| Title: | Using a Common Data Model on Spark |
|---|---|
| Description: | Use health data in the Observational Medical Outcomes Partnership Common Data Model format in Spark. |
| Authors: | Edward Burn [aut, cre] (ORCID: <https://orcid.org/0000-0002-9286-1128>), MartĂ CatalĂ [aut] (ORCID: <https://orcid.org/0000-0003-3308-9905>) |
| Maintainer: | Edward Burn <[email protected]> |
| License: | Apache License (>= 2) |
| Version: | 0.1.0 |
| Built: | 2026-05-24 08:05:33 UTC |
| Source: | https://github.com/ohdsi/omoponspark |
Disconnect the connection of the cdm object
## S3 method for class 'spark_cdm' cdmDisconnect(cdm, dropWriteSchema = FALSE, ...)## S3 method for class 'spark_cdm' cdmDisconnect(cdm, dropWriteSchema = FALSE, ...)
cdm |
cdm reference |
dropWriteSchema |
Whether to drop tables in the writeSchema |
... |
Not used |
cdm_reference object from a sparklyr connection.Create a cdm_reference object from a sparklyr connection.
cdmFromSpark( con, cdmSchema, writeSchema, cohortTables = NULL, cdmVersion = NULL, cdmName = NULL, achillesSchema = NULL, .softValidation = FALSE, writePrefix = NULL, cdmPrefix = NULL )cdmFromSpark( con, cdmSchema, writeSchema, cohortTables = NULL, cdmVersion = NULL, cdmName = NULL, achillesSchema = NULL, .softValidation = FALSE, writePrefix = NULL, cdmPrefix = NULL )
con |
A spark connection created with: |
cdmSchema |
Schema where omop standard tables are located. Schema is defined with a named character list/vector; allowed names are: 'catalog', 'schema' and 'prefix'. |
writeSchema |
Schema where with writing permissions. Schema is defined with a named character list/vector; allowed names are: 'catalog', 'schema' and 'prefix'. |
cohortTables |
Names of cohort tables to be read from |
cdmVersion |
The version of the cdm (either "5.3" or "5.4"). If NULL
|
cdmName |
The name of the cdm object, if NULL
|
achillesSchema |
Schema where achilled tables are located. Schema is defined with a named character list/vector; allowed names are: 'catalog', 'schema' and 'prefix'. |
.softValidation |
Whether to use soft validation, this is not recommended as analysis pipelines assume the cdm fullfill the validation criteria. |
writePrefix |
A prefix that will be added to all tables created in the write_schema. This can be used to create namespace in your database write_schema for your tables. |
cdmPrefix |
A prefix used with the OMOP CDM tables. |
A cdm reference object
## Not run: con <- sparklyr::spark_connect(...) cdmFromSpark( con = con, cdmSchema = c(catalog = "...", schema = "...", prefix = "..."), writeSchema = list() # use `list()`/`c()`/`NULL` to use temporary tables ) ## End(Not run)## Not run: con <- sparklyr::spark_connect(...) cdmFromSpark( con = con, cdmSchema = c(catalog = "...", schema = "...", prefix = "..."), writeSchema = list() # use `list()`/`c()`/`NULL` to use temporary tables ) ## End(Not run)
Create OMOP CDM tables
createOmopTablesOnSpark( con, schemaName, cdmVersion = "5.4", overwrite = FALSE, bigInt = FALSE, cdmPrefix = NULL )createOmopTablesOnSpark( con, schemaName, cdmVersion = "5.4", overwrite = FALSE, bigInt = FALSE, cdmPrefix = NULL )
con |
Connection to a Spark database. |
schemaName |
Schema in which to create tables. |
cdmVersion |
Which version of the OMOP CDM to create. Can be "5.3" or "5.4". |
overwrite |
Whether to overwrite existing tables. |
bigInt |
Whether to use big integers for person identifier (person_id or subject_id) |
cdmPrefix |
Whether to cdmPrefix tables created (not generally recommended). |
OMOP CDM tables created in database
Drop Spark tables in the write schema of the connection behind the cdm reference.
## S3 method for class 'spark_cdm' dropSourceTable(cdm, name)## S3 method for class 'spark_cdm' dropSourceTable(cdm, name)
cdm |
A cdm reference |
name |
The names of the tables to drop. Tidyselect statements can be used. |
Drops the Spark tables.
Insert a local dataframe into the cdm.
## S3 method for class 'spark_cdm' insertTable(cdm, name, table, overwrite = TRUE, temporary = FALSE, ...)## S3 method for class 'spark_cdm' insertTable(cdm, name, table, overwrite = TRUE, temporary = FALSE, ...)
cdm |
A cdm reference. |
name |
The name of the table to insert. |
table |
The table to insert. |
overwrite |
Whether to overwrite an existing table. |
temporary |
If TRUE, a spark dataframe will be written (that will persist to the end of the current session). If FALSE, a spark table will be written (which will persist beyond the end of the current session). |
... |
For compatability |
The cdm reference with the table added.
creates a cdm reference to local spark OMOP CDM tables
mockSparkCdm(path)mockSparkCdm(path)
path |
A directory for files |
A cdm reference with synthetic data in a local spark connection
## Not run: mockSparkCdm() ## End(Not run)## Not run: mockSparkCdm() ## End(Not run)