Package 'OmopOnSpark' reference manual

Title:	Using a Common Data Model on Spark
Description:	Use health data in the Observational Medical Outcomes Partnership Common Data Model format in Spark.
Authors:	Edward Burn [aut, cre] (ORCID: <https://orcid.org/0000-0002-9286-1128>), Martí Català [aut] (ORCID: <https://orcid.org/0000-0003-3308-9905>)
Maintainer:	Edward Burn <[email protected]>
License:	Apache License (>= 2)
Version:	0.1.0
Built:	2026-07-02 09:59:50 UTC
Source:	https://github.com/ohdsi/omoponspark

Disconnect the connection of the cdm object

Description

Disconnect the connection of the cdm object

Usage

## S3 method for class 'spark_cdm'
cdmDisconnect(cdm, dropWriteSchema = FALSE, ...)
## S3 method for class 'spark_cdm'
cdmDisconnect(cdm, dropWriteSchema = FALSE, ...)

Arguments

cdm

cdm reference

dropWriteSchema

Whether to drop tables in the writeSchema

...

Not used

Create a `cdm_reference` object from a `sparklyr` connection.

Description

Create a cdm_reference object from a sparklyr connection.

Usage

cdmFromSpark(
  con,
  cdmSchema,
  writeSchema,
  cohortTables = NULL,
  cdmVersion = NULL,
  cdmName = NULL,
  achillesSchema = NULL,
  .softValidation = FALSE,
  writePrefix = NULL,
  cdmPrefix = NULL
)
cdmFromSpark(
  con,
  cdmSchema,
  writeSchema,
  cohortTables = NULL,
  cdmVersion = NULL,
  cdmName = NULL,
  achillesSchema = NULL,
  .softValidation = FALSE,
  writePrefix = NULL,
  cdmPrefix = NULL
)

Arguments

con

A spark connection created with: sparklyr::spark_connect().

cdmSchema

Schema where omop standard tables are located. Schema is defined with a named character list/vector; allowed names are: 'catalog', 'schema' and 'prefix'.

writeSchema

Schema where with writing permissions. Schema is defined with a named character list/vector; allowed names are: 'catalog', 'schema' and 'prefix'.

cohortTables

Names of cohort tables to be read from writeSchema.

cdmVersion

The version of the cdm (either "5.3" or "5.4"). If NULL cdm_source$cdm_version will be used instead.

cdmName

The name of the cdm object, if NULL cdm_source$cdm_source_name will be used instead.

achillesSchema

Schema where achilled tables are located. Schema is defined with a named character list/vector; allowed names are: 'catalog', 'schema' and 'prefix'.

.softValidation

Whether to use soft validation, this is not recommended as analysis pipelines assume the cdm fullfill the validation criteria.

writePrefix

A prefix that will be added to all tables created in the write_schema. This can be used to create namespace in your database write_schema for your tables.

cdmPrefix

A prefix used with the OMOP CDM tables.

Value

A cdm reference object

Examples

## Not run: 
con <- sparklyr::spark_connect(...)
cdmFromSpark(
  con = con,
  cdmSchema = c(catalog = "...", schema = "...", prefix = "..."),
  writeSchema = list() # use `list()`/`c()`/`NULL` to use temporary tables
)

## End(Not run)
## Not run: 
con <- sparklyr::spark_connect(...)
cdmFromSpark(
  con = con,
  cdmSchema = c(catalog = "...", schema = "...", prefix = "..."),
  writeSchema = list() # use `list()`/`c()`/`NULL` to use temporary tables
)

## End(Not run)

Create OMOP CDM tables

Description

Create OMOP CDM tables

Usage

createOmopTablesOnSpark(
  con,
  schemaName,
  cdmVersion = "5.4",
  overwrite = FALSE,
  bigInt = FALSE,
  cdmPrefix = NULL
)
createOmopTablesOnSpark(
  con,
  schemaName,
  cdmVersion = "5.4",
  overwrite = FALSE,
  bigInt = FALSE,
  cdmPrefix = NULL
)

Arguments

con

Connection to a Spark database.

schemaName

Schema in which to create tables.

cdmVersion

Which version of the OMOP CDM to create. Can be "5.3" or "5.4".

overwrite

Whether to overwrite existing tables.

bigInt

Whether to use big integers for person identifier (person_id or subject_id)

cdmPrefix

Whether to cdmPrefix tables created (not generally recommended).

Value

OMOP CDM tables created in database

Drop spark tables

Description

Drop Spark tables in the write schema of the connection behind the cdm reference.

Usage

## S3 method for class 'spark_cdm'
dropSourceTable(cdm, name)
## S3 method for class 'spark_cdm'
dropSourceTable(cdm, name)

Arguments

cdm

A cdm reference

name

The names of the tables to drop. Tidyselect statements can be used.

Value

Drops the Spark tables.

Insert a table to a cdm object

Description

Insert a local dataframe into the cdm.

Usage

## S3 method for class 'spark_cdm'
insertTable(cdm, name, table, overwrite = TRUE, temporary = FALSE, ...)
## S3 method for class 'spark_cdm'
insertTable(cdm, name, table, overwrite = TRUE, temporary = FALSE, ...)

Arguments

cdm

A cdm reference.

name

The name of the table to insert.

table

The table to insert.

overwrite

Whether to overwrite an existing table.

temporary

If TRUE, a spark dataframe will be written (that will persist to the end of the current session). If FALSE, a spark table will be written (which will persist beyond the end of the current session).

...

For compatability

Value

The cdm reference with the table added.

creates a cdm reference to local spark OMOP CDM tables

Description

creates a cdm reference to local spark OMOP CDM tables

Usage

mockSparkCdm(path)
mockSparkCdm(path)

Arguments

path

A directory for files

Value

A cdm reference with synthetic data in a local spark connection

Examples

## Not run: 
mockSparkCdm()

## End(Not run)

## Not run: 
mockSparkCdm()

## End(Not run)

Package 'OmopOnSpark'

Help Index

Disconnect the connection of the cdm object

Description

Usage

Arguments

Create a cdm_reference object from a sparklyr connection.

Description

Usage

Arguments

Value

Examples

Create OMOP CDM tables

Description

Usage

Arguments

Value

Drop spark tables

Description

Usage

Arguments

Value

Insert a table to a cdm object

Description

Usage

Arguments

Value

creates a cdm reference to local spark OMOP CDM tables

Description

Usage

Arguments

Value

Examples

Create a `cdm_reference` object from a `sparklyr` connection.