Creating synthetic clinical tables

The omock package provides functionality to quickly create a cdm reference containing synthetic data based on population settings specified by the user.

First, let’s load packages required for this vignette.

library(omock)
library(dplyr)
library(ggplot2)

Now, in three lines of code, we can create a cdm reference with a person and observation period table for 1000 people.

cdm <- emptyCdmReference(cdmName = "synthetic cdm") |>
  mockPerson(nPerson = 1000) |>
  mockObservationPeriod()

cdm
#> 
#> ── # OMOP CDM reference (local) of synthetic cdm ───────────────────────────────
#> • omop tables: person, observation_period
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

cdm$person |> glimpse()
#> Rows: 1,000
#> Columns: 18
#> $ person_id                   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,…
#> $ gender_concept_id           <int> 8532, 8507, 8507, 8507, 8532, 8507, 8507, …
#> $ year_of_birth               <int> 1960, 1988, 1959, 1961, 1950, 1950, 1960, …
#> $ month_of_birth              <int> 10, 1, 10, 1, 6, 3, 9, 1, 1, 3, 3, 4, 11, …
#> $ day_of_birth                <int> 6, 14, 15, 23, 12, 22, 9, 16, 31, 4, 12, 9…
#> $ race_concept_id             <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ ethnicity_concept_id        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ birth_datetime              <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ location_id                 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ provider_id                 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ care_site_id                <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ person_source_value         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ gender_source_value         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ gender_source_concept_id    <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ race_source_value           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ race_source_concept_id      <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ ethnicity_source_value      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ ethnicity_source_concept_id <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

cdm$observation_period |> glimpse()
#> Rows: 1,000
#> Columns: 5
#> $ observation_period_id         <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1…
#> $ person_id                     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1…
#> $ observation_period_start_date <date> 1998-05-30, 2007-03-27, 2019-01-19, 199…
#> $ observation_period_end_date   <date> 2003-07-28, 2009-05-03, 2019-07-30, 200…
#> $ period_type_concept_id        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …

We can add further requirements around the population we create. For example we can require that they were born between 1960 and 1980 like so.

cdm <- emptyCdmReference(cdmName = "synthetic cdm") |>
  mockPerson(
    nPerson = 1000,
    birthRange = as.Date(c("1960-01-01", "1980-12-31"))
  ) |>
  mockObservationPeriod()
cdm$person |>
  collect() |>
  ggplot() +
  geom_histogram(aes(as.integer(year_of_birth)),
    binwidth = 1, colour = "grey"
  ) +
  theme_minimal() +
  xlab("Year of birth")