--- title: "Summarise clinical tables records" output: html_document: pandoc_args: [ "--number-offset=1,0" ] number_sections: yes toc: yes vignette: > %\VignetteIndexEntry{A-summarise_clinical_tables_records} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` In this vignette, we will explore the *OmopSketch* functions designed to provide an overview of the clinical tables within a CDM object (*observation_period*, *visit_occurrence*, *condition_occurrence*, *drug_exposure*, *procedure_occurrence*, *device_exposure*, *measurement*, *observation*, and *death*). Specifically, there are four key functions that facilitate this: - `summariseClinicalRecords()` and `tableClinicalRecords()`: Use them to create a summary statistics with key basic information of the clinical table (e.g., number of records, number of concepts mapped, etc.) - `summariseRecordCount()` and `plotRecordCount()`: Use them to summarise the number of records within a specific time interval. ## Create a mock cdm Let's see an example of its functionalities. To start with, we will load essential packages and create a mock cdm using the mockOmopSketch() database. ```{r, warning=FALSE} library(dplyr) library(OmopSketch) # Connect to mock database cdm <- mockOmopSketch() ``` # Summarise clinical tables Let's now use `summariseClinicalTables()`from the OmopSketch package to help us have an overview of one of the clinical tables of the cdm (i.e., **condition_occurrence**). ```{r, warning=FALSE} summarisedResult <- summariseClinicalRecords(cdm, "condition_occurrence") summarisedResult |> print() ``` Notice that the output is in the summarised result format. We can use the arguments to specify which statistics we want to perform. For example, use the argument `recordsPerPerson` to indicate which estimates you are interested regarding the number of records per person. ```{r, warning=FALSE} summarisedResult <- summariseClinicalRecords(cdm, "condition_occurrence", recordsPerPerson = c("mean", "sd", "q05", "q95")) summarisedResult |> filter(variable_name == "records_per_person") |> select(variable_name, estimate_name, estimate_value) ``` You can further specify if you want to include the number of records in observation (`inObservation = TRUE`), the number of concepts mapped (`standardConcept = TRUE`), which types of source vocabulary does the table contain (`sourceVocabulary = TRUE`), which types of domain does the vocabulary have (`domainId = TRUE`) or the concept's type (`typeConcept = TRUE`). ```{r, warning=FALSE} summarisedResult <- summariseClinicalRecords(cdm, "condition_occurrence", recordsPerPerson = c("mean", "sd", "q05", "q95"), inObservation = TRUE, standardConcept = TRUE, sourceVocabulary = TRUE, domainId = TRUE, typeConcept = TRUE) summarisedResult |> select(variable_name, estimate_name, estimate_value) |> glimpse() ``` Additionally, you can also stratify the previous results by sex and age groups: ```{r, warning=FALSE} summarisedResult <- summariseClinicalRecords(cdm, "condition_occurrence", recordsPerPerson = c("mean", "sd", "q05", "q95"), inObservation = TRUE, standardConcept = TRUE, sourceVocabulary = TRUE, domainId = TRUE, typeConcept = TRUE, sex = TRUE, ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))) summarisedResult |> select(variable_name, strata_level, estimate_name, estimate_value) |> glimpse() ``` Notice that, by default, the "overall" group will be also included, as well as crossed strata (that means, sex == "Female" and ageGroup == "\>35"). Also, see that the analysis can be conducted for multiple OMOP tables at the same time: ```{r, warning=FALSE} summarisedResult <- summariseClinicalRecords(cdm, c("observation_period","drug_exposure"), recordsPerPerson = c("mean","sd"), inObservation = FALSE, standardConcept = FALSE, sourceVocabulary = FALSE, domainId = FALSE, typeConcept = FALSE) summarisedResult |> select(group_level, variable_name, estimate_name, estimate_value) |> glimpse() ``` ## Tidy the summarised object `tableClinicalRecords()` will help you to tidy the previous results and create a gt table. ```{r, warning=FALSE} summarisedResult <- summariseClinicalRecords(cdm, "condition_occurrence", recordsPerPerson = c("mean", "sd", "q05", "q95"), inObservation = TRUE, standardConcept = TRUE, sourceVocabulary = TRUE, domainId = TRUE, typeConcept = TRUE, sex = TRUE) summarisedResult |> tableClinicalRecords() ``` # Summarise record counts OmopSketch can also help you to summarise the trend of the records of an OMOP table. See the example below, where we use `summariseRecordCount()` to count the number of records within each year, and then, we use `plotRecordCount()` to create a ggplot with the trend. ```{r, warning=FALSE} summarisedResult <- summariseRecordCount(cdm, "drug_exposure", interval = "years") summarisedResult |> print() summarisedResult |> plotRecordCount() ``` Note that you can adjust the time interval period using the `interval` argument, which can be set to either "years" or "months". See the example below, where it shows the number of records every 18 months: ```{r, warning=FALSE} summariseRecordCount(cdm, "drug_exposure", interval = "months") |> plotRecordCount() ``` We can further stratify our counts by sex (setting argument `sex = TRUE`) or by age (providing an age group). Notice that in both cases, the function will automatically create a group called *overall* with all the sex groups and all the age groups. ```{r, warning=FALSE} summariseRecordCount(cdm, "drug_exposure", interval = "months", sex = TRUE, ageGroup = list("<30" = c(0,29), ">=30" = c(30,Inf))) |> plotRecordCount() ``` By default, `plotRecordCount()` does not apply faceting or colour to any variables. This can result confusing when stratifying by different variables, as seen in the previous picture. We can use [VisOmopResults](https://darwin-eu.github.io/visOmopResults/) package to help us know by which columns we can colour or face by: ```{r, warning=FALSE} summariseRecordCount(cdm, "drug_exposure", interval = "months", sex = TRUE, ageGroup = list("0-29" = c(0,29), "30-Inf" = c(30,Inf))) |> visOmopResults::tidyColumns() ``` Then, we can simply specify this by using the `facet` and `colour` arguments from `plotRecordCount()` ```{r, warning=FALSE} summariseRecordCount(cdm, "drug_exposure", interval = "months", sex = TRUE, ageGroup = list("0-29" = c(0,29), "30-Inf" = c(30,Inf))) |> plotRecordCount(facet = omop_table ~ age_group, colour = "sex") ``` Finally, disconnect from the cdm ```{r, warning=FALSE} PatientProfiles::mockDisconnect(cdm = cdm) ```