--- title: "Summarise observation period" output: html_document: pandoc_args: [ "--number-offset=1,0" ] number_sections: yes toc: yes vignette: > %\VignetteIndexEntry{C-summarise_observation_period} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` In this vignette, we will explore the *OmopSketch* functions designed to provide an overview of the `observation_period` table. Specifically, there are five key functions that facilitate this: - `summariseObservationPeriod()`, `plotObservationPeriod()` and `tableObservationPeriod()`: Use them to get some overall statistics describing the `observation_period` table - `summariseInObservation()` and `plotInObservation()`: Use them to summarise the number of individuals in observation during specific intervals of time. ## Create a mock cdm Let's see an example of its functionalities. To start with, we will load essential packages and create a mock cdm using the mockOmopSketch() database. ```{r, warning=FALSE} library(dplyr) library(OmopSketch) # Connect to mock database cdm <- mockOmopSketch() ``` # Summarise observation periods Let's now use the `summariseObservationPeriod()` function from the OmopSketch package to help us have an overview of one of the `observation_period` table, including some statistics such as the `Number uf subjects` and `Duration in days` for each observation period (e.g., 1st, 2nd) ```{r, warning=FALSE} summarisedResult <- summariseObservationPeriod(cdm$observation_period) summarisedResult ``` Notice that the output is in the summarised result format. We can use the arguments to specify which statistics we want to perform. For example, use the argument `estimates` to indicate which estimates you are interested regarding the `Duration in days` of the observation period. ```{r, warning=FALSE} summarisedResult <- summariseObservationPeriod(cdm$observation_period, estimates = c("mean", "sd", "q05", "q95")) summarisedResult |> filter(variable_name == "Duration in days") |> select(group_level, variable_name, estimate_name, estimate_value) ``` Additionally, you can stratify the results by sex and age groups, and specify a date range of interest: ```{r, warning=FALSE} summarisedResult <- summariseObservationPeriod(cdm$observation_period, estimates = c("mean", "sd", "q05", "q95"), sex = TRUE, ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)), dateRange = as.Date(c("1970-01-01", "2010-01-01"))) summarisedResult |> select(group_level, variable_name, strata_level, estimate_name, estimate_value) |> glimpse() ``` Notice that, by default, the "overall" group will be also included, as well as crossed strata (that means, sex == "Female" and ageGroup == "\>35"). ## Tidy the summarised object `tableObservationPeriod()` will help you to create a table (see supported types with: visOmopResults::tableType()). By default it creates a [gt] () table. ```{r, warning=FALSE} summarisedResult <- summarisedResult <- summariseObservationPeriod(cdm$observation_period, estimates = c("mean", "sd", "q05", "q95"), sex = TRUE) summarisedResult |> tableObservationPeriod() ``` ## Visualise the results Finally, we can visualise the concept counts using `plotObservationPeriod()`. ```{r, warning=FALSE} summarisedResult <- summariseObservationPeriod(cdm$observation_period) plotObservationPeriod(summarisedResult, variableName = "Number subjects", plotType = "barplot") ``` Note that either `Number subjects` or `Duration in days` can be plotted. For `Number of subjects`, the plot type can be `barplot`, whereas for `Duration in days`, the plot type can be `barplot`, `boxplot`, or `densityplot`." Additionally, if results were stratified by sex or age group, we can further use `facet` or `colour` arguments to highlight the different results in the plot. To help us identify by which variables we can colour or facet by, we can use [visOmopResult](https://darwin-eu.github.io/visOmopResults/) package. ```{r, warning=FALSE} summarisedResult <- summariseObservationPeriod(cdm$observation_period, sex = TRUE) plotObservationPeriod(summarisedResult, variableName = "Duration in days", plotType = "boxplot", facet = "sex") summarisedResult <- summariseObservationPeriod(cdm$observation_period, sex = TRUE, ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))) plotObservationPeriod(summarisedResult, colour = "sex", facet = "age_group") ``` # Summarise in observation OmopSketch can also help you to summarise the number of individuals in observation during specific intervals of time. ```{r, warning=FALSE} summarisedResult <- summariseInObservation(cdm$observation_period, interval = "years") summarisedResult |> select(variable_name, estimate_name, estimate_value, additional_name, additional_level) ``` Note that you can adjust the time interval period using the `interval` argument, which can be set to either "years", "quarters", "months" or "overall" (default value). ```{r, warning=FALSE} summarisedResult <- summariseInObservation(cdm$observation_period, interval = "months") summarisedResult |> select(variable_name, estimate_name, estimate_value, additional_name, additional_level) ``` Along with the number of records in observation, you can also calculate the number of person-days by setting the `output` argument to c("records", "person-days"). ```{r, warning=FALSE} summarisedResult <- summariseInObservation(cdm$observation_period, output = c("records", "person-days")) summarisedResult |> select(variable_name, estimate_name, estimate_value, additional_name, additional_level) ``` We can further stratify our counts by sex (setting argument `sex = TRUE`) or by age (providing an age group). Notice that in both cases, the function will automatically create a group called *overall* with all the sex groups and all the age groups. We can also define a date range of interest to filter the `observation_period` table accordingly. ```{r, warning=FALSE} summarisedResult <- summariseInObservation(cdm$observation_period, output = c("records", "person-days"), interval = "quarters", sex = TRUE, ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)), dateRange = as.Date(c("1970-01-01", "2010-01-01"))) summarisedResult |> select(strata_level, variable_name, estimate_name, estimate_value, additional_name, additional_level) ``` ## Visualise the results Finally, we can visualise the concept counts using `plotInObservation()`. ```{r, warning=FALSE} summarisedResult <- summariseInObservation(cdm$observation_period, interval = "years") plotInObservation(summarisedResult) ``` Notice that either `Number records in observation` and `Number person-days` can be plotted. If both have been included in the summarised result, you will have to filter to only include one variable at time. Additionally, if results were stratified by sex or age group, we can further use `facet` or `colour` arguments to highlight the different results in the plot. To help us identify by which variables we can colour or facet by, we can use [visOmopResult](https://darwin-eu.github.io/visOmopResults/) package. ```{r, warning=FALSE} summarisedResult <- summariseInObservation(cdm$observation_period, interval = "years", output = c("records", "person-days"), sex = TRUE, ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))) plotInObservation(summarisedResult |> filter(variable_name == "Number person-days"), colour = "sex", facet = "age_group") ``` Finally, disconnect from the cdm ```{r, warning=FALSE} PatientProfiles::mockDisconnect(cdm = cdm) ```