---
title: "Summarise clinical tables records"
output: 
  html_document:
    pandoc_args: [
      "--number-offset=1,0"
      ]
    number_sections: yes
    toc: yes
vignette: >
  %\VignetteIndexEntry{A-summarise_clinical_tables_records}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Introduction

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

In this vignette, we will explore the *OmopSketch* functions designed to provide an overview of the clinical tables within a CDM object (*observation_period*, *visit_occurrence*, *condition_occurrence*, *drug_exposure*, *procedure_occurrence*, *device_exposure*, *measurement*, *observation*, and *death*). Specifically, there are four key functions that facilitate this:

-   `summariseClinicalRecords()` and `tableClinicalRecords()`: Use them to create a summary statistics with key basic information of the clinical table (e.g., number of records, number of concepts mapped, etc.)

-   `summariseRecordCount()` and `plotRecordCount()`: Use them to summarise the number of records within a specific time interval.

## Create a mock cdm

Let's see an example of its functionalities. To start with, we will load essential packages and create a mock cdm using the mockOmopSketch() database.

```{r, warning=FALSE}
library(dplyr)
library(OmopSketch)

# Connect to mock database
cdm <- mockOmopSketch()
```

# Summarise clinical tables

Let's now use `summariseClinicalTables()`from the OmopSketch package to help us have an overview of one of the clinical tables of the cdm (i.e., **condition_occurrence**).

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm, "condition_occurrence")

summarisedResult |> print()
```

Notice that the output is in the summarised result format.

We can use the arguments to specify which statistics we want to perform. For example, use the argument `recordsPerPerson` to indicate which estimates you are interested regarding the number of records per person.

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm, 
                                             "condition_occurrence",
                                             recordsPerPerson =  c("mean", "sd", "q05", "q95"))

summarisedResult |> 
    filter(variable_name == "records_per_person") |>
    select(variable_name, estimate_name, estimate_value)
```

You can further specify if you want to include the number of records in observation (`inObservation = TRUE`), the number of concepts mapped (`standardConcept = TRUE`), which types of source vocabulary does the table contain (`sourceVocabulary = TRUE`), which types of domain does the vocabulary have (`domainId = TRUE`) or the concept's type (`typeConcept = TRUE`).

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm, 
                                             "condition_occurrence",
                                             recordsPerPerson =  c("mean", "sd", "q05", "q95"),
                                             inObservation = TRUE,
                                             standardConcept = TRUE,
                                             sourceVocabulary = TRUE,
                                             domainId = TRUE,
                                             typeConcept = TRUE)

summarisedResult |> 
  select(variable_name, estimate_name, estimate_value) |> 
  glimpse()
```

Additionally, you can also stratify the previous results by sex and age groups:

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm, 
                                             "condition_occurrence",
                                             recordsPerPerson =  c("mean", "sd", "q05", "q95"),
                                             inObservation = TRUE,
                                             standardConcept = TRUE,
                                             sourceVocabulary = TRUE,
                                             domainId = TRUE,
                                             typeConcept = TRUE,
                                             sex = TRUE,
                                             ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)))

summarisedResult |> 
  select(variable_name, strata_level, estimate_name, estimate_value) |> 
  glimpse()
```

Notice that, by default, the "overall" group will be also included, as well as crossed strata (that means, sex == "Female" and ageGroup == "\>35").

Also, see that the analysis can be conducted for multiple OMOP tables at the same time:

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm, 
                                             c("observation_period","drug_exposure"),
                                             recordsPerPerson =  c("mean","sd"),
                                             inObservation = FALSE,
                                             standardConcept = FALSE,
                                             sourceVocabulary = FALSE,
                                             domainId = FALSE,
                                             typeConcept = FALSE)

summarisedResult |> 
  select(group_level, variable_name, estimate_name, estimate_value) |> 
  glimpse()
```

## Tidy the summarised object

`tableClinicalRecords()` will help you to tidy the previous results and create a gt table.

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm, 
                                             "condition_occurrence",
                                             recordsPerPerson =  c("mean", "sd", "q05", "q95"),
                                             inObservation = TRUE,
                                             standardConcept = TRUE,
                                             sourceVocabulary = TRUE,
                                             domainId = TRUE,
                                             typeConcept = TRUE, 
                                             sex = TRUE)

summarisedResult |> 
  tableClinicalRecords()
```

# Summarise record counts

OmopSketch can also help you to summarise the trend of the records of an OMOP table. See the example below, where we use `summariseRecordCount()` to count the number of records within each year, and then, we use `plotRecordCount()` to create a ggplot with the trend.

```{r, warning=FALSE}
summarisedResult <- summariseRecordCount(cdm, "drug_exposure", interval = "years")

summarisedResult |> print()

summarisedResult |> plotRecordCount()
```

Note that you can adjust the time interval period using the `interval` argument, which can be set to either "years" or "months". See the example below, where it shows the number of records every 18 months:

```{r, warning=FALSE}
summariseRecordCount(cdm, "drug_exposure", interval = "months") |> 
  plotRecordCount()
```

We can further stratify our counts by sex (setting argument `sex = TRUE`) or by age (providing an age group). Notice that in both cases, the function will automatically create a group called *overall* with all the sex groups and all the age groups.

```{r, warning=FALSE}
summariseRecordCount(cdm, "drug_exposure",
                      interval = "months",
                      sex = TRUE, 
                      ageGroup = list("<30" = c(0,29),
                                     ">=30" = c(30,Inf))) |> 
  plotRecordCount()
```

By default, `plotRecordCount()` does not apply faceting or colour to any variables. This can result confusing when stratifying by different variables, as seen in the previous picture. We can use [VisOmopResults](https://darwin-eu.github.io/visOmopResults/) package to help us know by which columns we can colour or face by:

```{r, warning=FALSE}
summariseRecordCount(cdm, "drug_exposure",
                     interval = "months", 
                     sex = TRUE,
                     ageGroup = list("0-29" = c(0,29),
                                     "30-Inf" = c(30,Inf)))  |>
  visOmopResults::tidyColumns()
```

Then, we can simply specify this by using the `facet` and `colour` arguments from `plotRecordCount()`

```{r, warning=FALSE}
summariseRecordCount(cdm, "drug_exposure",
                     interval = "months",
                     sex = TRUE,
                     ageGroup = list("0-29" = c(0,29),
                                     "30-Inf" = c(30,Inf))) |>
    plotRecordCount(facet = omop_table ~ age_group, colour = "sex")
```

Finally, disconnect from the cdm

```{r, warning=FALSE}
  PatientProfiles::mockDisconnect(cdm = cdm)
```