---
title: "Using Characterization Package"
author: "Jenna Reps"
date: '`r Sys.Date()`'
header-includes:
    - \usepackage{fancyhdr}
    - \pagestyle{fancy}
    - \fancyhead{}
    - \fancyhead[CO,CE]{Installation Guide}
    - \fancyfoot[CO,CE]{Characterization Package Version `r    utils::packageVersion("Characterization")`}
    - \fancyfoot[LE,RO]{\thepage}
    - \renewcommand{\headrulewidth}{0.4pt}
    - \renewcommand{\footrulewidth}{0.4pt}
output:
  html_document:
    number_sections: yes
    toc: yes
vignette: >
  %\VignetteIndexEntry{Using_Package}
  %\VignetteEngine{knitr::knitr}
  %\VignetteEncoding{UTF-8}
---

# Introduction
This vignette describes how you can use the Characterization package for various descriptive studies using OMOP CDM data. The Characterization package currently contains three different types of analyses:

- Aggregate Covariates: this returns the mean feature value for a set of features specified by the user for i) the Target cohort population, ii) the Outcome cohort population, iii) the Target population patients who had the outcome during some user specified time-at-risk and iv) the Target population patients who did not have the outcome during some user specified time-at-risk.
- DechallengeRechallenge: this is mainly aimed at investigating whether a drug and event are causally related by seeing whether the drug is stopped close in time to the event occurrence (dechallenge) and then whether the drug is restarted (a rechallenge occurs) and if so, whether the event starts again (a failed rechallenge). In this analysis, the Target cohorts are the drug users of interest and the Outcome cohorts are the medical events you wish to see whether the drug may cause.  The user must also specify how close in time a drug must be stopped after the outcome to be considered a dechallenge and how close in time an Outcome must occur after restarting the drug to be considered a failed rechallenge).
- Time-to-event: this returns descriptive results showing the timing between the target cohort and outcome.  This can help identify whether the outcome often precedes the target cohort or whether it generally comes after.

# Setup

First we need to install the `Characterization` package:
```{r tidy=TRUE,eval=FALSE}
remotes::install_github("ohdsi/Characterization")
```

and then load it:
```{r tidy=TRUE,eval=TRUE}
library(Characterization)
library(dplyr)
```

In this vignette we will show working examples using a sample of the `Eunomia` R package GI Bleed simulated data. The function `exampleOmopConnectionDetails` creates a connection details object for a SQLITE database containing an example observational medical outcomes partnership (OMOP) common data model (CDM) data in a temporary location. 

```{r tidy=TRUE,eval=TRUE}
connectionDetails <- Characterization::exampleOmopConnectionDetails()
```

# Examples 
## Aggreagate Covariates

To run an 'Aggregate Covariate' analysis you need to create a setting object using `createAggregateCovariateSettings`.  This requires specifying:

- one or more targetIds (these must be pre-generated in a cohort table)
- one or more outcomeIds (these must be pre-generated in a cohort table)
- the covariate settings using `FeatureExtraction::createCovariateSettings` or by creating your own custom feature extraction code.
- the time-at-risk settings
+ riskWindowStart
+ startAnchor
+ riskWindowEnd
+ endAnchor

Using the Eunomia data were we previous generated four cohorts, we can use cohort ids 1,2 and 4 as the targetIds and cohort id 3 as the outcomeIds:  

```{r eval=TRUE}
exampleTargetIds <- c(1, 2, 4)
exampleOutcomeIds <- 3
```

If we want to get information on the sex, age at index and Charlson Comorbidity index we can create the settings using `FeatureExtraction::createCovariateSettings`:

```{r eval=TRUE}
exampleCovariateSettings <- FeatureExtraction::createCovariateSettings(
  useDemographicsGender = T,
  useDemographicsAge = T,
  useCharlsonIndex = T
)
```

There is an additional covariate setting require that is calculated for the cases (patients in the target cohort with have the outcome during the time-at-risk).  This is called caseCovariateSettings and should be created using the createDuringCovariateSettings function.  The user can pick conditions, drugs, measurements, procedures and observations.  In this example, we just include condition eras groups by vocabulary heirarchy.  We also need to specify two related variables `casePreTargetDuration` which is the number of days before target index to extract features for the cases (answers what happens shortly before the target index) and `casePostOutcomeDuration` which is the number of days after the outcome date to extract features for the cases (answers what happens after the outcome).  The case covariates are also extracted between target index and outcome (answers the question what happens during target exposure).

```{r eval=TRUE}
caseCovariateSettings <- Characterization::createDuringCovariateSettings(
  useConditionGroupEraDuring = T
)
```


If we want to create the aggregate features for all our target cohorts, our outcome cohort and each target cohort restricted to those with a record of the outcome 1 day after target cohort start date until 365 days after target cohort end date with a outcome washout of 9999 (meaning we only include outcomes that are the first occurrence in the past 9999 days) and only include targets or outcomes where the patient was observed for 365 days or more prior, we can run:

```{r eval=TRUE}
exampleAggregateCovariateSettings <- createAggregateCovariateSettings(
  targetIds = exampleTargetIds,
  outcomeIds = exampleOutcomeIds,
  riskWindowStart = 1, startAnchor = "cohort start",
  riskWindowEnd = 365, endAnchor = "cohort start",
  outcomeWashoutDays = 9999,
  minPriorObservation = 365,
  covariateSettings = exampleCovariateSettings,
  caseCovariateSettings = caseCovariateSettings,
  casePreTargetDuration = 90,
  casePostOutcomeDuration = 90
)
```

Next we need to use the `exampleAggregateCovariateSettings` as the settings to `computeAggregateCovariateAnalyses`, we need to use the Eunomia connectionDetails and in Eunomia the OMOP CDM data and cohort table are in the 'main' schema.  The cohort table name is 'cohort'.  The following code will apply the aggregated covariates analysis using the previously specified settings on the simulated Eunomia data, but we can specify the `minCharacterizationMean` to exclude covarites with mean values below 0.01, and we must specify the `outputFolder` where the csv results will be written to.

```{r eval=FALSE,results='hide',error=FALSE,warning=FALSE,message=FALSE}
runCharacterizationAnalyses(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = "main",
  targetDatabaseSchema = "main",
  targetTable = "cohort",
  outcomeDatabaseSchema = "main",
  outcomeTable = "cohort",
  characterizationSettings = createCharacterizationSettings(
    aggregateCovariateSettings = exampleAggregateCovariateSettings
  ),
  databaseId = "Eunomia",
  runId = 1,
  minCharacterizationMean = 0.01,
  outputDirectory = file.path(tempdir(), "example_char", "results"), 
  executionPath = file.path(tempdir(), "example_char", "execution"),
  minCellCount = 10,
  incremental = F,
  threads = 1
)
```

You can then see the results in the location `file.path(tempdir(), 'example_char', 'results')` where you will find csv files.

## Dechallenge Rechallenge

To run a 'Dechallenge Rechallenge' analysis you need to create a setting object using `createDechallengeRechallengeSettings`.  This requires specifying:

- one or more targetIds (these must be pre-generated in a cohort table)
- one or more outcomeIds (these must be pre-generated in a cohort table)
- dechallengeStopInterval
- dechallengeEvaluationWindow

Using the Eunomia data were we previous generated four cohorts, we can use cohort ids 1,2 and 4 as the targetIds and cohort id 3 as the outcomeIds:  

```{r eval=TRUE}
exampleTargetIds <- c(1, 2, 4)
exampleOutcomeIds <- 3
```

If we want to create the dechallenge rechallenge for all our target cohorts and our outcome cohort with a 30 day dechallengeStopInterval and 31 day dechallengeEvaluationWindow:

```{r eval=TRUE}
exampleDechallengeRechallengeSettings <- createDechallengeRechallengeSettings(
  targetIds = exampleTargetIds,
  outcomeIds = exampleOutcomeIds,
  dechallengeStopInterval = 30,
  dechallengeEvaluationWindow = 31
)
```

We can then run  the analysis on the Eunomia data using `computeDechallengeRechallengeAnalyses` and the settings previously specified, with `minCellCount` removing values less than the specified value:

```{r eval=FALSE}
dc <- computeDechallengeRechallengeAnalyses(
  connectionDetails = connectionDetails,
  targetDatabaseSchema = "main",
  targetTable = "cohort",
  settings = exampleDechallengeRechallengeSettings,
  databaseId = "Eunomia",
  outcomeFolder = file.path(tempdir(), "example_char", "results"),
  minCellCount = 5
)
```

Next it is possible to compute the failed rechallenge cases

```{r eval=FALSE}
failed <- computeRechallengeFailCaseSeriesAnalyses(
  connectionDetails = connectionDetails,
  targetDatabaseSchema = "main",
  targetTable = "cohort",
  settings = exampleDechallengeRechallengeSettings,
  outcomeDatabaseSchema = "main",
  outcomeTable = "cohort",
  databaseId = "Eunomia",
  outcomeFolder = file.path(tempdir(), "example_char", "results"),
  minCellCount = 5
)
```

## Time to Event

To run a 'Time-to-event' analysis you need to create a setting object using `createTimeToEventSettings`.  This requires specifying:

- one or more targetIds (these must be pre-generated in a cohort table)
- one or more outcomeIds (these must be pre-generated in a cohort table)

```{r eval=TRUE}
exampleTimeToEventSettings <- createTimeToEventSettings(
  targetIds = exampleTargetIds,
  outcomeIds = exampleOutcomeIds
)
```

We can then run  the analysis on the Eunomia data using `computeTimeToEventAnalyses` and the settings previously specified:

```{r eval=FALSE}
tte <- computeTimeToEventAnalyses(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = "main",
  targetDatabaseSchema = "main",
  targetTable = "cohort",
  settings = exampleTimeToEventSettings,
  databaseId = "Eunomia",
  outcomefolder = file.path(tempdir(), "example_char", "results"),
  minCellCount = 5
)
```

## Run Multiple 

If you want to run multiple analyses (of the three previously shown) you can use `createCharacterizationSettings`.  You need to input a list of each of the settings (or NULL if you do not want to run one type of analysis).  To run all the analyses previously shown in one function:

```{r eval=FALSE,results='hide',error=FALSE,warning=FALSE,message=FALSE}
characterizationSettings <- createCharacterizationSettings(
  timeToEventSettings = list(
    exampleTimeToEventSettings
  ),
  dechallengeRechallengeSettings = list(
    exampleDechallengeRechallengeSettings
  ),
  aggregateCovariateSettings = exampleAggregateCovariateSettings
)

# save the settings using
saveCharacterizationSettings(
  settings = characterizationSettings,
  saveDirectory = file.path(tempdir(), "saveSettings")
)

# the settings can be loaded
characterizationSettings <- loadCharacterizationSettings(
  saveDirectory = file.path(tempdir(), "saveSettings")
)

runCharacterizationAnalyses(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = "main",
  targetDatabaseSchema = "main",
  targetTable = "cohort",
  outcomeDatabaseSchema = "main",
  outcomeTable = "cohort",
  characterizationSettings = characterizationSettings,
  outputDirectory = file.path(tempdir(), "example", "results"),
  executionPath = file.path(tempdir(), "example", "execution"),
  csvFilePrefix = "c_",
  databaseId = "1",
  incremental = F,
  minCharacterizationMean = 0.01,
  minCellCount = 5
)
```

This will create csv files with the results in the saveDirectory.  You can run the following code to view the results in a shiny app:

```{r eval=FALSE}
viewCharacterization(
  resultFolder = file.path(tempdir(), "example", "results"),
  cohortDefinitionSet = NULL
)
```