Step 2. Obtain the sequence ratios

Introduction

In this vignette we will explore the functionality and arguments of summariseSequenceRatios() function, which is used to generate the sequence ratios of the SSA. As this function uses the output of generateSequenceCohortSet() function (explained in detail in the vignette: Step 1. Generate a sequence cohort), we will pick up the explanation from where we left off in the previous vignette.

Recall that in the previous vignette: Step 1. Generate a sequence cohort, we’ve generated cdm$aspirin and cdm$acetaminophen before and using them we could generate cdm$intersect like so:

# Generate a sequence cohort
cdm <- generateSequenceCohortSet(
  cdm = cdm,
  indexTable = "aspirin",
  markerTable = "acetaminophen",
  name = "intersect",
  combinationWindow = c(0,Inf))

Obtain sequence ratios

One can obtain the crude and adjusted sequence ratios (with its corresponding confidence intervals) using summariseSequenceRatios() function:

summariseSequenceRatios(
  cohort = cdm$intersect
) |> 
  dplyr::glimpse()
#> Rows: 10
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
#> $ cdm_name         <chr> "Synthea synthetic health database", "Synthea synthet…
#> $ group_name       <chr> "index_cohort_name &&& marker_cohort_name", "index_co…
#> $ group_level      <chr> "1191_aspirin &&& 161_acetaminophen", "1191_aspirin &…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "crude", "adjusted", "crude", "crude", "adjusted", "a…
#> $ variable_level   <chr> "sequence_ratio", "sequence_ratio", "sequence_ratio",…
#> $ estimate_name    <chr> "point_estimate", "point_estimate", "lower_CI", "uppe…
#> $ estimate_type    <chr> "numeric", "numeric", "numeric", "numeric", "numeric"…
#> $ estimate_value   <chr> "1.8108504398827", "626.203086521988", "1.64970963817…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

The obtained output has a summarised result format. In the later vignette (Step 3. Visualise results) we will explore how to visualise the results in a more intuitive way.

Modify the cohort based on cohort_definition_id

This parameter is used to subset the cohort table inputted to the summariseSequenceRatios(). Imagine the user only wants to include cohort_definition_id  = 1 from cdm$intersect in the summariseSequenceRatios(), then one could do the following:

summariseSequenceRatios(cohort = cdm$intersect,
                          cohortId = 1) |> 
  dplyr::glimpse()
#> Rows: 10
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
#> $ cdm_name         <chr> "Synthea synthetic health database", "Synthea synthet…
#> $ group_name       <chr> "index_cohort_name &&& marker_cohort_name", "index_co…
#> $ group_level      <chr> "1191_aspirin &&& 161_acetaminophen", "1191_aspirin &…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "crude", "adjusted", "crude", "crude", "adjusted", "a…
#> $ variable_level   <chr> "sequence_ratio", "sequence_ratio", "sequence_ratio",…
#> $ estimate_name    <chr> "point_estimate", "point_estimate", "lower_CI", "uppe…
#> $ estimate_type    <chr> "numeric", "numeric", "numeric", "numeric", "numeric"…
#> $ estimate_value   <chr> "1.8108504398827", "626.203086521988", "1.64970963817…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

Of course in this case this does nothing because every entry in cdm$intersect has cohort_definition_id  = 1.

Modify confidenceInterval

By default, the summariseSequenceRatios() function will use 95% (two-sided) confidence interval. If another confidence interval is desired, for example 99% confidence interval, one can use the confidenceInterval argument:

summariseSequenceRatios(
  cohort = cdm$intersect,
  confidenceInterval = 99) |> 
  dplyr::glimpse()
#> Rows: 10
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
#> $ cdm_name         <chr> "Synthea synthetic health database", "Synthea synthet…
#> $ group_name       <chr> "index_cohort_name &&& marker_cohort_name", "index_co…
#> $ group_level      <chr> "1191_aspirin &&& 161_acetaminophen", "1191_aspirin &…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "crude", "adjusted", "crude", "crude", "adjusted", "a…
#> $ variable_level   <chr> "sequence_ratio", "sequence_ratio", "sequence_ratio",…
#> $ estimate_name    <chr> "point_estimate", "point_estimate", "lower_CI", "uppe…
#> $ estimate_type    <chr> "numeric", "numeric", "numeric", "numeric", "numeric"…
#> $ estimate_value   <chr> "1.8108504398827", "626.203086521988", "1.60240541369…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

Modify movingAverageRestriction

The idea of moving average restriction is necessary only for the null sequence ratio calculation, please refer to Lai et al. (2017) for more details on this parameter (parameter d when calculating P in page 578). Following Tsiropoulos et al. (2009), by default, the argument movingAverageRestriction is set to be 548 (18 months).

Modify minCellCount

By default, the minimum number of events to reported is 5, below which results will be obscured. If 0, all results will be reported and the user could do this via:

summariseSequenceRatios(cohort = cdm$intersect,
                        minCellCount = 0) |> 
  dplyr::glimpse()
#> Rows: 10
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
#> $ cdm_name         <chr> "Synthea synthetic health database", "Synthea synthet…
#> $ group_name       <chr> "index_cohort_name &&& marker_cohort_name", "index_co…
#> $ group_level      <chr> "1191_aspirin &&& 161_acetaminophen", "1191_aspirin &…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "crude", "adjusted", "crude", "crude", "adjusted", "a…
#> $ variable_level   <chr> "sequence_ratio", "sequence_ratio", "sequence_ratio",…
#> $ estimate_name    <chr> "point_estimate", "point_estimate", "lower_CI", "uppe…
#> $ estimate_type    <chr> "numeric", "numeric", "numeric", "numeric", "numeric"…
#> $ estimate_value   <chr> "1.8108504398827", "626.203086521988", "1.64970963817…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
CDMConnector::cdmDisconnect(cdm = cdm)