ResultModelManager is a package that aims to standardize the handling of data models across OHDSI Hades packages. This package is designed to have minimal requirements and be easily imported by others.
At the time of writing, OHDSI analytics packages have many different and specific results data models that can change between package versions. When changes occur to these data models occurs, old versions of dependent software become incompatible with new versions. Similarly, when applications that depend on these data models encounter these changes developers are either forced in to the following assumptions:
Furthermore, as the Strategus package is being developed it has been noted that many existing analytics packages generate results that are in non standard formats. Any time data models are changed, this breaks any backwards compatibility.
Work in progress name: ResultModelManager (RMM or, phonetically, rum).
This goal of package is intended to provide common data model utilities that can be used across OHDSI packages. The initial focus will be on handling database migrations via a common interface that can be used within R packages to add new data models (where not existing), handle migrations to future versions, report the status of a given data model’s version, get migrations that are to be run and execute SQL scripts that change a data model in sequential order.
In addition, common functionality for defining and querying data models can be provided in a way that allows easier access to results data models for the purpose of post-collection data analysis.
This is similar in function to a software utility like flyway (used
successfully in OHDSI WebAPI) - however, this is geared towards
R
and the OHDSI
landscape of packages, with
results that have been generated by ohdsi analytics packages and are
intended to be explored using OHDSI open source reporting tools such as
shiny and Rmarkdown. From analysis - it is not simple to expect users to
install flyway, and would not be trivial to store and handle migrations
in this way.
This package only intends to provide utilities for R Ohdsi packages that require exploration of results in a common way.
This package will mainly be used by OHDSI package maintainers and is not intended for users of those software. Though some functionality (such as alerting users to the fact that a data model is out of date) it will be up to the package maintainers to implement the interfaces provided here. In terms of Data model migrations and querying this package will provide a series of abstract base classes/interfaces (note: this distinction is hard to enforce within R6) that can be implemented within individual packages. The packages themselves will create classes that can perform the required tasks.
The obvious initial dependencies are ParrallelLogger
,
R6
, DatabaseConnector
and
SqlRender
. Beyond this the package should be fairly
lightweight and should not require packages from outside CRAN. It is not
an initial requirement that this package is in CRAN but all development
should conform to this unless a requirement makes this impossible.
The results migrator class will be defined as an R6
class that has the following base functions:
status()
Get the status of a current data model - this
should return the ordered migrations and flag those that have or haven’t
been executedcheck()
Check that migrations conform to standards (can
be used in package tests)executeMigrations()
execute any migrations that have
not been completedDataMigrationManager <- R6::R6Class(
"DataMigrationManager",
private = list(
executeMigration = function(filePath) {
# Load, render, translate and execute sql
# Save migration in set of migrations
# Error handling - stop execution, restore transaction
}
),
public = list(
migrationPath = NULL,
migrationFolder = NULL,
resulultsDatabaseSchema = NULL,
connection = NULL,
initalize = function(connectionDetails,
resultsDatabaseSchema,
tablePrefix,
migrationsPath,
migrationRegexp = .defaultMigrationRegexp) {
# Set required variables
},
getStatus = function() {
# return data frame all migrations, including file name, order and
},
check = function() {
# Check to see if files follow pattern
},
executeMigrations = function() {
# load list of migrations
# Load list of executed migrations
# if migrations table doesn't exist, create it
# execute migrations that haven't been executed yet
}
)
)
#' @inheritParams ResultModelManager::DatabaseMigrationManager - this will probably need to be a factory
#' @export`
getMigrationManager <- function(migrationsPath = system.file("sql", "sql_server", "migrations", package = utils::packageName()), ...) {
migrationManager <- ResultModelManager::DatabaseMigrationManager$new(migrationsPath = migrationsPath, ...)
}
It should also be possible to load the manager from a directory structure outside of a package. However, this may have limitations when considering the way sql files are loaded with SqlRender, especially when including scripts that have platform specific changes.
All data migrations are assumed to be in OHSI SQL and stored within a
migration folder. For a package this should be
inst/sql/sql_server/migrations
(these will also likely have
to be platform specific,
e.g. inst/sql/postgresql/migrations
). Inside this folder
only migrations that conform to a regular expression such as
(Migration_[0-9]+)-(.+).sql
. Explicitly, this encodes
several things:
For example, the following file names will work:
Migration_2-MyMigration.sql
Migration_2-v3.2whaterver.sql
Migration_4-TEST.sql
Migration_4-2018922-vAAAA.sql
However, the following would be invalid:
MyMigration.sql # Does not include Migration_1
Migration_2v3.2whaterver.sql # missing -
-TEST_Migration_1.sql # Wrong order
Migraton_4-a.sql # Migration spelt wrong
The check()
call within the DMM should validate these.
Package/module maintainers should also ensure that migrations conform to
this by adding a unit test.
Any table names or other variables should be set using the SqlRender
parameter {DEFAULT @table_name = my_table_name}
. The script
will have the variables:
@results_schema
@table_prefix
The use of @table_prefix
when referencing tables is
crucial. However, in general use of parameters is not seen as necessary
at this time. (NOTE: I’m not sure how we would handle parameterized
migrations, or if this is even desirable. Users can manage this with the
DEFAULT
keyword in SqlRender)
Following a migration the following SQL will execute:
INSERT INTO @results_schema.@table_prexif@migration (migration_completed, migration_order)
VALUES ('<fileName>', <order>);
(Note - this should be completed automatically by the manager class)
It is desirable, but not a hard requirement for the initial package version, to have the following helper utilities.
To provide the ability to add a new migration to an existing project
Function to create a ddl csv that conforms to common standards
In principle this could be added for new projects that will generate results. However, this may be of limited use.
This package will not be able to support the following: