Package: PatientLevelPrediction 6.4.0

Egill Fridgeirsson

PatientLevelPrediction: Develop Clinical Prediction Models Using the Common Data Model

A user friendly way to create patient level prediction models using the Observational Medical Outcomes Partnership Common Data Model. Given a cohort of interest and an outcome of interest, the package can use data in the Common Data Model to build a large set of features. These features can then be used to fit a predictive model with a number of machine learning algorithms. This is further described in Reps (2017) <doi:10.1093/jamia/ocy032>.

Authors:Egill Fridgeirsson [aut, cre], Jenna Reps [aut], Martijn Schuemie [aut], Marc Suchard [aut], Patrick Ryan [aut], Peter Rijnbeek [aut], Observational Health Data Science and Informatics [cph]

# Install 'PatientLevelPrediction' in R:

install.packages('PatientLevelPrediction', repos = c('https://ohdsi.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/ohdsi/patientlevelprediction/issues

Pkgdown site:https://ohdsi.github.io

Uses libs:

openjdk– OpenJDK Java runtime, using Hotspot JIT

Datasets:

simulationProfile - A simulation profile for generating synthetic patient level prediction data

On CRAN:

hades openjdk

10.85 score 190 stars 297 scripts 304 downloads 117 exports 65 dependencies

Last updated 13 days agofrom:94e316df85. Checks:9 OK. Indexed: yes.

Target	Result	Latest binary
Doc / Vignettes	OK	Mar 07 2025
R-4.5-win	OK	Mar 07 2025
R-4.5-mac	OK	Mar 07 2025
R-4.5-linux	OK	Mar 07 2025
R-4.4-win	OK	Mar 07 2025
R-4.4-mac	OK	Mar 07 2025
R-4.4-linux	OK	Mar 07 2025
R-4.3-win	OK	Mar 07 2025
R-4.3-mac	OK	Mar 07 2025

Exports:averagePrecision brierScore calibrationLine computeAuc computeGridPerformance configurePython covariateSummary createCohortCovariateSettings createDatabaseDetails createDatabaseSchemaSettings createDefaultExecuteSettings createDefaultSplitSetting createExecuteSettings createExistingSplitSettings createFeatureEngineeringSettings createGlmModel createIterativeImputer createLearningCurve createLogSettings createModelDesign createNormalizer createPlpResultTables createPreprocessSettings createRandomForestFeatureSelection createRareFeatureRemover createRestrictPlpDataSettings createSampleSettings createSimpleImputer createSklearnModel createSplineSettings createStratifiedImputationSettings createStudyPopulation createStudyPopulationSettings createTempModelLoc createUnivariateFeatureSelection createValidationDesign createValidationSettings diagnoseMultiplePlp diagnosePlp evaluatePlp externalValidateDbPlp extractDatabaseToCsv fitPlp getCalibrationSummary getCohortCovariateData getDemographicSummary getEunomiaPlpData getPlpData getPredictionDistribution getThresholdSummary ici insertCsvToDatabase insertResultsToSqlite listAppend listCartesian loadPlpAnalysesJson loadPlpData loadPlpModel loadPlpResult loadPlpShareable loadPrediction MapIds migrateDataModel modelBasedConcordance outcomeSurvivalPlot pfi plotDemographicSummary plotF1Measure plotGeneralizability plotLearningCurve plotNetBenefit plotPlp plotPrecisionRecall plotPredictedPDF plotPredictionDistribution plotPreferencePDF plotSmoothCalibration plotSparseCalibration plotSparseCalibration2 plotSparseRoc plotVariableScatterplot predictCyclops predictGlm predictPlp preprocessData recalibratePlp recalibratePlpRefit runMultiplePlp runPlp savePlpAnalysesJson savePlpData savePlpModel savePlpResult savePlpShareable savePrediction setAdaBoost setCoxModel setDecisionTree setGradientBoostingMachine setIterativeHardThresholding setLassoLogisticRegression setLightGBM setMLP setNaiveBayes setPythonEnvironment setRandomForest setSVM simulatePlpData sklearnFromJson sklearnToJson splitData toSparseM validateExternal validateMultiplePlp viewDatabaseResultPlp viewMultiplePlp viewPlp

Dependencies:Andromeda backports bit bit64 blob cachem checkmate cli clipr cpp11 crayon Cyclops DatabaseConnector DBI dbplyr digest dplyr fansi fastmap FeatureExtraction generics glue hms jsonlite lattice lifecycle magrittr Matrix memoise memuse ParallelLogger pillar pkgconfig plogr plyr prettyunits pROC progress PRROC purrr R6 Rcpp RcppEigen RcppParallel readr rJava rlang RSQLite snow SqlRender stringi stringr survival tibble tidyr tidyselect triebeard tzdb urltools utf8 vctrs vroom withr xml2 zip

Adding Custom Data Splitting

Jenna Reps

Rendered fromAddingCustomSplitting.Rmdusingknitr::rmarkdownon Mar 07 2025.

Last update: 2025-02-06
Started: 2022-03-11

Adding Custom Feature Engineering Functions

Jenna Reps, Egill Fridgeirsson

Rendered fromAddingCustomFeatureEngineering.Rmdusingknitr::rmarkdownon Mar 07 2025.

Last update: 2025-02-06
Started: 2022-03-11

Adding Custom Patient-Level Prediction Algorithms

Jenna Reps, Martijn J. Schuemie, Patrick B. Ryan, Peter R. Rijnbeek

Rendered fromAddingCustomModels.Rmdusingknitr::rmarkdownon Mar 07 2025.

Last update: 2025-02-06
Started: 2022-03-11

Adding Custom Sampling Functions

Jenna Reps

Rendered fromAddingCustomSamples.Rmdusingknitr::rmarkdownon Mar 07 2025.

Last update: 2025-02-06
Started: 2022-03-11

Automatically Build Multiple Patient-Level Predictive Models

Jenna Reps, Martijn J. Schuemie, Patrick B. Ryan, Peter R. Rijnbeek

Rendered fromBuildingMultiplePredictiveModels.Rmdusingknitr::rmarkdownon Mar 07 2025.

Last update: 2025-02-11
Started: 2018-10-05

Benchmark Tasks

Jenna Reps, Ross Williams, Peter R. Rijnbeek

Rendered fromBenchmarkTasks.Rmdusingknitr::rmarkdownon Mar 07 2025.

Last update: 2025-02-06
Started: 2023-10-12

Best Practice Research

Jenna Reps, Peter R. Rijnbeek

Rendered fromBestPractices.Rmdusingknitr::rmarkdownon Mar 07 2025.

Last update: 2025-02-11
Started: 2025-02-06

Building patient-level predictive models

Jenna Reps, Martijn J. Schuemie, Patrick B. Ryan, Peter R. Rijnbeek

Rendered fromBuildingPredictiveModels.Rmdusingknitr::rmarkdownon Mar 07 2025.

Last update: 2025-02-11
Started: 2015-03-27

Clinical Models

Jenna Reps, Peter R. Rijnbeek

Rendered fromClinicalModels.Rmdusingknitr::rmarkdownon Mar 07 2025.

Last update: 2025-02-11
Started: 2025-02-06

Constrained Predictors

Jenna Reps

Rendered fromConstrainedPredictors.Rmdusingknitr::rmarkdownon Mar 07 2025.

Last update: 2025-02-06
Started: 2023-10-12

Creating Learning Curves

Luis H. John, Jenna M. Reps, Peter R. Rijnbeek

Rendered fromCreatingLearningCurves.Rmdusingknitr::rmarkdownon Mar 07 2025.

Last update: 2025-02-11
Started: 2020-10-01

Integration of GIS Data Into OHDSI Model Building

Jared Houghtaling

Rendered fromGISExample.Rmdusingknitr::rmarkdownon Mar 07 2025.

Last update: 2025-02-06
Started: 2025-02-06

Making patient-level predictive network study packages

Jenna Reps, Martijn J. Schuemie, Patrick B. Ryan, Peter R. Rijnbeek

Rendered fromCreatingNetworkStudies.Rmdusingknitr::rmarkdownon Mar 07 2025.

Last update: 2025-02-06
Started: 2018-05-14

Patient-Level Prediction Installation Guide

Jenna Reps, Peter R. Rijnbeek, Egill Fridgeirsson

Rendered fromInstallationGuide.Rmdusingknitr::rmarkdownon Mar 07 2025.

Last update: 2025-02-11
Started: 2018-05-21

Help page	Topics
Calculate the average precision	averagePrecision
brierScore	brierScore
calibrationLine	calibrationLine
Compute the area under the ROC curve	computeAuc
Computes grid performance with a specified performance function	computeGridPerformance
Sets up a python environment to use for PLP (can be conda or venv)	configurePython
covariateSummary	covariateSummary
Extracts covariates based on cohorts	createCohortCovariateSettings
Create a setting that holds the details about the cdmDatabase connection for data extraction	createDatabaseDetails
Create the PatientLevelPrediction database result schema settings	createDatabaseSchemaSettings
Creates default list of settings specifying what parts of runPlp to execute	createDefaultExecuteSettings
Create the settings for defining how the plpData are split into test/validation/train sets using default splitting functions (either random stratified by outcome, time or subject splitting)	createDefaultSplitSetting
Creates list of settings specifying what parts of runPlp to execute	createExecuteSettings
Create the settings for defining how the plpData are split into test/validation/train sets using an existing split - good to use for reproducing results from a different run	createExistingSplitSettings
Create the settings for defining any feature engineering that will be done	createFeatureEngineeringSettings
createGlmModel	createGlmModel
Create Iterative Imputer settings	createIterativeImputer
createLearningCurve	createLearningCurve
Create the settings for logging the progression of the analysis	createLogSettings
Specify settings for developing a single model	createModelDesign
Create the settings for normalizing the data @param type The type of normalization to use, either "minmax" or "robust"	createNormalizer
Create the results tables to store PatientLevelPrediction models and results into a database	createPlpResultTables
Create the settings for preprocessing the trainData.	createPreprocessSettings
Create the settings for random foreat based feature selection	createRandomForestFeatureSelection
Create the settings for removing rare features	createRareFeatureRemover
createRestrictPlpDataSettings define extra restriction settings when calling getPlpData	createRestrictPlpDataSettings
Create the settings for defining how the trainData from 'splitData' are sampled using default sample functions.	createSampleSettings
Create Simple Imputer settings	createSimpleImputer
Plug an existing scikit learn python model into the PLP framework	createSklearnModel
Create the settings for adding a spline for continuous variables	createSplineSettings
Create the settings for using stratified imputation.	createStratifiedImputationSettings
Create a study population	createStudyPopulation
create the study population settings	createStudyPopulationSettings
Create a temporary model location	createTempModelLoc
Create the settings for defining any feature selection that will be done	createUnivariateFeatureSelection
createValidationDesign - Define the validation design for external validation	createValidationDesign
createValidationSettings define optional settings for performing external validation	createValidationSettings
Run a list of predictions diagnoses	diagnoseMultiplePlp
diagnostic - Investigates the prediction problem settings - use before training a model	diagnosePlp
evaluatePlp	evaluatePlp
externalValidateDbPlp - Validate a model on new databases	externalValidateDbPlp
Exports all the results from a database into csv files	extractDatabaseToCsv
fitPlp	fitPlp
Get a sparse summary of the calibration	getCalibrationSummary
Extracts covariates based on cohorts	getCohortCovariateData
Get a demographic summary	getDemographicSummary
Create a plpData object from the Eunomia database'	getEunomiaPlpData
Extract the patient level prediction data from the server	getPlpData
Calculates the prediction distribution	getPredictionDistribution
Calculate all measures for sparse ROC	getThresholdSummary
Calculate the Integrated Calibration Index from Austin and Steyerberg https://onlinelibrary.wiley.com/doi/full/10.1002/sim.8281	ici
Function to insert results into a database from csvs	insertCsvToDatabase
Create sqlite database with the results	insertResultsToSqlite
join two lists	listAppend
Cartesian product	listCartesian
Load the multiple prediction json settings from a file	loadPlpAnalysesJson
Load the plpData from a folder	loadPlpData
loads the plp model	loadPlpModel
Loads the evalaution dataframe	loadPlpResult
Loads the plp result saved as json/csv files for transparent sharing	loadPlpShareable
Loads the prediction dataframe to json	loadPrediction
Map covariate and row Ids so they start from 1	MapIds
Migrate Data model	migrateDataModel
Calculate the model-based concordance, which is a calculation of the expected discrimination performance of a model under the assumption the model predicts the "TRUE" outcome as detailed in van Klaveren et al. https://pubmed.ncbi.nlm.nih.gov/27251001/	modelBasedConcordance
Plot the outcome incidence over time	outcomeSurvivalPlot
Permutation Feature Importance	pfi
Plot the Observed vs. expected incidence, by age and gender	plotDemographicSummary
Plot the F1 measure efficiency frontier using the sparse thresholdSummary data frame	plotF1Measure
Plot the train/test generalizability diagnostic	plotGeneralizability
plotLearningCurve	plotLearningCurve
Plot the net benefit	plotNetBenefit
Plot all the PatientLevelPrediction plots	plotPlp
Plot the precision-recall curve using the sparse thresholdSummary data frame	plotPrecisionRecall
Plot the Predicted probability density function, showing prediction overlap between true and false cases	plotPredictedPDF
Plot the side-by-side boxplots of prediction distribution, by class	plotPredictionDistribution
Plot the preference score probability density function, showing prediction overlap between true and false cases #'	plotPreferencePDF
Plot the smooth calibration as detailed in Calster et al. "A calibration heirarchy for risk models was defined: from utopia to empirical data" (2016)	plotSmoothCalibration
Plot the calibration	plotSparseCalibration
Plot the conventional calibration	plotSparseCalibration2
Plot the ROC curve using the sparse thresholdSummary data frame	plotSparseRoc
Plot the variable importance scatterplot	plotVariableScatterplot
Create predictive probabilities	predictCyclops
predict using a logistic regression model	predictGlm
predictPlp	predictPlp
A function that wraps around FeatureExtraction::tidyCovariateData to normalise the data and remove rare or redundant features	preprocessData
Print a plpData object	print.plpData
Print a summary.plpData object	print.summary.plpData
recalibratePlp	recalibratePlp
recalibratePlpRefit	recalibratePlpRefit
Run a list of predictions analyses	runMultiplePlp
runPlp - Develop and internally evaluate a model using specified settings	runPlp
Save the modelDesignList to a json file	savePlpAnalysesJson
Save the plpData to folder	savePlpData
Saves the plp model	savePlpModel
Saves the result from runPlp into the location directory	savePlpResult
Save the plp result as json files and csv files for transparent sharing	savePlpShareable
Saves the prediction dataframe to a json file	savePrediction
Create setting for AdaBoost with python DecisionTreeClassifier base estimator	setAdaBoost
Create setting for lasso Cox model	setCoxModel
Create setting for the scikit-learn DecisionTree with python	setDecisionTree
Create setting for gradient boosting machine model using gbm_xgboost implementation	setGradientBoostingMachine
Create setting for Iterative Hard Thresholding model	setIterativeHardThresholding
Create modelSettings for lasso logistic regression	setLassoLogisticRegression
Create setting for gradient boosting machine model using lightGBM (https://github.com/microsoft/LightGBM/tree/master/R-package).	setLightGBM
Create setting for neural network model with python's scikit-learn. For bigger models, consider using 'DeepPatientLevelPrediction' package.	setMLP
Create setting for naive bayes model with python	setNaiveBayes
Use the python environment created using configurePython()	setPythonEnvironment
Create setting for random forest model using sklearn	setRandomForest
Create setting for the python sklearn SVM (SVC function)	setSVM
Generate simulated data	simulatePlpData
A simulation profile for generating synthetic patient level prediction data	simulationProfile
Loads sklearn python model from json	sklearnFromJson
Saves sklearn python model object to json in path	sklearnToJson
Split the plpData into test/train sets using a splitting settings of class 'splitSettings'	splitData
Summarize a plpData object	summary.plpData
Convert the plpData in COO format into a sparse R matrix	toSparseM
validateExternal - Validate model performance on new data	validateExternal
externally validate the multiple plp models across new datasets	validateMultiplePlp
open a local shiny app for viewing the result of a PLP analyses from a database	viewDatabaseResultPlp
open a local shiny app for viewing the result of a multiple PLP analyses	viewMultiplePlp
viewPlp - Interactively view the performance and model settings	viewPlp

Package: PatientLevelPrediction 6.4.0

PatientLevelPrediction: Develop Clinical Prediction Models Using the Common Data Model

Adding Custom Data Splitting

Adding Custom Feature Engineering Functions

Adding Custom Patient-Level Prediction Algorithms

Adding Custom Sampling Functions

Automatically Build Multiple Patient-Level Predictive Models

Benchmark Tasks

Best Practice Research

Building patient-level predictive models

Clinical Models

Constrained Predictors

Creating Learning Curves

Integration of GIS Data Into OHDSI Model Building

Making patient-level predictive network study packages

Patient-Level Prediction Installation Guide

Citation

Development and contributors

Readme and manuals

Help Manual

Usage by other packages (reverse dependencies)