Package: PatientLevelPrediction 6.6.0

Egill Fridgeirsson

PatientLevelPrediction: Develop Clinical Prediction Models Using the Common Data Model

A user friendly way to create patient level prediction models using the Observational Medical Outcomes Partnership Common Data Model. Given a cohort of interest and an outcome of interest, the package can use data in the Common Data Model to build a large set of features. These features can then be used to fit a predictive model with a number of machine learning algorithms. This is further described in Reps (2017) <doi:10.1093/jamia/ocy032>.

Authors:Egill Fridgeirsson [aut, cre], Jenna Reps [aut], Martijn Schuemie [aut], Marc Suchard [aut], Patrick Ryan [aut], Peter Rijnbeek [aut], Observational Health Data Science and Informatics [cph]

PatientLevelPrediction_6.6.0.tar.gz
PatientLevelPrediction_6.6.0.zip(r-4.7)PatientLevelPrediction_6.6.0.zip(r-4.6)PatientLevelPrediction_6.6.0.zip(r-4.5)
PatientLevelPrediction_6.6.0.tgz(r-4.6-any)PatientLevelPrediction_6.6.0.tgz(r-4.5-any)
PatientLevelPrediction_6.6.0.tar.gz(r-4.7-any)PatientLevelPrediction_6.6.0.tar.gz(r-4.6-any)
PatientLevelPrediction_6.6.0.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
PatientLevelPrediction/json (API)

# Install 'PatientLevelPrediction' in R:
install.packages('PatientLevelPrediction', repos = c('https://ohdsi.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/ohdsi/patientlevelprediction/issues

Pkgdown/docs site:https://ohdsi.github.io

Uses libs:
  • openjdk– OpenJDK Java runtime, using Hotspot JIT
Datasets:
  • simulationProfile - A simulation profile for generating synthetic patient level prediction data

On CRAN:

Conda:

hadesopenjdk

10.86 score 219 stars 397 scripts 440 downloads 122 exports 63 dependencies

Last updated from:b510a50a69. Checks:9 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK351
source / vignettesOK268
linux-release-x86_64OK338
macos-release-arm64OK249
macos-oldrel-arm64OK156
windows-develOK355
windows-releaseOK360
windows-oldrelOK306
wasm-releaseOK220

Exports:averagePrecisionbrierScorecalibrationLinecomputeAuccomputeAuprccomputeGridPerformanceconfigurePythoncovariateSummarycreateCohortCovariateSettingscreateDatabaseDetailscreateDatabaseSchemaSettingscreateDefaultExecuteSettingscreateDefaultSplitSettingcreateExecuteSettingscreateExistingSplitSettingscreateFeatureEngineeringSettingscreateGlmModelcreateHyperparameterSettingscreateIterativeImputercreateLearningCurvecreateLogSettingscreateModelDesigncreateNormalizercreatePlpResultTablescreatePreprocessSettingscreateRandomForestFeatureSelectioncreateRareFeatureRemovercreateRestrictPlpDataSettingscreateSampleSettingscreateSimpleImputercreateSklearnIterativeImputercreateSklearnModelcreateSplineSettingscreateStratifiedImputationSettingscreateStudyPopulationcreateStudyPopulationSettingscreateTempModelLoccreateTuningMetriccreateUnivariateFeatureSelectioncreateValidationDesigncreateValidationSettingsdiagnoseMultiplePlpdiagnosePlpevaluatePlpexternalValidateDbPlpextractDatabaseToCsvfitPlpgetCalibrationSummarygetCohortCovariateDatagetDemographicSummarygetEunomiaPlpDatagetPlpDatagetPredictionDistributiongetThresholdSummaryiciinsertCsvToDatabaseinsertResultsToSqlitelistAppendlistCartesianloadPlpAnalysesJsonloadPlpDataloadPlpModelloadPlpResultloadPlpShareableloadPredictionMapIdsmigrateDataModelmodelBasedConcordanceoutcomeSurvivalPlotpfiplotDemographicSummaryplotF1MeasureplotGeneralizabilityplotLearningCurveplotNetBenefitplotPlpplotPrecisionRecallplotPredictedPDFplotPredictionDistributionplotPreferencePDFplotSmoothCalibrationplotSparseCalibrationplotSparseCalibration2plotSparseRocplotVariableScatterplotpredictCyclopspredictGlmpredictPlppreprocessDatarecalibratePlprecalibratePlpRefitrunMultiplePlprunPlpsavePlpAnalysesJsonsavePlpDatasavePlpModelsavePlpResultsavePlpShareablesavePredictionsetAdaBoostsetCoxModelsetDecisionTreesetGradientBoostingMachinesetIterativeHardThresholdingsetLassoLogisticRegressionsetLightGBMsetMLPsetNaiveBayessetPythonEnvironmentsetRandomForestsetRidgeRegressionsetSVMsimulatePlpDatasklearnFromJsonsklearnToJsonsplitDatatoSparseMvalidateExternalvalidateMultiplePlpviewDatabaseResultPlpviewMultiplePlpviewPlp

Dependencies:Andromedabackportsbitbit64blobcachemcheckmateclicliprcpp11crayonCyclopsDatabaseConnectorDBIdbplyrdigestdplyrduckdbfastmapFeatureExtractiongenericsgluehmsjsonlitelatticelifecyclemagrittrMatrixmemoisememuseParallelLoggerpillarpkgconfigprettyunitspROCprogressPRROCpurrrR6RcppRcppEigenreadrrJavarlangRSQLiterstudioapisnowSqlRenderstringistringrsurvivaltibbletidyrtidyselecttriebeardtzdburltoolsutf8vctrsvroomwithrxml2zip

Adding Custom Patient-Level Prediction Algorithms
Introduction | Algorithm Code Structure | Set | Fit | Predict | VarImp | Algorithm Example | Variable importance | Acknowledgments

Last update: 2026-03-09
Started: 2022-03-11

Building patient-level predictive models
Introduction | Study specification | Problem definition 1: Stroke in atrial fibrilation patients | Problem definition 2: Angioedema in ACE inhibitor users | Study population definition | Model development settings | Example 1: Stroke in Atrial fibrillation patients | Study Specification | Study implementation | Cohort instantiation | ATLAS cohort builder | Custom cohorts | Study script creation | Data extraction | Additional inclusion criteria | Splitting the data into training/validation/testing datasets | Preprocessing the training data | Model Development | Example 2: Angioedema in ACE inhibitor users | Spliting the data into training/validation/testing datasets | Study package creation | Internal validation | Discrimination | Smooth Calibration | Other functionality | Demos | Acknowledgments | Appendix 1: Study population settings details

Last update: 2025-07-25
Started: 2015-03-27

Creating Learning Curves
Introduction | Creating the learning curve | Parallel processing | Demo | Publication | Acknowledgments

Last update: 2025-07-25
Started: 2020-10-01

Automatically Build Multiple Patient-Level Predictive Models
Introduction | Creating a model design | Model design example 1 | Model design example 2 | Model design example 3 | Running multiple models | Validating multiple models | Viewing the results | Acknowledgments

Last update: 2025-02-11
Started: 2018-10-05

Best Practice Research
Best practice publications using the OHDSI PatientLevelPrediction framework

Last update: 2025-02-11
Started: 2025-02-06

Clinical Models
Clinical models developed using the OHDSI PatientLevelPrediction framework

Last update: 2025-02-11
Started: 2025-02-06

Patient-Level Prediction Installation Guide
Introduction | Software Prerequisites | Windows Users | Mac/Linux Users | Installing the Package | Installing PatientLevelPrediction using remotes | Creating Python Reticulate Environment | Installation issues | Common issues | python environment Mac/linux users: | Acknowledgments

Last update: 2025-02-11
Started: 2018-05-21

Adding Custom Data Splitting
Introduction | Data Splitting Function Code Structure | Example | Create function | Implement function | Acknowledgments

Last update: 2025-02-06
Started: 2022-03-11

Adding Custom Feature Engineering Functions
Introduction | Feature Engineering Function Code Structure | Example | Create function | Implement function | Acknowledgments

Last update: 2025-02-06
Started: 2022-03-11

Adding Custom Sampling Functions
Introduction | Sample Function Code Structure | Example | Create function | Implement function | Acknowledgments

Last update: 2025-02-06
Started: 2022-03-11

Benchmark Tasks
Benchmark Tasks For Large-Scale Empirical Analyses

Last update: 2025-02-06
Started: 2023-10-12

Constrained Predictors
How to use the PhenotypeLibrary R package | The full set of predictor phenotypes

Last update: 2025-02-06
Started: 2023-10-12

Integration of GIS Data Into OHDSI Model Building
Integration of GIS Data into OHDSI Model Building | Motivation | Step-by-Step Process | Step 1: Create Target & Outcome Cohorts | Step 2: Create Generic PLP Lasso Logistic Regression Model in R | Step 3: Split plpData object to train/test, augment labels with EXPOSURE_OCCURRENCE values | Step 4: Reference augmented label objects in custom feature engineering function | Step 5: Apply new train and test datasets to runPlp and evaluate output

Last update: 2025-02-06
Started: 2025-02-06

Making patient-level predictive network study packages
Introduction | Useful publication | Main steps for running a network study | Step 1 – developing the study | Step 2 – implementing the study part 1 | Step 3 – implementing the study part 2 (make sure the package is functioning as planned and the definitions are valid across sites) | Step 4 – Publication | Package Skeleton - File Structure

Last update: 2025-02-06
Started: 2018-05-14

Readme and manuals

Help Manual

Help pageTopics
Calculate the average precisionaveragePrecision
brierScorebrierScore
calibrationLinecalibrationLine
Compute the area under the ROC curvecomputeAuc
Compute the area under the Precision-Recall curvecomputeAuprc
Computes grid performance for a hyperparameter combination (backwards compatible)computeGridPerformance
Sets up a python environment to use for PLP (can be conda or venv)configurePython
covariateSummarycovariateSummary
Extracts covariates based on cohortscreateCohortCovariateSettings
Create a setting that holds the details about the cdmDatabase connection for data extractioncreateDatabaseDetails
Create the PatientLevelPrediction database result schema settingscreateDatabaseSchemaSettings
Creates default list of settings specifying what parts of runPlp to executecreateDefaultExecuteSettings
Create the settings for defining how the plpData are split into test/validation/train sets using default splitting functions (either random stratified by outcome, time or subject splitting)createDefaultSplitSetting
Creates list of settings specifying what parts of runPlp to executecreateExecuteSettings
Create the settings for defining how the plpData are split into test/validation/train sets using an existing split - good to use for reproducing results from a different runcreateExistingSplitSettings
Create the settings for defining any feature engineering that will be donecreateFeatureEngineeringSettings
createGlmModelcreateGlmModel
Create Hyperparameter SettingscreateHyperparameterSettings
Create Iterative Imputer settingscreateIterativeImputer
createLearningCurvecreateLearningCurve
Create the settings for logging the progression of the analysiscreateLogSettings
Specify settings for developing a single modelcreateModelDesign
Create the settings for normalizing the data @param type The type of normalization to use, either "minmax" or "robust"createNormalizer
Create the results tables to store PatientLevelPrediction models and results into a databasecreatePlpResultTables
Create the settings for preprocessing the trainData.createPreprocessSettings
Create the settings for random foreat based feature selectioncreateRandomForestFeatureSelection
Create the settings for removing rare featurescreateRareFeatureRemover
createRestrictPlpDataSettings define extra restriction settings when calling getPlpDatacreateRestrictPlpDataSettings
Create the settings for defining how the trainData from 'splitData' are sampled using default sample functions.createSampleSettings
Create Simple Imputer settingscreateSimpleImputer
Create scikit-learn Iterative Imputer settingscreateSklearnIterativeImputer
Plug an existing scikit learn python model into the PLP frameworkcreateSklearnModel
Create the settings for adding a spline for continuous variablescreateSplineSettings
Create the settings for using stratified imputation.createStratifiedImputationSettings
Create a study populationcreateStudyPopulation
create the study population settingscreateStudyPopulationSettings
Create a temporary model locationcreateTempModelLoc
Create a tuning metric descriptorcreateTuningMetric
Create the settings for defining any feature selection that will be donecreateUnivariateFeatureSelection
createValidationDesign - Define the validation design for external validationcreateValidationDesign
createValidationSettings define optional settings for performing external validationcreateValidationSettings
Run a list of predictions diagnosesdiagnoseMultiplePlp
diagnostic - Investigates the prediction problem settings - use before training a modeldiagnosePlp
evaluatePlpevaluatePlp
externalValidateDbPlp - Validate a model on new databasesexternalValidateDbPlp
Exports all the results from a database into csv filesextractDatabaseToCsv
fitPlpfitPlp
Get a sparse summary of the calibrationgetCalibrationSummary
Extracts covariates based on cohortsgetCohortCovariateData
Get a demographic summarygetDemographicSummary
Create a plpData object from the Eunomia database'getEunomiaPlpData
Extract the patient level prediction data from the servergetPlpData
Calculates the prediction distributiongetPredictionDistribution
Calculate all measures for sparse ROCgetThresholdSummary
Calculate the Integrated Calibration Index from Austin and Steyerberg https://onlinelibrary.wiley.com/doi/full/10.1002/sim.8281ici
Function to insert results into a database from csvsinsertCsvToDatabase
Create sqlite database with the resultsinsertResultsToSqlite
join two listslistAppend
Cartesian productlistCartesian
Load the multiple prediction json settings from a fileloadPlpAnalysesJson
Load the plpData from a folderloadPlpData
loads the plp modelloadPlpModel
Loads the evalaution dataframeloadPlpResult
Loads the plp result saved as json/csv files for transparent sharingloadPlpShareable
Loads the prediction dataframe to jsonloadPrediction
Map covariate and row Ids so they start from 1MapIds
Migrate Data modelmigrateDataModel
Calculate the model-based concordance, which is a calculation of the expected discrimination performance of a model under the assumption the model predicts the "TRUE" outcome as detailed in van Klaveren et al. https://pubmed.ncbi.nlm.nih.gov/27251001/modelBasedConcordance
Plot the outcome incidence over timeoutcomeSurvivalPlot
Permutation Feature Importancepfi
Plot the Observed vs. expected incidence, by age and genderplotDemographicSummary
Plot the F1 measure efficiency frontier using the sparse thresholdSummary data frameplotF1Measure
Plot the train/test generalizability diagnosticplotGeneralizability
plotLearningCurveplotLearningCurve
Plot the net benefitplotNetBenefit
Plot all the PatientLevelPrediction plotsplotPlp
Plot the precision-recall curve using the sparse thresholdSummary data frameplotPrecisionRecall
Plot the Predicted probability density function, showing prediction overlap between true and false casesplotPredictedPDF
Plot the side-by-side boxplots of prediction distribution, by classplotPredictionDistribution
Plot the preference score probability density function, showing prediction overlap between true and false cases #'plotPreferencePDF
Plot the smooth calibration as detailed in Calster et al. "A calibration heirarchy for risk models was defined: from utopia to empirical data" (2016)plotSmoothCalibration
Plot the calibrationplotSparseCalibration
Plot the conventional calibrationplotSparseCalibration2
Plot the ROC curve using the sparse thresholdSummary data frameplotSparseRoc
Plot the variable importance scatterplotplotVariableScatterplot
Create predictive probabilitiespredictCyclops
predict using a logistic regression modelpredictGlm
predictPlppredictPlp
A function that wraps around FeatureExtraction::tidyCovariateData to normalise the data and remove rare or redundant featurespreprocessData
Print a plpData objectprint.plpData
Print a summary.plpData objectprint.summary.plpData
recalibratePlprecalibratePlp
recalibratePlpRefitrecalibratePlpRefit
Run a list of predictions analysesrunMultiplePlp
runPlp - Develop and internally evaluate a model using specified settingsrunPlp
Save the modelDesignList to a json filesavePlpAnalysesJson
Save the plpData to foldersavePlpData
Saves the plp modelsavePlpModel
Saves the result from runPlp into the location directorysavePlpResult
Save the plp result as json files and csv files for transparent sharingsavePlpShareable
Saves the prediction dataframe to a json filesavePrediction
Create setting for AdaBoost with python DecisionTreeClassifier base estimatorsetAdaBoost
Create setting for lasso Cox modelsetCoxModel
Create setting for the scikit-learn DecisionTree with pythonsetDecisionTree
Create setting for gradient boosting machine model using gbm_xgboost implementationsetGradientBoostingMachine
Create setting for Iterative Hard Thresholding modelsetIterativeHardThresholding
Create modelSettings for lasso logistic regressionsetLassoLogisticRegression
Create setting for gradient boosting machine model using lightGBM (https://github.com/microsoft/LightGBM/tree/master/R-package).setLightGBM
Create setting for neural network model with python's scikit-learn. For bigger models, consider using 'DeepPatientLevelPrediction' package.setMLP
Create setting for naive bayes model with pythonsetNaiveBayes
Use the python environment created using configurePython()setPythonEnvironment
Create setting for random forest model using sklearnsetRandomForest
Create modelSettings for ridge logistic regressionsetRidgeRegression
Create setting for the python sklearn SVM (SVC function)setSVM
Generate simulated datasimulatePlpData
A simulation profile for generating synthetic patient level prediction datasimulationProfile
Loads sklearn python model from jsonsklearnFromJson
Saves sklearn python model object to json in pathsklearnToJson
Split the plpData into test/train sets using a splitting settings of class 'splitSettings'splitData
Summarize a plpData objectsummary.plpData
Convert the plpData in COO format into a sparse R matrixtoSparseM
validateExternal - Validate model performance on new datavalidateExternal
externally validate the multiple plp models across new datasetsvalidateMultiplePlp
open a local shiny app for viewing the result of a PLP analyses from a databaseviewDatabaseResultPlp
open a local shiny app for viewing the result of a multiple PLP analysesviewMultiplePlp
viewPlp - Interactively view the performance and model settingsviewPlp