Package: PatientLevelPrediction 6.4.0

Egill Fridgeirsson

PatientLevelPrediction: Develop Clinical Prediction Models Using the Common Data Model

A user friendly way to create patient level prediction models using the Observational Medical Outcomes Partnership Common Data Model. Given a cohort of interest and an outcome of interest, the package can use data in the Common Data Model to build a large set of features. These features can then be used to fit a predictive model with a number of machine learning algorithms. This is further described in Reps (2017) <doi:10.1093/jamia/ocy032>.

Authors:Egill Fridgeirsson [aut, cre], Jenna Reps [aut], Martijn Schuemie [aut], Marc Suchard [aut], Patrick Ryan [aut], Peter Rijnbeek [aut], Observational Health Data Science and Informatics [cph]

PatientLevelPrediction_6.4.0.tar.gz
PatientLevelPrediction_6.4.0.zip(r-4.5)PatientLevelPrediction_6.4.0.zip(r-4.4)PatientLevelPrediction_6.4.0.zip(r-4.3)
PatientLevelPrediction_6.4.0.tgz(r-4.5-any)PatientLevelPrediction_6.4.0.tgz(r-4.4-any)PatientLevelPrediction_6.4.0.tgz(r-4.3-any)
PatientLevelPrediction_6.4.0.tar.gz(r-4.5-noble)PatientLevelPrediction_6.4.0.tar.gz(r-4.4-noble)
PatientLevelPrediction_6.4.0.tgz(r-4.4-emscripten)PatientLevelPrediction_6.4.0.tgz(r-4.3-emscripten)
PatientLevelPrediction.pdf |PatientLevelPrediction.html
PatientLevelPrediction/json (API)
NEWS

# Install 'PatientLevelPrediction' in R:
install.packages('PatientLevelPrediction', repos = c('https://ohdsi.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/ohdsi/patientlevelprediction/issues

Pkgdown site:https://ohdsi.github.io

Uses libs:
  • openjdk– OpenJDK Java runtime, using Hotspot JIT
Datasets:
  • simulationProfile - A simulation profile for generating synthetic patient level prediction data

On CRAN:

hadesopenjdk

10.80 score 189 stars 297 scripts 117 exports 65 dependencies

Last updated 8 days agofrom:bc4135235b. Checks:8 OK. Indexed: yes.

TargetResultLatest binary
Doc / VignettesOKFeb 12 2025
R-4.5-winOKFeb 12 2025
R-4.5-macOKFeb 12 2025
R-4.5-linuxOKFeb 12 2025
R-4.4-winOKFeb 12 2025
R-4.4-macOKFeb 12 2025
R-4.3-winOKFeb 12 2025
R-4.3-macOKFeb 12 2025

Exports:averagePrecisionbrierScorecalibrationLinecomputeAuccomputeGridPerformanceconfigurePythoncovariateSummarycreateCohortCovariateSettingscreateDatabaseDetailscreateDatabaseSchemaSettingscreateDefaultExecuteSettingscreateDefaultSplitSettingcreateExecuteSettingscreateExistingSplitSettingscreateFeatureEngineeringSettingscreateGlmModelcreateIterativeImputercreateLearningCurvecreateLogSettingscreateModelDesigncreateNormalizercreatePlpResultTablescreatePreprocessSettingscreateRandomForestFeatureSelectioncreateRareFeatureRemovercreateRestrictPlpDataSettingscreateSampleSettingscreateSimpleImputercreateSklearnModelcreateSplineSettingscreateStratifiedImputationSettingscreateStudyPopulationcreateStudyPopulationSettingscreateTempModelLoccreateUnivariateFeatureSelectioncreateValidationDesigncreateValidationSettingsdiagnoseMultiplePlpdiagnosePlpevaluatePlpexternalValidateDbPlpextractDatabaseToCsvfitPlpgetCalibrationSummarygetCohortCovariateDatagetDemographicSummarygetEunomiaPlpDatagetPlpDatagetPredictionDistributiongetThresholdSummaryiciinsertCsvToDatabaseinsertResultsToSqlitelistAppendlistCartesianloadPlpAnalysesJsonloadPlpDataloadPlpModelloadPlpResultloadPlpShareableloadPredictionMapIdsmigrateDataModelmodelBasedConcordanceoutcomeSurvivalPlotpfiplotDemographicSummaryplotF1MeasureplotGeneralizabilityplotLearningCurveplotNetBenefitplotPlpplotPrecisionRecallplotPredictedPDFplotPredictionDistributionplotPreferencePDFplotSmoothCalibrationplotSparseCalibrationplotSparseCalibration2plotSparseRocplotVariableScatterplotpredictCyclopspredictGlmpredictPlppreprocessDatarecalibratePlprecalibratePlpRefitrunMultiplePlprunPlpsavePlpAnalysesJsonsavePlpDatasavePlpModelsavePlpResultsavePlpShareablesavePredictionsetAdaBoostsetCoxModelsetDecisionTreesetGradientBoostingMachinesetIterativeHardThresholdingsetLassoLogisticRegressionsetLightGBMsetMLPsetNaiveBayessetPythonEnvironmentsetRandomForestsetSVMsimulatePlpDatasklearnFromJsonsklearnToJsonsplitDatatoSparseMvalidateExternalvalidateMultiplePlpviewDatabaseResultPlpviewMultiplePlpviewPlp

Dependencies:Andromedabackportsbitbit64blobcachemcheckmateclicliprcpp11crayonCyclopsDatabaseConnectorDBIdbplyrdigestdplyrfansifastmapFeatureExtractiongenericsgluehmsjsonlitelatticelifecyclemagrittrMatrixmemoisememuseParallelLoggerpillarpkgconfigplogrplyrprettyunitspROCprogressPRROCpurrrR6RcppRcppEigenRcppParallelreadrrJavarlangRSQLitesnowSqlRenderstringistringrsurvivaltibbletidyrtidyselecttriebeardtzdburltoolsutf8vctrsvroomwithrxml2zip

Adding Custom Data Splitting

Rendered fromAddingCustomSplitting.Rmdusingknitr::rmarkdownon Feb 12 2025.

Last update: 2025-02-06
Started: 2022-03-11

Adding Custom Feature Engineering Functions

Rendered fromAddingCustomFeatureEngineering.Rmdusingknitr::rmarkdownon Feb 12 2025.

Last update: 2025-02-06
Started: 2022-03-11

Adding Custom Patient-Level Prediction Algorithms

Rendered fromAddingCustomModels.Rmdusingknitr::rmarkdownon Feb 12 2025.

Last update: 2025-02-06
Started: 2022-03-11

Adding Custom Sampling Functions

Rendered fromAddingCustomSamples.Rmdusingknitr::rmarkdownon Feb 12 2025.

Last update: 2025-02-06
Started: 2022-03-11

Automatically Build Multiple Patient-Level Predictive Models

Rendered fromBuildingMultiplePredictiveModels.Rmdusingknitr::rmarkdownon Feb 12 2025.

Last update: 2025-02-11
Started: 2018-10-05

Benchmark Tasks

Rendered fromBenchmarkTasks.Rmdusingknitr::rmarkdownon Feb 12 2025.

Last update: 2025-02-06
Started: 2023-10-12

Best Practice Research

Rendered fromBestPractices.Rmdusingknitr::rmarkdownon Feb 12 2025.

Last update: 2025-02-11
Started: 2025-02-06

Building patient-level predictive models

Rendered fromBuildingPredictiveModels.Rmdusingknitr::rmarkdownon Feb 12 2025.

Last update: 2025-02-11
Started: 2015-03-27

Clinical Models

Rendered fromClinicalModels.Rmdusingknitr::rmarkdownon Feb 12 2025.

Last update: 2025-02-11
Started: 2025-02-06

Constrained Predictors

Rendered fromConstrainedPredictors.Rmdusingknitr::rmarkdownon Feb 12 2025.

Last update: 2025-02-06
Started: 2023-10-12

Creating Learning Curves

Rendered fromCreatingLearningCurves.Rmdusingknitr::rmarkdownon Feb 12 2025.

Last update: 2025-02-11
Started: 2020-10-01

Integration of GIS Data Into OHDSI Model Building

Rendered fromGISExample.Rmdusingknitr::rmarkdownon Feb 12 2025.

Last update: 2025-02-06
Started: 2025-02-06

Making patient-level predictive network study packages

Rendered fromCreatingNetworkStudies.Rmdusingknitr::rmarkdownon Feb 12 2025.

Last update: 2025-02-06
Started: 2018-05-14

Patient-Level Prediction Installation Guide

Rendered fromInstallationGuide.Rmdusingknitr::rmarkdownon Feb 12 2025.

Last update: 2025-02-11
Started: 2018-05-21

Readme and manuals

Help Manual

Help pageTopics
Calculate the average precisionaveragePrecision
brierScorebrierScore
calibrationLinecalibrationLine
Compute the area under the ROC curvecomputeAuc
Computes grid performance with a specified performance functioncomputeGridPerformance
Sets up a python environment to use for PLP (can be conda or venv)configurePython
covariateSummarycovariateSummary
Extracts covariates based on cohortscreateCohortCovariateSettings
Create a setting that holds the details about the cdmDatabase connection for data extractioncreateDatabaseDetails
Create the PatientLevelPrediction database result schema settingscreateDatabaseSchemaSettings
Creates default list of settings specifying what parts of runPlp to executecreateDefaultExecuteSettings
Create the settings for defining how the plpData are split into test/validation/train sets using default splitting functions (either random stratified by outcome, time or subject splitting)createDefaultSplitSetting
Creates list of settings specifying what parts of runPlp to executecreateExecuteSettings
Create the settings for defining how the plpData are split into test/validation/train sets using an existing split - good to use for reproducing results from a different runcreateExistingSplitSettings
Create the settings for defining any feature engineering that will be donecreateFeatureEngineeringSettings
createGlmModelcreateGlmModel
Create Iterative Imputer settingscreateIterativeImputer
createLearningCurvecreateLearningCurve
Create the settings for logging the progression of the analysiscreateLogSettings
Specify settings for developing a single modelcreateModelDesign
Create the settings for normalizing the data @param type The type of normalization to use, either "minmax" or "robust"createNormalizer
Create the results tables to store PatientLevelPrediction models and results into a databasecreatePlpResultTables
Create the settings for preprocessing the trainData.createPreprocessSettings
Create the settings for random foreat based feature selectioncreateRandomForestFeatureSelection
Create the settings for removing rare featurescreateRareFeatureRemover
createRestrictPlpDataSettings define extra restriction settings when calling getPlpDatacreateRestrictPlpDataSettings
Create the settings for defining how the trainData from 'splitData' are sampled using default sample functions.createSampleSettings
Create Simple Imputer settingscreateSimpleImputer
Plug an existing scikit learn python model into the PLP frameworkcreateSklearnModel
Create the settings for adding a spline for continuous variablescreateSplineSettings
Create the settings for using stratified imputation.createStratifiedImputationSettings
Create a study populationcreateStudyPopulation
create the study population settingscreateStudyPopulationSettings
Create a temporary model locationcreateTempModelLoc
Create the settings for defining any feature selection that will be donecreateUnivariateFeatureSelection
createValidationDesign - Define the validation design for external validationcreateValidationDesign
createValidationSettings define optional settings for performing external validationcreateValidationSettings
Run a list of predictions diagnosesdiagnoseMultiplePlp
diagnostic - Investigates the prediction problem settings - use before training a modeldiagnosePlp
evaluatePlpevaluatePlp
externalValidateDbPlp - Validate a model on new databasesexternalValidateDbPlp
Exports all the results from a database into csv filesextractDatabaseToCsv
fitPlpfitPlp
Get a sparse summary of the calibrationgetCalibrationSummary
Extracts covariates based on cohortsgetCohortCovariateData
Get a demographic summarygetDemographicSummary
Create a plpData object from the Eunomia database'getEunomiaPlpData
Extract the patient level prediction data from the servergetPlpData
Calculates the prediction distributiongetPredictionDistribution
Calculate all measures for sparse ROCgetThresholdSummary
Calculate the Integrated Calibration Index from Austin and Steyerberg https://onlinelibrary.wiley.com/doi/full/10.1002/sim.8281ici
Function to insert results into a database from csvsinsertCsvToDatabase
Create sqlite database with the resultsinsertResultsToSqlite
join two listslistAppend
Cartesian productlistCartesian
Load the multiple prediction json settings from a fileloadPlpAnalysesJson
Load the plpData from a folderloadPlpData
loads the plp modelloadPlpModel
Loads the evalaution dataframeloadPlpResult
Loads the plp result saved as json/csv files for transparent sharingloadPlpShareable
Loads the prediction dataframe to jsonloadPrediction
Map covariate and row Ids so they start from 1MapIds
Migrate Data modelmigrateDataModel
Calculate the model-based concordance, which is a calculation of the expected discrimination performance of a model under the assumption the model predicts the "TRUE" outcome as detailed in van Klaveren et al. https://pubmed.ncbi.nlm.nih.gov/27251001/modelBasedConcordance
Plot the outcome incidence over timeoutcomeSurvivalPlot
Permutation Feature Importancepfi
Plot the Observed vs. expected incidence, by age and genderplotDemographicSummary
Plot the F1 measure efficiency frontier using the sparse thresholdSummary data frameplotF1Measure
Plot the train/test generalizability diagnosticplotGeneralizability
plotLearningCurveplotLearningCurve
Plot the net benefitplotNetBenefit
Plot all the PatientLevelPrediction plotsplotPlp
Plot the precision-recall curve using the sparse thresholdSummary data frameplotPrecisionRecall
Plot the Predicted probability density function, showing prediction overlap between true and false casesplotPredictedPDF
Plot the side-by-side boxplots of prediction distribution, by classplotPredictionDistribution
Plot the preference score probability density function, showing prediction overlap between true and false cases #'plotPreferencePDF
Plot the smooth calibration as detailed in Calster et al. "A calibration heirarchy for risk models was defined: from utopia to empirical data" (2016)plotSmoothCalibration
Plot the calibrationplotSparseCalibration
Plot the conventional calibrationplotSparseCalibration2
Plot the ROC curve using the sparse thresholdSummary data frameplotSparseRoc
Plot the variable importance scatterplotplotVariableScatterplot
Create predictive probabilitiespredictCyclops
predict using a logistic regression modelpredictGlm
predictPlppredictPlp
A function that wraps around FeatureExtraction::tidyCovariateData to normalise the data and remove rare or redundant featurespreprocessData
Print a plpData objectprint.plpData
Print a summary.plpData objectprint.summary.plpData
recalibratePlprecalibratePlp
recalibratePlpRefitrecalibratePlpRefit
Run a list of predictions analysesrunMultiplePlp
runPlp - Develop and internally evaluate a model using specified settingsrunPlp
Save the modelDesignList to a json filesavePlpAnalysesJson
Save the plpData to foldersavePlpData
Saves the plp modelsavePlpModel
Saves the result from runPlp into the location directorysavePlpResult
Save the plp result as json files and csv files for transparent sharingsavePlpShareable
Saves the prediction dataframe to a json filesavePrediction
Create setting for AdaBoost with python DecisionTreeClassifier base estimatorsetAdaBoost
Create setting for lasso Cox modelsetCoxModel
Create setting for the scikit-learn DecisionTree with pythonsetDecisionTree
Create setting for gradient boosting machine model using gbm_xgboost implementationsetGradientBoostingMachine
Create setting for Iterative Hard Thresholding modelsetIterativeHardThresholding
Create modelSettings for lasso logistic regressionsetLassoLogisticRegression
Create setting for gradient boosting machine model using lightGBM (https://github.com/microsoft/LightGBM/tree/master/R-package).setLightGBM
Create setting for neural network model with python's scikit-learn. For bigger models, consider using 'DeepPatientLevelPrediction' package.setMLP
Create setting for naive bayes model with pythonsetNaiveBayes
Use the python environment created using configurePython()setPythonEnvironment
Create setting for random forest model using sklearnsetRandomForest
Create setting for the python sklearn SVM (SVC function)setSVM
Generate simulated datasimulatePlpData
A simulation profile for generating synthetic patient level prediction datasimulationProfile
Loads sklearn python model from jsonsklearnFromJson
Saves sklearn python model object to json in pathsklearnToJson
Split the plpData into test/train sets using a splitting settings of class 'splitSettings'splitData
Summarize a plpData objectsummary.plpData
Convert the plpData in COO format into a sparse R matrixtoSparseM
validateExternal - Validate model performance on new datavalidateExternal
externally validate the multiple plp models across new datasetsvalidateMultiplePlp
open a local shiny app for viewing the result of a PLP analyses from a databaseviewDatabaseResultPlp
open a local shiny app for viewing the result of a multiple PLP analysesviewMultiplePlp
viewPlp - Interactively view the performance and model settingsviewPlp