| Title: | Generator of Synthetic Patient Data for the OMOP Common Data Model |
|---|---|
| Description: | Tools to generate synthetic patient-level test datasets in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). Includes a chat-driven generator backed by large language models and an interactive 'shiny' designer for editing CDM test sets. |
| Authors: | Cesar Barboza [aut, cre] (ORCID: <https://orcid.org/0009-0002-4453-3071>), Ger Inberg [aut] (ORCID: <https://orcid.org/0000-0001-8993-8748>), Adam Black [aut] (ORCID: <https://orcid.org/0000-0001-5576-8701>) |
| Maintainer: | Cesar Barboza <[email protected]> |
| License: | Apache License (>= 2) |
| Version: | 0.1.4 |
| Built: | 2026-06-08 08:11:35 UTC |
| Source: | https://github.com/ohdsi/patientgenerator |
availableModels() If the API key is valid in the system, returns available models to the user from the LLM provider.OpenAI is the only one currently supported.
availableModels()availableModels()
A string list with the id of a vailable models.
Reusable UI and server for searching the vocabulary (Hecate) and selecting a concept. Shows a button that opens a modal with text input, search button, and results in a DT table. Optional callback when a concept is selected.
conceptSearchUI(id, buttonLabel = "Concept search") conceptSearchServer(id, onConceptSelected = NULL, placeholderText = "")conceptSearchUI(id, buttonLabel = "Concept search") conceptSearchServer(id, onConceptSelected = NULL, placeholderText = "")
id |
Module namespace id. |
buttonLabel |
Label for the trigger button (default |
onConceptSelected |
Optional function of one argument |
placeholderText |
Character string used as placeholder text in the search input field. |
conceptSearchUI(): UI for the concept search: a single button that opens the search modal.
conceptSearchServer(): Server for the concept search modal (text input, search, DT, close).
Useful to list available datasets in the testCases folder
getTestSets(path = NULL)getTestSets(path = NULL)
path |
Optional directory containing JSON test sets. If NULL, the package resolves a default path with testthat integration. |
A list()
Create a Hecate API client
hecateClient(baseUrl = NULL, timeoutMs = NULL, apiKey = NULL)hecateClient(baseUrl = NULL, timeoutMs = NULL, apiKey = NULL)
baseUrl |
Base URL of the Hecate API (default from config). |
timeoutMs |
Timeout in milliseconds (default from config). |
apiKey |
Optional API key for authorization (default from config). |
A client object with class hecate_client.
Search Hecate concepts and return results as a data frame
hecateSearch( query, vocabularyId = NULL, standardConcept = NULL, domainId = NULL, conceptClassId = NULL, limit = 20, client = hecateClient() )hecateSearch( query, vocabularyId = NULL, standardConcept = NULL, domainId = NULL, conceptClassId = NULL, limit = 20, client = hecateClient() )
query |
Character(1); search query (required). |
vocabularyId |
Character(1) or NULL; optional vocabulary filter (comma-separated). |
standardConcept |
Character(1) or NULL; e.g. |
domainId |
Character(1) or NULL; optional domain filter (comma-separated). |
conceptClassId |
Character(1) or NULL; optional concept class filter. |
limit |
Integer(1); max results (default 20, max 150). |
client |
Hecate client (default |
Data frame of search results, or NULL if an error occurred (API error or bad response shape).
## Not run: # Simple search df <- hecateSearch("diabetes") # Search with filters df <- hecateSearch("hypertension", domainId = "Condition", limit = 10) ## End(Not run)## Not run: # Simple search df <- hecateSearch("diabetes") # Search with filters df <- hecateSearch("hypertension", domainId = "Condition", limit = 10) ## End(Not run)
Null coalescing operator
x %||% yx %||% y
x |
First value (any type). |
y |
Fallback value when |
x if not NULL, otherwise y.
patientChat() generates synthetic patients in the OMOP-CDM using an LLM API.Requires an OPEN_AI_KEY in ~/.Renviron. After that just sent a prompt and save() the results. The JSON file can be used as an OMOP-CDM patient test set.
Accepts a prompt as input. Produces a test set using a structured JSON schema. Utilizes tools such as CodelistGenerator or Hecate to look up concept IDs. Accepts subsequent prompts to modify existing test sets that the LLM uses as context.
This class allows testing patient sets created by the LLM, prompt engineering, integration of search tools and functionality, and creating a set of patients to test analytical packages.
A JSON response that includes: the natural language answer from the LLM and a JSON with test set patients in accordance to the provided schema.
chatAn ellmer chat instance
json_schema_pathJSON schema to output structured results
responseOuput from the LLM
codelistA codelist with details to search for concepts ids
new()
Create a new chat to create JSON test sets for OMOP-CDM.
patientChat$new(
system_prompt = NULL,
model = "gpt-5.4",
jsonSchemaPath = NULL,
echo = c("none", "output", "all"),
codelist_data = NULL
)system_promptInitial system prompt to impose behaviour to the LLM
modelSuch as "gpt-5.3". For a complete list, call patientChat$availableModels()
jsonSchemaPathThe JSON schema to structure output from LLM
echoHow the output will be displayed in the console
codelist_dataA codelist with details to search for concepts ids
A new Person object.
prompt()
Prompt to request data from LLM API
patientChat$prompt(prompt)
promptA query in character.
json_response()
Output in JSON format
patientChat$json_response()
output()
Returns the chat object
patientChat$output()
retrieveCodelist()
Retrieves and filters data from codelist_data
patientChat$retrieveCodelist(concept_label = "Stage 1", domain = "Measurement")
concept_labelFilters the concept_name in the codelist with details
domainFilters the domain in the codelist with details.
save()
Saves the JSON test set to disk.
patientChat$save(name = "patient-chat-test", path = NULL)
nameName of the file
pathTo save the file.
If NULL, the package first tries testthat::test_path("testCases"),
then checks options(PatientGenerator.testSetDir = "..."), and finally
falls back to the package user data directory.
availableModels()
Retrieves available models from the LLM API.
patientChat$availableModels()
clone()
The objects of this class are cloneable with this method.
patientChat$clone(deep = FALSE)
deepWhether to make a deep clone.
## Not run: generator <- patientChat$new() generator$prompt("Give me 5 patients") generator$save("my_test") ## End(Not run)## Not run: generator <- patientChat$new() generator$prompt("Give me 5 patients") generator$save("my_test") ## End(Not run)
patientChatNaive() is a grapper for the ellmer package to send prompts and send the output test set to an LLM. Requires a valid API key.Priorities: - Accepts a prompt as an input. - Produces a test set in accordance to the provided JSON schema. - Utilizes tools such as CodelistGenerator or Hecate to look up for functions. - Accepts a subsequent prompt with a test set that the LLM has to use as a context
One function for this tasks allow us to: - Test the test sets created by the LLM. - Test prompt engineering. - Test integration of tools functionality. - Allow us to create fast a small set of patients to test analytical packages.
patientChatNaive( prompt = "### Give me a sample of five patients", model = "gpt-5.2", jsonSchemaPath = NULL )patientChatNaive( prompt = "### Give me a sample of five patients", model = "gpt-5.2", jsonSchemaPath = NULL )
prompt |
A prompt to the LLM, in character or JSON response. |
model |
The model used by the LLM. Currently only OpenAI models are accepted. |
jsonSchemaPath |
Path to a JSON schema used to structure the response. |
A JSON response that includes: the natural language answer from the LLM and a JSON with test set patients in accordance to the provided schema.
patientDesigner() is a visual interface based on D3 to construct test datasets for the OMOP-CDMpatientDesigner() is a visual interface based on D3 to construct test datasets for the OMOP-CDM
patientDesigner(path = NULL)patientDesigner(path = NULL)
path |
Optional folder containing JSON test sets. If NULL, default path resolution keeps testthat integration. |
A Shiny app