# OHDSI CDM Data Loader
This repository provides scripts to load Common Data Model (CDM) data from OHDSI's standardized vocabularies (version 5.4 or 5.3) into CDM tables in a relational database. It is designed for the OHDSI community and those working with OHDSI's Common Data Model for large-scale observational research.
Although this project has been primarily tested with PostgreSQL, it may also work with other supported databases by OHDSI (still testing).
Please make sure the default database port is used.
## Requirements
### Python Requirements
- Python 3.x
- PostgreSQL database (for testing, but adaptable for other OHDSI-supported databases)
- still checking for others.
- Required Python libraries (listed in `requirements.txt`)
### R Requirements
Some of the processes and dependencies in the OHDSI environment may require specific R packages to interact with the OHDSI CDM and tools. Ensure the following R packages are installed:
```r
# Install OHDSI-specific R packages
install.packages("devtools")
install.packages("DatabaseConnector")
install.packages("SqlRender")
devtools::install_github("OHDSI/CommonDataModel") # For working with CDM-related functionality
# some other packages (not OHDSI specific)
install.packages("lubridate")
install.packages("dplyr")
install.packages("readr")
```
## classes
### 1. `DatabaseHandler`
This class manages connections to a PostgreSQL database and can be adapted to other OHDSI-supported databases.
#### Key Features:
- Establishes a connection to the CDM database (primarily tested with PostgreSQL).
- Executes SQL commands and handles transactions for the CDM tables.
#### Example (Python): It uses the default port
```python
from ohdsi_cdm_loader.db_connector import DatabaseHandler
database_connector = DatabaseHandler(
db_type="postgresql", # Database type (e.g., postgresql)
host="localhost", # Database host
user="postgres", # Database user
password="your_password", # Database password
database="ohdsi_cdm", # OHDSI CDM database
driver_path="path_to_driver" # path to driver for selected database
)
db_conn = database_connector.connect_to_db()
if db_conn:
print("Connected to the database successfully!")
else:
print("Failed to connect to the database.")
```
### 2. `CSVLoader`
This loads the OHDSI CDM vocabularies (version 5.3 or 5.4) from CSV files into the CDM tables in the database.
#### Key Features:
- Loads all CSV files for the standardized vocabularies from the specified directory into the corresponding create database. For clarity the database can be created using the executeSQL function from the commondatamodel package. Not minding, we also incorporated it here.
```python
from ohdsi_cdm_loader.db_connector import DatabaseHandler
# Initialize the database connection
database_connector = DatabaseHandler(
db_type="postgresql",
host="localhost",
user="postgres",
password="your_password",
database="ohdsi_cdm",
driver_path="path_to_driver"
)
# Connect to the CDM database
db_conn = database_connector.connect_to_db()
# generate the table in the database
database_connector.execute_ddl(cdm_version = "value", cdm_database_schema = "schema name")
```
- Uses the active database connection and CDM-compliant table structure.
#### Example (Python):
#### Note: please download the latest vocabulary from [OHDSI vocabulary list](https://athena.ohdsi.org/vocabulary/list)
```python
from ohdsi_cdm_loader.load_csv import CSVLoader
csv_loader = CSVLoader(
db_connection=db_conn, # Active database connection
database_handler=database_connector, # DatabaseHandler instance
csv_loader = CSVLoader(
db_connection=db_conn, # Active database connection
database_handler=database_connector, # DatabaseHandler instance
schema="schema" # CDM schema
)
csv_loader.load_all_csvs("path_to_downloaded_csv_directory")
```
### 3. `main.py`
This is the main entry point of the application. It integrates the database connection and CSV loading functionality specifically for OHDSI's CDM.
#### Usage:
#### Workflow in Main Script:
```python
from ohdsi_cdm_loader.db_connector import DatabaseHandler
from ohdsi_cdm_loader.load_csv import CSVLoader
# Initialize the database connection
database_connector = DatabaseHandler(
db_type="postgresql",
host="localhost",
user="postgres",
password="your_password",
database="ohdsi_cdm",
driver_path="path_to_driver"
)
# Connect to the CDM database
db_conn = database_connector.connect_to_db()
# Load CSVs if the connection is successful
if db_conn:
csv_loader = CSVLoader(db_conn, database_connector, "cdm_table_name")
csv_loader.load_all_csvs("path_to_your_csv_directory")
else:
print("Database connection failed.")
```
## Environment Variables
To ensure security and flexibility, it is recommended to store database credentials as environment variables rather than hardcoding them into the script.
Here’s an example of how to set environment variables:
```bash
export DB_HOST='your_host'
export DB_NAME='ohdsi_cdm'
export DB_USER='your_username'
export DB_PASSWORD='your_password'
```
Update the script to read these variables using `os.getenv`:
```python
import os
from ohdsi_cdm_loader.db_connector import DatabaseHandler
database_connector = DatabaseHandler(
db_type="postgresql",
host=os.getenv('DB_HOST'),
user=os.getenv('DB_USER'),
password=os.getenv('DB_PASSWORD'),
database=os.getenv('DB_NAME'),
driver_path="path_to_driver"
)
```
## Credits
This project is designed to work with OHDSI's Common Data Model (CDM) and standardized vocabularies. The tools and processes used here are compatible with OHDSI standards, and the database loader has been tested specifically for PostgreSQL, though it should work with other databases supported by OHDSI.
<a href="https://ohdsi.org">
<img src="https://res.cloudinary.com/dc29czhf9/image/upload/v1729287157/h243-ohdsi-logo-with-text_hhymri.png" alt="OHDSI" width="100"/>
</a>
**OHDSI** (Observational Health Data Sciences and Informatics) is a multi-stakeholder, interdisciplinary collaborative that aims to bring out the value of observational health data through large-scale analytics. Learn more about OHDSI and the CDM on the [official OHDSI website](https://ohdsi.org).
<a href="https://ehealth4cancer.org">
<img src="https://res.cloudinary.com/dc29czhf9/image/upload/v1729287084/download_umxgmo.jpg" alt="eHealth Hub Limerick" width="100"/>
</a>
This project was also supported by **eHealth Hub Limerick**, contributing to the development and deployment of health data tools for innovative healthcare solutions. Learn more about eHealth Hub Limerick at [eHealth Hub Limerick's official website](https://ehealth4cancer.org).
## License
This project is licensed under the MIT License. See the `LICENSE` file for more details.
Raw data
{
"_id": null,
"home_page": "https://github.com/DavidIkechi/ohdsi_cdm_loader.git",
"name": "cdm-csv-loader",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": null,
"author": "David Chibuike Ikechi Akwuru",
"author_email": "akwuru.david@ul.ie",
"download_url": "https://files.pythonhosted.org/packages/0b/c0/5bad7f7d1530490670badcb74011be6b9599435e132990c8f7012bc353ef/cdm_csv_loader-0.1.3.tar.gz",
"platform": null,
"description": "# OHDSI CDM Data Loader\r\n\r\nThis repository provides scripts to load Common Data Model (CDM) data from OHDSI's standardized vocabularies (version 5.4 or 5.3) into CDM tables in a relational database. It is designed for the OHDSI community and those working with OHDSI's Common Data Model for large-scale observational research.\r\n\r\nAlthough this project has been primarily tested with PostgreSQL, it may also work with other supported databases by OHDSI (still testing).\r\n\r\nPlease make sure the default database port is used.\r\n\r\n## Requirements\r\n\r\n### Python Requirements\r\n\r\n- Python 3.x\r\n- PostgreSQL database (for testing, but adaptable for other OHDSI-supported databases)\r\n- still checking for others.\r\n- Required Python libraries (listed in `requirements.txt`)\r\n\r\n### R Requirements\r\n\r\nSome of the processes and dependencies in the OHDSI environment may require specific R packages to interact with the OHDSI CDM and tools. Ensure the following R packages are installed:\r\n\r\n```r\r\n# Install OHDSI-specific R packages\r\ninstall.packages(\"devtools\")\r\ninstall.packages(\"DatabaseConnector\")\r\ninstall.packages(\"SqlRender\")\r\ndevtools::install_github(\"OHDSI/CommonDataModel\") # For working with CDM-related functionality\r\n\r\n# some other packages (not OHDSI specific)\r\ninstall.packages(\"lubridate\")\r\ninstall.packages(\"dplyr\")\r\ninstall.packages(\"readr\")\r\n```\r\n\r\n## classes\r\n\r\n### 1. `DatabaseHandler`\r\n\r\nThis class manages connections to a PostgreSQL database and can be adapted to other OHDSI-supported databases.\r\n\r\n#### Key Features:\r\n- Establishes a connection to the CDM database (primarily tested with PostgreSQL).\r\n- Executes SQL commands and handles transactions for the CDM tables.\r\n\r\n#### Example (Python): It uses the default port\r\n\r\n```python\r\nfrom ohdsi_cdm_loader.db_connector import DatabaseHandler\r\n\r\ndatabase_connector = DatabaseHandler(\r\n db_type=\"postgresql\", # Database type (e.g., postgresql)\r\n host=\"localhost\", # Database host\r\n user=\"postgres\", # Database user\r\n password=\"your_password\", # Database password\r\n database=\"ohdsi_cdm\", # OHDSI CDM database\r\n driver_path=\"path_to_driver\" # path to driver for selected database\r\n)\r\n\r\ndb_conn = database_connector.connect_to_db()\r\n\r\nif db_conn:\r\n print(\"Connected to the database successfully!\")\r\nelse:\r\n print(\"Failed to connect to the database.\")\r\n```\r\n\r\n### 2. `CSVLoader`\r\n\r\nThis loads the OHDSI CDM vocabularies (version 5.3 or 5.4) from CSV files into the CDM tables in the database.\r\n\r\n#### Key Features:\r\n- Loads all CSV files for the standardized vocabularies from the specified directory into the corresponding create database. For clarity the database can be created using the executeSQL function from the commondatamodel package. Not minding, we also incorporated it here.\r\n\r\n```python\r\nfrom ohdsi_cdm_loader.db_connector import DatabaseHandler\r\n\r\n# Initialize the database connection\r\ndatabase_connector = DatabaseHandler(\r\n db_type=\"postgresql\",\r\n host=\"localhost\",\r\n user=\"postgres\",\r\n password=\"your_password\",\r\n database=\"ohdsi_cdm\",\r\n driver_path=\"path_to_driver\"\r\n)\r\n\r\n# Connect to the CDM database\r\ndb_conn = database_connector.connect_to_db()\r\n# generate the table in the database\r\ndatabase_connector.execute_ddl(cdm_version = \"value\", cdm_database_schema = \"schema name\")\r\n```\r\n- Uses the active database connection and CDM-compliant table structure.\r\n\r\n#### Example (Python):\r\n#### Note: please download the latest vocabulary from [OHDSI vocabulary list](https://athena.ohdsi.org/vocabulary/list)\r\n\r\n```python\r\nfrom ohdsi_cdm_loader.load_csv import CSVLoader\r\n\r\ncsv_loader = CSVLoader(\r\n db_connection=db_conn, # Active database connection\r\n database_handler=database_connector, # DatabaseHandler instance\r\ncsv_loader = CSVLoader(\r\n db_connection=db_conn, # Active database connection\r\n database_handler=database_connector, # DatabaseHandler instance\r\n schema=\"schema\" # CDM schema\r\n)\r\n\r\ncsv_loader.load_all_csvs(\"path_to_downloaded_csv_directory\")\r\n```\r\n\r\n### 3. `main.py`\r\n\r\nThis is the main entry point of the application. It integrates the database connection and CSV loading functionality specifically for OHDSI's CDM.\r\n\r\n#### Usage:\r\n#### Workflow in Main Script:\r\n\r\n```python\r\nfrom ohdsi_cdm_loader.db_connector import DatabaseHandler\r\nfrom ohdsi_cdm_loader.load_csv import CSVLoader\r\n\r\n# Initialize the database connection\r\ndatabase_connector = DatabaseHandler(\r\n db_type=\"postgresql\",\r\n host=\"localhost\",\r\n user=\"postgres\",\r\n password=\"your_password\",\r\n database=\"ohdsi_cdm\",\r\n driver_path=\"path_to_driver\"\r\n)\r\n\r\n# Connect to the CDM database\r\ndb_conn = database_connector.connect_to_db()\r\n\r\n# Load CSVs if the connection is successful\r\nif db_conn:\r\n csv_loader = CSVLoader(db_conn, database_connector, \"cdm_table_name\")\r\n csv_loader.load_all_csvs(\"path_to_your_csv_directory\")\r\nelse:\r\n print(\"Database connection failed.\")\r\n```\r\n\r\n## Environment Variables\r\n\r\nTo ensure security and flexibility, it is recommended to store database credentials as environment variables rather than hardcoding them into the script.\r\n\r\nHere\u00e2\u20ac\u2122s an example of how to set environment variables:\r\n\r\n```bash\r\nexport DB_HOST='your_host'\r\nexport DB_NAME='ohdsi_cdm'\r\nexport DB_USER='your_username'\r\nexport DB_PASSWORD='your_password'\r\n```\r\n\r\nUpdate the script to read these variables using `os.getenv`:\r\n\r\n```python\r\nimport os\r\nfrom ohdsi_cdm_loader.db_connector import DatabaseHandler\r\n\r\ndatabase_connector = DatabaseHandler(\r\n db_type=\"postgresql\",\r\n host=os.getenv('DB_HOST'),\r\n user=os.getenv('DB_USER'),\r\n password=os.getenv('DB_PASSWORD'),\r\n database=os.getenv('DB_NAME'),\r\n driver_path=\"path_to_driver\"\r\n)\r\n```\r\n\r\n## Credits\r\n\r\nThis project is designed to work with OHDSI's Common Data Model (CDM) and standardized vocabularies. The tools and processes used here are compatible with OHDSI standards, and the database loader has been tested specifically for PostgreSQL, though it should work with other databases supported by OHDSI.\r\n\r\n<a href=\"https://ohdsi.org\">\r\n <img src=\"https://res.cloudinary.com/dc29czhf9/image/upload/v1729287157/h243-ohdsi-logo-with-text_hhymri.png\" alt=\"OHDSI\" width=\"100\"/>\r\n</a>\r\n\r\n**OHDSI** (Observational Health Data Sciences and Informatics) is a multi-stakeholder, interdisciplinary collaborative that aims to bring out the value of observational health data through large-scale analytics. Learn more about OHDSI and the CDM on the [official OHDSI website](https://ohdsi.org).\r\n\r\n<a href=\"https://ehealth4cancer.org\">\r\n <img src=\"https://res.cloudinary.com/dc29czhf9/image/upload/v1729287084/download_umxgmo.jpg\" alt=\"eHealth Hub Limerick\" width=\"100\"/>\r\n</a>\r\n\r\nThis project was also supported by **eHealth Hub Limerick**, contributing to the development and deployment of health data tools for innovative healthcare solutions. Learn more about eHealth Hub Limerick at [eHealth Hub Limerick's official website](https://ehealth4cancer.org).\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License. See the `LICENSE` file for more details.\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A package for loading OHDSI CDM CSV files into a relational database.",
"version": "0.1.3",
"project_urls": {
"Homepage": "https://github.com/DavidIkechi/ohdsi_cdm_loader.git"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "daf33d3bf94f00e38141e8382f4d846913953098190ab9f4abb814e68497caff",
"md5": "38de8736c545af9b6d436030e5090cb5",
"sha256": "e98af88adadaeb73be847fc0eb94edba27178b9d8d41b244a421c3140714c611"
},
"downloads": -1,
"filename": "cdm_csv_loader-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "38de8736c545af9b6d436030e5090cb5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 7826,
"upload_time": "2024-10-19T07:40:37",
"upload_time_iso_8601": "2024-10-19T07:40:37.942445Z",
"url": "https://files.pythonhosted.org/packages/da/f3/3d3bf94f00e38141e8382f4d846913953098190ab9f4abb814e68497caff/cdm_csv_loader-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0bc05bad7f7d1530490670badcb74011be6b9599435e132990c8f7012bc353ef",
"md5": "5717bf0922cb63d7d6864a921bf29992",
"sha256": "75f06ccdcf4e3de1c1d44fba5a323b732db1e21039301cd287f9b4d4acf299da"
},
"downloads": -1,
"filename": "cdm_csv_loader-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "5717bf0922cb63d7d6864a921bf29992",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 7656,
"upload_time": "2024-10-19T07:40:39",
"upload_time_iso_8601": "2024-10-19T07:40:39.598933Z",
"url": "https://files.pythonhosted.org/packages/0b/c0/5bad7f7d1530490670badcb74011be6b9599435e132990c8f7012bc353ef/cdm_csv_loader-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-19 07:40:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "DavidIkechi",
"github_project": "ohdsi_cdm_loader",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "cdm-csv-loader"
}