<h1 style="width: 100%; text-align: center; margin-bottom: 20px; border-bottom: 0px;">BeETL: Extensible Python/Polars-based ETL Framework</h1>
<p style="text-align: center; margin-bottom: 30px;"><img src="./images/beetl.jpg" style="max-width: 400px;" alt=" "><img src="./docs/images/beetl.jpg" style="max-width: 400px;" alt=" "><br/></p>
BeETL was born from a job as Integration Developer where a majority of the integrations we develop follow the same pattern - get here, transform a little, put there (with the middle step frequently missing altogether).
After building our 16th integration between the same two systems with another manual template, we decided to build BeETL. BeETL is currently limited to one datasource per source and destination per sync, but this will be expanded in the future. One configuration can contain multiple syncs.
Note: Even though most of the configuration below is in YAML format, you can also use JSON or a python dictionary.
## Todo:
- [ ] Soft Delete/Hard Delete
- [ ] Output table at end
- [ ] Automatic column specification from data
## TOC
- [Installation](#installation)
- [From PyPi](#from-pypi)
- [From Source](#from-source)
- [Quick Start](#quick-start)
- [Documentation](https://beetl.hoglan.dev/en/latest/)
- [Source Code](https://github.com/hoglandets-it/beetl)
## Installation
### From PyPi
```bash
pip3 install beetl
```
### From Source
```bash
git clone https://
python3 setup.py install
```
## Quick Start
The following is the minimum amount of configuration needed to get started with a simple sync
```python
from src.beetl.beetl import Beetl, BeetlConfig
sync_config = {
# The version of the config file, currently V1
"version": "V1",
# The datasources to move data between
"sources": [
{
# The identifier for the datasource
"name": "mysql_db",
# The type (ex. Sqlserver, Rest, Itop)
"type": "Mysql",
# The connection settings for the datasource (connection string or host/user/password)
"connection": {
"settings": {
"connection_string": "mysql://user:password@host:3306/database"
}
}
},
{
"name": "postgres_db",
"type": "Postgres",
"connection": {
"settings": {
"connection_string": "postgresql://user:password@host:5432/database"
}
}
}
],
# The configuration for the sync(s) to run
"sync": [
{
# The source and destination identifiers
"source": "mysql_db",
"destination": "postgres_db",
# The configuration for source/destination
"sourceConfig": {
# The query with data to fetch
"query": "SELECT field1, field2, field3 FROM table1",
# The column descriptions for the query
"columns": [
{
# The name of the column/field
"name": "field1",
# The data type
"type": "Int32",
# Whether the column is considered unique
# (unique cols will be used for comparison)
"unique": True
},
{
"name": "field2",
"type": "Utf8",
"unique": False
},
{
"name": "field3",
"type": "Utf8",
"unique": False
}
]
},
"destinationConfig": {
# The table to insert data into
"table": "table1",
# The columns to insert data into
"columns": [
{
# The name of the column/field
"name": "field1",
# The data type
"type": "Int32",
# Whether the column is considered unique
# (unique cols will be used for comparison)
"unique": True
},
{
"name": "field2",
"type": "Utf8",
"unique": False
},
{
"name": "field3",
"type": "Utf8",
"unique": False,
# Will be created on insert, but not updated
"skip_update": True
}
]
},
"sourceTransformers": {},
"insertionTransformers": {}
}
]
}
```
### Secrets from Environment Variables
In case you want to save your secrets in environment variables instead of in the yaml configuration file, you can save them as a json object to an environment variable and replace the "sources"-section with sourcesFromEnv setting.
Note that the "sources" and "sourcesFromEnv" options are mutually exclusive.
```python
sync_config = {
# The version of the config file, currently V1
"version": "V1",
# Fetch source configuration from environment variable BEETL_SOURCES
"sourcesFromEnv": "BEETL_SOURCES",
# The datasources to move data between
"sync": [
.....
```
```yaml
version: "V1"
sourcesFromEnv: "BEETL_SOURCES"
sync:
- ......
```
```json
{
"version": "V1",
"sourcesFromEnv": "BEETL_SOURCES",
"sync": [
......
```
The format of the sources configuration is the same as the one normally under the "sources"-section:
```python
[
{
# The identifier for the datasource
"name": "mysql_db",
# The type (ex. Sqlserver, Rest, Itop)
"type": "Mysql",
# The connection settings for the datasource (connection string or host/user/password)
"connection": {
"settings": {
"connection_string": "mysql://user:password@host:3306/database"
}
}
},
{
"name": "postgres_db",
"type": "Postgres",
"connection": {
"settings": {
"connection_string": "postgresql://user:password@host:5432/database"
}
}
}
]
```
Raw data
{
"_id": null,
"home_page": "https://github.com/Hoglandets-IT/beetl",
"name": "beetl",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "python template package module cli",
"author": "Lars Scheibling",
"author_email": "lars.scheibling@hoglandet.se",
"download_url": "https://files.pythonhosted.org/packages/5f/0e/afb05256e2e7bd9a2f786ce0cc2c86cfd9adaea398faf62dc242b34837d8/beetl-0.4.10.tar.gz",
"platform": null,
"description": "<h1 style=\"width: 100%; text-align: center; margin-bottom: 20px; border-bottom: 0px;\">BeETL: Extensible Python/Polars-based ETL Framework</h1>\n<p style=\"text-align: center; margin-bottom: 30px;\"><img src=\"./images/beetl.jpg\" style=\"max-width: 400px;\" alt=\" \"><img src=\"./docs/images/beetl.jpg\" style=\"max-width: 400px;\" alt=\" \"><br/></p>\nBeETL was born from a job as Integration Developer where a majority of the integrations we develop follow the same pattern - get here, transform a little, put there (with the middle step frequently missing altogether). \n\nAfter building our 16th integration between the same two systems with another manual template, we decided to build BeETL. BeETL is currently limited to one datasource per source and destination per sync, but this will be expanded in the future. One configuration can contain multiple syncs.\n\nNote: Even though most of the configuration below is in YAML format, you can also use JSON or a python dictionary.\n\n## Todo:\n- [ ] Soft Delete/Hard Delete\n- [ ] Output table at end\n- [ ] Automatic column specification from data\n\n\n## TOC\n- [Installation](#installation)\n - [From PyPi](#from-pypi)\n - [From Source](#from-source)\n- [Quick Start](#quick-start)\n- [Documentation](https://beetl.hoglan.dev/en/latest/)\n- [Source Code](https://github.com/hoglandets-it/beetl)\n\n## Installation\n### From PyPi\n```bash\npip3 install beetl\n```\n\n### From Source\n```bash\ngit clone https://\npython3 setup.py install\n```\n\n## Quick Start\nThe following is the minimum amount of configuration needed to get started with a simple sync\n\n```python\nfrom src.beetl.beetl import Beetl, BeetlConfig\n\nsync_config = {\n # The version of the config file, currently V1\n \"version\": \"V1\",\n \n # The datasources to move data between\n \"sources\": [\n {\n # The identifier for the datasource\n \"name\": \"mysql_db\",\n\n # The type (ex. Sqlserver, Rest, Itop)\n \"type\": \"Mysql\",\n\n # The connection settings for the datasource (connection string or host/user/password)\n \"connection\": {\n \"settings\": {\n \"connection_string\": \"mysql://user:password@host:3306/database\"\n }\n }\n },\n {\n \"name\": \"postgres_db\",\n \"type\": \"Postgres\",\n \"connection\": {\n \"settings\": {\n \"connection_string\": \"postgresql://user:password@host:5432/database\"\n }\n }\n }\n ],\n # The configuration for the sync(s) to run\n \"sync\": [\n {\n # The source and destination identifiers\n \"source\": \"mysql_db\",\n \"destination\": \"postgres_db\",\n\n # The configuration for source/destination\n \"sourceConfig\": {\n # The query with data to fetch\n \"query\": \"SELECT field1, field2, field3 FROM table1\",\n \n # The column descriptions for the query\n \"columns\": [\n {\n # The name of the column/field\n \"name\": \"field1\",\n\n # The data type\n \"type\": \"Int32\",\n\n # Whether the column is considered unique\n # (unique cols will be used for comparison)\n \"unique\": True\n },\n {\n \"name\": \"field2\",\n \"type\": \"Utf8\",\n \"unique\": False\n },\n {\n \"name\": \"field3\",\n \"type\": \"Utf8\",\n \"unique\": False\n }\n ]\n },\n \"destinationConfig\": {\n # The table to insert data into\n \"table\": \"table1\",\n\n # The columns to insert data into\n \"columns\": [\n {\n # The name of the column/field\n \"name\": \"field1\",\n\n # The data type\n \"type\": \"Int32\",\n\n # Whether the column is considered unique\n # (unique cols will be used for comparison)\n \"unique\": True\n },\n {\n \"name\": \"field2\",\n \"type\": \"Utf8\",\n \"unique\": False\n },\n {\n \"name\": \"field3\",\n \"type\": \"Utf8\",\n \"unique\": False,\n \n # Will be created on insert, but not updated\n \"skip_update\": True\n }\n ]\n },\n \"sourceTransformers\": {},\n \"insertionTransformers\": {}\n }\n ]\n}\n```\n\n### Secrets from Environment Variables\nIn case you want to save your secrets in environment variables instead of in the yaml configuration file, you can save them as a json object to an environment variable and replace the \"sources\"-section with sourcesFromEnv setting.\n\nNote that the \"sources\" and \"sourcesFromEnv\" options are mutually exclusive.\n\n```python\nsync_config = {\n # The version of the config file, currently V1\n \"version\": \"V1\",\n\n # Fetch source configuration from environment variable BEETL_SOURCES\n \"sourcesFromEnv\": \"BEETL_SOURCES\",\n\n # The datasources to move data between\n \"sync\": [\n .....\n```\n\n```yaml\nversion: \"V1\"\nsourcesFromEnv: \"BEETL_SOURCES\"\nsync:\n - ......\n```\n\n```json\n{\n \"version\": \"V1\",\n \"sourcesFromEnv\": \"BEETL_SOURCES\",\n \"sync\": [\n ......\n```\n\nThe format of the sources configuration is the same as the one normally under the \"sources\"-section:\n\n```python\n[\n {\n # The identifier for the datasource\n \"name\": \"mysql_db\",\n\n # The type (ex. Sqlserver, Rest, Itop)\n \"type\": \"Mysql\",\n\n # The connection settings for the datasource (connection string or host/user/password)\n \"connection\": {\n \"settings\": {\n \"connection_string\": \"mysql://user:password@host:3306/database\"\n }\n }\n },\n {\n \"name\": \"postgres_db\",\n \"type\": \"Postgres\",\n \"connection\": {\n \"settings\": {\n \"connection_string\": \"postgresql://user:password@host:5432/database\"\n }\n }\n }\n]\n```\n",
"bugtrack_url": null,
"license": "GnuPG 3.0",
"summary": "BeETL is a Python package for extracting data from one datasource,",
"version": "0.4.10",
"project_urls": {
"Homepage": "https://github.com/Hoglandets-IT/beetl"
},
"split_keywords": [
"python",
"template",
"package",
"module",
"cli"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cab182b45849d9b9bc1de8fac76f165f8e143cad22409584e9e73d6d536ce3e2",
"md5": "531fc1c6671ca18df9e72355ca86d8f3",
"sha256": "d6802f102ccb99775068f81b297d5e662b835d5ab8549dd417bce505a745efe3"
},
"downloads": -1,
"filename": "beetl-0.4.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "531fc1c6671ca18df9e72355ca86d8f3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 33326,
"upload_time": "2024-09-19T15:13:43",
"upload_time_iso_8601": "2024-09-19T15:13:43.416449Z",
"url": "https://files.pythonhosted.org/packages/ca/b1/82b45849d9b9bc1de8fac76f165f8e143cad22409584e9e73d6d536ce3e2/beetl-0.4.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5f0eafb05256e2e7bd9a2f786ce0cc2c86cfd9adaea398faf62dc242b34837d8",
"md5": "048647c6b355b599749825509d8fd5c0",
"sha256": "9f3e4d7464916ba8add9f000e323814d9304b4af3389ed5c99785067ada8b990"
},
"downloads": -1,
"filename": "beetl-0.4.10.tar.gz",
"has_sig": false,
"md5_digest": "048647c6b355b599749825509d8fd5c0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 25923,
"upload_time": "2024-09-19T15:13:45",
"upload_time_iso_8601": "2024-09-19T15:13:45.161983Z",
"url": "https://files.pythonhosted.org/packages/5f/0e/afb05256e2e7bd9a2f786ce0cc2c86cfd9adaea398faf62dc242b34837d8/beetl-0.4.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-19 15:13:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Hoglandets-IT",
"github_project": "beetl",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "requests",
"specs": []
},
{
"name": "polars",
"specs": [
[
"==",
"0.17.5"
]
]
},
{
"name": "sqlalchemy",
"specs": []
},
{
"name": "faker",
"specs": []
},
{
"name": "pandas",
"specs": []
},
{
"name": "pyyaml",
"specs": []
},
{
"name": "pymysql",
"specs": []
},
{
"name": "pyodbc",
"specs": []
},
{
"name": "pymssql",
"specs": []
},
{
"name": "mysql-connector-python",
"specs": []
},
{
"name": "alive-progress",
"specs": []
},
{
"name": "tabulate",
"specs": []
}
],
"lcname": "beetl"
}