# Datalake CLI
This project develops a of Command Line Interface (CLI) tool designed to facilitate the migration of data from Sage ERP systems into a structured datalake and data-warehouse architecture on Google Cloud. Aimed at enhancing data management and analytics capabilities, these tools support project-specific datalake environments identified by unique tags.
## Getting Started
1. Configuration Creation:
Install the tool
```sh
pip3 install shopcloud-datalake
```
Set up your configuration directory:
```sh
mkdir config-dir
```
Create a new Datalake configuration:
```sh
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" config create
```
2. Configuration Synchronization:
Sync your configuration files to the project bucket:
```sh
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" config sync
```
3. Data Migration Execution:
Run the data migration process with or without specifying a table:
```sh
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" run --partition-date=YYYY-MM-DD
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" run <table> --partition-date=YYYY-MM-DD
```
## Architektur
```mermaid
flowchart LR
subgraph Data-Lake
Sage[(Sage)] --> datalake-cli
GCS_SCHEMA[(Storage)] --> |gs://shopcloud-datalake-sage-schema| datalake-cli
datalake-cli --> |gs://shopcloud-datalake-sage-data| GCS_DATA[(Storage)]
end
subgraph Data-Warehouse
GCS_DATA[(Storage)] --> SCDS[(BigQuery)]
end
```
## FAQs
- __Where are the configurations stored?__ Configurations are stored in a Google Cloud Storage bucket associated with each project.
- __What is the structure of the Datalake?__ Each project has a dedicated Google Cloud Project for data storage.
- __What file format is used?__ Data is stored in Parquet format for efficiency and performance.
How is data partitioned? Data is partitioned using BigQuery's TimePartitioning feature.
## Development
```sh
# run unit tests
$ python3 -m unittest
# run unit tests with coverage
$ python3 -m coverage run --source=tests,shopcloud_datalake -m unittest discover && python3 -m coverage html -d coverage_report
$ python3 -m coverage run --source=tests,shopcloud_datalake -m unittest discover && python3 -m coverage xml
```
Raw data
{
"_id": null,
"home_page": "https://github.com/Talk-Point/shopcloud-datalake-cli",
"name": "shopcloud-datalake",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "CLI",
"author": "Konstantin Stoldt",
"author_email": "konstantin.stoldt@talk-point.de",
"download_url": "https://files.pythonhosted.org/packages/0c/7d/8eae72d491aae3e35173f4fc3574c337e9c39402e1d721a566beaf3e7c26/shopcloud-datalake-1.5.0.tar.gz",
"platform": null,
"description": "# Datalake CLI\n\nThis project develops a of Command Line Interface (CLI) tool designed to facilitate the migration of data from Sage ERP systems into a structured datalake and data-warehouse architecture on Google Cloud. Aimed at enhancing data management and analytics capabilities, these tools support project-specific datalake environments identified by unique tags.\n\n## Getting Started\n\n1. Configuration Creation:\n\nInstall the tool\n\n```sh\npip3 install shopcloud-datalake\n```\n\nSet up your configuration directory:\n\n```sh\nmkdir config-dir\n```\n\nCreate a new Datalake configuration:\n\n```sh\ndatalake --project=\"your-google-cloud-project-id\" --base-dir=\"config-dir\" config create\n```\n\n2. Configuration Synchronization:\n\nSync your configuration files to the project bucket:\n\n```sh\ndatalake --project=\"your-google-cloud-project-id\" --base-dir=\"config-dir\" config sync\n```\n\n3. Data Migration Execution:\n\nRun the data migration process with or without specifying a table:\n\n```sh\ndatalake --project=\"your-google-cloud-project-id\" --base-dir=\"config-dir\" run --partition-date=YYYY-MM-DD\ndatalake --project=\"your-google-cloud-project-id\" --base-dir=\"config-dir\" run <table> --partition-date=YYYY-MM-DD\n```\n\n## Architektur\n\n```mermaid\nflowchart LR\n subgraph Data-Lake\n Sage[(Sage)] --> datalake-cli\n GCS_SCHEMA[(Storage)] --> |gs://shopcloud-datalake-sage-schema| datalake-cli\n datalake-cli --> |gs://shopcloud-datalake-sage-data| GCS_DATA[(Storage)]\n end\n subgraph Data-Warehouse\n GCS_DATA[(Storage)] --> SCDS[(BigQuery)]\n end\n```\n\n## FAQs\n\n- __Where are the configurations stored?__ Configurations are stored in a Google Cloud Storage bucket associated with each project.\n- __What is the structure of the Datalake?__ Each project has a dedicated Google Cloud Project for data storage.\n- __What file format is used?__ Data is stored in Parquet format for efficiency and performance.\nHow is data partitioned? Data is partitioned using BigQuery's TimePartitioning feature.\n\n## Development\n\n```sh\n# run unit tests\n$ python3 -m unittest\n# run unit tests with coverage\n$ python3 -m coverage run --source=tests,shopcloud_datalake -m unittest discover && python3 -m coverage html -d coverage_report\n$ python3 -m coverage run --source=tests,shopcloud_datalake -m unittest discover && python3 -m coverage xml\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "CLI tool for datalake operation",
"version": "1.5.0",
"project_urls": {
"Homepage": "https://github.com/Talk-Point/shopcloud-datalake-cli"
},
"split_keywords": [
"cli"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3d97a53825a526d73a81b373890e05c15148a5f357a6295db6dec64ee3cde018",
"md5": "5d99d8b6d7cf46b754173be72242bed0",
"sha256": "60d76b0952bd9f5013640d3f206617fc011fc6121c68d85a2912e5ed2bf97d3c"
},
"downloads": -1,
"filename": "shopcloud_datalake-1.5.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5d99d8b6d7cf46b754173be72242bed0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 24612,
"upload_time": "2024-03-26T07:44:57",
"upload_time_iso_8601": "2024-03-26T07:44:57.131645Z",
"url": "https://files.pythonhosted.org/packages/3d/97/a53825a526d73a81b373890e05c15148a5f357a6295db6dec64ee3cde018/shopcloud_datalake-1.5.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0c7d8eae72d491aae3e35173f4fc3574c337e9c39402e1d721a566beaf3e7c26",
"md5": "8383f10e807e5b7ade950986fad3d350",
"sha256": "cb05f915dcd87faa3e1cf81bd18754fa72fd3e82ac26d04abd84ee9996a7879d"
},
"downloads": -1,
"filename": "shopcloud-datalake-1.5.0.tar.gz",
"has_sig": false,
"md5_digest": "8383f10e807e5b7ade950986fad3d350",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 18647,
"upload_time": "2024-03-26T07:44:58",
"upload_time_iso_8601": "2024-03-26T07:44:58.203860Z",
"url": "https://files.pythonhosted.org/packages/0c/7d/8eae72d491aae3e35173f4fc3574c337e9c39402e1d721a566beaf3e7c26/shopcloud-datalake-1.5.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-26 07:44:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Talk-Point",
"github_project": "shopcloud-datalake-cli",
"github_not_found": true,
"lcname": "shopcloud-datalake"
}