Name | tauro JSON |
Version |
0.1.0
JSON |
| download |
home_page | None |
Summary | Enhanced Tauro - Data Pipeline Execution System with Auto-Discovery |
upload_time | 2025-08-29 00:32:16 |
maintainer | None |
docs_url | None |
author | Faustino Lopez Ramos |
requires_python | <4.0,>=3.9 |
license | MIT |
keywords |
data
pipeline
etl
automation
cli
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Tauro
Tauro es un framework poderoso y flexible para la ejecución y gestión de pipelines de datos, diseñado para ser accesible tanto para usuarios no técnicos como para desarrolladores avanzados. Proporciona una interfaz unificada para:
- Ejecución de jobs batch (procesamiento por lotes)
- Gestión de pipelines streaming (procesamiento en tiempo real)
- Configuración basada en archivos (YAML/JSON/Python)
- Generación de proyectos desde templates predefinidos
- Soporte para arquitectura Medallion (Bronze → Silver → Gold)
## Arquitectura del Proyecto
Tauro está organizado en módulos principales:
### 🔧 CLI (`tauro.cli`)
- Interfaz de línea de comandos principal
- Gestión de configuración y descubrimiento automático
- Validación de seguridad y manejo de paths
- Logging centralizado
### ⚙️ Config (`tauro.config`)
- Gestión de configuración cohesiva
- Soporte para múltiples formatos (YAML/JSON/Python)
- Interpolación de variables
- Validación de configuración
- Gestión de sesiones Spark
### 🔄 Exec (`tauro.exec`)
- Ejecución de pipelines
- Resolución de dependencias
- Validación de pipelines
- Estado y monitoreo de ejecución
### 📝 IO (`tauro.io`)
- Manejo unificado de entrada/salida
- Soporte para múltiples formatos
- Validación de datos
- Factories para readers/writers
### 🌊 Streaming (`tauro.streaming`)
- Gestión de pipelines en tiempo real
- Manejo de queries
- Validación específica para streaming
- Lectores y escritores especializados
## Requisitos
- Python 3.9+
- pyspark (opcional, para procesamiento con Spark)
- Databricks Connect (opcional, para modo Databricks/Distributed)
Tauro helps you run data pipelines without needing to be a developer. Think of it as a “remote control” to:
- Run batch jobs (for files or tables that update on a schedule)
- Start and monitor streaming jobs (for real‑time data)
- Use a simple folder of configuration files to keep things organized
- Generate a ready‑to‑use project template (Medallion: Bronze → Silver → Gold)
This guide explains how to use Tauro from your terminal in clear, practical steps.
---
## What can I do with Tauro?
- Create a new project from a template with one command
- Run a pipeline for a specific environment (dev, pre_prod, prod)
- Run a single step (node) of a pipeline if you need to re‑run just part of it
- Start a streaming pipeline and check its status or stop it
- See which pipelines exist and view basic details
- Validate your setup before running
You do not need to write code to use these features. If you later want to customize pipeline logic, a developer can edit the generated sample files.
---
## Before you start
- You need Python 3.9 or later
- Open a terminal (Command Prompt/PowerShell on Windows, Terminal on macOS/Linux)
- Install required packages (you’ll get a ready “requirements.txt” in the template)
If Tauro is already installed in your environment, you can skip template generation and use your team’s existing project.
---
## Quick Start in 10 Minutes
Follow these steps to try Tauro with a new sample project.
1) Create a new project
- YAML format (default):
```
tauro --template medallion_basic --project-name demo_project
```
- JSON format:
```
tauro --template medallion_basic --project-name demo_project --format json
```
2) Go into your project and install requirements
```
cd demo_project
pip install -r requirements.txt
```
3) Run your first batch pipeline (Bronze ingestion)
- Development environment (“dev”):
```
tauro --env dev --pipeline bronze_batch_ingestion
```
4) Run your first streaming pipeline (Bronze streaming)
- Start (async mode, runs in background):
```
tauro --streaming --streaming-command run \
--streaming-config ./settings_json.json \
--streaming-pipeline bronze_streaming_ingestion \
--streaming-mode async
```
- Check status (all running jobs):
```
tauro --streaming --streaming-command status --streaming-config ./settings_json.json
```
- Stop a streaming job (replace <ID> with the execution id from status):
```
tauro --streaming --streaming-command stop \
--streaming-config ./settings_json.json \
--execution-id <ID>
```
Tip: If you generated YAML instead of JSON, your settings file will be settings_yml.json. Use that in --streaming-config.
---
## Everyday tasks
Choose an environment
- Environments help you separate development, testing, and production.
- Supported: base, dev, pre_prod, prod
- Example:
```
tauro --env pre_prod --pipeline silver_transform
```
Run only one step (node) of a pipeline
- Useful if a particular step failed and you want to re‑run just that part.
```
tauro --env dev --pipeline gold_aggregation --node aggregate_sales
```
Preview without actually running (dry run)
- Shows what would happen, but makes no changes.
```
tauro --env dev --pipeline bronze_batch_ingestion --dry-run
```
Validate your setup (no execution)
- Checks the configuration structure and paths.
```
tauro --env dev --pipeline bronze_batch_ingestion --validate-only
```
See available pipelines
```
tauro --list-pipelines
```
Get basic info about a pipeline
```
tauro --pipeline-info gold_aggregation
```
Clear cached discovery results
```
tauro --clear-cache
```
---
## Understanding the configuration (plain English)
Your project has:
- One “settings” file at the project root (for example, settings_json.json)
- This file points Tauro to the right config files for each environment
- A “config/” folder with the actual settings:
- global_settings: general options (project name, defaults)
- pipelines: list of pipeline names and which steps (nodes) they include
- nodes: what each step does and in which order
- input: where data comes from (files, tables, streams)
- output: where results go (tables, folders, streams)
You don’t need to edit these to try Tauro, but your team may customize them later.
---
## Dates and time windows
Some pipelines work with date ranges.
- Use ISO format: YYYY-MM-DD
- Example:
```
tauro --env dev --pipeline bronze_batch_ingestion \
--start-date 2025-01-01 --end-date 2025-01-31
```
- Tauro checks that the start date is not after the end date.
---
## Logging (making output quieter or more detailed)
- Default level is INFO (balanced)
- Make it very detailed:
```
tauro --env dev --pipeline bronze_batch_ingestion --verbose
```
- Show only errors:
```
tauro --env dev --pipeline bronze_batch_ingestion --quiet
```
- Send logs to a custom file:
```
tauro --env dev --pipeline bronze_batch_ingestion --log-file ./my_run.log
```
A default log file is also saved in logs/tauro.log.
---
## Streaming (simple view)
- Run: starts the streaming job (sync waits until it finishes, async continues in background)
- Status: tells you if your streaming job is running and its identifier
- Stop: stops the job safely
You always need to point to your settings file with --streaming-config.
Examples:
- Run async:
```
tauro --streaming --streaming-command run \
--streaming-config ./settings_json.json \
--streaming-pipeline bronze_streaming_ingestion \
--streaming-mode async
```
- Status (all):
```
tauro --streaming --streaming-command status --streaming-config ./settings_json.json
```
- Stop by id:
```
tauro --streaming --streaming-command stop \
--streaming-config ./settings_json.json \
--execution-id <ID>
```
---
## Tips and common fixes
- “Config not found”
- Make sure you are inside your project folder (cd demo_project)
- The settings file should be visible in your current folder: settings_json.json (or settings_yml.json)
- Try:
```
tauro --list-configs
```
- “Invalid date format”
- Use YYYY-MM-DD, for example 2025-03-15
- “Import” or “module not found” in custom code (if your team customized nodes)
- Make sure code files are inside your project (for example under pipelines/ or src/)
- Ask a developer to check Python package setup if needed
- Want to see what Tauro would do without changes?
- Use --dry-run
---
## Frequently Asked Questions
- Do I need admin rights?
- No, you just need Python and the project files.
- Does Tauro change my original data?
- Only if a pipeline writes to an output location. You can always use --dry-run to preview.
- Can I use Tauro on Windows/macOS/Linux?
- Yes. Commands are the same. Paths and permissions may differ by system.
---
## Where to get help
- Check the README created inside your generated project (it includes next steps)
- Use:
```
tauro --list-pipelines
tauro --pipeline-info <name>
```
- If you still need help, share the error message and the log file (logs/tauro.log) with your data team.
You’re ready to go. Start with bronze_batch_ingestion in dev, then explore the rest!
Raw data
{
"_id": null,
"home_page": null,
"name": "tauro",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "data, pipeline, etl, automation, cli",
"author": "Faustino Lopez Ramos",
"author_email": "faustinolopezramos@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b1/04/bf51279bb0396d06dde66468eb8f1d8211d9e9907959c859aa99a2e434c1/tauro-0.1.0.tar.gz",
"platform": null,
"description": "# Tauro\n\nTauro es un framework poderoso y flexible para la ejecuci\u00f3n y gesti\u00f3n de pipelines de datos, dise\u00f1ado para ser accesible tanto para usuarios no t\u00e9cnicos como para desarrolladores avanzados. Proporciona una interfaz unificada para:\n\n- Ejecuci\u00f3n de jobs batch (procesamiento por lotes)\n- Gesti\u00f3n de pipelines streaming (procesamiento en tiempo real)\n- Configuraci\u00f3n basada en archivos (YAML/JSON/Python)\n- Generaci\u00f3n de proyectos desde templates predefinidos\n- Soporte para arquitectura Medallion (Bronze \u2192 Silver \u2192 Gold)\n\n## Arquitectura del Proyecto\n\nTauro est\u00e1 organizado en m\u00f3dulos principales:\n\n### \ud83d\udd27 CLI (`tauro.cli`)\n- Interfaz de l\u00ednea de comandos principal\n- Gesti\u00f3n de configuraci\u00f3n y descubrimiento autom\u00e1tico\n- Validaci\u00f3n de seguridad y manejo de paths\n- Logging centralizado\n\n### \u2699\ufe0f Config (`tauro.config`)\n- Gesti\u00f3n de configuraci\u00f3n cohesiva\n- Soporte para m\u00faltiples formatos (YAML/JSON/Python)\n- Interpolaci\u00f3n de variables\n- Validaci\u00f3n de configuraci\u00f3n\n- Gesti\u00f3n de sesiones Spark\n\n### \ud83d\udd04 Exec (`tauro.exec`)\n- Ejecuci\u00f3n de pipelines\n- Resoluci\u00f3n de dependencias\n- Validaci\u00f3n de pipelines\n- Estado y monitoreo de ejecuci\u00f3n\n\n### \ud83d\udcdd IO (`tauro.io`)\n- Manejo unificado de entrada/salida\n- Soporte para m\u00faltiples formatos\n- Validaci\u00f3n de datos\n- Factories para readers/writers\n\n### \ud83c\udf0a Streaming (`tauro.streaming`)\n- Gesti\u00f3n de pipelines en tiempo real\n- Manejo de queries\n- Validaci\u00f3n espec\u00edfica para streaming\n- Lectores y escritores especializados\n\n## Requisitos\n\n- Python 3.9+\n- pyspark (opcional, para procesamiento con Spark)\n- Databricks Connect (opcional, para modo Databricks/Distributed)\nTauro helps you run data pipelines without needing to be a developer. Think of it as a \u201cremote control\u201d to:\n- Run batch jobs (for files or tables that update on a schedule)\n- Start and monitor streaming jobs (for real\u2011time data)\n- Use a simple folder of configuration files to keep things organized\n- Generate a ready\u2011to\u2011use project template (Medallion: Bronze \u2192 Silver \u2192 Gold)\n\nThis guide explains how to use Tauro from your terminal in clear, practical steps.\n\n---\n\n## What can I do with Tauro?\n\n- Create a new project from a template with one command\n- Run a pipeline for a specific environment (dev, pre_prod, prod)\n- Run a single step (node) of a pipeline if you need to re\u2011run just part of it\n- Start a streaming pipeline and check its status or stop it\n- See which pipelines exist and view basic details\n- Validate your setup before running\n\nYou do not need to write code to use these features. If you later want to customize pipeline logic, a developer can edit the generated sample files.\n\n---\n\n## Before you start\n\n- You need Python 3.9 or later\n- Open a terminal (Command Prompt/PowerShell on Windows, Terminal on macOS/Linux)\n- Install required packages (you\u2019ll get a ready \u201crequirements.txt\u201d in the template)\n\nIf Tauro is already installed in your environment, you can skip template generation and use your team\u2019s existing project.\n\n---\n\n## Quick Start in 10 Minutes\n\nFollow these steps to try Tauro with a new sample project.\n\n1) Create a new project\n- YAML format (default):\n ```\n tauro --template medallion_basic --project-name demo_project\n ```\n- JSON format:\n ```\n tauro --template medallion_basic --project-name demo_project --format json\n ```\n\n2) Go into your project and install requirements\n```\ncd demo_project\npip install -r requirements.txt\n```\n\n3) Run your first batch pipeline (Bronze ingestion)\n- Development environment (\u201cdev\u201d):\n ```\n tauro --env dev --pipeline bronze_batch_ingestion\n ```\n\n4) Run your first streaming pipeline (Bronze streaming)\n- Start (async mode, runs in background):\n ```\n tauro --streaming --streaming-command run \\\n --streaming-config ./settings_json.json \\\n --streaming-pipeline bronze_streaming_ingestion \\\n --streaming-mode async\n ```\n- Check status (all running jobs):\n ```\n tauro --streaming --streaming-command status --streaming-config ./settings_json.json\n ```\n- Stop a streaming job (replace <ID> with the execution id from status):\n ```\n tauro --streaming --streaming-command stop \\\n --streaming-config ./settings_json.json \\\n --execution-id <ID>\n ```\n\nTip: If you generated YAML instead of JSON, your settings file will be settings_yml.json. Use that in --streaming-config.\n\n---\n\n## Everyday tasks\n\nChoose an environment\n- Environments help you separate development, testing, and production.\n- Supported: base, dev, pre_prod, prod\n- Example:\n ```\n tauro --env pre_prod --pipeline silver_transform\n ```\n\nRun only one step (node) of a pipeline\n- Useful if a particular step failed and you want to re\u2011run just that part.\n ```\n tauro --env dev --pipeline gold_aggregation --node aggregate_sales\n ```\n\nPreview without actually running (dry run)\n- Shows what would happen, but makes no changes.\n ```\n tauro --env dev --pipeline bronze_batch_ingestion --dry-run\n ```\n\nValidate your setup (no execution)\n- Checks the configuration structure and paths.\n ```\n tauro --env dev --pipeline bronze_batch_ingestion --validate-only\n ```\n\nSee available pipelines\n```\ntauro --list-pipelines\n```\n\nGet basic info about a pipeline\n```\ntauro --pipeline-info gold_aggregation\n```\n\nClear cached discovery results\n```\ntauro --clear-cache\n```\n\n---\n\n## Understanding the configuration (plain English)\n\nYour project has:\n- One \u201csettings\u201d file at the project root (for example, settings_json.json)\n - This file points Tauro to the right config files for each environment\n- A \u201cconfig/\u201d folder with the actual settings:\n - global_settings: general options (project name, defaults)\n - pipelines: list of pipeline names and which steps (nodes) they include\n - nodes: what each step does and in which order\n - input: where data comes from (files, tables, streams)\n - output: where results go (tables, folders, streams)\n\nYou don\u2019t need to edit these to try Tauro, but your team may customize them later.\n\n---\n\n## Dates and time windows\n\nSome pipelines work with date ranges.\n\n- Use ISO format: YYYY-MM-DD\n- Example:\n ```\n tauro --env dev --pipeline bronze_batch_ingestion \\\n --start-date 2025-01-01 --end-date 2025-01-31\n ```\n- Tauro checks that the start date is not after the end date.\n\n---\n\n## Logging (making output quieter or more detailed)\n\n- Default level is INFO (balanced)\n- Make it very detailed:\n ```\n tauro --env dev --pipeline bronze_batch_ingestion --verbose\n ```\n- Show only errors:\n ```\n tauro --env dev --pipeline bronze_batch_ingestion --quiet\n ```\n- Send logs to a custom file:\n ```\n tauro --env dev --pipeline bronze_batch_ingestion --log-file ./my_run.log\n ```\n\nA default log file is also saved in logs/tauro.log.\n\n---\n\n## Streaming (simple view)\n\n- Run: starts the streaming job (sync waits until it finishes, async continues in background)\n- Status: tells you if your streaming job is running and its identifier\n- Stop: stops the job safely\n\nYou always need to point to your settings file with --streaming-config.\n\nExamples:\n- Run async:\n ```\n tauro --streaming --streaming-command run \\\n --streaming-config ./settings_json.json \\\n --streaming-pipeline bronze_streaming_ingestion \\\n --streaming-mode async\n ```\n- Status (all):\n ```\n tauro --streaming --streaming-command status --streaming-config ./settings_json.json\n ```\n- Stop by id:\n ```\n tauro --streaming --streaming-command stop \\\n --streaming-config ./settings_json.json \\\n --execution-id <ID>\n ```\n\n---\n\n## Tips and common fixes\n\n- \u201cConfig not found\u201d\n - Make sure you are inside your project folder (cd demo_project)\n - The settings file should be visible in your current folder: settings_json.json (or settings_yml.json)\n - Try:\n ```\n tauro --list-configs\n ```\n- \u201cInvalid date format\u201d\n - Use YYYY-MM-DD, for example 2025-03-15\n- \u201cImport\u201d or \u201cmodule not found\u201d in custom code (if your team customized nodes)\n - Make sure code files are inside your project (for example under pipelines/ or src/)\n - Ask a developer to check Python package setup if needed\n- Want to see what Tauro would do without changes?\n - Use --dry-run\n\n---\n\n## Frequently Asked Questions\n\n- Do I need admin rights?\n - No, you just need Python and the project files.\n- Does Tauro change my original data?\n - Only if a pipeline writes to an output location. You can always use --dry-run to preview.\n- Can I use Tauro on Windows/macOS/Linux?\n - Yes. Commands are the same. Paths and permissions may differ by system.\n\n---\n\n## Where to get help\n\n- Check the README created inside your generated project (it includes next steps)\n- Use:\n ```\n tauro --list-pipelines\n tauro --pipeline-info <name>\n ```\n- If you still need help, share the error message and the log file (logs/tauro.log) with your data team.\n\nYou\u2019re ready to go. Start with bronze_batch_ingestion in dev, then explore the rest!\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Enhanced Tauro - Data Pipeline Execution System with Auto-Discovery",
"version": "0.1.0",
"project_urls": {
"Documentation": "https://github.com/tu-usuario/enhanced-tauro",
"Homepage": "https://github.com/tu-usuario/enhanced-tauro",
"Repository": "https://github.com/tu-usuario/enhanced-tauro"
},
"split_keywords": [
"data",
" pipeline",
" etl",
" automation",
" cli"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "38240457b3a076626920059391cb1210e6edc9b6440bc07bcc13af32d7653f5d",
"md5": "0085cd333f6dad4137191709f707da7c",
"sha256": "885f85d8aebead29d76c52c347c3e9b513dc2bf910c1d58627b786adc31ac733"
},
"downloads": -1,
"filename": "tauro-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0085cd333f6dad4137191709f707da7c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 134065,
"upload_time": "2025-08-29T00:32:14",
"upload_time_iso_8601": "2025-08-29T00:32:14.532878Z",
"url": "https://files.pythonhosted.org/packages/38/24/0457b3a076626920059391cb1210e6edc9b6440bc07bcc13af32d7653f5d/tauro-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b104bf51279bb0396d06dde66468eb8f1d8211d9e9907959c859aa99a2e434c1",
"md5": "27bfa77c5c69f933b907feabb13c986f",
"sha256": "e30393faec09bc8016c03dedfd49ebf5f129c04bf43e09cb71d42017e96ea0ed"
},
"downloads": -1,
"filename": "tauro-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "27bfa77c5c69f933b907feabb13c986f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 114712,
"upload_time": "2025-08-29T00:32:16",
"upload_time_iso_8601": "2025-08-29T00:32:16.989718Z",
"url": "https://files.pythonhosted.org/packages/b1/04/bf51279bb0396d06dde66468eb8f1d8211d9e9907959c859aa99a2e434c1/tauro-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-29 00:32:16",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tu-usuario",
"github_project": "enhanced-tauro",
"github_not_found": true,
"lcname": "tauro"
}