tauro


Nametauro JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryEnhanced Tauro - Data Pipeline Execution System with Auto-Discovery
upload_time2025-08-29 00:32:16
maintainerNone
docs_urlNone
authorFaustino Lopez Ramos
requires_python<4.0,>=3.9
licenseMIT
keywords data pipeline etl automation cli
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Tauro

Tauro es un framework poderoso y flexible para la ejecución y gestión de pipelines de datos, diseñado para ser accesible tanto para usuarios no técnicos como para desarrolladores avanzados. Proporciona una interfaz unificada para:

- Ejecución de jobs batch (procesamiento por lotes)
- Gestión de pipelines streaming (procesamiento en tiempo real)
- Configuración basada en archivos (YAML/JSON/Python)
- Generación de proyectos desde templates predefinidos
- Soporte para arquitectura Medallion (Bronze → Silver → Gold)

## Arquitectura del Proyecto

Tauro está organizado en módulos principales:

### 🔧 CLI (`tauro.cli`)
- Interfaz de línea de comandos principal
- Gestión de configuración y descubrimiento automático
- Validación de seguridad y manejo de paths
- Logging centralizado

### ⚙️ Config (`tauro.config`)
- Gestión de configuración cohesiva
- Soporte para múltiples formatos (YAML/JSON/Python)
- Interpolación de variables
- Validación de configuración
- Gestión de sesiones Spark

### 🔄 Exec (`tauro.exec`)
- Ejecución de pipelines
- Resolución de dependencias
- Validación de pipelines
- Estado y monitoreo de ejecución

### 📝 IO (`tauro.io`)
- Manejo unificado de entrada/salida
- Soporte para múltiples formatos
- Validación de datos
- Factories para readers/writers

### 🌊 Streaming (`tauro.streaming`)
- Gestión de pipelines en tiempo real
- Manejo de queries
- Validación específica para streaming
- Lectores y escritores especializados

## Requisitos

- Python 3.9+
- pyspark (opcional, para procesamiento con Spark)
- Databricks Connect (opcional, para modo Databricks/Distributed)
Tauro helps you run data pipelines without needing to be a developer. Think of it as a “remote control” to:
- Run batch jobs (for files or tables that update on a schedule)
- Start and monitor streaming jobs (for real‑time data)
- Use a simple folder of configuration files to keep things organized
- Generate a ready‑to‑use project template (Medallion: Bronze → Silver → Gold)

This guide explains how to use Tauro from your terminal in clear, practical steps.

---

## What can I do with Tauro?

- Create a new project from a template with one command
- Run a pipeline for a specific environment (dev, pre_prod, prod)
- Run a single step (node) of a pipeline if you need to re‑run just part of it
- Start a streaming pipeline and check its status or stop it
- See which pipelines exist and view basic details
- Validate your setup before running

You do not need to write code to use these features. If you later want to customize pipeline logic, a developer can edit the generated sample files.

---

## Before you start

- You need Python 3.9 or later
- Open a terminal (Command Prompt/PowerShell on Windows, Terminal on macOS/Linux)
- Install required packages (you’ll get a ready “requirements.txt” in the template)

If Tauro is already installed in your environment, you can skip template generation and use your team’s existing project.

---

## Quick Start in 10 Minutes

Follow these steps to try Tauro with a new sample project.

1) Create a new project
- YAML format (default):
  ```
  tauro --template medallion_basic --project-name demo_project
  ```
- JSON format:
  ```
  tauro --template medallion_basic --project-name demo_project --format json
  ```

2) Go into your project and install requirements
```
cd demo_project
pip install -r requirements.txt
```

3) Run your first batch pipeline (Bronze ingestion)
- Development environment (“dev”):
  ```
  tauro --env dev --pipeline bronze_batch_ingestion
  ```

4) Run your first streaming pipeline (Bronze streaming)
- Start (async mode, runs in background):
  ```
  tauro --streaming --streaming-command run \
        --streaming-config ./settings_json.json \
        --streaming-pipeline bronze_streaming_ingestion \
        --streaming-mode async
  ```
- Check status (all running jobs):
  ```
  tauro --streaming --streaming-command status --streaming-config ./settings_json.json
  ```
- Stop a streaming job (replace <ID> with the execution id from status):
  ```
  tauro --streaming --streaming-command stop \
        --streaming-config ./settings_json.json \
        --execution-id <ID>
  ```

Tip: If you generated YAML instead of JSON, your settings file will be settings_yml.json. Use that in --streaming-config.

---

## Everyday tasks

Choose an environment
- Environments help you separate development, testing, and production.
- Supported: base, dev, pre_prod, prod
- Example:
  ```
  tauro --env pre_prod --pipeline silver_transform
  ```

Run only one step (node) of a pipeline
- Useful if a particular step failed and you want to re‑run just that part.
  ```
  tauro --env dev --pipeline gold_aggregation --node aggregate_sales
  ```

Preview without actually running (dry run)
- Shows what would happen, but makes no changes.
  ```
  tauro --env dev --pipeline bronze_batch_ingestion --dry-run
  ```

Validate your setup (no execution)
- Checks the configuration structure and paths.
  ```
  tauro --env dev --pipeline bronze_batch_ingestion --validate-only
  ```

See available pipelines
```
tauro --list-pipelines
```

Get basic info about a pipeline
```
tauro --pipeline-info gold_aggregation
```

Clear cached discovery results
```
tauro --clear-cache
```

---

## Understanding the configuration (plain English)

Your project has:
- One “settings” file at the project root (for example, settings_json.json)
  - This file points Tauro to the right config files for each environment
- A “config/” folder with the actual settings:
  - global_settings: general options (project name, defaults)
  - pipelines: list of pipeline names and which steps (nodes) they include
  - nodes: what each step does and in which order
  - input: where data comes from (files, tables, streams)
  - output: where results go (tables, folders, streams)

You don’t need to edit these to try Tauro, but your team may customize them later.

---

## Dates and time windows

Some pipelines work with date ranges.

- Use ISO format: YYYY-MM-DD
- Example:
  ```
  tauro --env dev --pipeline bronze_batch_ingestion \
        --start-date 2025-01-01 --end-date 2025-01-31
  ```
- Tauro checks that the start date is not after the end date.

---

## Logging (making output quieter or more detailed)

- Default level is INFO (balanced)
- Make it very detailed:
  ```
  tauro --env dev --pipeline bronze_batch_ingestion --verbose
  ```
- Show only errors:
  ```
  tauro --env dev --pipeline bronze_batch_ingestion --quiet
  ```
- Send logs to a custom file:
  ```
  tauro --env dev --pipeline bronze_batch_ingestion --log-file ./my_run.log
  ```

A default log file is also saved in logs/tauro.log.

---

## Streaming (simple view)

- Run: starts the streaming job (sync waits until it finishes, async continues in background)
- Status: tells you if your streaming job is running and its identifier
- Stop: stops the job safely

You always need to point to your settings file with --streaming-config.

Examples:
- Run async:
  ```
  tauro --streaming --streaming-command run \
        --streaming-config ./settings_json.json \
        --streaming-pipeline bronze_streaming_ingestion \
        --streaming-mode async
  ```
- Status (all):
  ```
  tauro --streaming --streaming-command status --streaming-config ./settings_json.json
  ```
- Stop by id:
  ```
  tauro --streaming --streaming-command stop \
        --streaming-config ./settings_json.json \
        --execution-id <ID>
  ```

---

## Tips and common fixes

- “Config not found”
  - Make sure you are inside your project folder (cd demo_project)
  - The settings file should be visible in your current folder: settings_json.json (or settings_yml.json)
  - Try:
    ```
    tauro --list-configs
    ```
- “Invalid date format”
  - Use YYYY-MM-DD, for example 2025-03-15
- “Import” or “module not found” in custom code (if your team customized nodes)
  - Make sure code files are inside your project (for example under pipelines/ or src/)
  - Ask a developer to check Python package setup if needed
- Want to see what Tauro would do without changes?
  - Use --dry-run

---

## Frequently Asked Questions

- Do I need admin rights?
  - No, you just need Python and the project files.
- Does Tauro change my original data?
  - Only if a pipeline writes to an output location. You can always use --dry-run to preview.
- Can I use Tauro on Windows/macOS/Linux?
  - Yes. Commands are the same. Paths and permissions may differ by system.

---

## Where to get help

- Check the README created inside your generated project (it includes next steps)
- Use:
  ```
  tauro --list-pipelines
  tauro --pipeline-info <name>
  ```
- If you still need help, share the error message and the log file (logs/tauro.log) with your data team.

You’re ready to go. Start with bronze_batch_ingestion in dev, then explore the rest!


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "tauro",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "data, pipeline, etl, automation, cli",
    "author": "Faustino Lopez Ramos",
    "author_email": "faustinolopezramos@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b1/04/bf51279bb0396d06dde66468eb8f1d8211d9e9907959c859aa99a2e434c1/tauro-0.1.0.tar.gz",
    "platform": null,
    "description": "# Tauro\n\nTauro es un framework poderoso y flexible para la ejecuci\u00f3n y gesti\u00f3n de pipelines de datos, dise\u00f1ado para ser accesible tanto para usuarios no t\u00e9cnicos como para desarrolladores avanzados. Proporciona una interfaz unificada para:\n\n- Ejecuci\u00f3n de jobs batch (procesamiento por lotes)\n- Gesti\u00f3n de pipelines streaming (procesamiento en tiempo real)\n- Configuraci\u00f3n basada en archivos (YAML/JSON/Python)\n- Generaci\u00f3n de proyectos desde templates predefinidos\n- Soporte para arquitectura Medallion (Bronze \u2192 Silver \u2192 Gold)\n\n## Arquitectura del Proyecto\n\nTauro est\u00e1 organizado en m\u00f3dulos principales:\n\n### \ud83d\udd27 CLI (`tauro.cli`)\n- Interfaz de l\u00ednea de comandos principal\n- Gesti\u00f3n de configuraci\u00f3n y descubrimiento autom\u00e1tico\n- Validaci\u00f3n de seguridad y manejo de paths\n- Logging centralizado\n\n### \u2699\ufe0f Config (`tauro.config`)\n- Gesti\u00f3n de configuraci\u00f3n cohesiva\n- Soporte para m\u00faltiples formatos (YAML/JSON/Python)\n- Interpolaci\u00f3n de variables\n- Validaci\u00f3n de configuraci\u00f3n\n- Gesti\u00f3n de sesiones Spark\n\n### \ud83d\udd04 Exec (`tauro.exec`)\n- Ejecuci\u00f3n de pipelines\n- Resoluci\u00f3n de dependencias\n- Validaci\u00f3n de pipelines\n- Estado y monitoreo de ejecuci\u00f3n\n\n### \ud83d\udcdd IO (`tauro.io`)\n- Manejo unificado de entrada/salida\n- Soporte para m\u00faltiples formatos\n- Validaci\u00f3n de datos\n- Factories para readers/writers\n\n### \ud83c\udf0a Streaming (`tauro.streaming`)\n- Gesti\u00f3n de pipelines en tiempo real\n- Manejo de queries\n- Validaci\u00f3n espec\u00edfica para streaming\n- Lectores y escritores especializados\n\n## Requisitos\n\n- Python 3.9+\n- pyspark (opcional, para procesamiento con Spark)\n- Databricks Connect (opcional, para modo Databricks/Distributed)\nTauro helps you run data pipelines without needing to be a developer. Think of it as a \u201cremote control\u201d to:\n- Run batch jobs (for files or tables that update on a schedule)\n- Start and monitor streaming jobs (for real\u2011time data)\n- Use a simple folder of configuration files to keep things organized\n- Generate a ready\u2011to\u2011use project template (Medallion: Bronze \u2192 Silver \u2192 Gold)\n\nThis guide explains how to use Tauro from your terminal in clear, practical steps.\n\n---\n\n## What can I do with Tauro?\n\n- Create a new project from a template with one command\n- Run a pipeline for a specific environment (dev, pre_prod, prod)\n- Run a single step (node) of a pipeline if you need to re\u2011run just part of it\n- Start a streaming pipeline and check its status or stop it\n- See which pipelines exist and view basic details\n- Validate your setup before running\n\nYou do not need to write code to use these features. If you later want to customize pipeline logic, a developer can edit the generated sample files.\n\n---\n\n## Before you start\n\n- You need Python 3.9 or later\n- Open a terminal (Command Prompt/PowerShell on Windows, Terminal on macOS/Linux)\n- Install required packages (you\u2019ll get a ready \u201crequirements.txt\u201d in the template)\n\nIf Tauro is already installed in your environment, you can skip template generation and use your team\u2019s existing project.\n\n---\n\n## Quick Start in 10 Minutes\n\nFollow these steps to try Tauro with a new sample project.\n\n1) Create a new project\n- YAML format (default):\n  ```\n  tauro --template medallion_basic --project-name demo_project\n  ```\n- JSON format:\n  ```\n  tauro --template medallion_basic --project-name demo_project --format json\n  ```\n\n2) Go into your project and install requirements\n```\ncd demo_project\npip install -r requirements.txt\n```\n\n3) Run your first batch pipeline (Bronze ingestion)\n- Development environment (\u201cdev\u201d):\n  ```\n  tauro --env dev --pipeline bronze_batch_ingestion\n  ```\n\n4) Run your first streaming pipeline (Bronze streaming)\n- Start (async mode, runs in background):\n  ```\n  tauro --streaming --streaming-command run \\\n        --streaming-config ./settings_json.json \\\n        --streaming-pipeline bronze_streaming_ingestion \\\n        --streaming-mode async\n  ```\n- Check status (all running jobs):\n  ```\n  tauro --streaming --streaming-command status --streaming-config ./settings_json.json\n  ```\n- Stop a streaming job (replace <ID> with the execution id from status):\n  ```\n  tauro --streaming --streaming-command stop \\\n        --streaming-config ./settings_json.json \\\n        --execution-id <ID>\n  ```\n\nTip: If you generated YAML instead of JSON, your settings file will be settings_yml.json. Use that in --streaming-config.\n\n---\n\n## Everyday tasks\n\nChoose an environment\n- Environments help you separate development, testing, and production.\n- Supported: base, dev, pre_prod, prod\n- Example:\n  ```\n  tauro --env pre_prod --pipeline silver_transform\n  ```\n\nRun only one step (node) of a pipeline\n- Useful if a particular step failed and you want to re\u2011run just that part.\n  ```\n  tauro --env dev --pipeline gold_aggregation --node aggregate_sales\n  ```\n\nPreview without actually running (dry run)\n- Shows what would happen, but makes no changes.\n  ```\n  tauro --env dev --pipeline bronze_batch_ingestion --dry-run\n  ```\n\nValidate your setup (no execution)\n- Checks the configuration structure and paths.\n  ```\n  tauro --env dev --pipeline bronze_batch_ingestion --validate-only\n  ```\n\nSee available pipelines\n```\ntauro --list-pipelines\n```\n\nGet basic info about a pipeline\n```\ntauro --pipeline-info gold_aggregation\n```\n\nClear cached discovery results\n```\ntauro --clear-cache\n```\n\n---\n\n## Understanding the configuration (plain English)\n\nYour project has:\n- One \u201csettings\u201d file at the project root (for example, settings_json.json)\n  - This file points Tauro to the right config files for each environment\n- A \u201cconfig/\u201d folder with the actual settings:\n  - global_settings: general options (project name, defaults)\n  - pipelines: list of pipeline names and which steps (nodes) they include\n  - nodes: what each step does and in which order\n  - input: where data comes from (files, tables, streams)\n  - output: where results go (tables, folders, streams)\n\nYou don\u2019t need to edit these to try Tauro, but your team may customize them later.\n\n---\n\n## Dates and time windows\n\nSome pipelines work with date ranges.\n\n- Use ISO format: YYYY-MM-DD\n- Example:\n  ```\n  tauro --env dev --pipeline bronze_batch_ingestion \\\n        --start-date 2025-01-01 --end-date 2025-01-31\n  ```\n- Tauro checks that the start date is not after the end date.\n\n---\n\n## Logging (making output quieter or more detailed)\n\n- Default level is INFO (balanced)\n- Make it very detailed:\n  ```\n  tauro --env dev --pipeline bronze_batch_ingestion --verbose\n  ```\n- Show only errors:\n  ```\n  tauro --env dev --pipeline bronze_batch_ingestion --quiet\n  ```\n- Send logs to a custom file:\n  ```\n  tauro --env dev --pipeline bronze_batch_ingestion --log-file ./my_run.log\n  ```\n\nA default log file is also saved in logs/tauro.log.\n\n---\n\n## Streaming (simple view)\n\n- Run: starts the streaming job (sync waits until it finishes, async continues in background)\n- Status: tells you if your streaming job is running and its identifier\n- Stop: stops the job safely\n\nYou always need to point to your settings file with --streaming-config.\n\nExamples:\n- Run async:\n  ```\n  tauro --streaming --streaming-command run \\\n        --streaming-config ./settings_json.json \\\n        --streaming-pipeline bronze_streaming_ingestion \\\n        --streaming-mode async\n  ```\n- Status (all):\n  ```\n  tauro --streaming --streaming-command status --streaming-config ./settings_json.json\n  ```\n- Stop by id:\n  ```\n  tauro --streaming --streaming-command stop \\\n        --streaming-config ./settings_json.json \\\n        --execution-id <ID>\n  ```\n\n---\n\n## Tips and common fixes\n\n- \u201cConfig not found\u201d\n  - Make sure you are inside your project folder (cd demo_project)\n  - The settings file should be visible in your current folder: settings_json.json (or settings_yml.json)\n  - Try:\n    ```\n    tauro --list-configs\n    ```\n- \u201cInvalid date format\u201d\n  - Use YYYY-MM-DD, for example 2025-03-15\n- \u201cImport\u201d or \u201cmodule not found\u201d in custom code (if your team customized nodes)\n  - Make sure code files are inside your project (for example under pipelines/ or src/)\n  - Ask a developer to check Python package setup if needed\n- Want to see what Tauro would do without changes?\n  - Use --dry-run\n\n---\n\n## Frequently Asked Questions\n\n- Do I need admin rights?\n  - No, you just need Python and the project files.\n- Does Tauro change my original data?\n  - Only if a pipeline writes to an output location. You can always use --dry-run to preview.\n- Can I use Tauro on Windows/macOS/Linux?\n  - Yes. Commands are the same. Paths and permissions may differ by system.\n\n---\n\n## Where to get help\n\n- Check the README created inside your generated project (it includes next steps)\n- Use:\n  ```\n  tauro --list-pipelines\n  tauro --pipeline-info <name>\n  ```\n- If you still need help, share the error message and the log file (logs/tauro.log) with your data team.\n\nYou\u2019re ready to go. Start with bronze_batch_ingestion in dev, then explore the rest!\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Enhanced Tauro - Data Pipeline Execution System with Auto-Discovery",
    "version": "0.1.0",
    "project_urls": {
        "Documentation": "https://github.com/tu-usuario/enhanced-tauro",
        "Homepage": "https://github.com/tu-usuario/enhanced-tauro",
        "Repository": "https://github.com/tu-usuario/enhanced-tauro"
    },
    "split_keywords": [
        "data",
        " pipeline",
        " etl",
        " automation",
        " cli"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "38240457b3a076626920059391cb1210e6edc9b6440bc07bcc13af32d7653f5d",
                "md5": "0085cd333f6dad4137191709f707da7c",
                "sha256": "885f85d8aebead29d76c52c347c3e9b513dc2bf910c1d58627b786adc31ac733"
            },
            "downloads": -1,
            "filename": "tauro-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0085cd333f6dad4137191709f707da7c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 134065,
            "upload_time": "2025-08-29T00:32:14",
            "upload_time_iso_8601": "2025-08-29T00:32:14.532878Z",
            "url": "https://files.pythonhosted.org/packages/38/24/0457b3a076626920059391cb1210e6edc9b6440bc07bcc13af32d7653f5d/tauro-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b104bf51279bb0396d06dde66468eb8f1d8211d9e9907959c859aa99a2e434c1",
                "md5": "27bfa77c5c69f933b907feabb13c986f",
                "sha256": "e30393faec09bc8016c03dedfd49ebf5f129c04bf43e09cb71d42017e96ea0ed"
            },
            "downloads": -1,
            "filename": "tauro-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "27bfa77c5c69f933b907feabb13c986f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 114712,
            "upload_time": "2025-08-29T00:32:16",
            "upload_time_iso_8601": "2025-08-29T00:32:16.989718Z",
            "url": "https://files.pythonhosted.org/packages/b1/04/bf51279bb0396d06dde66468eb8f1d8211d9e9907959c859aa99a2e434c1/tauro-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-29 00:32:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tu-usuario",
    "github_project": "enhanced-tauro",
    "github_not_found": true,
    "lcname": "tauro"
}
        
Elapsed time: 1.36964s