digout

Name	digout JSON
Version	0.1.1 JSON
	download
home_page	None
Summary	Pipeline framework to dump analysis-ready data from LHCb grid-based files
upload_time	2025-08-04 01:33:35
maintainer	None
docs_url	None
author	None
requires_python	>=3.11
license	Apache License (2.0)
keywords	dag cern computation graph grid lhcb pipeline workflow
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            
<picture align="center">
  <img alt="Digout logo" src="https://gitlab.cern.ch/particlepredatorinvasion/digout/raw/master/docs/source/_static/digout.svg">
</picture>

<p align="center">
  <a href="https://gitlab.cern.ch/particlepredatorinvasion/digout/-/pipelines/">
    <img alt="Pipeline Status" src="https://gitlab.cern.ch/particlepredatorinvasion/digout/badges/master/pipeline.svg" />
  </a>
  <a href="https://gitlab.cern.ch/particlepredatorinvasion/digout/-/blob/master/LICENSE">
    <img alt="License" src="https://img.shields.io/pypi/l/digout" />
  </a>
  <a href="https://gitlab.cern.ch/particlepredatorinvasion/digout/-/releases">
    <img alt="Latest Release" src="https://gitlab.cern.ch/particlepredatorinvasion/digout/-/badges/release.svg" />
  </a>
  <a href="https://pypi.org/project/digout/">
    <img alt="PyPI - Version" src="https://img.shields.io/pypi/v/digout" />
  </a>
  <a href="https://pypi.org/project/digout/">
    <img alt="Python Version" src="https://img.shields.io/pypi/pyversions/digout" />
  </a>
  <a href="https://digout.docs.cern.ch">
    <img alt="Documentation Status" src="https://img.shields.io/badge/documentation-view-blue.svg" />
  </a>
  <a href="https://digout.docs.cern.ch/master/development/contributing.html">
    <img alt="Contributing Guide" src="https://img.shields.io/badge/contributing-guide-blue.svg" />
  </a>
</p>

`digout` is a Python library purpose-built to execute the multi-stage workflow
of converting raw LHCb `DIGI` files into analysis-ready `parquet` dataframes
of particles and hits.

To manage this process in a scalable and reproducible manner,
it implements a workflow framework organized around configurable **steps**
(e.g., `digi2root`, `root2df`).
The framework operates on a two-phase execution model:
a **stream phase** runs once to prepare the dataset from a bookkeeping path,
and a **chunk phase** processes each input file in parallel.
This parallel execution is managed by swappable **schedulers**
(such as `local` for local processing or `htcondor` for cluster submission),
with the entire workflow being defined through YAML configuration files
to ensure complete reproducibility.

## Resources

| Link                                                                                          | Description                                                                  |
|:----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------|
| 📖 **[Full Documentation](https://digout.docs.cern.ch)**                                      | The complete guide to installation, configuration, and concepts.             |
| 🚀 **[Quickstart Guide](https://digout.docs.cern.ch/master/getstarted/quickstart.html)**      | The fastest way to get a working example running.                            |
| 💡 **[Contributing Guide](https://digout.docs.cern.ch/master/development/contributing.html)** | Learn how to set up a development environment and contribute to the project. |
| 🐛 **[Report a Bug](https://gitlab.cern.ch/particlepredatorinvasion/digout/-/issues)**        | Found an issue? Let us know by creating a bug report.                        |
| 📜 **[Changelog](https://gitlab.cern.ch/particlepredatorinvasion/digout/-/releases)**         | See the latest changes from the release page                                 |

## Core Features

- **Automated Metadata Discovery**:
  Automatically queries the LHCb bookkeeping system to retrieve necessary
  metadata (`dddb_tag`, `conddb_tag`, etc.), eliminating manual lookup.
- **Scalable Parallel Processing**:
  Built-in support for processing large datasets in parallel on a local machine
  or on a distributed cluster like HTCondor.
- **Configuration-Driven and Reproducible**:
  Define your entire workflow in YAML files.
  `digout` saves the final, resolved configuration for every run,
  ensuring any result can be reproduced.
- **Idempotent Execution**:
  Automatically detects and skips steps that have already been completed.
- **Extensible Architecture**: Easily define new steps or schedulers.

## Main Workflows

- **DIGI to DataFrame Conversion**:
  Produce analysis-ready `parquet` dataframes from LHCb `DIGI` files.
  The available output dataframes are detailed
  on the [DataFrames Page](https://digout.docs.cern.ch/master/concepts/dataframes.html).

- **DIGI to MDF Conversion**:
  Convert LHCb `DIGI` files into the `.mdf` format required as input
  for the [Allen framework](https://gitlab.cern.ch/lhcb/Allen).

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "digout",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "DAG, cern, computation, graph, grid, lhcb, pipeline, workflow",
    "author": null,
    "author_email": "anthonyc <anthony.correia@cern.ch>",
    "download_url": null,
    "platform": null,
    "description": "\n<picture align=\"center\">\n  <img alt=\"Digout logo\" src=\"https://gitlab.cern.ch/particlepredatorinvasion/digout/raw/master/docs/source/_static/digout.svg\">\n</picture>\n\n<p align=\"center\">\n  <a href=\"https://gitlab.cern.ch/particlepredatorinvasion/digout/-/pipelines/\">\n    <img alt=\"Pipeline Status\" src=\"https://gitlab.cern.ch/particlepredatorinvasion/digout/badges/master/pipeline.svg\" />\n  </a>\n  <a href=\"https://gitlab.cern.ch/particlepredatorinvasion/digout/-/blob/master/LICENSE\">\n    <img alt=\"License\" src=\"https://img.shields.io/pypi/l/digout\" />\n  </a>\n  <a href=\"https://gitlab.cern.ch/particlepredatorinvasion/digout/-/releases\">\n    <img alt=\"Latest Release\" src=\"https://gitlab.cern.ch/particlepredatorinvasion/digout/-/badges/release.svg\" />\n  </a>\n  <a href=\"https://pypi.org/project/digout/\">\n    <img alt=\"PyPI - Version\" src=\"https://img.shields.io/pypi/v/digout\" />\n  </a>\n  <a href=\"https://pypi.org/project/digout/\">\n    <img alt=\"Python Version\" src=\"https://img.shields.io/pypi/pyversions/digout\" />\n  </a>\n  <a href=\"https://digout.docs.cern.ch\">\n    <img alt=\"Documentation Status\" src=\"https://img.shields.io/badge/documentation-view-blue.svg\" />\n  </a>\n  <a href=\"https://digout.docs.cern.ch/master/development/contributing.html\">\n    <img alt=\"Contributing Guide\" src=\"https://img.shields.io/badge/contributing-guide-blue.svg\" />\n  </a>\n</p>\n\n`digout` is a Python library purpose-built to execute the multi-stage workflow\nof converting raw LHCb `DIGI` files into analysis-ready `parquet` dataframes\nof particles and hits.\n\nTo manage this process in a scalable and reproducible manner,\nit implements a workflow framework organized around configurable **steps**\n(e.g., `digi2root`, `root2df`).\nThe framework operates on a two-phase execution model:\na **stream phase** runs once to prepare the dataset from a bookkeeping path,\nand a **chunk phase** processes each input file in parallel.\nThis parallel execution is managed by swappable **schedulers**\n(such as `local` for local processing or `htcondor` for cluster submission),\nwith the entire workflow being defined through YAML configuration files\nto ensure complete reproducibility.\n\n## Resources\n\n| Link                                                                                          | Description                                                                  |\n|:----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------|\n| \ud83d\udcd6 **[Full Documentation](https://digout.docs.cern.ch)**                                      | The complete guide to installation, configuration, and concepts.             |\n| \ud83d\ude80 **[Quickstart Guide](https://digout.docs.cern.ch/master/getstarted/quickstart.html)**      | The fastest way to get a working example running.                            |\n| \ud83d\udca1 **[Contributing Guide](https://digout.docs.cern.ch/master/development/contributing.html)** | Learn how to set up a development environment and contribute to the project. |\n| \ud83d\udc1b **[Report a Bug](https://gitlab.cern.ch/particlepredatorinvasion/digout/-/issues)**        | Found an issue? Let us know by creating a bug report.                        |\n| \ud83d\udcdc **[Changelog](https://gitlab.cern.ch/particlepredatorinvasion/digout/-/releases)**         | See the latest changes from the release page                                 |\n\n## Core Features\n\n- **Automated Metadata Discovery**:\n  Automatically queries the LHCb bookkeeping system to retrieve necessary\n  metadata (`dddb_tag`, `conddb_tag`, etc.), eliminating manual lookup.\n- **Scalable Parallel Processing**:\n  Built-in support for processing large datasets in parallel on a local machine\n  or on a distributed cluster like HTCondor.\n- **Configuration-Driven and Reproducible**:\n  Define your entire workflow in YAML files.\n  `digout` saves the final, resolved configuration for every run,\n  ensuring any result can be reproduced.\n- **Idempotent Execution**:\n  Automatically detects and skips steps that have already been completed.\n- **Extensible Architecture**: Easily define new steps or schedulers.\n\n## Main Workflows\n\n- **DIGI to DataFrame Conversion**:\n  Produce analysis-ready `parquet` dataframes from LHCb `DIGI` files.\n  The available output dataframes are detailed\n  on the [DataFrames Page](https://digout.docs.cern.ch/master/concepts/dataframes.html).\n\n- **DIGI to MDF Conversion**:\n  Convert LHCb `DIGI` files into the `.mdf` format required as input\n  for the [Allen framework](https://gitlab.cern.ch/lhcb/Allen).\n",
    "bugtrack_url": null,
    "license": "Apache License (2.0)",
    "summary": "Pipeline framework to dump analysis-ready data from LHCb grid-based files",
    "version": "0.1.1",
    "project_urls": {
        "Changelog": "https://gitlab.cern.ch/particlepredatorinvasion/digout/-/blob/master/CHANGELOG.md",
        "Documentation": "https://digout.docs.cern.ch",
        "Homepage": "https://gitlab.cern.ch/particlepredatorinvasion/digout",
        "Issues": "https://gitlab.cern.ch/particlepredatorinvasion/digout/issues",
        "Repository": "https://gitlab.cern.ch/particlepredatorinvasion/digout"
    },
    "split_keywords": [
        "dag",
        " cern",
        " computation",
        " graph",
        " grid",
        " lhcb",
        " pipeline",
        " workflow"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "07b3dce00e8313c6436f0c25dde30f6528137616432b2aa3a918e5f723defabf",
                "md5": "6b745b677def77917f8095e32685e2c1",
                "sha256": "375c0854f571e513ad3b1dcd1a1d0667619e5dcc328eb808622515744659d5b2"
            },
            "downloads": -1,
            "filename": "digout-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6b745b677def77917f8095e32685e2c1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 152930,
            "upload_time": "2025-08-04T01:33:35",
            "upload_time_iso_8601": "2025-08-04T01:33:35.855861Z",
            "url": "https://files.pythonhosted.org/packages/07/b3/dce00e8313c6436f0c25dde30f6528137616432b2aa3a918e5f723defabf/digout-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-04 01:33:35",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "digout"
}

None