mcap-etl


Namemcap-etl JSON
Version 0.1.4 PyPI version JSON
download
home_pagehttps://github.com/SensorSurf/mcap-etl
SummaryTransform mcap (or rosbag) files into databases or other files
upload_time2023-06-07 20:40:58
maintainer
docs_urlNone
authorSensorSurf
requires_python
license
keywords ros ros2 rosbag mcap timescale etl timeseries database etl
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # mcap_etl

`.mcap` (MCAP) and `.bag` (ROS Bag) files pose challenges for data engineers due to their large size, complex structure, and lack of standardization. Managing and transferring these massive files requires substantial storage capacity and bandwidth, leading to slow processing times. Extracting and interpreting the desired information becomes time-consuming and error-prone due to the mix of timestamped messages and the need for custom parsing and processing pipelines.

mcap_etl is alleviating these pain points by offering a comprehensive suite of options to transform the contents of MCAP files into a structured database or alternative file formats, streamlining the data engineering process and enabling easier analysis and visualization of the captured data.

Presently, mcap_etl supports the following conversions:
* From `.mcap` to `.bag`
* From `.mcap` to TimescaleDB

We are actively seeking to extend this list and invite community contributions. Specifically, we aim to include transformations to InfluxDB, Timestream, and Parquet.

## Installation and Usage

### Installation

mcap_etl can be installed easily via pip:
```shell
pip install mcap-etl
```

### Usage

mcap_etl requires a running Timescale database. Once set up, you can execute jobs against any file. Note that the database connection parameters are optional, enhancing flexibility.
```shell
mcap_etl timescale \
    --host localhost \
    --port 5432 \
    --user postgres \
    --password password \
    --name postgres \
    /path/to/file.mcap
```

You can now perform queries against your database.

## Future Development: Hosted Solution

We are in the process of developing a hosted solution, offering:

* Managed services for data ingestion, database, and infrastructure for integrations, including S3 and Grafana.
* A tool to convert data back from Timescale to `.mcap` and `.bag` formats.
* Vector search capabilities for unstructured data types, such as imagery and audio.
* A web interface to monitor and share data with your team.

## Key Design Considerations

mcap_etl has been designed with the following key principles:

* __Timescale Hypertables__: Due to the large number of messages, we use Timescale's hypertables and take advantage of their compression features.
* __No ROS Dependency__: mcap_etl operates without any ROS dependencies, avoiding the complexity of ROS installations for simple data extraction from `.bag` or `.mcap` files. Instead, we utilize the `rosbags` project, which allows for dynamic loading of message schemas at runtime.
* __MCAP to ROS bag Transformation__: mcap_etl first converts MCAP files into ROS bags before performing data transformations. This approach avoids the need to rewrite the ingestion flow.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SensorSurf/mcap-etl",
    "name": "mcap-etl",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "ros,ros2,rosbag,mcap,timescale,etl,timeseries,database,etl",
    "author": "SensorSurf",
    "author_email": "support@sensorsurf.com",
    "download_url": "https://files.pythonhosted.org/packages/cd/95/f175244f7a90662896c0a67ea01bda833251e7bada9f041728b53c3575c7/mcap-etl-0.1.4.tar.gz",
    "platform": null,
    "description": "# mcap_etl\n\n`.mcap` (MCAP) and `.bag` (ROS Bag) files pose challenges for data engineers due to their large size, complex structure, and lack of standardization. Managing and transferring these massive files requires substantial storage capacity and bandwidth, leading to slow processing times. Extracting and interpreting the desired information becomes time-consuming and error-prone due to the mix of timestamped messages and the need for custom parsing and processing pipelines.\n\nmcap_etl is alleviating these pain points by offering a comprehensive suite of options to transform the contents of MCAP files into a structured database or alternative file formats, streamlining the data engineering process and enabling easier analysis and visualization of the captured data.\n\nPresently, mcap_etl supports the following conversions:\n* From `.mcap` to `.bag`\n* From `.mcap` to TimescaleDB\n\nWe are actively seeking to extend this list and invite community contributions. Specifically, we aim to include transformations to InfluxDB, Timestream, and Parquet.\n\n## Installation and Usage\n\n### Installation\n\nmcap_etl can be installed easily via pip:\n```shell\npip install mcap-etl\n```\n\n### Usage\n\nmcap_etl requires a running Timescale database. Once set up, you can execute jobs against any file. Note that the database connection parameters are optional, enhancing flexibility.\n```shell\nmcap_etl timescale \\\n    --host localhost \\\n    --port 5432 \\\n    --user postgres \\\n    --password password \\\n    --name postgres \\\n    /path/to/file.mcap\n```\n\nYou can now perform queries against your database.\n\n## Future Development: Hosted Solution\n\nWe are in the process of developing a hosted solution, offering:\n\n* Managed services for data ingestion, database, and infrastructure for integrations, including S3 and Grafana.\n* A tool to convert data back from Timescale to `.mcap` and `.bag` formats.\n* Vector search capabilities for unstructured data types, such as imagery and audio.\n* A web interface to monitor and share data with your team.\n\n## Key Design Considerations\n\nmcap_etl has been designed with the following key principles:\n\n* __Timescale Hypertables__: Due to the large number of messages, we use Timescale's hypertables and take advantage of their compression features.\n* __No ROS Dependency__: mcap_etl operates without any ROS dependencies, avoiding the complexity of ROS installations for simple data extraction from `.bag` or `.mcap` files. Instead, we utilize the `rosbags` project, which allows for dynamic loading of message schemas at runtime.\n* __MCAP to ROS bag Transformation__: mcap_etl first converts MCAP files into ROS bags before performing data transformations. This approach avoids the need to rewrite the ingestion flow.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Transform mcap (or rosbag) files into databases or other files",
    "version": "0.1.4",
    "project_urls": {
        "Bug Reports": "https://github.com/SensorSurf/mcap-etl/issues",
        "Homepage": "https://github.com/SensorSurf/mcap-etl",
        "Source": "https://github.com/SensorSurf/mcap-etl"
    },
    "split_keywords": [
        "ros",
        "ros2",
        "rosbag",
        "mcap",
        "timescale",
        "etl",
        "timeseries",
        "database",
        "etl"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d7dc69e9bc06444fb5778c10c9fb0bec7375ba2741d0faa5d3ce6d05e043bce9",
                "md5": "c2ff5754c62848c75ba09f1f0a28b3ea",
                "sha256": "8a99bf3b8f34d719e3f747446ed081c4122f043acd60f3d9dcffd2111bd15c5b"
            },
            "downloads": -1,
            "filename": "mcap_etl-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c2ff5754c62848c75ba09f1f0a28b3ea",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 9110,
            "upload_time": "2023-06-07T20:40:56",
            "upload_time_iso_8601": "2023-06-07T20:40:56.747809Z",
            "url": "https://files.pythonhosted.org/packages/d7/dc/69e9bc06444fb5778c10c9fb0bec7375ba2741d0faa5d3ce6d05e043bce9/mcap_etl-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cd95f175244f7a90662896c0a67ea01bda833251e7bada9f041728b53c3575c7",
                "md5": "f5fdb13cc4a85bff9e20a5ef659b6906",
                "sha256": "110604b55ec4c5ee0cc1009741ab8552afcc028d020bc81a3f8f228cea528be6"
            },
            "downloads": -1,
            "filename": "mcap-etl-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "f5fdb13cc4a85bff9e20a5ef659b6906",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 9660,
            "upload_time": "2023-06-07T20:40:58",
            "upload_time_iso_8601": "2023-06-07T20:40:58.644481Z",
            "url": "https://files.pythonhosted.org/packages/cd/95/f175244f7a90662896c0a67ea01bda833251e7bada9f041728b53c3575c7/mcap-etl-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-07 20:40:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SensorSurf",
    "github_project": "mcap-etl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "mcap-etl"
}
        
Elapsed time: 0.21711s