# mcap_etl
`.mcap` (MCAP) and `.bag` (ROS Bag) files pose challenges for data engineers due to their large size, complex structure, and lack of standardization. Managing and transferring these massive files requires substantial storage capacity and bandwidth, leading to slow processing times. Extracting and interpreting the desired information becomes time-consuming and error-prone due to the mix of timestamped messages and the need for custom parsing and processing pipelines.
mcap_etl is alleviating these pain points by offering a comprehensive suite of options to transform the contents of MCAP files into a structured database or alternative file formats, streamlining the data engineering process and enabling easier analysis and visualization of the captured data.
Presently, mcap_etl supports the following conversions:
* From `.mcap` to `.bag`
* From `.mcap` to TimescaleDB
We are actively seeking to extend this list and invite community contributions. Specifically, we aim to include transformations to InfluxDB, Timestream, and Parquet.
## Installation and Usage
### Installation
mcap_etl can be installed easily via pip:
```shell
pip install mcap-etl
```
### Usage
mcap_etl requires a running Timescale database. Once set up, you can execute jobs against any file. Note that the database connection parameters are optional, enhancing flexibility.
```shell
mcap_etl timescale \
--host localhost \
--port 5432 \
--user postgres \
--password password \
--name postgres \
/path/to/file.mcap
```
You can now perform queries against your database.
## Future Development: Hosted Solution
We are in the process of developing a hosted solution, offering:
* Managed services for data ingestion, database, and infrastructure for integrations, including S3 and Grafana.
* A tool to convert data back from Timescale to `.mcap` and `.bag` formats.
* Vector search capabilities for unstructured data types, such as imagery and audio.
* A web interface to monitor and share data with your team.
## Key Design Considerations
mcap_etl has been designed with the following key principles:
* __Timescale Hypertables__: Due to the large number of messages, we use Timescale's hypertables and take advantage of their compression features.
* __No ROS Dependency__: mcap_etl operates without any ROS dependencies, avoiding the complexity of ROS installations for simple data extraction from `.bag` or `.mcap` files. Instead, we utilize the `rosbags` project, which allows for dynamic loading of message schemas at runtime.
* __MCAP to ROS bag Transformation__: mcap_etl first converts MCAP files into ROS bags before performing data transformations. This approach avoids the need to rewrite the ingestion flow.
Raw data
{
"_id": null,
"home_page": "https://github.com/SensorSurf/mcap-etl",
"name": "mcap-etl",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "ros,ros2,rosbag,mcap,timescale,etl,timeseries,database,etl",
"author": "SensorSurf",
"author_email": "support@sensorsurf.com",
"download_url": "https://files.pythonhosted.org/packages/cd/95/f175244f7a90662896c0a67ea01bda833251e7bada9f041728b53c3575c7/mcap-etl-0.1.4.tar.gz",
"platform": null,
"description": "# mcap_etl\n\n`.mcap` (MCAP) and `.bag` (ROS Bag) files pose challenges for data engineers due to their large size, complex structure, and lack of standardization. Managing and transferring these massive files requires substantial storage capacity and bandwidth, leading to slow processing times. Extracting and interpreting the desired information becomes time-consuming and error-prone due to the mix of timestamped messages and the need for custom parsing and processing pipelines.\n\nmcap_etl is alleviating these pain points by offering a comprehensive suite of options to transform the contents of MCAP files into a structured database or alternative file formats, streamlining the data engineering process and enabling easier analysis and visualization of the captured data.\n\nPresently, mcap_etl supports the following conversions:\n* From `.mcap` to `.bag`\n* From `.mcap` to TimescaleDB\n\nWe are actively seeking to extend this list and invite community contributions. Specifically, we aim to include transformations to InfluxDB, Timestream, and Parquet.\n\n## Installation and Usage\n\n### Installation\n\nmcap_etl can be installed easily via pip:\n```shell\npip install mcap-etl\n```\n\n### Usage\n\nmcap_etl requires a running Timescale database. Once set up, you can execute jobs against any file. Note that the database connection parameters are optional, enhancing flexibility.\n```shell\nmcap_etl timescale \\\n --host localhost \\\n --port 5432 \\\n --user postgres \\\n --password password \\\n --name postgres \\\n /path/to/file.mcap\n```\n\nYou can now perform queries against your database.\n\n## Future Development: Hosted Solution\n\nWe are in the process of developing a hosted solution, offering:\n\n* Managed services for data ingestion, database, and infrastructure for integrations, including S3 and Grafana.\n* A tool to convert data back from Timescale to `.mcap` and `.bag` formats.\n* Vector search capabilities for unstructured data types, such as imagery and audio.\n* A web interface to monitor and share data with your team.\n\n## Key Design Considerations\n\nmcap_etl has been designed with the following key principles:\n\n* __Timescale Hypertables__: Due to the large number of messages, we use Timescale's hypertables and take advantage of their compression features.\n* __No ROS Dependency__: mcap_etl operates without any ROS dependencies, avoiding the complexity of ROS installations for simple data extraction from `.bag` or `.mcap` files. Instead, we utilize the `rosbags` project, which allows for dynamic loading of message schemas at runtime.\n* __MCAP to ROS bag Transformation__: mcap_etl first converts MCAP files into ROS bags before performing data transformations. This approach avoids the need to rewrite the ingestion flow.\n",
"bugtrack_url": null,
"license": "",
"summary": "Transform mcap (or rosbag) files into databases or other files",
"version": "0.1.4",
"project_urls": {
"Bug Reports": "https://github.com/SensorSurf/mcap-etl/issues",
"Homepage": "https://github.com/SensorSurf/mcap-etl",
"Source": "https://github.com/SensorSurf/mcap-etl"
},
"split_keywords": [
"ros",
"ros2",
"rosbag",
"mcap",
"timescale",
"etl",
"timeseries",
"database",
"etl"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d7dc69e9bc06444fb5778c10c9fb0bec7375ba2741d0faa5d3ce6d05e043bce9",
"md5": "c2ff5754c62848c75ba09f1f0a28b3ea",
"sha256": "8a99bf3b8f34d719e3f747446ed081c4122f043acd60f3d9dcffd2111bd15c5b"
},
"downloads": -1,
"filename": "mcap_etl-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c2ff5754c62848c75ba09f1f0a28b3ea",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 9110,
"upload_time": "2023-06-07T20:40:56",
"upload_time_iso_8601": "2023-06-07T20:40:56.747809Z",
"url": "https://files.pythonhosted.org/packages/d7/dc/69e9bc06444fb5778c10c9fb0bec7375ba2741d0faa5d3ce6d05e043bce9/mcap_etl-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "cd95f175244f7a90662896c0a67ea01bda833251e7bada9f041728b53c3575c7",
"md5": "f5fdb13cc4a85bff9e20a5ef659b6906",
"sha256": "110604b55ec4c5ee0cc1009741ab8552afcc028d020bc81a3f8f228cea528be6"
},
"downloads": -1,
"filename": "mcap-etl-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "f5fdb13cc4a85bff9e20a5ef659b6906",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 9660,
"upload_time": "2023-06-07T20:40:58",
"upload_time_iso_8601": "2023-06-07T20:40:58.644481Z",
"url": "https://files.pythonhosted.org/packages/cd/95/f175244f7a90662896c0a67ea01bda833251e7bada9f041728b53c3575c7/mcap-etl-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-07 20:40:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "SensorSurf",
"github_project": "mcap-etl",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "mcap-etl"
}