| Name | pit30m JSON |
| Version |
0.0.2
JSON |
| download |
| home_page | https://github.com/pit30m/pit30m |
| Summary | Development kit for the Pit30M large scale localization dataset |
| upload_time | 2023-06-16 02:27:17 |
| maintainer | |
| docs_url | None |
| author | Andrei Bârsan |
| requires_python | >=3.9,<4.0 |
| license | MIT |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# Pit30M Development Kit
[](https://github.com/pit30m/pit30m/actions/workflows/ci.yaml)
[](./LICENSE)
[](https://pypi.org/project/pit30m/)
[](#)
## Overview
This is the Python software development kit for the Pit30M benchmark for large-scale global localization. The devkit is currently in a pre-release state and many features are coming soon!
Consider checking out [the original paper](https://arxiv.org/abs/2012.12437). If you would like to, you could also follow some of the authors' social media, e.g., [Julieta's](https://twitter.com/yoknapathawa) or [Andrei's](https://twitter.com/andreib) in order to be among the first to hear of any updates!
## Getting Started
The recommended way to interact with the dataset is with the `pip` package, which you can install with:
`pip install pit30m`
The devkit lets you efficiently access data on the fly. Here is a "hello world" command which renders a demo video from a random log segment. Note that it assumes `ffmpeg` is installed:
`python -m pit30m.cli multicam_demo --out-dir .`
To preview data more interactively, check out the [tutorial notebook](examples/tutorial_00_introduction.ipynb).
[](https://studiolab.sagemaker.aws/import/github/pit30m/pit30m/blob/main/examples/tutorial_00_introduction.ipynb)
More tutorials coming soon.
### Torch Data Loading
We provide basic log-based PyTorch dataloaders. Visual-localization-specific ones are coming soon. To see an
example on how to use one of these dataloaders, have a look at `demo_dataloader` in `torch/dataset.py`.
An example command:
```
python -m pit30m.torch.dataset --root-uri s3://pit30m/ --logs 00682fa6-2183-4a0d-dcfe-bc38c448090f,021286dc-5fe5-445f-e5fa-f875f2eb3c57,1c915eda-c18a-46d5-e1ec-e4f624605ff0 --num-workers 16 --batch-size 64
```
## Features
* Framework-agnostic multiprocessing-safe log reader objects
* PyTorch dataloaders
### In-progress
* More lightweight package with fewer dependencies.
* Very efficient native S3 support through [AWS-authored PyTorch-optimized S3 DataPipes](https://aws.amazon.com/blogs/machine-learning/announcing-the-amazon-s3-plugin-for-pytorch/).
* Support for non-S3 data sources, for users who wish to copy the dataset, or parts of it, to their own storage.
* Tons of examples and tutorials. See `examples/` for more information.
## Development
Package development, testing, and releasing is performed with `poetry`. If you just want to use the `pit30m` package, you don't need to care about this section; just have a look at "Getting Started" above!
1. [Install poetry](https://python-poetry.org/docs/)
2. Setup/update your `dev` virtual environments with `poetry install --with=dev` in the project root
- If you encounter strange keyring/credential errors, you may need `PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring poetry install --with=dev`
3. Develop away
- run commands like `poetry run python -m pit30m.cli`
4. Test with `poetry run pytest`
- Advanced command: `poetry run pytest -ff --new-first --quiet --color=yes --maxfail=3 -n 4`
- This command will run tests, then wait for new changes and test them automatically. Test execution will run in parallel thanks to the `-n 4` argument.
- The command lets you get feedback on whether your code change fixed or broke a particular test within seconds.
5. Make sure you lint. Refer to `ci.yaml` for exact commands we run in CI.
- format with black: `poetry run black .`
- type check with mypy: `poetry run mypy . --ignore-missing-imports`
6. Remember to run `poetry install` after pulling and/or updating dependencies.
Note that in the pre-release time, `torch` will be a "dev" dependency, since it's necessary for all tests to pass.
### Publishing
#### First Time
1. [Configure poetry](https://www.digitalocean.com/community/tutorials/how-to-publish-python-packages-to-pypi-using-poetry-on-ubuntu-22-04) with a PyPI account which has access to edit the package. You need to make sure poetry is configured with your API key.
#### New Release
1. Decide on the commit with the desired changes.
2. Bump the version number in `pyproject.toml`.
3. `git tag vA.B.C` and `git push origin <tag>`
4. `poetry publish --build`
5. Create the release on GitHub.
## Citation
```bibtex
@inproceedings{martinez2020pit30m,
title={Pit30m: A benchmark for global localization in the age of self-driving cars},
author={Martinez, Julieta and Doubov, Sasha and Fan, Jack and B{\^a}rsan, Ioan Andrei and Wang, Shenlong and M{\'a}ttyus, Gell{\'e}rt and Urtasun, Raquel},
booktitle={{IROS}},
pages={4477--4484},
year={2020},
organization={IEEE}
}
```
## Additional Details
### Images
#### Compression
The images in the dataset are stored using lossy WebP compression at quality level 85. We picked this as a sweet spot between space- and network-bandwidth-saving (about 10x smaller than equivalent PNGs) and maintaining very good image quality for tasks such as SLAM, 3D reconstruction, and visual localization. The images were saved using `Pillow 9.2.0`.
The `s3://pit30m/raw` prefix contains lossless image data for a small subset of the logs present in the bucket root. This can be used as a reference by those curious in understanding which artifacts are induced by the lossy compression, and which are inherent in the raw data.
#### Known Issues
A fraction of the images in the dataset exhibit artifacts such as a strong purple tint or missing data (white images). An even smaller fraction of these purple images sometimes shows strong blocky compression artifacts. These represent a known (and, at this scale, difficult to avoid) problem; it was already present in the original raw logs from which we generated the public facing benchmark. Perfectly blank images can be detected quite reliably in a data loader or ETL script by checking whether `np.mean(img) > 250`.
On example of a log with many blank (whiteout) images is `8438b1ba-44e2-4456-f83b-207351a99865`.
### Ground Truth Sanitization
Poses belonging to test-query are not available and have been removed with zeroes / NaNs / blank submap IDs in the corresponding pose files and indexes. The long term plan is to use this held-out test query ground truth for a public leaderboard. More information will come in the second half of 2023. In the meantime, there should be a ton of data to iterate on using the publicly available train and val splits.
Raw data
{
"_id": null,
"home_page": "https://github.com/pit30m/pit30m",
"name": "pit30m",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<4.0",
"maintainer_email": "",
"keywords": "",
"author": "Andrei B\u00e2rsan",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/9a/35/47a7e7920bb5f285ae0f5b018a9a4752a93aecaad782e24f3868df164514/pit30m-0.0.2.tar.gz",
"platform": null,
"description": "# Pit30M Development Kit\n\n[](https://github.com/pit30m/pit30m/actions/workflows/ci.yaml)\n[](./LICENSE)\n[](https://pypi.org/project/pit30m/)\n[](#)\n\n## Overview\nThis is the Python software development kit for the Pit30M benchmark for large-scale global localization. The devkit is currently in a pre-release state and many features are coming soon!\n\nConsider checking out [the original paper](https://arxiv.org/abs/2012.12437). If you would like to, you could also follow some of the authors' social media, e.g., [Julieta's](https://twitter.com/yoknapathawa) or [Andrei's](https://twitter.com/andreib) in order to be among the first to hear of any updates!\n\n## Getting Started\n\nThe recommended way to interact with the dataset is with the `pip` package, which you can install with:\n\n`pip install pit30m`\n\nThe devkit lets you efficiently access data on the fly. Here is a \"hello world\" command which renders a demo video from a random log segment. Note that it assumes `ffmpeg` is installed:\n\n`python -m pit30m.cli multicam_demo --out-dir .`\n\nTo preview data more interactively, check out the [tutorial notebook](examples/tutorial_00_introduction.ipynb).\n[](https://studiolab.sagemaker.aws/import/github/pit30m/pit30m/blob/main/examples/tutorial_00_introduction.ipynb)\n\nMore tutorials coming soon.\n\n### Torch Data Loading\n\nWe provide basic log-based PyTorch dataloaders. Visual-localization-specific ones are coming soon. To see an\nexample on how to use one of these dataloaders, have a look at `demo_dataloader` in `torch/dataset.py`.\n\nAn example command:\n\n```\npython -m pit30m.torch.dataset --root-uri s3://pit30m/ --logs 00682fa6-2183-4a0d-dcfe-bc38c448090f,021286dc-5fe5-445f-e5fa-f875f2eb3c57,1c915eda-c18a-46d5-e1ec-e4f624605ff0 --num-workers 16 --batch-size 64\n```\n\n## Features\n\n * Framework-agnostic multiprocessing-safe log reader objects\n * PyTorch dataloaders\n\n### In-progress\n * More lightweight package with fewer dependencies.\n * Very efficient native S3 support through [AWS-authored PyTorch-optimized S3 DataPipes](https://aws.amazon.com/blogs/machine-learning/announcing-the-amazon-s3-plugin-for-pytorch/).\n * Support for non-S3 data sources, for users who wish to copy the dataset, or parts of it, to their own storage.\n * Tons of examples and tutorials. See `examples/` for more information.\n\n\n## Development\n\nPackage development, testing, and releasing is performed with `poetry`. If you just want to use the `pit30m` package, you don't need to care about this section; just have a look at \"Getting Started\" above!\n\n 1. [Install poetry](https://python-poetry.org/docs/)\n 2. Setup/update your `dev` virtual environments with `poetry install --with=dev` in the project root\n - If you encounter strange keyring/credential errors, you may need `PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring poetry install --with=dev`\n 3. Develop away\n - run commands like `poetry run python -m pit30m.cli`\n 4. Test with `poetry run pytest`\n - Advanced command: `poetry run pytest -ff --new-first --quiet --color=yes --maxfail=3 -n 4`\n - This command will run tests, then wait for new changes and test them automatically. Test execution will run in parallel thanks to the `-n 4` argument.\n - The command lets you get feedback on whether your code change fixed or broke a particular test within seconds.\n 5. Make sure you lint. Refer to `ci.yaml` for exact commands we run in CI.\n - format with black: `poetry run black .`\n - type check with mypy: `poetry run mypy . --ignore-missing-imports`\n 6. Remember to run `poetry install` after pulling and/or updating dependencies.\n\n\nNote that in the pre-release time, `torch` will be a \"dev\" dependency, since it's necessary for all tests to pass.\n\n### Publishing\n\n#### First Time\n 1. [Configure poetry](https://www.digitalocean.com/community/tutorials/how-to-publish-python-packages-to-pypi-using-poetry-on-ubuntu-22-04) with a PyPI account which has access to edit the package. You need to make sure poetry is configured with your API key.\n\n#### New Release\n 1. Decide on the commit with the desired changes.\n 2. Bump the version number in `pyproject.toml`.\n 3. `git tag vA.B.C` and `git push origin <tag>`\n 4. `poetry publish --build`\n 5. Create the release on GitHub.\n\n\n## Citation\n\n```bibtex\n@inproceedings{martinez2020pit30m,\n title={Pit30m: A benchmark for global localization in the age of self-driving cars},\n author={Martinez, Julieta and Doubov, Sasha and Fan, Jack and B{\\^a}rsan, Ioan Andrei and Wang, Shenlong and M{\\'a}ttyus, Gell{\\'e}rt and Urtasun, Raquel},\n booktitle={{IROS}},\n pages={4477--4484},\n year={2020},\n organization={IEEE}\n}\n```\n\n## Additional Details\n\n### Images\n\n#### Compression\nThe images in the dataset are stored using lossy WebP compression at quality level 85. We picked this as a sweet spot between space- and network-bandwidth-saving (about 10x smaller than equivalent PNGs) and maintaining very good image quality for tasks such as SLAM, 3D reconstruction, and visual localization. The images were saved using `Pillow 9.2.0`.\n\nThe `s3://pit30m/raw` prefix contains lossless image data for a small subset of the logs present in the bucket root. This can be used as a reference by those curious in understanding which artifacts are induced by the lossy compression, and which are inherent in the raw data.\n\n#### Known Issues\n\nA fraction of the images in the dataset exhibit artifacts such as a strong purple tint or missing data (white images). An even smaller fraction of these purple images sometimes shows strong blocky compression artifacts. These represent a known (and, at this scale, difficult to avoid) problem; it was already present in the original raw logs from which we generated the public facing benchmark. Perfectly blank images can be detected quite reliably in a data loader or ETL script by checking whether `np.mean(img) > 250`.\n\nOn example of a log with many blank (whiteout) images is `8438b1ba-44e2-4456-f83b-207351a99865`.\n\n### Ground Truth Sanitization\n\nPoses belonging to test-query are not available and have been removed with zeroes / NaNs / blank submap IDs in the corresponding pose files and indexes. The long term plan is to use this held-out test query ground truth for a public leaderboard. More information will come in the second half of 2023. In the meantime, there should be a ton of data to iterate on using the publicly available train and val splits.",
"bugtrack_url": null,
"license": "MIT",
"summary": "Development kit for the Pit30M large scale localization dataset",
"version": "0.0.2",
"project_urls": {
"Homepage": "https://github.com/pit30m/pit30m"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a4672f8c5e9fd3fb7c780b087416095371f24bfa6a0b4b52b50dd21360e3a140",
"md5": "bad99c9bbf70ac7395231de3e8647ac5",
"sha256": "df2ca0b49a81b66a325dda8b22003872b1f6b9c880255300320e9656ec611fba"
},
"downloads": -1,
"filename": "pit30m-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bad99c9bbf70ac7395231de3e8647ac5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<4.0",
"size": 198627,
"upload_time": "2023-06-16T02:27:15",
"upload_time_iso_8601": "2023-06-16T02:27:15.715615Z",
"url": "https://files.pythonhosted.org/packages/a4/67/2f8c5e9fd3fb7c780b087416095371f24bfa6a0b4b52b50dd21360e3a140/pit30m-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9a3547a7e7920bb5f285ae0f5b018a9a4752a93aecaad782e24f3868df164514",
"md5": "50302c36a7cf85248fce5e41aea6b238",
"sha256": "e47a378a9b507297a44d20e30bcd79535f541438b425e6a6314bf5a58233ff10"
},
"downloads": -1,
"filename": "pit30m-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "50302c36a7cf85248fce5e41aea6b238",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<4.0",
"size": 201012,
"upload_time": "2023-06-16T02:27:17",
"upload_time_iso_8601": "2023-06-16T02:27:17.334581Z",
"url": "https://files.pythonhosted.org/packages/9a/35/47a7e7920bb5f285ae0f5b018a9a4752a93aecaad782e24f3868df164514/pit30m-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-16 02:27:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "pit30m",
"github_project": "pit30m",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pit30m"
}