pit30m


Namepit30m JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://github.com/pit30m/pit30m
SummaryDevelopment kit for the Pit30M large scale localization dataset
upload_time2023-06-16 02:27:17
maintainer
docs_urlNone
authorAndrei Bârsan
requires_python>=3.9,<4.0
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Pit30M Development Kit

[![Python CI Status](https://github.com/pit30m/pit30m/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/pit30m/pit30m/actions/workflows/ci.yaml)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](./LICENSE)
[![PyPI](https://img.shields.io/pypi/v/pit30m)](https://pypi.org/project/pit30m/)
[![Public on the AWS Open Data Registry](https://shields.io/badge/Open%20Data%20Registry-public-green?logo=amazonaws&style=flat)](#)

## Overview
This is the Python software development kit for the Pit30M benchmark for large-scale global localization. The devkit is currently in a pre-release state and many features are coming soon!

Consider checking out [the original paper](https://arxiv.org/abs/2012.12437). If you would like to, you could also follow some of the authors' social media, e.g., [Julieta's](https://twitter.com/yoknapathawa) or [Andrei's](https://twitter.com/andreib) in order to be among the first to hear of any updates!

## Getting Started

The recommended way to interact with the dataset is with the `pip` package, which you can install with:

`pip install pit30m`

The devkit lets you efficiently access data on the fly. Here is a "hello world" command which renders a demo video from a random log segment. Note that it assumes `ffmpeg` is installed:

`python -m pit30m.cli multicam_demo --out-dir .`

To preview data more interactively, check out the [tutorial notebook](examples/tutorial_00_introduction.ipynb).
[![Open In Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/pit30m/pit30m/blob/main/examples/tutorial_00_introduction.ipynb)

More tutorials coming soon.

### Torch Data Loading

We provide basic log-based PyTorch dataloaders. Visual-localization-specific ones are coming soon. To see an
example on how to use one of these dataloaders, have a look at `demo_dataloader` in `torch/dataset.py`.

An example command:

```
python -m pit30m.torch.dataset --root-uri s3://pit30m/ --logs 00682fa6-2183-4a0d-dcfe-bc38c448090f,021286dc-5fe5-445f-e5fa-f875f2eb3c57,1c915eda-c18a-46d5-e1ec-e4f624605ff0 --num-workers 16 --batch-size 64
```

## Features

 * Framework-agnostic multiprocessing-safe log reader objects
 * PyTorch dataloaders

### In-progress
 * More lightweight package with fewer dependencies.
 * Very efficient native S3 support through [AWS-authored PyTorch-optimized S3 DataPipes](https://aws.amazon.com/blogs/machine-learning/announcing-the-amazon-s3-plugin-for-pytorch/).
 * Support for non-S3 data sources, for users who wish to copy the dataset, or parts of it, to their own storage.
 * Tons of examples and tutorials. See `examples/` for more information.


## Development

Package development, testing, and releasing is performed with `poetry`. If you just want to use the `pit30m` package, you don't need to care about this section; just have a look at "Getting Started" above!

 1. [Install poetry](https://python-poetry.org/docs/)
 2. Setup/update your `dev` virtual environments with `poetry install --with=dev` in the project root
    - If you encounter strange keyring/credential errors, you may need `PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring poetry install --with=dev`
 3. Develop away
    - run commands like `poetry run python -m pit30m.cli`
 4. Test with `poetry run pytest`
    - Advanced command: `poetry run pytest -ff --new-first --quiet --color=yes --maxfail=3 -n 4`
    - This command will run tests, then wait for new changes and test them automatically. Test execution will run in parallel thanks to the `-n 4` argument.
    - The command lets you get feedback on whether your code change fixed or broke a particular test within seconds.
 5. Make sure you lint. Refer to `ci.yaml` for exact commands we run in CI.
   - format with black: `poetry run black .`
   - type check with mypy: `poetry run mypy . --ignore-missing-imports`
 6. Remember to run `poetry install` after pulling and/or updating dependencies.


Note that in the pre-release time, `torch` will be a "dev" dependency, since it's necessary for all tests to pass.

### Publishing

#### First Time
 1. [Configure poetry](https://www.digitalocean.com/community/tutorials/how-to-publish-python-packages-to-pypi-using-poetry-on-ubuntu-22-04) with a PyPI account which has access to edit the package. You need to make sure poetry is configured with your API key.

#### New Release
 1. Decide on the commit with the desired changes.
 2. Bump the version number in `pyproject.toml`.
 3. `git tag vA.B.C` and `git push origin <tag>`
 4. `poetry publish --build`
 5. Create the release on GitHub.


## Citation

```bibtex
@inproceedings{martinez2020pit30m,
  title={Pit30m: A benchmark for global localization in the age of self-driving cars},
  author={Martinez, Julieta and Doubov, Sasha and Fan, Jack and B{\^a}rsan, Ioan Andrei and Wang, Shenlong and M{\'a}ttyus, Gell{\'e}rt and Urtasun, Raquel},
  booktitle={{IROS}},
  pages={4477--4484},
  year={2020},
  organization={IEEE}
}
```

## Additional Details

### Images

#### Compression
The images in the dataset are stored using lossy WebP compression at quality level 85. We picked this as a sweet spot between space- and network-bandwidth-saving (about 10x smaller than equivalent PNGs) and maintaining very good image quality for tasks such as SLAM, 3D reconstruction, and visual localization. The images were saved using `Pillow 9.2.0`.

The `s3://pit30m/raw` prefix contains lossless image data for a small subset of the logs present in the bucket root. This can be used as a reference by those curious in understanding which artifacts are induced by the lossy compression, and which are inherent in the raw data.

#### Known Issues

A fraction of the images in the dataset exhibit artifacts such as a strong purple tint or missing data (white images). An even smaller fraction of these purple images sometimes shows strong blocky compression artifacts. These represent a known (and, at this scale, difficult to avoid) problem; it was already present in the original raw logs from which we generated the public facing benchmark. Perfectly blank images can be detected quite reliably in a data loader or ETL script by checking whether `np.mean(img) > 250`.

On example of a log with many blank (whiteout) images is `8438b1ba-44e2-4456-f83b-207351a99865`.

### Ground Truth Sanitization

Poses belonging to test-query are not available and have been removed with zeroes / NaNs / blank submap IDs in the corresponding pose files and indexes. The long term plan is to use this held-out test query ground truth for a public leaderboard. More information will come in the second half of 2023. In the meantime, there should be a ton of data to iterate on using the publicly available train and val splits.
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/pit30m/pit30m",
    "name": "pit30m",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Andrei B\u00e2rsan",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/9a/35/47a7e7920bb5f285ae0f5b018a9a4752a93aecaad782e24f3868df164514/pit30m-0.0.2.tar.gz",
    "platform": null,
    "description": "# Pit30M Development Kit\n\n[![Python CI Status](https://github.com/pit30m/pit30m/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/pit30m/pit30m/actions/workflows/ci.yaml)\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](./LICENSE)\n[![PyPI](https://img.shields.io/pypi/v/pit30m)](https://pypi.org/project/pit30m/)\n[![Public on the AWS Open Data Registry](https://shields.io/badge/Open%20Data%20Registry-public-green?logo=amazonaws&style=flat)](#)\n\n## Overview\nThis is the Python software development kit for the Pit30M benchmark for large-scale global localization. The devkit is currently in a pre-release state and many features are coming soon!\n\nConsider checking out [the original paper](https://arxiv.org/abs/2012.12437). If you would like to, you could also follow some of the authors' social media, e.g., [Julieta's](https://twitter.com/yoknapathawa) or [Andrei's](https://twitter.com/andreib) in order to be among the first to hear of any updates!\n\n## Getting Started\n\nThe recommended way to interact with the dataset is with the `pip` package, which you can install with:\n\n`pip install pit30m`\n\nThe devkit lets you efficiently access data on the fly. Here is a \"hello world\" command which renders a demo video from a random log segment. Note that it assumes `ffmpeg` is installed:\n\n`python -m pit30m.cli multicam_demo --out-dir .`\n\nTo preview data more interactively, check out the [tutorial notebook](examples/tutorial_00_introduction.ipynb).\n[![Open In Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/pit30m/pit30m/blob/main/examples/tutorial_00_introduction.ipynb)\n\nMore tutorials coming soon.\n\n### Torch Data Loading\n\nWe provide basic log-based PyTorch dataloaders. Visual-localization-specific ones are coming soon. To see an\nexample on how to use one of these dataloaders, have a look at `demo_dataloader` in `torch/dataset.py`.\n\nAn example command:\n\n```\npython -m pit30m.torch.dataset --root-uri s3://pit30m/ --logs 00682fa6-2183-4a0d-dcfe-bc38c448090f,021286dc-5fe5-445f-e5fa-f875f2eb3c57,1c915eda-c18a-46d5-e1ec-e4f624605ff0 --num-workers 16 --batch-size 64\n```\n\n## Features\n\n * Framework-agnostic multiprocessing-safe log reader objects\n * PyTorch dataloaders\n\n### In-progress\n * More lightweight package with fewer dependencies.\n * Very efficient native S3 support through [AWS-authored PyTorch-optimized S3 DataPipes](https://aws.amazon.com/blogs/machine-learning/announcing-the-amazon-s3-plugin-for-pytorch/).\n * Support for non-S3 data sources, for users who wish to copy the dataset, or parts of it, to their own storage.\n * Tons of examples and tutorials. See `examples/` for more information.\n\n\n## Development\n\nPackage development, testing, and releasing is performed with `poetry`. If you just want to use the `pit30m` package, you don't need to care about this section; just have a look at \"Getting Started\" above!\n\n 1. [Install poetry](https://python-poetry.org/docs/)\n 2. Setup/update your `dev` virtual environments with `poetry install --with=dev` in the project root\n    - If you encounter strange keyring/credential errors, you may need `PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring poetry install --with=dev`\n 3. Develop away\n    - run commands like `poetry run python -m pit30m.cli`\n 4. Test with `poetry run pytest`\n    - Advanced command: `poetry run pytest -ff --new-first --quiet --color=yes --maxfail=3 -n 4`\n    - This command will run tests, then wait for new changes and test them automatically. Test execution will run in parallel thanks to the `-n 4` argument.\n    - The command lets you get feedback on whether your code change fixed or broke a particular test within seconds.\n 5. Make sure you lint. Refer to `ci.yaml` for exact commands we run in CI.\n   - format with black: `poetry run black .`\n   - type check with mypy: `poetry run mypy . --ignore-missing-imports`\n 6. Remember to run `poetry install` after pulling and/or updating dependencies.\n\n\nNote that in the pre-release time, `torch` will be a \"dev\" dependency, since it's necessary for all tests to pass.\n\n### Publishing\n\n#### First Time\n 1. [Configure poetry](https://www.digitalocean.com/community/tutorials/how-to-publish-python-packages-to-pypi-using-poetry-on-ubuntu-22-04) with a PyPI account which has access to edit the package. You need to make sure poetry is configured with your API key.\n\n#### New Release\n 1. Decide on the commit with the desired changes.\n 2. Bump the version number in `pyproject.toml`.\n 3. `git tag vA.B.C` and `git push origin <tag>`\n 4. `poetry publish --build`\n 5. Create the release on GitHub.\n\n\n## Citation\n\n```bibtex\n@inproceedings{martinez2020pit30m,\n  title={Pit30m: A benchmark for global localization in the age of self-driving cars},\n  author={Martinez, Julieta and Doubov, Sasha and Fan, Jack and B{\\^a}rsan, Ioan Andrei and Wang, Shenlong and M{\\'a}ttyus, Gell{\\'e}rt and Urtasun, Raquel},\n  booktitle={{IROS}},\n  pages={4477--4484},\n  year={2020},\n  organization={IEEE}\n}\n```\n\n## Additional Details\n\n### Images\n\n#### Compression\nThe images in the dataset are stored using lossy WebP compression at quality level 85. We picked this as a sweet spot between space- and network-bandwidth-saving (about 10x smaller than equivalent PNGs) and maintaining very good image quality for tasks such as SLAM, 3D reconstruction, and visual localization. The images were saved using `Pillow 9.2.0`.\n\nThe `s3://pit30m/raw` prefix contains lossless image data for a small subset of the logs present in the bucket root. This can be used as a reference by those curious in understanding which artifacts are induced by the lossy compression, and which are inherent in the raw data.\n\n#### Known Issues\n\nA fraction of the images in the dataset exhibit artifacts such as a strong purple tint or missing data (white images). An even smaller fraction of these purple images sometimes shows strong blocky compression artifacts. These represent a known (and, at this scale, difficult to avoid) problem; it was already present in the original raw logs from which we generated the public facing benchmark. Perfectly blank images can be detected quite reliably in a data loader or ETL script by checking whether `np.mean(img) > 250`.\n\nOn example of a log with many blank (whiteout) images is `8438b1ba-44e2-4456-f83b-207351a99865`.\n\n### Ground Truth Sanitization\n\nPoses belonging to test-query are not available and have been removed with zeroes / NaNs / blank submap IDs in the corresponding pose files and indexes. The long term plan is to use this held-out test query ground truth for a public leaderboard. More information will come in the second half of 2023. In the meantime, there should be a ton of data to iterate on using the publicly available train and val splits.",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Development kit for the Pit30M large scale localization dataset",
    "version": "0.0.2",
    "project_urls": {
        "Homepage": "https://github.com/pit30m/pit30m"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a4672f8c5e9fd3fb7c780b087416095371f24bfa6a0b4b52b50dd21360e3a140",
                "md5": "bad99c9bbf70ac7395231de3e8647ac5",
                "sha256": "df2ca0b49a81b66a325dda8b22003872b1f6b9c880255300320e9656ec611fba"
            },
            "downloads": -1,
            "filename": "pit30m-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bad99c9bbf70ac7395231de3e8647ac5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 198627,
            "upload_time": "2023-06-16T02:27:15",
            "upload_time_iso_8601": "2023-06-16T02:27:15.715615Z",
            "url": "https://files.pythonhosted.org/packages/a4/67/2f8c5e9fd3fb7c780b087416095371f24bfa6a0b4b52b50dd21360e3a140/pit30m-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9a3547a7e7920bb5f285ae0f5b018a9a4752a93aecaad782e24f3868df164514",
                "md5": "50302c36a7cf85248fce5e41aea6b238",
                "sha256": "e47a378a9b507297a44d20e30bcd79535f541438b425e6a6314bf5a58233ff10"
            },
            "downloads": -1,
            "filename": "pit30m-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "50302c36a7cf85248fce5e41aea6b238",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 201012,
            "upload_time": "2023-06-16T02:27:17",
            "upload_time_iso_8601": "2023-06-16T02:27:17.334581Z",
            "url": "https://files.pythonhosted.org/packages/9a/35/47a7e7920bb5f285ae0f5b018a9a4752a93aecaad782e24f3868df164514/pit30m-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-16 02:27:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pit30m",
    "github_project": "pit30m",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pit30m"
}
        
Elapsed time: 0.08988s