mbari-aidata


Namembari-aidata JSON
Version 1.56.1 PyPI version JSON
download
home_pageNone
SummaryCommand line tool to do extract, transform, load and download operations on AI data for a number of projects at MBARI that require detection, clustering or classification workflows.
upload_time2025-08-10 02:12:55
maintainerNone
docs_urlNone
authorDanelle Cline
requires_python<3.12,>=3.10
licenseApache
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![MBARI](https://www.mbari.org/wp-content/uploads/2014/11/logo-mbari-3b.png)](http://www.mbari.org)
[![semantic-release](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079.svg)](https://github.com/semantic-release/semantic-release)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Python](https://img.shields.io/badge/language-Python-blue.svg)](https://www.python.org/downloads/)

*mbari-aidata* is a command line tool to do extract, transform, load and download operations
on AI data for a number of projects at MBARI that require detection, clustering or classification
workflows.

More documentation and examples are available at [https://docs.mbari.org/internal/ai/data](https://docs.mbari.org/internal/ai/data/).
 
## 🚀 Features
* 🧠 Object Detection/Clustering Integration: Loads detection/classification/clustering output from SDCAT formatted results.
* Flexible Data Export: Downloads from Tator into machine learning formats like COCO, CIFAR, or PASCAL VOC.
* Real-Time Uploads: Pushes localizations to [Tator](https://www.tator.io/) via [Redis](https://redis.io/glossary/redis-queue/) queues for real-time workflows.
* Metadata Extraction: Parses images metadata such as GPS/time/date through a plugin-based system (extractors).
* Duplicate Detection & flexible media references: Supports duplicate media load checks with the --check-duplicates flag. 
* Images or video are made accessible through a web server without needing to upload or move them from your internal NFS project mounts (e.g. Thalassa)
* Augmentation Support: Augment VOC datasets with [Albumentations](https://albumentations.ai/) to boost your object detection model performance. See examples in the [docs](https://docs.mbari.org/internal/ai/data/commands/transform/?h=aug#transform-voc-to-yolo-with-augmentations).

## Requirements
- Python 3.10 or higher
- A Tator API token and (optional) Redis password for the .env file. Contact the MBARI AI team for access.
- 🐳Docker for development and testing only, but it can also be used instead of a local Python installation.
- For local installation, you will need to install the required Python packages listed in the `requirements.txt` file, [ffmpeg](https://ffmpeg.org/), and the mp4dump tool from https://www.bento4.com/

## 📦 Installation 
Install as a Python package:

```shell
pip install mbari-aidata
```
 
Create the .env file with the following contents in the root directory of the project:

```text
TATOR_TOKEN=your_api_token
REDIS_PASSWORD=your_redis_password
ENVIRONMENT=testing or production
```

Create a configuration file in the root directory of the project:
```bash
touch config_cfe.yaml
```
Or, use the project specific configuration from our docs server at
https://docs.mbari.org/internal/ai/projects/


This file will be used to configure the project data, such as mounts, plugins, and database connections.
```bash
aidata download --version Baseline --labels "Diatoms, Copepods" --config https://docs.mbari.org/internal/ai/projects/uav-901902/config_uav.yml
```

⚙️Example configuration file:
```yaml
# config_cfe.yml
# Config file for CFE project production
mounts:
  - name: "image"
    path: "/mnt/CFElab"
    host: "https://mantis.shore.mbari.org"
    nginx_root: "/CFElab"

  - name: "video"
    path: "/mnt/CFElab"
    host: "https://mantis.shore.mbari.org"
    nginx_root: "/CFElab"


plugins:
  - name: "extractor"
    module: "mbari_aidata.plugins.extractors.tap_cfe_media"
    function: "extract_media"

redis:
  host: "doris.shore.mbari.org"
  port: 6382

vss:
  project: "902111-CFE"
  model: "google/vit-base-patch16-224"

tator:
  project: "902111-CFE"
  host: "https://mantis.shore.mbari.org"
  image:
    attributes:
      iso_datetime: #<-------Required for images
        type: datetime
      depth:
        type: float
  video:
    attributes:
      iso_start_datetime:  #<-------Required for videos
        type: datetime
  box:
    attributes:
      Label:
        type: string
      score:
        type: float
      cluster:
        type: string
      saliency:
        type: float
      area:
        type: int
      exemplar:
        type: bool
```

## 🐳 Docker usage
A docker version is also available at `mbari/aidata:latest` or `mbari/aidata:latest:cuda-124`.
For example, to download data using the docker image:

```shell
docker run -it --rm -v $(pwd):/mnt mbari/aidata:latest aidata download --version Baseline --labels "Diatoms, Copepods" --config config_cfe.yml
```

## Commands

* `aidata download --help` -  Download data, such as images, boxes, into various formats for machine learning e.g. COCO, CIFAR, or PASCAL VOC format. Augmentation supported for VOC exported data using Albumentations.
* `aidata load --help` -  Load data, such as images, boxes, or clusters into either a Postgres or REDIS database
* `aidata db --help` -  Commands related to database management
* `aidata transform --help` - Commands related to transforming downloaded data
* `aidata  -h` - Print help message and exit.
 
Source code is available at [github.com/mbari-org/aidata](https://github.com/mbari-org/aidata/). 

## Development
See the [Development Guide](https://github.com/mbari-org/aidata/blob/main/DEVELOPMENT.md) for more information on how to set up the development environment or the [justfile](justfile)  
 
🗓️ Last updated: 2025-06-13
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "mbari-aidata",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Danelle Cline",
    "author_email": "dcline@mbari.org",
    "download_url": "https://files.pythonhosted.org/packages/56/b2/5f69d787459f6e3e5cf5149c4c52e7631fc4e34df1112aa093258b0d6a2d/mbari_aidata-1.56.1.tar.gz",
    "platform": null,
    "description": "[![MBARI](https://www.mbari.org/wp-content/uploads/2014/11/logo-mbari-3b.png)](http://www.mbari.org)\n[![semantic-release](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079.svg)](https://github.com/semantic-release/semantic-release)\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Python](https://img.shields.io/badge/language-Python-blue.svg)](https://www.python.org/downloads/)\n\n*mbari-aidata* is a command line tool to do extract, transform, load and download operations\non AI data for a number of projects at MBARI that require detection, clustering or classification\nworkflows.\n\nMore documentation and examples are available at [https://docs.mbari.org/internal/ai/data](https://docs.mbari.org/internal/ai/data/).\n \n## \ud83d\ude80 Features\n* \ud83e\udde0 Object Detection/Clustering Integration: Loads detection/classification/clustering output from SDCAT formatted results.\n* Flexible Data Export: Downloads from Tator into machine learning formats like COCO, CIFAR, or PASCAL VOC.\n* Real-Time Uploads: Pushes localizations to [Tator](https://www.tator.io/) via [Redis](https://redis.io/glossary/redis-queue/) queues for real-time workflows.\n* Metadata Extraction: Parses images metadata such as GPS/time/date through a plugin-based system (extractors).\n* Duplicate Detection & flexible media references: Supports duplicate media load checks with the --check-duplicates flag. \n* Images or video are made accessible through a web server without needing to upload or move them from your internal NFS project mounts (e.g. Thalassa)\n* Augmentation Support: Augment VOC datasets with [Albumentations](https://albumentations.ai/) to boost your object detection model performance. See examples in the [docs](https://docs.mbari.org/internal/ai/data/commands/transform/?h=aug#transform-voc-to-yolo-with-augmentations).\n\n## Requirements\n- Python 3.10 or higher\n- A Tator API token and (optional) Redis password for the .env file. Contact the MBARI AI team for access.\n- \ud83d\udc33Docker for development and testing only, but it can also be used instead of a local Python installation.\n- For local installation, you will need to install the required Python packages listed in the `requirements.txt` file, [ffmpeg](https://ffmpeg.org/), and the mp4dump tool from https://www.bento4.com/\n\n## \ud83d\udce6 Installation \nInstall as a Python package:\n\n```shell\npip install mbari-aidata\n```\n \nCreate the .env file with the following contents in the root directory of the project:\n\n```text\nTATOR_TOKEN=your_api_token\nREDIS_PASSWORD=your_redis_password\nENVIRONMENT=testing or production\n```\n\nCreate a configuration file in the root directory of the project:\n```bash\ntouch config_cfe.yaml\n```\nOr, use the project specific configuration from our docs server at\nhttps://docs.mbari.org/internal/ai/projects/\n\n\nThis file will be used to configure the project data, such as mounts, plugins, and database connections.\n```bash\naidata download --version Baseline --labels \"Diatoms, Copepods\" --config https://docs.mbari.org/internal/ai/projects/uav-901902/config_uav.yml\n```\n\n\u2699\ufe0fExample configuration file:\n```yaml\n# config_cfe.yml\n# Config file for CFE project production\nmounts:\n  - name: \"image\"\n    path: \"/mnt/CFElab\"\n    host: \"https://mantis.shore.mbari.org\"\n    nginx_root: \"/CFElab\"\n\n  - name: \"video\"\n    path: \"/mnt/CFElab\"\n    host: \"https://mantis.shore.mbari.org\"\n    nginx_root: \"/CFElab\"\n\n\nplugins:\n  - name: \"extractor\"\n    module: \"mbari_aidata.plugins.extractors.tap_cfe_media\"\n    function: \"extract_media\"\n\nredis:\n  host: \"doris.shore.mbari.org\"\n  port: 6382\n\nvss:\n  project: \"902111-CFE\"\n  model: \"google/vit-base-patch16-224\"\n\ntator:\n  project: \"902111-CFE\"\n  host: \"https://mantis.shore.mbari.org\"\n  image:\n    attributes:\n      iso_datetime: #<-------Required for images\n        type: datetime\n      depth:\n        type: float\n  video:\n    attributes:\n      iso_start_datetime:  #<-------Required for videos\n        type: datetime\n  box:\n    attributes:\n      Label:\n        type: string\n      score:\n        type: float\n      cluster:\n        type: string\n      saliency:\n        type: float\n      area:\n        type: int\n      exemplar:\n        type: bool\n```\n\n## \ud83d\udc33 Docker usage\nA docker version is also available at `mbari/aidata:latest` or `mbari/aidata:latest:cuda-124`.\nFor example, to download data using the docker image:\n\n```shell\ndocker run -it --rm -v $(pwd):/mnt mbari/aidata:latest aidata download --version Baseline --labels \"Diatoms, Copepods\" --config config_cfe.yml\n```\n\n## Commands\n\n* `aidata download --help` -  Download data, such as images, boxes, into various formats for machine learning e.g. COCO, CIFAR, or PASCAL VOC format. Augmentation supported for VOC exported data using Albumentations.\n* `aidata load --help` -  Load data, such as images, boxes, or clusters into either a Postgres or REDIS database\n* `aidata db --help` -  Commands related to database management\n* `aidata transform --help` - Commands related to transforming downloaded data\n* `aidata  -h` - Print help message and exit.\n \nSource code is available at [github.com/mbari-org/aidata](https://github.com/mbari-org/aidata/). \n\n## Development\nSee the [Development Guide](https://github.com/mbari-org/aidata/blob/main/DEVELOPMENT.md) for more information on how to set up the development environment or the [justfile](justfile)  \n \n\ud83d\uddd3\ufe0f Last updated: 2025-06-13",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "Command line tool to do extract, transform, load and download operations on AI data for a number of projects at MBARI that require detection, clustering or classification workflows.",
    "version": "1.56.1",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "001b126aee83b297ce4b869c2b1afcb14dfea5d4ca5a692584d9b94b11632bb1",
                "md5": "403d595d4df79b4f7a0d6a56d3f869f6",
                "sha256": "75b803eb7436af50fd9904f372ec66a86a9fc5cfd0c1d3afbd7e725a56cb490f"
            },
            "downloads": -1,
            "filename": "mbari_aidata-1.56.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "403d595d4df79b4f7a0d6a56d3f869f6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.10",
            "size": 64562,
            "upload_time": "2025-08-10T02:12:53",
            "upload_time_iso_8601": "2025-08-10T02:12:53.922190Z",
            "url": "https://files.pythonhosted.org/packages/00/1b/126aee83b297ce4b869c2b1afcb14dfea5d4ca5a692584d9b94b11632bb1/mbari_aidata-1.56.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "56b25f69d787459f6e3e5cf5149c4c52e7631fc4e34df1112aa093258b0d6a2d",
                "md5": "47e3ff8fc0e49cc559739dc9c77453dc",
                "sha256": "fd4bc1e230015ddf81a133b2c3e6f0db40414eee58c8bf2aec00ef2b6ce1e334"
            },
            "downloads": -1,
            "filename": "mbari_aidata-1.56.1.tar.gz",
            "has_sig": false,
            "md5_digest": "47e3ff8fc0e49cc559739dc9c77453dc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.10",
            "size": 47057,
            "upload_time": "2025-08-10T02:12:55",
            "upload_time_iso_8601": "2025-08-10T02:12:55.306885Z",
            "url": "https://files.pythonhosted.org/packages/56/b2/5f69d787459f6e3e5cf5149c4c52e7631fc4e34df1112aa093258b0d6a2d/mbari_aidata-1.56.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-10 02:12:55",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "mbari-aidata"
}
        
Elapsed time: 0.89001s