hirundo


Namehirundo JSON
Version 0.1.8 PyPI version JSON
download
home_pageNone
SummaryThis package is used to interface with Hirundo's platform. It provides a simple API to optimize your ML datasets.
upload_time2024-09-18 14:14:41
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT License Copyright (c) 2024, Hirundo Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords dataset machine learning data science data engineering
VCS
bugtrack_url
requirements annotated-types anyio backports-tarfile certifi charset-normalizer click docutils exceptiongroup h11 httpcore httpx httpx-sse idna importlib-metadata jaraco-classes jaraco-context jaraco-functools keyring markdown-it-py mdurl more-itertools nh3 numpy pandas pkginfo pydantic pydantic-core pygments python-dateutil python-dotenv pytz pyyaml readme-renderer requests requests-toolbelt rfc3986 rich shellingham six sniffio stamina tenacity tqdm twine typer types-pyyaml types-requests typing-extensions tzdata urllib3 zipp
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Hirundo

This package exposes access to Hirundo APIs for dataset optimization for Machine Learning.

Dataset optimization is currently available for datasets labelled for classification and object detection.


Support dataset storage integrations include:
   - Google Cloud (GCP) Storage
   - Amazon Web Services (AWS) S3
   - Git LFS (Large File Storage) repositories (e.g. GitHub or HuggingFace)

Optimizing a classification dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Currently ``hirundo`` requires a CSV file with the following columns (all columns are required):
   - ``image_path``: The location of the image within the dataset ``root``
   - ``label``: The label of the image, i.e. which the class that was annotated for this image

And outputs a CSV with the same columns and:
   - ``suspect_level``: mislabel suspect level
   - ``suggested_label``: suggested label
   - ``suggested_label_conf``: suggested label confidence

Optimizing an object detection (OD) dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Currently ``hirundo`` requires a CSV file with the following columns (all columns are required):
   - ``image_path``: The location of the image within the dataset ``root``
   - ``bbox_id``: The index of the bounding box within the dataset. Used to indicate label suspects
   - ``label``: The label of the image, i.e. which the class that was annotated for this image
   - ``x1``, ``y1``, ``x2``, ``y2``: The bounding box coordinates of the object within the image

And outputs a CSV with the same columns and:
   - ``suspect_level``: object mislabel suspect level
   - ``suggested_label``: suggested object label
   - ``suggested_label_conf``: suggested object label confidence

Note: This Python package must be used alongside a Hirundo server, either the SaaS platform, a custom VPC deployment or an on-premises installation.


## Installation

You can install the codebase with a simple `pip install hirundo` to install the latest version of this package. If you prefer to install from the Git repository and/or need a specific version or branch, you can simply clone the repository, check out the relevant commit and then run `pip install .` to install that version. A full list of dependencies can be found in `requirements.txt`, but these will be installed automatically by either of these commands.

## Usage

Classification example:
```
from hirundo.dataset_optimization import OptimizationDataset
from hirundo.enum import LabellingType
from hirundo.storage import StorageIntegration, StorageLink, StorageTypes

test_dataset = OptimizationDataset(
    name="TEST-GCP cifar 100 classification dataset",
    labelling_type=LabellingType.SingleLabelClassification,
    dataset_storage=StorageLink(
        storage_integration=StorageIntegration(
            name="cifar100bucket",
            type=StorageTypes.GCP,
            gcp=StorageGCP(
                bucket_name="cifar100bucket",
                project="Hirundo-global",
                credentials_json=json.loads(os.environ["GCP_CREDENTIALS"]),
            ),
        ),
        path="/pytorch-cifar/data",
    ),
    dataset_metadata_path="cifar100.csv",
    classes=cifar100_classes,
)

test_dataset.run_optimization()
results = test_dataset.check_run()
print(results)
```


Object detection example:

```
from hirundo.dataset_optimization import OptimizationDataset
from hirundo.enum import LabellingType
from hirundo.storage import StorageIntegration, StorageLink, StorageTypes

test_dataset = OptimizationDataset(
    name=f"TEST-HuggingFace-BDD-100k-validation-OD-validation-dataset{unique_id}",
    labelling_type=LabellingType.ObjectDetection,
    dataset_storage=StorageLink(
        storage_integration=StorageIntegration(
            name=f"BDD-100k-validation-dataset{unique_id}",
            type=StorageTypes.GIT,
            git=StorageGit(
                repo=GitRepo(
                    name=f"BDD-100k-validation-dataset{unique_id}",
                    repository_url="https://git@hf.co/datasets/hirundo-io/bdd100k-validation-only",
                ),
                branch="main",
            ),
        ),
        path="/BDD100K Val from Hirundo.zip/bdd100k",
    ),
    dataset_metadata_path="bdd100k.csv",
)

test_dataset.run_optimization()
results = test_dataset.check_run()
print(results)
```

Note: Currently we only support the main CPython release 3.9, 3.10 and 3.11. PyPy support may be introduced in the future.

## Further documentation

To learn about mroe how to use this library, please visit the [http://docs.hirundo.io/](documentation) or see the Google Colab examples.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "hirundo",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "dataset, machine learning, data science, data engineering",
    "author": null,
    "author_email": "Hirundo <dev@hirundo.io>",
    "download_url": "https://files.pythonhosted.org/packages/09/89/3b1d1a2b290400688a498753402fe8f80d0adf96b6c6e615542e3f04c47f/hirundo-0.1.8.tar.gz",
    "platform": null,
    "description": "# Hirundo\n\nThis package exposes access to Hirundo APIs for dataset optimization for Machine Learning.\n\nDataset optimization is currently available for datasets labelled for classification and object detection.\n\n\nSupport dataset storage integrations include:\n   - Google Cloud (GCP) Storage\n   - Amazon Web Services (AWS) S3\n   - Git LFS (Large File Storage) repositories (e.g. GitHub or HuggingFace)\n\nOptimizing a classification dataset\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCurrently ``hirundo`` requires a CSV file with the following columns (all columns are required):\n   - ``image_path``: The location of the image within the dataset ``root``\n   - ``label``: The label of the image, i.e. which the class that was annotated for this image\n\nAnd outputs a CSV with the same columns and:\n   - ``suspect_level``: mislabel suspect level\n   - ``suggested_label``: suggested label\n   - ``suggested_label_conf``: suggested label confidence\n\nOptimizing an object detection (OD) dataset\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCurrently ``hirundo`` requires a CSV file with the following columns (all columns are required):\n   - ``image_path``: The location of the image within the dataset ``root``\n   - ``bbox_id``: The index of the bounding box within the dataset. Used to indicate label suspects\n   - ``label``: The label of the image, i.e. which the class that was annotated for this image\n   - ``x1``, ``y1``, ``x2``, ``y2``: The bounding box coordinates of the object within the image\n\nAnd outputs a CSV with the same columns and:\n   - ``suspect_level``: object mislabel suspect level\n   - ``suggested_label``: suggested object label\n   - ``suggested_label_conf``: suggested object label confidence\n\nNote: This Python package must be used alongside a Hirundo server, either the SaaS platform, a custom VPC deployment or an on-premises installation.\n\n\n## Installation\n\nYou can install the codebase with a simple `pip install hirundo` to install the latest version of this package. If you prefer to install from the Git repository and/or need a specific version or branch, you can simply clone the repository, check out the relevant commit and then run `pip install .` to install that version. A full list of dependencies can be found in `requirements.txt`, but these will be installed automatically by either of these commands.\n\n## Usage\n\nClassification example:\n```\nfrom hirundo.dataset_optimization import OptimizationDataset\nfrom hirundo.enum import LabellingType\nfrom hirundo.storage import StorageIntegration, StorageLink, StorageTypes\n\ntest_dataset = OptimizationDataset(\n    name=\"TEST-GCP cifar 100 classification dataset\",\n    labelling_type=LabellingType.SingleLabelClassification,\n    dataset_storage=StorageLink(\n        storage_integration=StorageIntegration(\n            name=\"cifar100bucket\",\n            type=StorageTypes.GCP,\n            gcp=StorageGCP(\n                bucket_name=\"cifar100bucket\",\n                project=\"Hirundo-global\",\n                credentials_json=json.loads(os.environ[\"GCP_CREDENTIALS\"]),\n            ),\n        ),\n        path=\"/pytorch-cifar/data\",\n    ),\n    dataset_metadata_path=\"cifar100.csv\",\n    classes=cifar100_classes,\n)\n\ntest_dataset.run_optimization()\nresults = test_dataset.check_run()\nprint(results)\n```\n\n\nObject detection example:\n\n```\nfrom hirundo.dataset_optimization import OptimizationDataset\nfrom hirundo.enum import LabellingType\nfrom hirundo.storage import StorageIntegration, StorageLink, StorageTypes\n\ntest_dataset = OptimizationDataset(\n    name=f\"TEST-HuggingFace-BDD-100k-validation-OD-validation-dataset{unique_id}\",\n    labelling_type=LabellingType.ObjectDetection,\n    dataset_storage=StorageLink(\n        storage_integration=StorageIntegration(\n            name=f\"BDD-100k-validation-dataset{unique_id}\",\n            type=StorageTypes.GIT,\n            git=StorageGit(\n                repo=GitRepo(\n                    name=f\"BDD-100k-validation-dataset{unique_id}\",\n                    repository_url=\"https://git@hf.co/datasets/hirundo-io/bdd100k-validation-only\",\n                ),\n                branch=\"main\",\n            ),\n        ),\n        path=\"/BDD100K Val from Hirundo.zip/bdd100k\",\n    ),\n    dataset_metadata_path=\"bdd100k.csv\",\n)\n\ntest_dataset.run_optimization()\nresults = test_dataset.check_run()\nprint(results)\n```\n\nNote: Currently we only support the main CPython release 3.9, 3.10 and 3.11. PyPy support may be introduced in the future.\n\n## Further documentation\n\nTo learn about mroe how to use this library, please visit the [http://docs.hirundo.io/](documentation) or see the Google Colab examples.\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024, Hirundo  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "This package is used to interface with Hirundo's platform. It provides a simple API to optimize your ML datasets.",
    "version": "0.1.8",
    "project_urls": {
        "Homepage": "https://github.com/Hirundo-io/hirundo-client"
    },
    "split_keywords": [
        "dataset",
        " machine learning",
        " data science",
        " data engineering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fc23fbe6a19bf7d3ffc01e022480c395411732ee6c555dd064c0ed4783b7825f",
                "md5": "7e71afbe290fc6198353899452844664",
                "sha256": "1e7b454a41d1888f23c08ae4f46684eda4180e1d255b6f5b6efa46f7220219ed"
            },
            "downloads": -1,
            "filename": "hirundo-0.1.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7e71afbe290fc6198353899452844664",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 19973,
            "upload_time": "2024-09-18T14:14:39",
            "upload_time_iso_8601": "2024-09-18T14:14:39.716334Z",
            "url": "https://files.pythonhosted.org/packages/fc/23/fbe6a19bf7d3ffc01e022480c395411732ee6c555dd064c0ed4783b7825f/hirundo-0.1.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "09893b1d1a2b290400688a498753402fe8f80d0adf96b6c6e615542e3f04c47f",
                "md5": "2ac70a5b80fd75b60ea3954d104bbf87",
                "sha256": "dcc828dbc18b327f557a4a9975d3016106ceeb48cd65eaae239040c6d1466f0e"
            },
            "downloads": -1,
            "filename": "hirundo-0.1.8.tar.gz",
            "has_sig": false,
            "md5_digest": "2ac70a5b80fd75b60ea3954d104bbf87",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 19096,
            "upload_time": "2024-09-18T14:14:41",
            "upload_time_iso_8601": "2024-09-18T14:14:41.435211Z",
            "url": "https://files.pythonhosted.org/packages/09/89/3b1d1a2b290400688a498753402fe8f80d0adf96b6c6e615542e3f04c47f/hirundo-0.1.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-18 14:14:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Hirundo-io",
    "github_project": "hirundo-client",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "annotated-types",
            "specs": [
                [
                    "==",
                    "0.7.0"
                ]
            ]
        },
        {
            "name": "anyio",
            "specs": [
                [
                    "==",
                    "4.4.0"
                ]
            ]
        },
        {
            "name": "backports-tarfile",
            "specs": [
                [
                    "==",
                    "1.2.0"
                ]
            ]
        },
        {
            "name": "certifi",
            "specs": [
                [
                    "==",
                    "2024.7.4"
                ]
            ]
        },
        {
            "name": "charset-normalizer",
            "specs": [
                [
                    "==",
                    "3.3.2"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    "==",
                    "8.1.7"
                ]
            ]
        },
        {
            "name": "docutils",
            "specs": [
                [
                    "==",
                    "0.21.2"
                ]
            ]
        },
        {
            "name": "exceptiongroup",
            "specs": [
                [
                    "==",
                    "1.2.2"
                ]
            ]
        },
        {
            "name": "h11",
            "specs": [
                [
                    "==",
                    "0.14.0"
                ]
            ]
        },
        {
            "name": "httpcore",
            "specs": [
                [
                    "==",
                    "1.0.5"
                ]
            ]
        },
        {
            "name": "httpx",
            "specs": [
                [
                    "==",
                    "0.27.0"
                ]
            ]
        },
        {
            "name": "httpx-sse",
            "specs": [
                [
                    "==",
                    "0.4.0"
                ]
            ]
        },
        {
            "name": "idna",
            "specs": [
                [
                    "==",
                    "3.7"
                ]
            ]
        },
        {
            "name": "importlib-metadata",
            "specs": [
                [
                    "==",
                    "8.0.0"
                ]
            ]
        },
        {
            "name": "jaraco-classes",
            "specs": [
                [
                    "==",
                    "3.4.0"
                ]
            ]
        },
        {
            "name": "jaraco-context",
            "specs": [
                [
                    "==",
                    "5.3.0"
                ]
            ]
        },
        {
            "name": "jaraco-functools",
            "specs": [
                [
                    "==",
                    "4.0.1"
                ]
            ]
        },
        {
            "name": "keyring",
            "specs": [
                [
                    "==",
                    "25.2.1"
                ]
            ]
        },
        {
            "name": "markdown-it-py",
            "specs": [
                [
                    "==",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "mdurl",
            "specs": [
                [
                    "==",
                    "0.1.2"
                ]
            ]
        },
        {
            "name": "more-itertools",
            "specs": [
                [
                    "==",
                    "10.3.0"
                ]
            ]
        },
        {
            "name": "nh3",
            "specs": [
                [
                    "==",
                    "0.2.18"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "2.0.1"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.2"
                ]
            ]
        },
        {
            "name": "pkginfo",
            "specs": [
                [
                    "==",
                    "1.10.0"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    "==",
                    "2.8.2"
                ]
            ]
        },
        {
            "name": "pydantic-core",
            "specs": [
                [
                    "==",
                    "2.20.1"
                ]
            ]
        },
        {
            "name": "pygments",
            "specs": [
                [
                    "==",
                    "2.18.0"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.9.0.post0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    "==",
                    "1.0.1"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2024.1"
                ]
            ]
        },
        {
            "name": "pyyaml",
            "specs": [
                [
                    "==",
                    "6.0.1"
                ]
            ]
        },
        {
            "name": "readme-renderer",
            "specs": [
                [
                    "==",
                    "44.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.32.3"
                ]
            ]
        },
        {
            "name": "requests-toolbelt",
            "specs": [
                [
                    "==",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "rfc3986",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "rich",
            "specs": [
                [
                    "==",
                    "13.7.1"
                ]
            ]
        },
        {
            "name": "shellingham",
            "specs": [
                [
                    "==",
                    "1.5.4"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "sniffio",
            "specs": [
                [
                    "==",
                    "1.3.1"
                ]
            ]
        },
        {
            "name": "stamina",
            "specs": [
                [
                    "==",
                    "24.2.0"
                ]
            ]
        },
        {
            "name": "tenacity",
            "specs": [
                [
                    "==",
                    "8.5.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.66.5"
                ]
            ]
        },
        {
            "name": "twine",
            "specs": [
                [
                    "==",
                    "5.1.1"
                ]
            ]
        },
        {
            "name": "typer",
            "specs": [
                [
                    "==",
                    "0.12.3"
                ]
            ]
        },
        {
            "name": "types-pyyaml",
            "specs": [
                [
                    "==",
                    "6.0.12.20240311"
                ]
            ]
        },
        {
            "name": "types-requests",
            "specs": [
                [
                    "==",
                    "2.32.0.20240712"
                ]
            ]
        },
        {
            "name": "typing-extensions",
            "specs": [
                [
                    "==",
                    "4.12.2"
                ]
            ]
        },
        {
            "name": "tzdata",
            "specs": [
                [
                    "==",
                    "2024.1"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    "==",
                    "2.2.2"
                ]
            ]
        },
        {
            "name": "zipp",
            "specs": [
                [
                    "==",
                    "3.19.2"
                ]
            ]
        }
    ],
    "lcname": "hirundo"
}
        
Elapsed time: 0.82185s