hf-fastup


Namehf-fastup JSON
Version 0.0.7 PyPI version JSON
download
home_pagehttps://github.com/kkoutini/hf-fastup
SummaryFast upload in parallel large datasets to HuggingFace Datasets hub.
upload_time2024-02-16 13:33:09
maintainer
docs_urlNone
authorKhaled Koutini
requires_python>=3.7
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # HF-fastup

Pushes a HF dataset to the HF hub as a Parquet dataset, allowing streaming.
The dataset is processed to shards and uploaded in parallel. It useful for large datasets, for example, with embedded data.

## Usage

Make sure hf_transfer is installed and `HF_HUB_ENABLE_HF_TRANSFER` is set to `1`.

```python
import hffastup
import datasets
datasets.logging.set_verbosity_info()

# load any HF dataset
dataset = datasets.load_dataset("my_large_dataset.py")

hffastup.upload_to_hf_hub(dataset, "Org/repo") # upload to HF Hub
hffastup.push_dataset_card(dataset, "Org/repo") # Makes a dataset card and pushes it to HF Hub

```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kkoutini/hf-fastup",
    "name": "hf-fastup",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "Khaled Koutini",
    "author_email": "first.last@jku.at",
    "download_url": "https://files.pythonhosted.org/packages/ec/a5/39d0568aae1a34384011294f041d4e1b967d1c97f2f65f5734f2c58d5bac/hf-fastup-0.0.7.tar.gz",
    "platform": null,
    "description": "# HF-fastup\n\nPushes a HF dataset to the HF hub as a Parquet dataset, allowing streaming.\nThe dataset is processed to shards and uploaded in parallel. It useful for large datasets, for example, with embedded data.\n\n## Usage\n\nMake sure hf_transfer is installed and `HF_HUB_ENABLE_HF_TRANSFER` is set to `1`.\n\n```python\nimport hffastup\nimport datasets\ndatasets.logging.set_verbosity_info()\n\n# load any HF dataset\ndataset = datasets.load_dataset(\"my_large_dataset.py\")\n\nhffastup.upload_to_hf_hub(dataset, \"Org/repo\") # upload to HF Hub\nhffastup.push_dataset_card(dataset, \"Org/repo\") # Makes a dataset card and pushes it to HF Hub\n\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Fast upload in parallel large datasets to HuggingFace Datasets hub.",
    "version": "0.0.7",
    "project_urls": {
        "Bug Tracker": "https://github.com/kkoutini/hf-fastup/issues",
        "Homepage": "https://github.com/kkoutini/hf-fastup",
        "Source Code": "https://github.com/kkoutini/hf-fastup"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2466c04abf09fa7a2945f83d449b7a51bc6658852563af91e77787ef24604287",
                "md5": "10ea5a9042bb85627075d5084cfef120",
                "sha256": "861a57cc1b690de39ffdbdda1d77b3c3f28beb180a7e88df560cf5e51eb87e6f"
            },
            "downloads": -1,
            "filename": "hf_fastup-0.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "10ea5a9042bb85627075d5084cfef120",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 6231,
            "upload_time": "2024-02-16T13:33:07",
            "upload_time_iso_8601": "2024-02-16T13:33:07.890965Z",
            "url": "https://files.pythonhosted.org/packages/24/66/c04abf09fa7a2945f83d449b7a51bc6658852563af91e77787ef24604287/hf_fastup-0.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "eca539d0568aae1a34384011294f041d4e1b967d1c97f2f65f5734f2c58d5bac",
                "md5": "7abaa48912c08f4419535fce1fd85d33",
                "sha256": "fda4046498680ab173ed5147d847b85657434331900e787d9da73f297c3bca10"
            },
            "downloads": -1,
            "filename": "hf-fastup-0.0.7.tar.gz",
            "has_sig": false,
            "md5_digest": "7abaa48912c08f4419535fce1fd85d33",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 5944,
            "upload_time": "2024-02-16T13:33:09",
            "upload_time_iso_8601": "2024-02-16T13:33:09.138322Z",
            "url": "https://files.pythonhosted.org/packages/ec/a5/39d0568aae1a34384011294f041d4e1b967d1c97f2f65f5734f2c58d5bac/hf-fastup-0.0.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-16 13:33:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kkoutini",
    "github_project": "hf-fastup",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "hf-fastup"
}
        
Elapsed time: 0.18105s