# Monkey patch for HuggingFace Hub to download Git-LFS blobs from Storj
This patch aims to demonstrate the transfer speed that can be achieved with `huggingface_hub` Python library when utilizing the power of the [Storj Decentralized Cloud Storage](https://storj.io).
HuggingFace Hub stores all large files in Git-LFS.
![image](https://github.com/storj/huggingface-hub-storj-patch/assets/468091/b3c8d6d6-14fd-43c2-9396-91d4d3eba62f)
When the `huggingface_hub` Python library requests to download such a file, the download request is redirected to the Git-LFS CDN hosted at `cdn-lfs.huggingface.co`.
This monkey patch modifies the `huggingface_hub` library to redirect Git-LFS downloads to the Storj Linksharing service hosted at `link.storjshare.io`.
## Prerequisites
The Git-LFS blobs for the respective AI model must be replicated to a Storj bucket and shared it with the [Storj Linksharing Service](https://docs.storj.io/dcs/api-reference/linksharing-service).
We have already replicated the Git-FLS blobs of the [StarCoder](https://huggingface.co/bigcode/starcoder) model to a Storj bucket and shared it: https://link.storjshare.io/raw/juzlwaj7ovnst5gtkv2km3rkriha/lfs-huggingface
If you want to use another AI model, you need to use your own Storj bucket and then configure the patch to use it. See [Configuration](#hf_hub_storj_url_prefix) for more details.
## Installation
First, install the patch module:
```sh
pip install huggingface-hub-storj-patch
```
Then add the following import statement at the top, before any other import, of your Python script:
```python
import huggingface_hub_storj_patch
```
Now you can run your script. If the patch is applied successfully, you will see it printing the URLs from which the `huggingface_hub` library is downloading.
![image](https://github.com/storj/huggingface-hub-storj-patch/assets/468091/ad50968c-7959-4a6a-8f63-540eb70372ba)
## Configuration
These environment variables can configure the behavior of the patch.
### HF_HUB_NO_STORJ
If set to `true`, downloads won't be redirected to the Storj Linksharing Service as if the patch is not applied.
### HF_HUB_STORJ_PARALLELISM
Configures how many parallel download connections are open to the Storj Linksharing Service. The default value is `16`.
### HF_HUB_STORJ_URL_PREFIX
Configures the URL to the shared Storj bucket that replicates the Git-LFS blobs of the AI model. The default value is the bucket that replicates the StarCoder model: https://link.storjshare.io/raw/juzlwaj7ovnst5gtkv2km3rkriha/lfs-huggingface
Raw data
{
"_id": null,
"home_page": "https://github.com/storj/huggingface-hub-storj-patch",
"name": "huggingface-hub-storj-patch",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7.0",
"maintainer_email": "",
"keywords": "model-hub machine-learning models natural-language-processing deep-learning pytorch pretrained-models storj patch linksharing decentralized cloud storage",
"author": "Kaloyan Raev",
"author_email": "kaloyan@storj.io",
"download_url": "https://files.pythonhosted.org/packages/94/73/9fc2b2ae0298aa8ba0ef396416dc525df74b2f1696b410b66d0183e678da/huggingface_hub_storj_patch-0.0.6.tar.gz",
"platform": null,
"description": "# Monkey patch for HuggingFace Hub to download Git-LFS blobs from Storj\n\nThis patch aims to demonstrate the transfer speed that can be achieved with `huggingface_hub` Python library when utilizing the power of the [Storj Decentralized Cloud Storage](https://storj.io).\n\nHuggingFace Hub stores all large files in Git-LFS.\n\n![image](https://github.com/storj/huggingface-hub-storj-patch/assets/468091/b3c8d6d6-14fd-43c2-9396-91d4d3eba62f)\n\nWhen the `huggingface_hub` Python library requests to download such a file, the download request is redirected to the Git-LFS CDN hosted at `cdn-lfs.huggingface.co`.\n\nThis monkey patch modifies the `huggingface_hub` library to redirect Git-LFS downloads to the Storj Linksharing service hosted at `link.storjshare.io`.\n\n## Prerequisites\n\nThe Git-LFS blobs for the respective AI model must be replicated to a Storj bucket and shared it with the [Storj Linksharing Service](https://docs.storj.io/dcs/api-reference/linksharing-service).\n\nWe have already replicated the Git-FLS blobs of the [StarCoder](https://huggingface.co/bigcode/starcoder) model to a Storj bucket and shared it: https://link.storjshare.io/raw/juzlwaj7ovnst5gtkv2km3rkriha/lfs-huggingface\n\nIf you want to use another AI model, you need to use your own Storj bucket and then configure the patch to use it. See [Configuration](#hf_hub_storj_url_prefix) for more details.\n\n## Installation\n\nFirst, install the patch module:\n\n```sh\npip install huggingface-hub-storj-patch\n```\n\nThen add the following import statement at the top, before any other import, of your Python script:\n\n```python\nimport huggingface_hub_storj_patch\n```\n\nNow you can run your script. If the patch is applied successfully, you will see it printing the URLs from which the `huggingface_hub` library is downloading.\n\n![image](https://github.com/storj/huggingface-hub-storj-patch/assets/468091/ad50968c-7959-4a6a-8f63-540eb70372ba)\n\n## Configuration\n\nThese environment variables can configure the behavior of the patch.\n\n### HF_HUB_NO_STORJ\n\nIf set to `true`, downloads won't be redirected to the Storj Linksharing Service as if the patch is not applied.\n\n### HF_HUB_STORJ_PARALLELISM\n\nConfigures how many parallel download connections are open to the Storj Linksharing Service. The default value is `16`.\n\n### HF_HUB_STORJ_URL_PREFIX\n\nConfigures the URL to the shared Storj bucket that replicates the Git-LFS blobs of the AI model. The default value is the bucket that replicates the StarCoder model: https://link.storjshare.io/raw/juzlwaj7ovnst5gtkv2km3rkriha/lfs-huggingface\n",
"bugtrack_url": null,
"license": "Apache",
"summary": "Monkey patch for huggingface_hub to download Git-LFS blobs from Storj",
"version": "0.0.6",
"project_urls": {
"Homepage": "https://github.com/storj/huggingface-hub-storj-patch"
},
"split_keywords": [
"model-hub",
"machine-learning",
"models",
"natural-language-processing",
"deep-learning",
"pytorch",
"pretrained-models",
"storj",
"patch",
"linksharing",
"decentralized",
"cloud",
"storage"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a195bd9663e71e10f26f8d588a05177efb0d2236a3f3030c123b2717ccdb3496",
"md5": "6d2e1225290dfea6bf1eac25906bd090",
"sha256": "657c5a450673d6c04ec58d028901024323fbddb8dfd797cb00076841581ef125"
},
"downloads": -1,
"filename": "huggingface_hub_storj_patch-0.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6d2e1225290dfea6bf1eac25906bd090",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7.0",
"size": 8664,
"upload_time": "2023-06-08T17:22:43",
"upload_time_iso_8601": "2023-06-08T17:22:43.819865Z",
"url": "https://files.pythonhosted.org/packages/a1/95/bd9663e71e10f26f8d588a05177efb0d2236a3f3030c123b2717ccdb3496/huggingface_hub_storj_patch-0.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "94739fc2b2ae0298aa8ba0ef396416dc525df74b2f1696b410b66d0183e678da",
"md5": "353774b72c1a3fcb4a5e4d8073d6b652",
"sha256": "5da5c0ffe5bffe9d9745a7534da493a7d6a6033e2ca849d83c32b1b795b09cf7"
},
"downloads": -1,
"filename": "huggingface_hub_storj_patch-0.0.6.tar.gz",
"has_sig": false,
"md5_digest": "353774b72c1a3fcb4a5e4d8073d6b652",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7.0",
"size": 8336,
"upload_time": "2023-06-08T17:22:46",
"upload_time_iso_8601": "2023-06-08T17:22:46.116371Z",
"url": "https://files.pythonhosted.org/packages/94/73/9fc2b2ae0298aa8ba0ef396416dc525df74b2f1696b410b66d0183e678da/huggingface_hub_storj_patch-0.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-08 17:22:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "storj",
"github_project": "huggingface-hub-storj-patch",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "huggingface-hub-storj-patch"
}