hf-hub-ctranslate2


Namehf-hub-ctranslate2 JSON
Version 2.13.1 PyPI version JSON
download
home_pageNone
SummaryConnecting Transfromers on HuggingfaceHub with CTranslate2.
upload_time2024-06-26 19:43:35
maintainerNone
docs_urlNone
authormichaelfeil
requires_python<3.13,>=3.8
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            hf_hub_ctranslate2
==============================

Connecting Transformers on HuggingfaceHub with Ctranslate2 - a small utility for keeping tokenizer and model around Huggingface Hub.

[![codecov](https://codecov.io/gh/michaelfeil/hf-hub-ctranslate2/branch/main/graph/badge.svg?token=U9VIEFEELS)](https://codecov.io/gh/michaelfeil/hf-hub-ctranslate2)![CI pytest](https://github.com/michaelfeil/hf-hub-ctranslate2/actions/workflows/test_release.yml/badge.svg)

[Read the docs](https://michaelfeil.github.io/hf-hub-ctranslate2/)

<!-- PROJECT SHIELDS -->
[![Contributors][contributors-shield]][contributors-url]
[![Forks][forks-shield]][forks-url]
[![Stargazers][stars-shield]][stars-url]
[![Issues][issues-shield]][issues-url]
[![MIT License][license-shield]][license-url]
[![LinkedIn][linkedin-shield]][linkedin-url]

--------
## Usage:

### PYPI Install
```bash
pip install hf-hub-ctranslate2
```
--------

## Decoder-only Transformer:
```python
# download ctranslate.Generator repos from Huggingface Hub (GPT-J, ..)
from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub

model_name_1="michaelfeil/ct2fast-pythia-160m"
model = GeneratorCT2fromHfHub(
    # load in int8 on CPU
    model_name_or_path=model_name_1, device="cpu", compute_type="int8"
)
outputs = model.generate(
    text=["How do you call a fast Flan-ingo?", "User: How are you doing?"]
    # add arguments specifically to ctranslate2.Generator here
)
```
## Encoder-Decoder:
```python
from hf_hub_ctranslate2 import TranslatorCT2fromHfHub
# download ctranslate.Translator repos from Huggingface Hub (T5, ..)
model_name_2 = "michaelfeil/ct2fast-flan-alpaca-base"
model = TranslatorCT2fromHfHub(
        # load in int8 on CUDA
        model_name_or_path=model_name_2, device="cuda", compute_type="int8_float16"
)
outputs = model.generate(
    text=["How do you call a fast Flan-ingo?", "Translate to german: How are you doing?"],
    # use arguments specifically to ctranslate2.Translator below:
    min_decoding_length=8,
    max_decoding_length=16,
    max_input_length=512,
    beam_size=3
)
print(outputs)
```
## Encoder-Decoder for multilingual translations (m2m-100):
```python
from hf_hub_ctranslate2 import MultiLingualTranslatorCT2fromHfHub
model = MultiLingualTranslatorCT2fromHfHub(
    model_name_or_path="michaelfeil/ct2fast-m2m100_418M", device="cpu", compute_type="int8",
    tokenizer=AutoTokenizer.from_pretrained(f"facebook/m2m100_418M")
)

outputs = model.generate(
    ["How do you call a fast Flamingo?", "Wie geht es dir?"],
    src_lang=["en", "de"],
    tgt_lang=["de", "fr"]
)
```
## Encoder-only Sentence Transformers
Feel free to try out a new repo, using CTranslate2 for vector-embeddings: 
https://github.com/michaelfeil/infinity

```python
from hf_hub_ctranslate2 import CT2SentenceTransformer
model_name_pytorch = "intfloat/e5-small"
model = CT2SentenceTransformer(
    model_name_pytorch, compute_type="int8", device="cuda", 
)
embeddings = model.encode(
    ["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
    batch_size=32,
    convert_to_numpy=True,
    normalize_embeddings=True,
)
print(embeddings.shape, embeddings)
scores = (embeddings @ embeddings.T) * 100
```

## Encoder-only -> no longer recommended
```python
from hf_hub_ctranslate2 import EncoderCT2fromHfHub
model_name = "michaelfeil/ct2fast-e5-small"
model = EncoderCT2fromHfHub(
        # load in int8 on CUDA
        model_name_or_path=model_name,
        device="cuda",
        compute_type="int8_float16",
)
outputs = model.generate(
    text=["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
    max_length=64,
)
```




<!-- MARKDOWN LINKS & IMAGES -->
<!-- https://www.markdownguide.org/basic-syntax/#reference-style-links -->
[contributors-shield]: https://img.shields.io/github/contributors/michaelfeil/hf-hub-ctranslate2.svg?style=for-the-badge
[contributors-url]: https://github.com/michaelfeil/hf-hub-ctranslate2/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/michaelfeil/hf-hub-ctranslate2.svg?style=for-the-badge
[forks-url]: https://github.com/michaelfeil/hf-hub-ctranslate2/network/members
[stars-shield]: https://img.shields.io/github/stars/michaelfeil/hf-hub-ctranslate2.svg?style=for-the-badge
[stars-url]: https://github.com/michaelfeil/hf-hub-ctranslate2/stargazers
[issues-shield]: https://img.shields.io/github/issues/michaelfeil/hf-hub-ctranslate2.svg?style=for-the-badge
[issues-url]: https://github.com/michaelfeil/hf-hub-ctranslate2/issues
[license-shield]: https://img.shields.io/github/license/michaelfeil/hf-hub-ctranslate2.svg?style=for-the-badge
[license-url]: https://github.com/michaelfeil/hf-hub-ctranslate2/blob/master/LICENSE.txt
[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555
[linkedin-url]: https://linkedin.com/in/michael-feil


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "hf-hub-ctranslate2",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "michaelfeil",
    "author_email": "no-reply@michaelfeil.eu",
    "download_url": "https://files.pythonhosted.org/packages/b9/bb/2f1850ca94b95dee5bb0221d85d93c99196a8452e40b354b0d9562ecb4ef/hf_hub_ctranslate2-2.13.1.tar.gz",
    "platform": null,
    "description": "hf_hub_ctranslate2\n==============================\n\nConnecting Transformers on HuggingfaceHub with Ctranslate2 - a small utility for keeping tokenizer and model around Huggingface Hub.\n\n[![codecov](https://codecov.io/gh/michaelfeil/hf-hub-ctranslate2/branch/main/graph/badge.svg?token=U9VIEFEELS)](https://codecov.io/gh/michaelfeil/hf-hub-ctranslate2)![CI pytest](https://github.com/michaelfeil/hf-hub-ctranslate2/actions/workflows/test_release.yml/badge.svg)\n\n[Read the docs](https://michaelfeil.github.io/hf-hub-ctranslate2/)\n\n<!-- PROJECT SHIELDS -->\n[![Contributors][contributors-shield]][contributors-url]\n[![Forks][forks-shield]][forks-url]\n[![Stargazers][stars-shield]][stars-url]\n[![Issues][issues-shield]][issues-url]\n[![MIT License][license-shield]][license-url]\n[![LinkedIn][linkedin-shield]][linkedin-url]\n\n--------\n## Usage:\n\n### PYPI Install\n```bash\npip install hf-hub-ctranslate2\n```\n--------\n\n## Decoder-only Transformer:\n```python\n# download ctranslate.Generator repos from Huggingface Hub (GPT-J, ..)\nfrom hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub\n\nmodel_name_1=\"michaelfeil/ct2fast-pythia-160m\"\nmodel = GeneratorCT2fromHfHub(\n    # load in int8 on CPU\n    model_name_or_path=model_name_1, device=\"cpu\", compute_type=\"int8\"\n)\noutputs = model.generate(\n    text=[\"How do you call a fast Flan-ingo?\", \"User: How are you doing?\"]\n    # add arguments specifically to ctranslate2.Generator here\n)\n```\n## Encoder-Decoder:\n```python\nfrom hf_hub_ctranslate2 import TranslatorCT2fromHfHub\n# download ctranslate.Translator repos from Huggingface Hub (T5, ..)\nmodel_name_2 = \"michaelfeil/ct2fast-flan-alpaca-base\"\nmodel = TranslatorCT2fromHfHub(\n        # load in int8 on CUDA\n        model_name_or_path=model_name_2, device=\"cuda\", compute_type=\"int8_float16\"\n)\noutputs = model.generate(\n    text=[\"How do you call a fast Flan-ingo?\", \"Translate to german: How are you doing?\"],\n    # use arguments specifically to ctranslate2.Translator below:\n    min_decoding_length=8,\n    max_decoding_length=16,\n    max_input_length=512,\n    beam_size=3\n)\nprint(outputs)\n```\n## Encoder-Decoder for multilingual translations (m2m-100):\n```python\nfrom hf_hub_ctranslate2 import MultiLingualTranslatorCT2fromHfHub\nmodel = MultiLingualTranslatorCT2fromHfHub(\n    model_name_or_path=\"michaelfeil/ct2fast-m2m100_418M\", device=\"cpu\", compute_type=\"int8\",\n    tokenizer=AutoTokenizer.from_pretrained(f\"facebook/m2m100_418M\")\n)\n\noutputs = model.generate(\n    [\"How do you call a fast Flamingo?\", \"Wie geht es dir?\"],\n    src_lang=[\"en\", \"de\"],\n    tgt_lang=[\"de\", \"fr\"]\n)\n```\n## Encoder-only Sentence Transformers\nFeel free to try out a new repo, using CTranslate2 for vector-embeddings: \nhttps://github.com/michaelfeil/infinity\n\n```python\nfrom hf_hub_ctranslate2 import CT2SentenceTransformer\nmodel_name_pytorch = \"intfloat/e5-small\"\nmodel = CT2SentenceTransformer(\n    model_name_pytorch, compute_type=\"int8\", device=\"cuda\", \n)\nembeddings = model.encode(\n    [\"I like soccer\", \"I like tennis\", \"The eiffel tower is in Paris\"],\n    batch_size=32,\n    convert_to_numpy=True,\n    normalize_embeddings=True,\n)\nprint(embeddings.shape, embeddings)\nscores = (embeddings @ embeddings.T) * 100\n```\n\n## Encoder-only -> no longer recommended\n```python\nfrom hf_hub_ctranslate2 import EncoderCT2fromHfHub\nmodel_name = \"michaelfeil/ct2fast-e5-small\"\nmodel = EncoderCT2fromHfHub(\n        # load in int8 on CUDA\n        model_name_or_path=model_name,\n        device=\"cuda\",\n        compute_type=\"int8_float16\",\n)\noutputs = model.generate(\n    text=[\"I like soccer\", \"I like tennis\", \"The eiffel tower is in Paris\"],\n    max_length=64,\n)\n```\n\n\n\n\n<!-- MARKDOWN LINKS & IMAGES -->\n<!-- https://www.markdownguide.org/basic-syntax/#reference-style-links -->\n[contributors-shield]: https://img.shields.io/github/contributors/michaelfeil/hf-hub-ctranslate2.svg?style=for-the-badge\n[contributors-url]: https://github.com/michaelfeil/hf-hub-ctranslate2/graphs/contributors\n[forks-shield]: https://img.shields.io/github/forks/michaelfeil/hf-hub-ctranslate2.svg?style=for-the-badge\n[forks-url]: https://github.com/michaelfeil/hf-hub-ctranslate2/network/members\n[stars-shield]: https://img.shields.io/github/stars/michaelfeil/hf-hub-ctranslate2.svg?style=for-the-badge\n[stars-url]: https://github.com/michaelfeil/hf-hub-ctranslate2/stargazers\n[issues-shield]: https://img.shields.io/github/issues/michaelfeil/hf-hub-ctranslate2.svg?style=for-the-badge\n[issues-url]: https://github.com/michaelfeil/hf-hub-ctranslate2/issues\n[license-shield]: https://img.shields.io/github/license/michaelfeil/hf-hub-ctranslate2.svg?style=for-the-badge\n[license-url]: https://github.com/michaelfeil/hf-hub-ctranslate2/blob/master/LICENSE.txt\n[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555\n[linkedin-url]: https://linkedin.com/in/michael-feil\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Connecting Transfromers on HuggingfaceHub with CTranslate2.",
    "version": "2.13.1",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a73c24980d9f448fc04e7f6371499d6cf29919238c7566f34000604074351e89",
                "md5": "ac75293a5a9978d779168647b8342c78",
                "sha256": "389ed6102f18d3a84cb466bddd07e1e5e532bc91ddc47d3639c9eb11cdd43a86"
            },
            "downloads": -1,
            "filename": "hf_hub_ctranslate2-2.13.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ac75293a5a9978d779168647b8342c78",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.8",
            "size": 10638,
            "upload_time": "2024-06-26T19:43:33",
            "upload_time_iso_8601": "2024-06-26T19:43:33.556456Z",
            "url": "https://files.pythonhosted.org/packages/a7/3c/24980d9f448fc04e7f6371499d6cf29919238c7566f34000604074351e89/hf_hub_ctranslate2-2.13.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b9bb2f1850ca94b95dee5bb0221d85d93c99196a8452e40b354b0d9562ecb4ef",
                "md5": "12a6c7c6e5dd7b0162ebb5c06c700dcf",
                "sha256": "e3d41d859fdea39a745a1ae4f754f24c124d9d96b0dd415317ee71141042faee"
            },
            "downloads": -1,
            "filename": "hf_hub_ctranslate2-2.13.1.tar.gz",
            "has_sig": false,
            "md5_digest": "12a6c7c6e5dd7b0162ebb5c06c700dcf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.8",
            "size": 10002,
            "upload_time": "2024-06-26T19:43:35",
            "upload_time_iso_8601": "2024-06-26T19:43:35.185646Z",
            "url": "https://files.pythonhosted.org/packages/b9/bb/2f1850ca94b95dee5bb0221d85d93c99196a8452e40b354b0d9562ecb4ef/hf_hub_ctranslate2-2.13.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-26 19:43:35",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "hf-hub-ctranslate2"
}
        
Elapsed time: 2.43449s