llm-rs-metal

Name	llm-rs-metal JSON
Version	0.2.15 JSON
	download
home_page	None
Summary	Unofficial python bindings for llm-rs. 🐍❤️🦀
upload_time	2023-08-19 14:26:45
maintainer	None
docs_url	None
author	Lukas Kreussel
requires_python	>=3.8
license	None
keywords	llm transformers
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # llm-rs-python: Python Bindings for Rust's llm Library

[![PyPI](https://img.shields.io/pypi/v/llm-rs)](https://pypi.org/project/llm-rs/)
[![PyPI - License](https://img.shields.io/pypi/l/llm-rs)](https://pypi.org/project/llm-rs/)
[![Downloads](https://static.pepy.tech/badge/llm-rs)](https://pepy.tech/project/llm-rs)

Welcome to `llm-rs`, an unofficial Python interface for the Rust-based [llm](https://github.com/rustformers/llm) library, made possible through [PyO3](https://github.com/PyO3/pyo3). Our package combines the convenience of Python with the performance of Rust to offer an efficient tool for your machine learning projects. 🐍❤️🦀

With `llm-rs`, you can operate a variety of Large Language Models (LLMs) including LLama and GPT-NeoX directly on your CPU or GPU. 

For a detailed overview of all the supported architectures, visit the [llm](https://github.com/rustformers/llm) project page. 

### Integrations:
* 🦜️🔗 [LangChain](https://github.com/hwchase17/langchain)
* 🌾🔱 [Haystack](https://github.com/deepset-ai/haystack)

## Installation

Simply install it via pip: `pip install llm-rs`

<details>
<summary>Installation with GPU Acceleration Support</summary>
<br>

`llm-rs` incorporates support for various GPU-accelerated backends to facilitate enhanced inference times. To enable GPU-acceleration the `use_gpu` parameter of your `SessionConfig` must be set to `True`. The [llm documentation](https://github.com/rustformers/llm/blob/main/doc/acceleration-support.md#supported-accelerated-models) lists all model architectures, which are currently accelerated. We distribute prebuilt binaries for the following operating systems and graphics APIs:

### MacOS (Using Metal)
For MacOS users, the Metal-supported version of `llm-rs` can be easily installed via pip:

`
pip install llm-rs-metal
`

### Windows/Linux (Using CUDA for Nvidia GPUs)
Due to the significant file size, CUDA-supported packages cannot be directly uploaded to `pip`. To install them, download the appropriate `*.whl` file from the latest [Release](https://github.com/LLukas22/llm-rs-python/releases/latest) and install it using pip as follows:

`
pip install [wheelname].whl
`

### Windows/Linux (Using OpenCL for All GPUs)

For universal GPU support on Windows and Linux, we offer an OpenCL-supported version. It can be installed via pip:

`
pip install llm-rs-opencl
`
</details>


## Usage
### Running local GGML models:
Models can be loaded via the `AutoModel` interface.

```python 
from llm_rs import AutoModel, KnownModels

#load the model
model = AutoModel.from_pretrained("path/to/model.bin",model_type=KnownModels.Llama)

#generate
print(model.generate("The meaning of life is"))
```

### Streaming Text
Text can be yielded from a generator via the `stream` function:
```python 
from llm_rs import AutoModel, KnownModels

#load the model
model = AutoModel.from_pretrained("path/to/model.bin",model_type=KnownModels.Llama)

#generate
for token in model.stream("The meaning of life is"):
    print(token)
```

### Running GGML models from the Hugging Face Hub
GGML converted models can be directly downloaded and run from the hub.
```python 
from llm_rs import AutoModel

model = AutoModel.from_pretrained("rustformers/mpt-7b-ggml",model_file="mpt-7b-q4_0-ggjt.bin")
```
If there are multiple models in a repo the `model_file` has to be specified.
If you want to load repositories which were not created throught this library, you have to specify the `model_type` parameter as the metadata files needed to infer the architecture are missing.

### Running Pytorch Transfomer models from the Hugging Face Hub
`llm-rs` supports automatic conversion of all supported transformer architectures on the Huggingface Hub. 

To run covnersions additional dependencies are needed which can be installed via `pip install llm-rs[convert]`.

The models can then be loaded and automatically converted via the `from_pretrained` function.

```python
from llm_rs import AutoModel

model = AutoModel.from_pretrained("mosaicml/mpt-7b")
```

### Convert Huggingface Hub Models

The following example shows how a [Pythia](https://huggingface.co/EleutherAI/pythia-410m) model can be covnverted, quantized and run.

```python
from llm_rs.convert import AutoConverter
from llm_rs import AutoModel, AutoQuantizer
import sys

#define the model which should be converted and an output directory
export_directory = "path/to/directory" 
base_model = "EleutherAI/pythia-410m"

#convert the model
converted_model = AutoConverter.convert(base_model, export_directory)

#quantize the model (this step is optional)
quantized_model = AutoQuantizer.quantize(converted_model)

#load the quantized model
model = AutoModel.load(quantized_model,verbose=True)

#generate text
def callback(text):
    print(text,end="")
    sys.stdout.flush()

model.generate("The meaning of life is",callback=callback)
```
## 🦜️🔗 LangChain Usage
Utilizing `llm-rs-python` through langchain requires additional dependencies. You can install these using `pip install llm-rs[langchain]`. Once installed, you gain access to the `RustformersLLM` model through the `llm_rs.langchain` module. This particular model offers features for text generation and embeddings.

Consider the example below, demonstrating a straightforward LLMchain implementation with MPT-Instruct:

```python
from llm_rs.langchain import RustformersLLM
from langchain import PromptTemplate
from langchain.chains import LLMChain
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

template="""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
Answer:"""

prompt = PromptTemplate(input_variables=["instruction"],template=template,)

llm = RustformersLLM(model_path_or_repo_id="rustformers/mpt-7b-ggml",model_file="mpt-7b-instruct-q5_1-ggjt.bin",callbacks=[StreamingStdOutCallbackHandler()])

chain = LLMChain(llm=llm, prompt=prompt)

chain.run("Write a short post congratulating rustformers on their new release of their langchain integration.")
```


## 🌾🔱 Haystack Usage
Utilizing `llm-rs-python` through haystack requires additional dependencies. You can install these using `pip install llm-rs[haystack]`. Once installed, you gain access to the `RustformersInvocationLayer` model through the `llm_rs.haystack` module. This particular model offers features for text generation.

Consider the example below, demonstrating a straightforward Haystack-Pipeline implementation with OpenLLama-3B:

```python
from haystack.nodes import PromptNode, PromptModel
from llm_rs.haystack import RustformersInvocationLayer

model = PromptModel("rustformers/open-llama-ggml",
                    max_length=1024,
                    invocation_layer_class=RustformersInvocationLayer,
                    model_kwargs={"model_file":"open_llama_3b-q5_1-ggjt.bin"})

pn = PromptNode(
    model,
    max_length=1024
)

pn("Write me a short story about a lama riding a crab.",stream=True)
```


## Documentation

For in-depth information on customizing the loading and generation processes, refer to our detailed [documentation](https://llukas22.github.io/llm-rs-python/).

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llm-rs-metal",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "LLM,Transformers",
    "author": "Lukas Kreussel",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/57/fe/65726417c1598cd40e7a81444973337bf4b73fedbee69c3b8806c73b23b6/llm_rs_metal-0.2.15.tar.gz",
    "platform": null,
    "description": "# llm-rs-python: Python Bindings for Rust's llm Library\n\n[![PyPI](https://img.shields.io/pypi/v/llm-rs)](https://pypi.org/project/llm-rs/)\n[![PyPI - License](https://img.shields.io/pypi/l/llm-rs)](https://pypi.org/project/llm-rs/)\n[![Downloads](https://static.pepy.tech/badge/llm-rs)](https://pepy.tech/project/llm-rs)\n\nWelcome to `llm-rs`, an unofficial Python interface for the Rust-based [llm](https://github.com/rustformers/llm) library, made possible through [PyO3](https://github.com/PyO3/pyo3). Our package combines the convenience of Python with the performance of Rust to offer an efficient tool for your machine learning projects. \ud83d\udc0d\u2764\ufe0f\ud83e\udd80\n\nWith `llm-rs`, you can operate a variety of Large Language Models (LLMs) including LLama and GPT-NeoX directly on your CPU or GPU. \n\nFor a detailed overview of all the supported architectures, visit the [llm](https://github.com/rustformers/llm) project page. \n\n### Integrations:\n* \ud83e\udd9c\ufe0f\ud83d\udd17 [LangChain](https://github.com/hwchase17/langchain)\n* \ud83c\udf3e\ud83d\udd31 [Haystack](https://github.com/deepset-ai/haystack)\n\n## Installation\n\nSimply install it via pip: `pip install llm-rs`\n\n<details>\n<summary>Installation with GPU Acceleration Support</summary>\n<br>\n\n`llm-rs` incorporates support for various GPU-accelerated backends to facilitate enhanced inference times. To enable GPU-acceleration the `use_gpu` parameter of your `SessionConfig` must be set to `True`. The [llm documentation](https://github.com/rustformers/llm/blob/main/doc/acceleration-support.md#supported-accelerated-models) lists all model architectures, which are currently accelerated. We distribute prebuilt binaries for the following operating systems and graphics APIs:\n\n### MacOS (Using Metal)\nFor MacOS users, the Metal-supported version of `llm-rs` can be easily installed via pip:\n\n`\npip install llm-rs-metal\n`\n\n### Windows/Linux (Using CUDA for Nvidia GPUs)\nDue to the significant file size, CUDA-supported packages cannot be directly uploaded to `pip`. To install them, download the appropriate `*.whl` file from the latest [Release](https://github.com/LLukas22/llm-rs-python/releases/latest) and install it using pip as follows:\n\n`\npip install [wheelname].whl\n`\n\n### Windows/Linux (Using OpenCL for All GPUs)\n\nFor universal GPU support on Windows and Linux, we offer an OpenCL-supported version. It can be installed via pip:\n\n`\npip install llm-rs-opencl\n`\n</details>\n\n\n## Usage\n### Running local GGML models:\nModels can be loaded via the `AutoModel` interface.\n\n```python \nfrom llm_rs import AutoModel, KnownModels\n\n#load the model\nmodel = AutoModel.from_pretrained(\"path/to/model.bin\",model_type=KnownModels.Llama)\n\n#generate\nprint(model.generate(\"The meaning of life is\"))\n```\n\n### Streaming Text\nText can be yielded from a generator via the `stream` function:\n```python \nfrom llm_rs import AutoModel, KnownModels\n\n#load the model\nmodel = AutoModel.from_pretrained(\"path/to/model.bin\",model_type=KnownModels.Llama)\n\n#generate\nfor token in model.stream(\"The meaning of life is\"):\n    print(token)\n```\n\n### Running GGML models from the Hugging Face Hub\nGGML converted models can be directly downloaded and run from the hub.\n```python \nfrom llm_rs import AutoModel\n\nmodel = AutoModel.from_pretrained(\"rustformers/mpt-7b-ggml\",model_file=\"mpt-7b-q4_0-ggjt.bin\")\n```\nIf there are multiple models in a repo the `model_file` has to be specified.\nIf you want to load repositories which were not created throught this library, you have to specify the `model_type` parameter as the metadata files needed to infer the architecture are missing.\n\n### Running Pytorch Transfomer models from the Hugging Face Hub\n`llm-rs` supports automatic conversion of all supported transformer architectures on the Huggingface Hub. \n\nTo run covnersions additional dependencies are needed which can be installed via `pip install llm-rs[convert]`.\n\nThe models can then be loaded and automatically converted via the `from_pretrained` function.\n\n```python\nfrom llm_rs import AutoModel\n\nmodel = AutoModel.from_pretrained(\"mosaicml/mpt-7b\")\n```\n\n### Convert Huggingface Hub Models\n\nThe following example shows how a [Pythia](https://huggingface.co/EleutherAI/pythia-410m) model can be covnverted, quantized and run.\n\n```python\nfrom llm_rs.convert import AutoConverter\nfrom llm_rs import AutoModel, AutoQuantizer\nimport sys\n\n#define the model which should be converted and an output directory\nexport_directory = \"path/to/directory\" \nbase_model = \"EleutherAI/pythia-410m\"\n\n#convert the model\nconverted_model = AutoConverter.convert(base_model, export_directory)\n\n#quantize the model (this step is optional)\nquantized_model = AutoQuantizer.quantize(converted_model)\n\n#load the quantized model\nmodel = AutoModel.load(quantized_model,verbose=True)\n\n#generate text\ndef callback(text):\n    print(text,end=\"\")\n    sys.stdout.flush()\n\nmodel.generate(\"The meaning of life is\",callback=callback)\n```\n## \ud83e\udd9c\ufe0f\ud83d\udd17 LangChain Usage\nUtilizing `llm-rs-python` through langchain requires additional dependencies. You can install these using `pip install llm-rs[langchain]`. Once installed, you gain access to the `RustformersLLM` model through the `llm_rs.langchain` module. This particular model offers features for text generation and embeddings.\n\nConsider the example below, demonstrating a straightforward LLMchain implementation with MPT-Instruct:\n\n```python\nfrom llm_rs.langchain import RustformersLLM\nfrom langchain import PromptTemplate\nfrom langchain.chains import LLMChain\nfrom langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n\ntemplate=\"\"\"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n### Instruction:\n{instruction}\n### Response:\nAnswer:\"\"\"\n\nprompt = PromptTemplate(input_variables=[\"instruction\"],template=template,)\n\nllm = RustformersLLM(model_path_or_repo_id=\"rustformers/mpt-7b-ggml\",model_file=\"mpt-7b-instruct-q5_1-ggjt.bin\",callbacks=[StreamingStdOutCallbackHandler()])\n\nchain = LLMChain(llm=llm, prompt=prompt)\n\nchain.run(\"Write a short post congratulating rustformers on their new release of their langchain integration.\")\n```\n\n\n## \ud83c\udf3e\ud83d\udd31 Haystack Usage\nUtilizing `llm-rs-python` through haystack requires additional dependencies. You can install these using `pip install llm-rs[haystack]`. Once installed, you gain access to the `RustformersInvocationLayer` model through the `llm_rs.haystack` module. This particular model offers features for text generation.\n\nConsider the example below, demonstrating a straightforward Haystack-Pipeline implementation with OpenLLama-3B:\n\n```python\nfrom haystack.nodes import PromptNode, PromptModel\nfrom llm_rs.haystack import RustformersInvocationLayer\n\nmodel = PromptModel(\"rustformers/open-llama-ggml\",\n                    max_length=1024,\n                    invocation_layer_class=RustformersInvocationLayer,\n                    model_kwargs={\"model_file\":\"open_llama_3b-q5_1-ggjt.bin\"})\n\npn = PromptNode(\n    model,\n    max_length=1024\n)\n\npn(\"Write me a short story about a lama riding a crab.\",stream=True)\n```\n\n\n## Documentation\n\nFor in-depth information on customizing the loading and generation processes, refer to our detailed [documentation](https://llukas22.github.io/llm-rs-python/).\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Unofficial python bindings for llm-rs. \ud83d\udc0d\u2764\ufe0f\ud83e\udd80",
    "version": "0.2.15",
    "project_urls": {
        "documentation": "https://llukas22.github.io/llm-rs-python/",
        "repository": "https://github.com/LLukas22/llm-rs-python"
    },
    "split_keywords": [
        "llm",
        "transformers"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c49a8bf191c112df93a2357a040453e09f4d7e1e982bd3485adb732c00420fa3",
                "md5": "60766bbdc3ca3c4d4e73b294ac320520",
                "sha256": "c9e860fa7fa12c50df257ec31065146753a321a51e759cca4c704772fc456a77"
            },
            "downloads": -1,
            "filename": "llm_rs_metal-0.2.15-cp38-abi3-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "60766bbdc3ca3c4d4e73b294ac320520",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4154181,
            "upload_time": "2023-08-19T14:26:41",
            "upload_time_iso_8601": "2023-08-19T14:26:41.926559Z",
            "url": "https://files.pythonhosted.org/packages/c4/9a/8bf191c112df93a2357a040453e09f4d7e1e982bd3485adb732c00420fa3/llm_rs_metal-0.2.15-cp38-abi3-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c7cd70938feeeb557ac8b8e7338a07a0f2b612f676a48ac64e05eefd846278b1",
                "md5": "ff422708b434ae9e9149a008d544c84a",
                "sha256": "1971614f9b0594e75c0cb3b43d4589f93de17cdc646149f2948364492f36074e"
            },
            "downloads": -1,
            "filename": "llm_rs_metal-0.2.15-cp38-abi3-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "ff422708b434ae9e9149a008d544c84a",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4036206,
            "upload_time": "2023-08-19T14:26:43",
            "upload_time_iso_8601": "2023-08-19T14:26:43.693228Z",
            "url": "https://files.pythonhosted.org/packages/c7/cd/70938feeeb557ac8b8e7338a07a0f2b612f676a48ac64e05eefd846278b1/llm_rs_metal-0.2.15-cp38-abi3-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "57fe65726417c1598cd40e7a81444973337bf4b73fedbee69c3b8806c73b23b6",
                "md5": "84e4ff99c53d03d1a572ddc79d5c4967",
                "sha256": "6e6b8fa79e09a47a1abe37faf5851cd6f5e104e4a157193241945db16153e537"
            },
            "downloads": -1,
            "filename": "llm_rs_metal-0.2.15.tar.gz",
            "has_sig": false,
            "md5_digest": "84e4ff99c53d03d1a572ddc79d5c4967",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 57050,
            "upload_time": "2023-08-19T14:26:45",
            "upload_time_iso_8601": "2023-08-19T14:26:45.444769Z",
            "url": "https://files.pythonhosted.org/packages/57/fe/65726417c1598cd40e7a81444973337bf4b73fedbee69c3b8806c73b23b6/llm_rs_metal-0.2.15.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-19 14:26:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "LLukas22",
    "github_project": "llm-rs-python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "llm-rs-metal"
}

Lukas Kreussel