Name | llm-rs-metal JSON |
Version |
0.2.15
JSON |
| download |
home_page | None |
Summary | Unofficial python bindings for llm-rs. πβ€οΈπ¦ |
upload_time | 2023-08-19 14:26:45 |
maintainer | None |
docs_url | None |
author | Lukas Kreussel |
requires_python | >=3.8 |
license | None |
keywords |
llm
transformers
|
VCS |
![](/static/img/github-24-000000.png) |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# llm-rs-python: Python Bindings for Rust's llm Library
[![PyPI](https://img.shields.io/pypi/v/llm-rs)](https://pypi.org/project/llm-rs/)
[![PyPI - License](https://img.shields.io/pypi/l/llm-rs)](https://pypi.org/project/llm-rs/)
[![Downloads](https://static.pepy.tech/badge/llm-rs)](https://pepy.tech/project/llm-rs)
Welcome to `llm-rs`, an unofficial Python interface for the Rust-based [llm](https://github.com/rustformers/llm) library, made possible through [PyO3](https://github.com/PyO3/pyo3). Our package combines the convenience of Python with the performance of Rust to offer an efficient tool for your machine learning projects. πβ€οΈπ¦
With `llm-rs`, you can operate a variety of Large Language Models (LLMs) including LLama and GPT-NeoX directly on your CPU or GPU.
For a detailed overview of all the supported architectures, visit the [llm](https://github.com/rustformers/llm) project page.
### Integrations:
* π¦οΈπ [LangChain](https://github.com/hwchase17/langchain)
* πΎπ± [Haystack](https://github.com/deepset-ai/haystack)
## Installation
Simply install it via pip: `pip install llm-rs`
<details>
<summary>Installation with GPU Acceleration Support</summary>
<br>
`llm-rs` incorporates support for various GPU-accelerated backends to facilitate enhanced inference times. To enable GPU-acceleration the `use_gpu` parameter of your `SessionConfig` must be set to `True`. The [llm documentation](https://github.com/rustformers/llm/blob/main/doc/acceleration-support.md#supported-accelerated-models) lists all model architectures, which are currently accelerated. We distribute prebuilt binaries for the following operating systems and graphics APIs:
### MacOS (Using Metal)
For MacOS users, the Metal-supported version of `llm-rs` can be easily installed via pip:
`
pip install llm-rs-metal
`
### Windows/Linux (Using CUDA for Nvidia GPUs)
Due to the significant file size, CUDA-supported packages cannot be directly uploaded to `pip`. To install them, download the appropriate `*.whl` file from the latest [Release](https://github.com/LLukas22/llm-rs-python/releases/latest) and install it using pip as follows:
`
pip install [wheelname].whl
`
### Windows/Linux (Using OpenCL for All GPUs)
For universal GPU support on Windows and Linux, we offer an OpenCL-supported version. It can be installed via pip:
`
pip install llm-rs-opencl
`
</details>
## Usage
### Running local GGML models:
Models can be loaded via the `AutoModel` interface.
```python
from llm_rs import AutoModel, KnownModels
#load the model
model = AutoModel.from_pretrained("path/to/model.bin",model_type=KnownModels.Llama)
#generate
print(model.generate("The meaning of life is"))
```
### Streaming Text
Text can be yielded from a generator via the `stream` function:
```python
from llm_rs import AutoModel, KnownModels
#load the model
model = AutoModel.from_pretrained("path/to/model.bin",model_type=KnownModels.Llama)
#generate
for token in model.stream("The meaning of life is"):
print(token)
```
### Running GGML models from the Hugging Face Hub
GGML converted models can be directly downloaded and run from the hub.
```python
from llm_rs import AutoModel
model = AutoModel.from_pretrained("rustformers/mpt-7b-ggml",model_file="mpt-7b-q4_0-ggjt.bin")
```
If there are multiple models in a repo the `model_file` has to be specified.
If you want to load repositories which were not created throught this library, you have to specify the `model_type` parameter as the metadata files needed to infer the architecture are missing.
### Running Pytorch Transfomer models from the Hugging Face Hub
`llm-rs` supports automatic conversion of all supported transformer architectures on the Huggingface Hub.
To run covnersions additional dependencies are needed which can be installed via `pip install llm-rs[convert]`.
The models can then be loaded and automatically converted via the `from_pretrained` function.
```python
from llm_rs import AutoModel
model = AutoModel.from_pretrained("mosaicml/mpt-7b")
```
### Convert Huggingface Hub Models
The following example shows how a [Pythia](https://huggingface.co/EleutherAI/pythia-410m) model can be covnverted, quantized and run.
```python
from llm_rs.convert import AutoConverter
from llm_rs import AutoModel, AutoQuantizer
import sys
#define the model which should be converted and an output directory
export_directory = "path/to/directory"
base_model = "EleutherAI/pythia-410m"
#convert the model
converted_model = AutoConverter.convert(base_model, export_directory)
#quantize the model (this step is optional)
quantized_model = AutoQuantizer.quantize(converted_model)
#load the quantized model
model = AutoModel.load(quantized_model,verbose=True)
#generate text
def callback(text):
print(text,end="")
sys.stdout.flush()
model.generate("The meaning of life is",callback=callback)
```
## π¦οΈπ LangChain Usage
Utilizing `llm-rs-python` through langchain requires additional dependencies. You can install these using `pip install llm-rs[langchain]`. Once installed, you gain access to the `RustformersLLM` model through the `llm_rs.langchain` module. This particular model offers features for text generation and embeddings.
Consider the example below, demonstrating a straightforward LLMchain implementation with MPT-Instruct:
```python
from llm_rs.langchain import RustformersLLM
from langchain import PromptTemplate
from langchain.chains import LLMChain
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
template="""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
Answer:"""
prompt = PromptTemplate(input_variables=["instruction"],template=template,)
llm = RustformersLLM(model_path_or_repo_id="rustformers/mpt-7b-ggml",model_file="mpt-7b-instruct-q5_1-ggjt.bin",callbacks=[StreamingStdOutCallbackHandler()])
chain = LLMChain(llm=llm, prompt=prompt)
chain.run("Write a short post congratulating rustformers on their new release of their langchain integration.")
```
## πΎπ± Haystack Usage
Utilizing `llm-rs-python` through haystack requires additional dependencies. You can install these using `pip install llm-rs[haystack]`. Once installed, you gain access to the `RustformersInvocationLayer` model through the `llm_rs.haystack` module. This particular model offers features for text generation.
Consider the example below, demonstrating a straightforward Haystack-Pipeline implementation with OpenLLama-3B:
```python
from haystack.nodes import PromptNode, PromptModel
from llm_rs.haystack import RustformersInvocationLayer
model = PromptModel("rustformers/open-llama-ggml",
max_length=1024,
invocation_layer_class=RustformersInvocationLayer,
model_kwargs={"model_file":"open_llama_3b-q5_1-ggjt.bin"})
pn = PromptNode(
model,
max_length=1024
)
pn("Write me a short story about a lama riding a crab.",stream=True)
```
## Documentation
For in-depth information on customizing the loading and generation processes, refer to our detailed [documentation](https://llukas22.github.io/llm-rs-python/).
Raw data
{
"_id": null,
"home_page": null,
"name": "llm-rs-metal",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "LLM,Transformers",
"author": "Lukas Kreussel",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/57/fe/65726417c1598cd40e7a81444973337bf4b73fedbee69c3b8806c73b23b6/llm_rs_metal-0.2.15.tar.gz",
"platform": null,
"description": "# llm-rs-python: Python Bindings for Rust's llm Library\n\n[![PyPI](https://img.shields.io/pypi/v/llm-rs)](https://pypi.org/project/llm-rs/)\n[![PyPI - License](https://img.shields.io/pypi/l/llm-rs)](https://pypi.org/project/llm-rs/)\n[![Downloads](https://static.pepy.tech/badge/llm-rs)](https://pepy.tech/project/llm-rs)\n\nWelcome to `llm-rs`, an unofficial Python interface for the Rust-based [llm](https://github.com/rustformers/llm) library, made possible through [PyO3](https://github.com/PyO3/pyo3). Our package combines the convenience of Python with the performance of Rust to offer an efficient tool for your machine learning projects. \ud83d\udc0d\u2764\ufe0f\ud83e\udd80\n\nWith `llm-rs`, you can operate a variety of Large Language Models (LLMs) including LLama and GPT-NeoX directly on your CPU or GPU. \n\nFor a detailed overview of all the supported architectures, visit the [llm](https://github.com/rustformers/llm) project page. \n\n### Integrations:\n* \ud83e\udd9c\ufe0f\ud83d\udd17 [LangChain](https://github.com/hwchase17/langchain)\n* \ud83c\udf3e\ud83d\udd31 [Haystack](https://github.com/deepset-ai/haystack)\n\n## Installation\n\nSimply install it via pip: `pip install llm-rs`\n\n<details>\n<summary>Installation with GPU Acceleration Support</summary>\n<br>\n\n`llm-rs` incorporates support for various GPU-accelerated backends to facilitate enhanced inference times. To enable GPU-acceleration the `use_gpu` parameter of your `SessionConfig` must be set to `True`. The [llm documentation](https://github.com/rustformers/llm/blob/main/doc/acceleration-support.md#supported-accelerated-models) lists all model architectures, which are currently accelerated. We distribute prebuilt binaries for the following operating systems and graphics APIs:\n\n### MacOS (Using Metal)\nFor MacOS users, the Metal-supported version of `llm-rs` can be easily installed via pip:\n\n`\npip install llm-rs-metal\n`\n\n### Windows/Linux (Using CUDA for Nvidia GPUs)\nDue to the significant file size, CUDA-supported packages cannot be directly uploaded to `pip`. To install them, download the appropriate `*.whl` file from the latest [Release](https://github.com/LLukas22/llm-rs-python/releases/latest) and install it using pip as follows:\n\n`\npip install [wheelname].whl\n`\n\n### Windows/Linux (Using OpenCL for All GPUs)\n\nFor universal GPU support on Windows and Linux, we offer an OpenCL-supported version. It can be installed via pip:\n\n`\npip install llm-rs-opencl\n`\n</details>\n\n\n## Usage\n### Running local GGML models:\nModels can be loaded via the `AutoModel` interface.\n\n```python \nfrom llm_rs import AutoModel, KnownModels\n\n#load the model\nmodel = AutoModel.from_pretrained(\"path/to/model.bin\",model_type=KnownModels.Llama)\n\n#generate\nprint(model.generate(\"The meaning of life is\"))\n```\n\n### Streaming Text\nText can be yielded from a generator via the `stream` function:\n```python \nfrom llm_rs import AutoModel, KnownModels\n\n#load the model\nmodel = AutoModel.from_pretrained(\"path/to/model.bin\",model_type=KnownModels.Llama)\n\n#generate\nfor token in model.stream(\"The meaning of life is\"):\n print(token)\n```\n\n### Running GGML models from the Hugging Face Hub\nGGML converted models can be directly downloaded and run from the hub.\n```python \nfrom llm_rs import AutoModel\n\nmodel = AutoModel.from_pretrained(\"rustformers/mpt-7b-ggml\",model_file=\"mpt-7b-q4_0-ggjt.bin\")\n```\nIf there are multiple models in a repo the `model_file` has to be specified.\nIf you want to load repositories which were not created throught this library, you have to specify the `model_type` parameter as the metadata files needed to infer the architecture are missing.\n\n### Running Pytorch Transfomer models from the Hugging Face Hub\n`llm-rs` supports automatic conversion of all supported transformer architectures on the Huggingface Hub. \n\nTo run covnersions additional dependencies are needed which can be installed via `pip install llm-rs[convert]`.\n\nThe models can then be loaded and automatically converted via the `from_pretrained` function.\n\n```python\nfrom llm_rs import AutoModel\n\nmodel = AutoModel.from_pretrained(\"mosaicml/mpt-7b\")\n```\n\n### Convert Huggingface Hub Models\n\nThe following example shows how a [Pythia](https://huggingface.co/EleutherAI/pythia-410m) model can be covnverted, quantized and run.\n\n```python\nfrom llm_rs.convert import AutoConverter\nfrom llm_rs import AutoModel, AutoQuantizer\nimport sys\n\n#define the model which should be converted and an output directory\nexport_directory = \"path/to/directory\" \nbase_model = \"EleutherAI/pythia-410m\"\n\n#convert the model\nconverted_model = AutoConverter.convert(base_model, export_directory)\n\n#quantize the model (this step is optional)\nquantized_model = AutoQuantizer.quantize(converted_model)\n\n#load the quantized model\nmodel = AutoModel.load(quantized_model,verbose=True)\n\n#generate text\ndef callback(text):\n print(text,end=\"\")\n sys.stdout.flush()\n\nmodel.generate(\"The meaning of life is\",callback=callback)\n```\n## \ud83e\udd9c\ufe0f\ud83d\udd17 LangChain Usage\nUtilizing `llm-rs-python` through langchain requires additional dependencies. You can install these using `pip install llm-rs[langchain]`. Once installed, you gain access to the `RustformersLLM` model through the `llm_rs.langchain` module. This particular model offers features for text generation and embeddings.\n\nConsider the example below, demonstrating a straightforward LLMchain implementation with MPT-Instruct:\n\n```python\nfrom llm_rs.langchain import RustformersLLM\nfrom langchain import PromptTemplate\nfrom langchain.chains import LLMChain\nfrom langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n\ntemplate=\"\"\"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n### Instruction:\n{instruction}\n### Response:\nAnswer:\"\"\"\n\nprompt = PromptTemplate(input_variables=[\"instruction\"],template=template,)\n\nllm = RustformersLLM(model_path_or_repo_id=\"rustformers/mpt-7b-ggml\",model_file=\"mpt-7b-instruct-q5_1-ggjt.bin\",callbacks=[StreamingStdOutCallbackHandler()])\n\nchain = LLMChain(llm=llm, prompt=prompt)\n\nchain.run(\"Write a short post congratulating rustformers on their new release of their langchain integration.\")\n```\n\n\n## \ud83c\udf3e\ud83d\udd31 Haystack Usage\nUtilizing `llm-rs-python` through haystack requires additional dependencies. You can install these using `pip install llm-rs[haystack]`. Once installed, you gain access to the `RustformersInvocationLayer` model through the `llm_rs.haystack` module. This particular model offers features for text generation.\n\nConsider the example below, demonstrating a straightforward Haystack-Pipeline implementation with OpenLLama-3B:\n\n```python\nfrom haystack.nodes import PromptNode, PromptModel\nfrom llm_rs.haystack import RustformersInvocationLayer\n\nmodel = PromptModel(\"rustformers/open-llama-ggml\",\n max_length=1024,\n invocation_layer_class=RustformersInvocationLayer,\n model_kwargs={\"model_file\":\"open_llama_3b-q5_1-ggjt.bin\"})\n\npn = PromptNode(\n model,\n max_length=1024\n)\n\npn(\"Write me a short story about a lama riding a crab.\",stream=True)\n```\n\n\n## Documentation\n\nFor in-depth information on customizing the loading and generation processes, refer to our detailed [documentation](https://llukas22.github.io/llm-rs-python/).\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Unofficial python bindings for llm-rs. \ud83d\udc0d\u2764\ufe0f\ud83e\udd80",
"version": "0.2.15",
"project_urls": {
"documentation": "https://llukas22.github.io/llm-rs-python/",
"repository": "https://github.com/LLukas22/llm-rs-python"
},
"split_keywords": [
"llm",
"transformers"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c49a8bf191c112df93a2357a040453e09f4d7e1e982bd3485adb732c00420fa3",
"md5": "60766bbdc3ca3c4d4e73b294ac320520",
"sha256": "c9e860fa7fa12c50df257ec31065146753a321a51e759cca4c704772fc456a77"
},
"downloads": -1,
"filename": "llm_rs_metal-0.2.15-cp38-abi3-macosx_10_9_x86_64.whl",
"has_sig": false,
"md5_digest": "60766bbdc3ca3c4d4e73b294ac320520",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4154181,
"upload_time": "2023-08-19T14:26:41",
"upload_time_iso_8601": "2023-08-19T14:26:41.926559Z",
"url": "https://files.pythonhosted.org/packages/c4/9a/8bf191c112df93a2357a040453e09f4d7e1e982bd3485adb732c00420fa3/llm_rs_metal-0.2.15-cp38-abi3-macosx_10_9_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c7cd70938feeeb557ac8b8e7338a07a0f2b612f676a48ac64e05eefd846278b1",
"md5": "ff422708b434ae9e9149a008d544c84a",
"sha256": "1971614f9b0594e75c0cb3b43d4589f93de17cdc646149f2948364492f36074e"
},
"downloads": -1,
"filename": "llm_rs_metal-0.2.15-cp38-abi3-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "ff422708b434ae9e9149a008d544c84a",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4036206,
"upload_time": "2023-08-19T14:26:43",
"upload_time_iso_8601": "2023-08-19T14:26:43.693228Z",
"url": "https://files.pythonhosted.org/packages/c7/cd/70938feeeb557ac8b8e7338a07a0f2b612f676a48ac64e05eefd846278b1/llm_rs_metal-0.2.15-cp38-abi3-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "57fe65726417c1598cd40e7a81444973337bf4b73fedbee69c3b8806c73b23b6",
"md5": "84e4ff99c53d03d1a572ddc79d5c4967",
"sha256": "6e6b8fa79e09a47a1abe37faf5851cd6f5e104e4a157193241945db16153e537"
},
"downloads": -1,
"filename": "llm_rs_metal-0.2.15.tar.gz",
"has_sig": false,
"md5_digest": "84e4ff99c53d03d1a572ddc79d5c4967",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 57050,
"upload_time": "2023-08-19T14:26:45",
"upload_time_iso_8601": "2023-08-19T14:26:45.444769Z",
"url": "https://files.pythonhosted.org/packages/57/fe/65726417c1598cd40e7a81444973337bf4b73fedbee69c3b8806c73b23b6/llm_rs_metal-0.2.15.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-19 14:26:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "LLukas22",
"github_project": "llm-rs-python",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "llm-rs-metal"
}