llm-rs-opencl


Namellm-rs-opencl JSON
Version 0.2.15 PyPI version JSON
download
home_pageNone
SummaryUnofficial python bindings for llm-rs. πŸβ€οΈπŸ¦€
upload_time2023-08-19 14:26:25
maintainerNone
docs_urlNone
authorLukas Kreussel
requires_python>=3.8
licenseNone
keywords llm transformers
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # llm-rs-python: Python Bindings for Rust's llm Library

[![PyPI](https://img.shields.io/pypi/v/llm-rs)](https://pypi.org/project/llm-rs/)
[![PyPI - License](https://img.shields.io/pypi/l/llm-rs)](https://pypi.org/project/llm-rs/)
[![Downloads](https://static.pepy.tech/badge/llm-rs)](https://pepy.tech/project/llm-rs)

Welcome to `llm-rs`, an unofficial Python interface for the Rust-based [llm](https://github.com/rustformers/llm) library, made possible through [PyO3](https://github.com/PyO3/pyo3). Our package combines the convenience of Python with the performance of Rust to offer an efficient tool for your machine learning projects. πŸβ€οΈπŸ¦€

With `llm-rs`, you can operate a variety of Large Language Models (LLMs) including LLama and GPT-NeoX directly on your CPU or GPU. 

For a detailed overview of all the supported architectures, visit the [llm](https://github.com/rustformers/llm) project page. 

### Integrations:
* πŸ¦œοΈπŸ”— [LangChain](https://github.com/hwchase17/langchain)
* πŸŒΎπŸ”± [Haystack](https://github.com/deepset-ai/haystack)

## Installation

Simply install it via pip: `pip install llm-rs`

<details>
<summary>Installation with GPU Acceleration Support</summary>
<br>

`llm-rs` incorporates support for various GPU-accelerated backends to facilitate enhanced inference times. To enable GPU-acceleration the `use_gpu` parameter of your `SessionConfig` must be set to `True`. The [llm documentation](https://github.com/rustformers/llm/blob/main/doc/acceleration-support.md#supported-accelerated-models) lists all model architectures, which are currently accelerated. We distribute prebuilt binaries for the following operating systems and graphics APIs:

### MacOS (Using Metal)
For MacOS users, the Metal-supported version of `llm-rs` can be easily installed via pip:

`
pip install llm-rs-metal
`

### Windows/Linux (Using CUDA for Nvidia GPUs)
Due to the significant file size, CUDA-supported packages cannot be directly uploaded to `pip`. To install them, download the appropriate `*.whl` file from the latest [Release](https://github.com/LLukas22/llm-rs-python/releases/latest) and install it using pip as follows:

`
pip install [wheelname].whl
`

### Windows/Linux (Using OpenCL for All GPUs)

For universal GPU support on Windows and Linux, we offer an OpenCL-supported version. It can be installed via pip:

`
pip install llm-rs-opencl
`
</details>


## Usage
### Running local GGML models:
Models can be loaded via the `AutoModel` interface.

```python 
from llm_rs import AutoModel, KnownModels

#load the model
model = AutoModel.from_pretrained("path/to/model.bin",model_type=KnownModels.Llama)

#generate
print(model.generate("The meaning of life is"))
```

### Streaming Text
Text can be yielded from a generator via the `stream` function:
```python 
from llm_rs import AutoModel, KnownModels

#load the model
model = AutoModel.from_pretrained("path/to/model.bin",model_type=KnownModels.Llama)

#generate
for token in model.stream("The meaning of life is"):
    print(token)
```

### Running GGML models from the Hugging Face Hub
GGML converted models can be directly downloaded and run from the hub.
```python 
from llm_rs import AutoModel

model = AutoModel.from_pretrained("rustformers/mpt-7b-ggml",model_file="mpt-7b-q4_0-ggjt.bin")
```
If there are multiple models in a repo the `model_file` has to be specified.
If you want to load repositories which were not created throught this library, you have to specify the `model_type` parameter as the metadata files needed to infer the architecture are missing.

### Running Pytorch Transfomer models from the Hugging Face Hub
`llm-rs` supports automatic conversion of all supported transformer architectures on the Huggingface Hub. 

To run covnersions additional dependencies are needed which can be installed via `pip install llm-rs[convert]`.

The models can then be loaded and automatically converted via the `from_pretrained` function.

```python
from llm_rs import AutoModel

model = AutoModel.from_pretrained("mosaicml/mpt-7b")
```

### Convert Huggingface Hub Models

The following example shows how a [Pythia](https://huggingface.co/EleutherAI/pythia-410m) model can be covnverted, quantized and run.

```python
from llm_rs.convert import AutoConverter
from llm_rs import AutoModel, AutoQuantizer
import sys

#define the model which should be converted and an output directory
export_directory = "path/to/directory" 
base_model = "EleutherAI/pythia-410m"

#convert the model
converted_model = AutoConverter.convert(base_model, export_directory)

#quantize the model (this step is optional)
quantized_model = AutoQuantizer.quantize(converted_model)

#load the quantized model
model = AutoModel.load(quantized_model,verbose=True)

#generate text
def callback(text):
    print(text,end="")
    sys.stdout.flush()

model.generate("The meaning of life is",callback=callback)
```
## πŸ¦œοΈπŸ”— LangChain Usage
Utilizing `llm-rs-python` through langchain requires additional dependencies. You can install these using `pip install llm-rs[langchain]`. Once installed, you gain access to the `RustformersLLM` model through the `llm_rs.langchain` module. This particular model offers features for text generation and embeddings.

Consider the example below, demonstrating a straightforward LLMchain implementation with MPT-Instruct:

```python
from llm_rs.langchain import RustformersLLM
from langchain import PromptTemplate
from langchain.chains import LLMChain
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

template="""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
Answer:"""

prompt = PromptTemplate(input_variables=["instruction"],template=template,)

llm = RustformersLLM(model_path_or_repo_id="rustformers/mpt-7b-ggml",model_file="mpt-7b-instruct-q5_1-ggjt.bin",callbacks=[StreamingStdOutCallbackHandler()])

chain = LLMChain(llm=llm, prompt=prompt)

chain.run("Write a short post congratulating rustformers on their new release of their langchain integration.")
```


## πŸŒΎπŸ”± Haystack Usage
Utilizing `llm-rs-python` through haystack requires additional dependencies. You can install these using `pip install llm-rs[haystack]`. Once installed, you gain access to the `RustformersInvocationLayer` model through the `llm_rs.haystack` module. This particular model offers features for text generation.

Consider the example below, demonstrating a straightforward Haystack-Pipeline implementation with OpenLLama-3B:

```python
from haystack.nodes import PromptNode, PromptModel
from llm_rs.haystack import RustformersInvocationLayer

model = PromptModel("rustformers/open-llama-ggml",
                    max_length=1024,
                    invocation_layer_class=RustformersInvocationLayer,
                    model_kwargs={"model_file":"open_llama_3b-q5_1-ggjt.bin"})

pn = PromptNode(
    model,
    max_length=1024
)

pn("Write me a short story about a lama riding a crab.",stream=True)
```


## Documentation

For in-depth information on customizing the loading and generation processes, refer to our detailed [documentation](https://llukas22.github.io/llm-rs-python/).


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llm-rs-opencl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "LLM,Transformers",
    "author": "Lukas Kreussel",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/df/40/0c28905c1cfbc4bd19757bbaf7b56f72da7b310732d4c77869a60cb95c8e/llm_rs_opencl-0.2.15.tar.gz",
    "platform": null,
    "description": "# llm-rs-python: Python Bindings for Rust's llm Library\n\n[![PyPI](https://img.shields.io/pypi/v/llm-rs)](https://pypi.org/project/llm-rs/)\n[![PyPI - License](https://img.shields.io/pypi/l/llm-rs)](https://pypi.org/project/llm-rs/)\n[![Downloads](https://static.pepy.tech/badge/llm-rs)](https://pepy.tech/project/llm-rs)\n\nWelcome to `llm-rs`, an unofficial Python interface for the Rust-based [llm](https://github.com/rustformers/llm) library, made possible through [PyO3](https://github.com/PyO3/pyo3). Our package combines the convenience of Python with the performance of Rust to offer an efficient tool for your machine learning projects. \ud83d\udc0d\u2764\ufe0f\ud83e\udd80\n\nWith `llm-rs`, you can operate a variety of Large Language Models (LLMs) including LLama and GPT-NeoX directly on your CPU or GPU. \n\nFor a detailed overview of all the supported architectures, visit the [llm](https://github.com/rustformers/llm) project page. \n\n### Integrations:\n* \ud83e\udd9c\ufe0f\ud83d\udd17 [LangChain](https://github.com/hwchase17/langchain)\n* \ud83c\udf3e\ud83d\udd31 [Haystack](https://github.com/deepset-ai/haystack)\n\n## Installation\n\nSimply install it via pip: `pip install llm-rs`\n\n<details>\n<summary>Installation with GPU Acceleration Support</summary>\n<br>\n\n`llm-rs` incorporates support for various GPU-accelerated backends to facilitate enhanced inference times. To enable GPU-acceleration the `use_gpu` parameter of your `SessionConfig` must be set to `True`. The [llm documentation](https://github.com/rustformers/llm/blob/main/doc/acceleration-support.md#supported-accelerated-models) lists all model architectures, which are currently accelerated. We distribute prebuilt binaries for the following operating systems and graphics APIs:\n\n### MacOS (Using Metal)\nFor MacOS users, the Metal-supported version of `llm-rs` can be easily installed via pip:\n\n`\npip install llm-rs-metal\n`\n\n### Windows/Linux (Using CUDA for Nvidia GPUs)\nDue to the significant file size, CUDA-supported packages cannot be directly uploaded to `pip`. To install them, download the appropriate `*.whl` file from the latest [Release](https://github.com/LLukas22/llm-rs-python/releases/latest) and install it using pip as follows:\n\n`\npip install [wheelname].whl\n`\n\n### Windows/Linux (Using OpenCL for All GPUs)\n\nFor universal GPU support on Windows and Linux, we offer an OpenCL-supported version. It can be installed via pip:\n\n`\npip install llm-rs-opencl\n`\n</details>\n\n\n## Usage\n### Running local GGML models:\nModels can be loaded via the `AutoModel` interface.\n\n```python \nfrom llm_rs import AutoModel, KnownModels\n\n#load the model\nmodel = AutoModel.from_pretrained(\"path/to/model.bin\",model_type=KnownModels.Llama)\n\n#generate\nprint(model.generate(\"The meaning of life is\"))\n```\n\n### Streaming Text\nText can be yielded from a generator via the `stream` function:\n```python \nfrom llm_rs import AutoModel, KnownModels\n\n#load the model\nmodel = AutoModel.from_pretrained(\"path/to/model.bin\",model_type=KnownModels.Llama)\n\n#generate\nfor token in model.stream(\"The meaning of life is\"):\n    print(token)\n```\n\n### Running GGML models from the Hugging Face Hub\nGGML converted models can be directly downloaded and run from the hub.\n```python \nfrom llm_rs import AutoModel\n\nmodel = AutoModel.from_pretrained(\"rustformers/mpt-7b-ggml\",model_file=\"mpt-7b-q4_0-ggjt.bin\")\n```\nIf there are multiple models in a repo the `model_file` has to be specified.\nIf you want to load repositories which were not created throught this library, you have to specify the `model_type` parameter as the metadata files needed to infer the architecture are missing.\n\n### Running Pytorch Transfomer models from the Hugging Face Hub\n`llm-rs` supports automatic conversion of all supported transformer architectures on the Huggingface Hub. \n\nTo run covnersions additional dependencies are needed which can be installed via `pip install llm-rs[convert]`.\n\nThe models can then be loaded and automatically converted via the `from_pretrained` function.\n\n```python\nfrom llm_rs import AutoModel\n\nmodel = AutoModel.from_pretrained(\"mosaicml/mpt-7b\")\n```\n\n### Convert Huggingface Hub Models\n\nThe following example shows how a [Pythia](https://huggingface.co/EleutherAI/pythia-410m) model can be covnverted, quantized and run.\n\n```python\nfrom llm_rs.convert import AutoConverter\nfrom llm_rs import AutoModel, AutoQuantizer\nimport sys\n\n#define the model which should be converted and an output directory\nexport_directory = \"path/to/directory\" \nbase_model = \"EleutherAI/pythia-410m\"\n\n#convert the model\nconverted_model = AutoConverter.convert(base_model, export_directory)\n\n#quantize the model (this step is optional)\nquantized_model = AutoQuantizer.quantize(converted_model)\n\n#load the quantized model\nmodel = AutoModel.load(quantized_model,verbose=True)\n\n#generate text\ndef callback(text):\n    print(text,end=\"\")\n    sys.stdout.flush()\n\nmodel.generate(\"The meaning of life is\",callback=callback)\n```\n## \ud83e\udd9c\ufe0f\ud83d\udd17 LangChain Usage\nUtilizing `llm-rs-python` through langchain requires additional dependencies. You can install these using `pip install llm-rs[langchain]`. Once installed, you gain access to the `RustformersLLM` model through the `llm_rs.langchain` module. This particular model offers features for text generation and embeddings.\n\nConsider the example below, demonstrating a straightforward LLMchain implementation with MPT-Instruct:\n\n```python\nfrom llm_rs.langchain import RustformersLLM\nfrom langchain import PromptTemplate\nfrom langchain.chains import LLMChain\nfrom langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n\ntemplate=\"\"\"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n### Instruction:\n{instruction}\n### Response:\nAnswer:\"\"\"\n\nprompt = PromptTemplate(input_variables=[\"instruction\"],template=template,)\n\nllm = RustformersLLM(model_path_or_repo_id=\"rustformers/mpt-7b-ggml\",model_file=\"mpt-7b-instruct-q5_1-ggjt.bin\",callbacks=[StreamingStdOutCallbackHandler()])\n\nchain = LLMChain(llm=llm, prompt=prompt)\n\nchain.run(\"Write a short post congratulating rustformers on their new release of their langchain integration.\")\n```\n\n\n## \ud83c\udf3e\ud83d\udd31 Haystack Usage\nUtilizing `llm-rs-python` through haystack requires additional dependencies. You can install these using `pip install llm-rs[haystack]`. Once installed, you gain access to the `RustformersInvocationLayer` model through the `llm_rs.haystack` module. This particular model offers features for text generation.\n\nConsider the example below, demonstrating a straightforward Haystack-Pipeline implementation with OpenLLama-3B:\n\n```python\nfrom haystack.nodes import PromptNode, PromptModel\nfrom llm_rs.haystack import RustformersInvocationLayer\n\nmodel = PromptModel(\"rustformers/open-llama-ggml\",\n                    max_length=1024,\n                    invocation_layer_class=RustformersInvocationLayer,\n                    model_kwargs={\"model_file\":\"open_llama_3b-q5_1-ggjt.bin\"})\n\npn = PromptNode(\n    model,\n    max_length=1024\n)\n\npn(\"Write me a short story about a lama riding a crab.\",stream=True)\n```\n\n\n## Documentation\n\nFor in-depth information on customizing the loading and generation processes, refer to our detailed [documentation](https://llukas22.github.io/llm-rs-python/).\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Unofficial python bindings for llm-rs. \ud83d\udc0d\u2764\ufe0f\ud83e\udd80",
    "version": "0.2.15",
    "project_urls": {
        "documentation": "https://llukas22.github.io/llm-rs-python/",
        "repository": "https://github.com/LLukas22/llm-rs-python"
    },
    "split_keywords": [
        "llm",
        "transformers"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f175391b20d3ed1245fd04f5149b6646490e7d85d9422518091413fbb134994b",
                "md5": "01918dd2bc1234a6ee50b1abb7582ad2",
                "sha256": "093ad63758481856da20f03c772047771063cbb28504d1e50dd2356b0848ef3e"
            },
            "downloads": -1,
            "filename": "llm_rs_opencl-0.2.15-cp38-abi3-manylinux_2_34_x86_64.whl",
            "has_sig": false,
            "md5_digest": "01918dd2bc1234a6ee50b1abb7582ad2",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 8670746,
            "upload_time": "2023-08-19T14:26:22",
            "upload_time_iso_8601": "2023-08-19T14:26:22.545308Z",
            "url": "https://files.pythonhosted.org/packages/f1/75/391b20d3ed1245fd04f5149b6646490e7d85d9422518091413fbb134994b/llm_rs_opencl-0.2.15-cp38-abi3-manylinux_2_34_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "df400c28905c1cfbc4bd19757bbaf7b56f72da7b310732d4c77869a60cb95c8e",
                "md5": "b8158c6e9c4765c6da510927be08e8a8",
                "sha256": "0aa0f1c72c9d66ddb1d3b3d81e2e60952a969af3a8cae1dd26f9222fc5d8b223"
            },
            "downloads": -1,
            "filename": "llm_rs_opencl-0.2.15.tar.gz",
            "has_sig": false,
            "md5_digest": "b8158c6e9c4765c6da510927be08e8a8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 57052,
            "upload_time": "2023-08-19T14:26:25",
            "upload_time_iso_8601": "2023-08-19T14:26:25.381481Z",
            "url": "https://files.pythonhosted.org/packages/df/40/0c28905c1cfbc4bd19757bbaf7b56f72da7b310732d4c77869a60cb95c8e/llm_rs_opencl-0.2.15.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-19 14:26:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "LLukas22",
    "github_project": "llm-rs-python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "llm-rs-opencl"
}
        
Elapsed time: 0.24118s