llama-index-llms-llama-cpp


Namellama-index-llms-llama-cpp JSON
Version 0.2.3 PyPI version JSON
download
home_pageNone
Summaryllama-index llms llama cpp integration
upload_time2024-10-08 22:31:16
maintainerNone
docs_urlNone
authorYour Name
requires_python<4.0,>=3.8.1
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # LlamaIndex Llms Integration: Llama Cpp

## Installation

1. Install the required Python packages:

   ```bash
   %pip install llama-index-embeddings-huggingface
   %pip install llama-index-llms-llama-cpp
   !pip install llama-index
   ```

## Basic Usage

### Import Required Libraries

```python
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.llms.llama_cpp.llama_utils import (
    messages_to_prompt,
    completion_to_prompt,
)
```

### Initialize LlamaCPP

Set up the model URL and initialize the LlamaCPP LLM:

```python
model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"
llm = LlamaCPP(
    model_url=model_url,
    temperature=0.1,
    max_new_tokens=256,
    context_window=3900,
    generate_kwargs={},
    model_kwargs={"n_gpu_layers": 1},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)
```

### Generate Completions

Use the `complete` method to generate a response:

```python
response = llm.complete("Hello! Can you tell me a poem about cats and dogs?")
print(response.text)
```

### Stream Completions

You can also stream completions for a prompt:

```python
response_iter = llm.stream_complete("Can you write me a poem about fast cars?")
for response in response_iter:
    print(response.delta, end="", flush=True)
```

### Set Up Query Engine with LlamaCPP

Change the global tokenizer to match the LLM:

```python
from llama_index.core import set_global_tokenizer
from transformers import AutoTokenizer

set_global_tokenizer(
    AutoTokenizer.from_pretrained("NousResearch/Llama-2-7b-chat-hf").encode
)
```

### Use Hugging Face Embeddings

Set up the embedding model and load documents:

```python
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
documents = SimpleDirectoryReader(
    "../../../examples/paul_graham_essay/data"
).load_data()
```

### Create Vector Store Index

Create a vector store index from the loaded documents:

```python
index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)
```

### Set Up Query Engine

Set up the query engine with the LlamaCPP LLM:

```python
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("What did the author do growing up?")
print(response)
```

### LLM Implementation example

https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp/

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llama-index-llms-llama-cpp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8.1",
    "maintainer_email": null,
    "keywords": null,
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/86/5d/2e26884ad44d048ef6135effcc09bdd36e1115e35f30af8afd08de78587a/llama_index_llms_llama_cpp-0.2.3.tar.gz",
    "platform": null,
    "description": "# LlamaIndex Llms Integration: Llama Cpp\n\n## Installation\n\n1. Install the required Python packages:\n\n   ```bash\n   %pip install llama-index-embeddings-huggingface\n   %pip install llama-index-llms-llama-cpp\n   !pip install llama-index\n   ```\n\n## Basic Usage\n\n### Import Required Libraries\n\n```python\nfrom llama_index.core import SimpleDirectoryReader, VectorStoreIndex\nfrom llama_index.llms.llama_cpp import LlamaCPP\nfrom llama_index.llms.llama_cpp.llama_utils import (\n    messages_to_prompt,\n    completion_to_prompt,\n)\n```\n\n### Initialize LlamaCPP\n\nSet up the model URL and initialize the LlamaCPP LLM:\n\n```python\nmodel_url = \"https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin\"\nllm = LlamaCPP(\n    model_url=model_url,\n    temperature=0.1,\n    max_new_tokens=256,\n    context_window=3900,\n    generate_kwargs={},\n    model_kwargs={\"n_gpu_layers\": 1},\n    messages_to_prompt=messages_to_prompt,\n    completion_to_prompt=completion_to_prompt,\n    verbose=True,\n)\n```\n\n### Generate Completions\n\nUse the `complete` method to generate a response:\n\n```python\nresponse = llm.complete(\"Hello! Can you tell me a poem about cats and dogs?\")\nprint(response.text)\n```\n\n### Stream Completions\n\nYou can also stream completions for a prompt:\n\n```python\nresponse_iter = llm.stream_complete(\"Can you write me a poem about fast cars?\")\nfor response in response_iter:\n    print(response.delta, end=\"\", flush=True)\n```\n\n### Set Up Query Engine with LlamaCPP\n\nChange the global tokenizer to match the LLM:\n\n```python\nfrom llama_index.core import set_global_tokenizer\nfrom transformers import AutoTokenizer\n\nset_global_tokenizer(\n    AutoTokenizer.from_pretrained(\"NousResearch/Llama-2-7b-chat-hf\").encode\n)\n```\n\n### Use Hugging Face Embeddings\n\nSet up the embedding model and load documents:\n\n```python\nfrom llama_index.embeddings.huggingface import HuggingFaceEmbedding\n\nembed_model = HuggingFaceEmbedding(model_name=\"BAAI/bge-small-en-v1.5\")\ndocuments = SimpleDirectoryReader(\n    \"../../../examples/paul_graham_essay/data\"\n).load_data()\n```\n\n### Create Vector Store Index\n\nCreate a vector store index from the loaded documents:\n\n```python\nindex = VectorStoreIndex.from_documents(documents, embed_model=embed_model)\n```\n\n### Set Up Query Engine\n\nSet up the query engine with the LlamaCPP LLM:\n\n```python\nquery_engine = index.as_query_engine(llm=llm)\nresponse = query_engine.query(\"What did the author do growing up?\")\nprint(response)\n```\n\n### LLM Implementation example\n\nhttps://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp/\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "llama-index llms llama cpp integration",
    "version": "0.2.3",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "19302b1a067e762a3bd0d45a17640870727fe39fdfbc13f0277abc08fdb2fc54",
                "md5": "6c775183cc05155a76a53d984414ec35",
                "sha256": "7e97d53c2b7ea4ae2e5afc770cceca28797e413129f3181249791eecb1261e7d"
            },
            "downloads": -1,
            "filename": "llama_index_llms_llama_cpp-0.2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6c775183cc05155a76a53d984414ec35",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8.1",
            "size": 6985,
            "upload_time": "2024-10-08T22:31:14",
            "upload_time_iso_8601": "2024-10-08T22:31:14.664102Z",
            "url": "https://files.pythonhosted.org/packages/19/30/2b1a067e762a3bd0d45a17640870727fe39fdfbc13f0277abc08fdb2fc54/llama_index_llms_llama_cpp-0.2.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "865d2e26884ad44d048ef6135effcc09bdd36e1115e35f30af8afd08de78587a",
                "md5": "6e0589d022ddcc9fe0d69a5944a6708f",
                "sha256": "60547d237ce79c0489ade4a3557560292220ecc2893ec430ecc1e508aff7ab02"
            },
            "downloads": -1,
            "filename": "llama_index_llms_llama_cpp-0.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "6e0589d022ddcc9fe0d69a5944a6708f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8.1",
            "size": 6219,
            "upload_time": "2024-10-08T22:31:16",
            "upload_time_iso_8601": "2024-10-08T22:31:16.138996Z",
            "url": "https://files.pythonhosted.org/packages/86/5d/2e26884ad44d048ef6135effcc09bdd36e1115e35f30af8afd08de78587a/llama_index_llms_llama_cpp-0.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-08 22:31:16",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-llms-llama-cpp"
}
        
Elapsed time: 0.34708s