| Name | llama-index-llms-llamafile JSON |
| Version |
0.4.0
JSON |
| download |
| home_page | None |
| Summary | llama-index llms llamafile integration |
| upload_time | 2025-07-30 20:54:15 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | <4.0,>=3.9 |
| license | None |
| keywords |
|
| VCS |
|
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# LlamaIndex Llms Integration: llamafile
## Setup Steps
### 1. Download a LlamaFile
Use the following command to download a LlamaFile from Hugging Face:
```bash
wget https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile
```
### 2. Make the File Executable
On Unix-like systems, run the following command:
```bash
chmod +x TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile
```
For Windows, simply rename the file to end with `.exe`.
### 3. Start the Model Server
Run the following command to start the model server, which will listen on `http://localhost:8080` by default:
```bash
./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile --server --nobrowser --embedding
```
## Using LlamaIndex
If you are using Google Colab or want to interact with LlamaIndex, you will need to install the necessary packages:
```bash
%pip install llama-index-llms-llamafile
!pip install llama-index
```
### Import Required Libraries
```python
from llama_index.llms.llamafile import Llamafile
from llama_index.core.llms import ChatMessage
```
### Initialize the LLM
Create an instance of the LlamaFile LLM:
```python
llm = Llamafile(temperature=0, seed=0)
```
### Generate Completions
To generate a completion for a prompt, use the `complete` method:
```python
resp = llm.complete("Who is Octavia Butler?")
print(resp)
```
### Call Chat with a List of Messages
You can also interact with the LLM using a list of messages:
```python
messages = [
ChatMessage(
role="system",
content="Pretend you are a pirate with a colorful personality.",
),
ChatMessage(role="user", content="What is your name?"),
]
resp = llm.chat(messages)
print(resp)
```
### Streaming Responses
To use the streaming capabilities, you can call the `stream_complete` method:
```python
response = llm.stream_complete("Who is Octavia Butler?")
for r in response:
print(r.delta, end="")
```
You can also stream chat responses:
```python
messages = [
ChatMessage(
role="system",
content="Pretend you are a pirate with a colorful personality.",
),
ChatMessage(role="user", content="What is your name?"),
]
resp = llm.stream_chat(messages)
for r in resp:
print(r.delta, end="")
```
### LLM Implementation example
https://docs.llamaindex.ai/en/stable/examples/llm/llamafile/
Raw data
{
"_id": null,
"home_page": null,
"name": "llama-index-llms-llamafile",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Your Name <you@example.com>",
"download_url": "https://files.pythonhosted.org/packages/dc/83/5f23a0a0daf78c63c1648a800b268a936057f54ae678840139697eb24c51/llama_index_llms_llamafile-0.4.0.tar.gz",
"platform": null,
"description": "# LlamaIndex Llms Integration: llamafile\n\n## Setup Steps\n\n### 1. Download a LlamaFile\n\nUse the following command to download a LlamaFile from Hugging Face:\n\n```bash\nwget https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile\n```\n\n### 2. Make the File Executable\n\nOn Unix-like systems, run the following command:\n\n```bash\nchmod +x TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile\n```\n\nFor Windows, simply rename the file to end with `.exe`.\n\n### 3. Start the Model Server\n\nRun the following command to start the model server, which will listen on `http://localhost:8080` by default:\n\n```bash\n./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile --server --nobrowser --embedding\n```\n\n## Using LlamaIndex\n\nIf you are using Google Colab or want to interact with LlamaIndex, you will need to install the necessary packages:\n\n```bash\n%pip install llama-index-llms-llamafile\n!pip install llama-index\n```\n\n### Import Required Libraries\n\n```python\nfrom llama_index.llms.llamafile import Llamafile\nfrom llama_index.core.llms import ChatMessage\n```\n\n### Initialize the LLM\n\nCreate an instance of the LlamaFile LLM:\n\n```python\nllm = Llamafile(temperature=0, seed=0)\n```\n\n### Generate Completions\n\nTo generate a completion for a prompt, use the `complete` method:\n\n```python\nresp = llm.complete(\"Who is Octavia Butler?\")\nprint(resp)\n```\n\n### Call Chat with a List of Messages\n\nYou can also interact with the LLM using a list of messages:\n\n```python\nmessages = [\n ChatMessage(\n role=\"system\",\n content=\"Pretend you are a pirate with a colorful personality.\",\n ),\n ChatMessage(role=\"user\", content=\"What is your name?\"),\n]\nresp = llm.chat(messages)\nprint(resp)\n```\n\n### Streaming Responses\n\nTo use the streaming capabilities, you can call the `stream_complete` method:\n\n```python\nresponse = llm.stream_complete(\"Who is Octavia Butler?\")\nfor r in response:\n print(r.delta, end=\"\")\n```\n\nYou can also stream chat responses:\n\n```python\nmessages = [\n ChatMessage(\n role=\"system\",\n content=\"Pretend you are a pirate with a colorful personality.\",\n ),\n ChatMessage(role=\"user\", content=\"What is your name?\"),\n]\nresp = llm.stream_chat(messages)\nfor r in resp:\n print(r.delta, end=\"\")\n```\n\n### LLM Implementation example\n\nhttps://docs.llamaindex.ai/en/stable/examples/llm/llamafile/\n",
"bugtrack_url": null,
"license": null,
"summary": "llama-index llms llamafile integration",
"version": "0.4.0",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "a70298782fd7982b69fe5d729397a75d42a2e499001f011093bb3ff30b6784ce",
"md5": "ca5393e0f688b0514c59080a0b731cd6",
"sha256": "fab8acfaebbdfbef41433fa3f40a8ff358a15ef14bbb50bc7ed0a7134602c6b5"
},
"downloads": -1,
"filename": "llama_index_llms_llamafile-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ca5393e0f688b0514c59080a0b731cd6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 5459,
"upload_time": "2025-07-30T20:54:14",
"upload_time_iso_8601": "2025-07-30T20:54:14.739643Z",
"url": "https://files.pythonhosted.org/packages/a7/02/98782fd7982b69fe5d729397a75d42a2e499001f011093bb3ff30b6784ce/llama_index_llms_llamafile-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "dc835f23a0a0daf78c63c1648a800b268a936057f54ae678840139697eb24c51",
"md5": "e1cd2afc29a1fd9f6bf143918b043479",
"sha256": "91dabf1f0620d2edb2215fb2833eb4ee2e9a64c7c927a5c7ff87cac90afbafd8"
},
"downloads": -1,
"filename": "llama_index_llms_llamafile-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "e1cd2afc29a1fd9f6bf143918b043479",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 5816,
"upload_time": "2025-07-30T20:54:15",
"upload_time_iso_8601": "2025-07-30T20:54:15.630425Z",
"url": "https://files.pythonhosted.org/packages/dc/83/5f23a0a0daf78c63c1648a800b268a936057f54ae678840139697eb24c51/llama_index_llms_llamafile-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-30 20:54:15",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-llms-llamafile"
}