# picoLLM Inference Engine Python Demos
Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
## picoLLM Inference Engine
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
models. picoLLM Inference Engine is:
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
- Private; LLM inference runs 100% locally.
- Cross-Platform
- Runs on CPU and GPU
- Free for open-weight models
## Compatibility
- Python 3.8+
- Runs on Linux (x86_64), macOS (arm64, x86_64), Windows (x86_64, arm64), and Raspberry Pi (5 and 4).
## Installation
```console
pip3 install picollmdemo
```
## Models
picoLLM Inference Engine supports the following open-weight models. The models are on
[Picovoice Console](https://console.picovoice.ai/).
- Gemma
- `gemma-2b`
- `gemma-2b-it`
- `gemma-7b`
- `gemma-7b-it`
- Llama-2
- `llama-2-7b`
- `llama-2-7b-chat`
- `llama-2-13b`
- `llama-2-13b-chat`
- `llama-2-70b`
- `llama-2-70b-chat`
- Llama-3
- `llama-3-8b`
- `llama-3-8b-instruct`
- `llama-3-70b`
- `llama-3-70b-instruct`
- Llama-3.2
- `llama3.2-1b-instruct`
- `llama3.2-3b-instruct`
- Mistral
- `mistral-7b-v0.1`
- `mistral-7b-instruct-v0.1`
- `mistral-7b-instruct-v0.2`
- Mixtral
- `mixtral-8x7b-v0.1`
- `mixtral-8x7b-instruct-v0.1`
- Phi-2
- `phi2`
- Phi-3
- `phi3`
## AccessKey
AccessKey is your authentication and authorization token for deploying Picovoice SDKs, including picoLLM. Anyone who is
using Picovoice needs to have a valid AccessKey. You must keep your AccessKey secret. You would need internet
connectivity to validate your AccessKey with Picovoice license servers even though the LLM inference is running 100%
offline and completely free for open-weight models. Everyone who signs up for
[Picovoice Console](https://console.picovoice.ai/) receives a unique AccessKey.
## Usage
There are two demos available: completion and chat. The completion demo accepts a prompt and a set of optional
parameters and generates a single completion. It can run all models, whether instruction-tuned or not. The chat demo can
run instruction-tuned (chat) models such as `llama-3-8b-instruct`, `phi2`, etc. The chat demo enables a back-and-forth
conversation with the LLM, similar to ChatGPT.
### Completion Demo
Run the demo by entering the following in the terminal:
```console
picollm_demo_completion --access_key ${ACCESS_KEY} --model_path ${MODEL_PATH} --prompt ${PROMPT}
```
Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${MODEL_PATH}` with the path to a model file
downloaded from Picovoice Console, and `${PROMPT}` with a prompt string.
To get information about all the available options in the demo, run the following:
```console
picollm_demo_completion --help
```
### Chat Demo
To run an instruction-tuned model for chat, run the following in the terminal:
```console
picollm_demo_chat --access_key ${ACCESS_KEY} --model_path ${MODEL_PATH}
```
Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console and `${MODEL_PATH}` with the path to a model file
downloaded from Picovoice Console.
To get information about all the available options in the demo, run the following:
```console
picollm_demo_chat --help
```
Raw data
{
"_id": null,
"home_page": "https://github.com/Picovoice/picollm",
"name": "picollmdemo",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "Large Language Model, LLM, Generative AI, GenAI, Llama, Mistral, Mixtral, Gemma, Phi",
"author": "Picovoice",
"author_email": "hello@picovoice.ai",
"download_url": null,
"platform": null,
"description": "# picoLLM Inference Engine Python Demos\n\nMade in Vancouver, Canada by [Picovoice](https://picovoice.ai)\n\n## picoLLM Inference Engine\n\npicoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language\nmodels. picoLLM Inference Engine is:\n\n- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)\n- Private; LLM inference runs 100% locally.\n- Cross-Platform\n- Runs on CPU and GPU\n- Free for open-weight models\n\n## Compatibility\n\n- Python 3.8+\n- Runs on Linux (x86_64), macOS (arm64, x86_64), Windows (x86_64, arm64), and Raspberry Pi (5 and 4).\n\n## Installation\n\n```console\npip3 install picollmdemo\n```\n\n## Models\n\npicoLLM Inference Engine supports the following open-weight models. The models are on\n[Picovoice Console](https://console.picovoice.ai/).\n\n- Gemma\n - `gemma-2b`\n - `gemma-2b-it`\n - `gemma-7b`\n - `gemma-7b-it`\n- Llama-2\n - `llama-2-7b`\n - `llama-2-7b-chat`\n - `llama-2-13b`\n - `llama-2-13b-chat`\n - `llama-2-70b`\n - `llama-2-70b-chat`\n- Llama-3\n - `llama-3-8b`\n - `llama-3-8b-instruct`\n - `llama-3-70b`\n - `llama-3-70b-instruct`\n- Llama-3.2\n - `llama3.2-1b-instruct`\n - `llama3.2-3b-instruct`\n- Mistral\n - `mistral-7b-v0.1`\n - `mistral-7b-instruct-v0.1`\n - `mistral-7b-instruct-v0.2`\n- Mixtral\n - `mixtral-8x7b-v0.1`\n - `mixtral-8x7b-instruct-v0.1`\n- Phi-2\n - `phi2`\n- Phi-3\n - `phi3`\n\n## AccessKey\n\nAccessKey is your authentication and authorization token for deploying Picovoice SDKs, including picoLLM. Anyone who is\nusing Picovoice needs to have a valid AccessKey. You must keep your AccessKey secret. You would need internet\nconnectivity to validate your AccessKey with Picovoice license servers even though the LLM inference is running 100%\noffline and completely free for open-weight models. Everyone who signs up for\n[Picovoice Console](https://console.picovoice.ai/) receives a unique AccessKey.\n\n## Usage\n\nThere are two demos available: completion and chat. The completion demo accepts a prompt and a set of optional\nparameters and generates a single completion. It can run all models, whether instruction-tuned or not. The chat demo can\nrun instruction-tuned (chat) models such as `llama-3-8b-instruct`, `phi2`, etc. The chat demo enables a back-and-forth\nconversation with the LLM, similar to ChatGPT.\n\n### Completion Demo\n\nRun the demo by entering the following in the terminal:\n\n```console\npicollm_demo_completion --access_key ${ACCESS_KEY} --model_path ${MODEL_PATH} --prompt ${PROMPT}\n```\n\nReplace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${MODEL_PATH}` with the path to a model file\ndownloaded from Picovoice Console, and `${PROMPT}` with a prompt string.\n\nTo get information about all the available options in the demo, run the following:\n\n```console\npicollm_demo_completion --help\n```\n\n### Chat Demo\n\nTo run an instruction-tuned model for chat, run the following in the terminal:\n\n```console\npicollm_demo_chat --access_key ${ACCESS_KEY} --model_path ${MODEL_PATH}\n```\n\nReplace `${ACCESS_KEY}` with yours obtained from Picovoice Console and `${MODEL_PATH}` with the path to a model file\ndownloaded from Picovoice Console.\n\nTo get information about all the available options in the demo, run the following:\n\n```console\npicollm_demo_chat --help\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "picoLLM Inference Engine demos",
"version": "1.2.4",
"project_urls": {
"Homepage": "https://github.com/Picovoice/picollm"
},
"split_keywords": [
"large language model",
" llm",
" generative ai",
" genai",
" llama",
" mistral",
" mixtral",
" gemma",
" phi"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e45db81e3ee36ece1e3acd2c84fe03c903efe1bb1bd33d916365ce010bb2aeff",
"md5": "3f9cef7694e11ada5fd564e6d789bd33",
"sha256": "a023ff03408fbe2792079ea9dee3900b256c9b9bee36a441595990cc957ccf89"
},
"downloads": -1,
"filename": "picollmdemo-1.2.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3f9cef7694e11ada5fd564e6d789bd33",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 12250,
"upload_time": "2025-01-14T19:09:33",
"upload_time_iso_8601": "2025-01-14T19:09:33.956549Z",
"url": "https://files.pythonhosted.org/packages/e4/5d/b81e3ee36ece1e3acd2c84fe03c903efe1bb1bd33d916365ce010bb2aeff/picollmdemo-1.2.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-14 19:09:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Picovoice",
"github_project": "picollm",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "picollmdemo"
}