| Name | locallm JSON |
| Version |
0.5.3
JSON |
| download |
| home_page | https://github.com/emencia/locallm |
| Summary | An api to query local language models using different backends |
| upload_time | 2024-01-16 15:29:18 |
| maintainer | |
| docs_url | None |
| author | emencia |
| requires_python | |
| license | MIT |
| keywords |
python
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# Locallm
[](https://pypi.org/project/locallm/)
An api to query local language models using different backends. Supported backends:
- [Llama.cpp Python](https://github.com/abetlen/llama-cpp-python): the local Python bindings for Llama.cpp
- [Kobold.cpp](https://github.com/LostRuins/koboldcpp): the Koboldcpp api server
- [Ollama](https://github.com/jmorganca/ollama): the Ollama api server
## Quickstart
```bash
pip install locallm
```
### Local
```python
from locallm import LocalLm, InferenceParams, LmParams
lm = LocalLm(
LmParams(
models_dir="/home/me/my/models/dir"
)
)
lm.load_model("mistral-7b-instruct-v0.1.Q4_K_M.gguf", 8192)
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
"list the planets in the solar system",
InferenceParams(
template=template,
temperature=0.2,
stream=True,
max_tokens=512,
),
)
```
### Koboldcpp
```python
from locallm import KoboldcppLm, LmParams, InferenceParams
lm = KoboldcppLm(
LmParams(is_verbose=True)
)
lm.load_model("", 8192) # sets the context window size to 8196 tokens
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
"list the planets in the solar system",
InferenceParams(
template=template,
stream=True,
max_tokens=512,
),
)
```
### Ollama
```python
from locallm import OllamaLm, LmParams, InferenceParams
lm = Ollama(
LmParams(is_verbose=True)
)
lm.load_model("mistral-7b-instruct-v0.1.Q4_K_M.gguf", 8192)
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
"list the planets in the solar system",
InferenceParams(
stream=True,
template=template,
temperature=0.5,
),
)
```
## Examples
Providers:
- [Llama.cpp Python](https://github.com/abetlen/llama-cpp-python/examples/local) provider
- [Kobold.cpp](https://github.com/LostRuins/koboldcpp/examples/koboldcpp) provider
- [Ollama](https://github.com/jmorganca/ollama/ollama) provider
Other:
- [Cli](https://github.com/abetlen/llama-cpp-python/examples/cli): a Python terminal client
- [Autodoc](https://github.com/emencia/locallm/tree/master/examples/autodoc): generate docstrings from code
## Api
## LmProvider
An abstract base class to describe a language model provider. All the
providers implement this api
### Attributes
- **llm** `Optional[Llama]`: the language model.
- **models_dir** `str`: the directory where the models are stored.
- **api_key** `str`: the API key for the language model.
- **server_url** `str`: the URL of the language model server.
- **is_verbose** `bool`: whether to print more information.
- **threads** `Optional[int]`: the numbers of threads to use.
- **gpu_layers** `Optional[int]`: the numbers of layers to offload to the GPU.
- **embedding** `Optional[bool]`: use embeddings or not.
- **on_token** `OnTokenType`: the function to be called when a token is generated. Default: outputs the token to the terminal.
- **on_start_emit** `OnStartEmitType`: the function to be called when the model starts emitting tokens.
### Example
```python
lm = OllamaLm(LmParams(is_verbose=True))
```
Methods:
### `__init__`
Constructs all the necessary attributes for the LmProvider object.
#### Parameters
- **params** `LmParams`: the parameters for the language model.
#### Example
```python
lm = KoboldcppLm(LmParams())
```
### `load_model`
Loads a language model.
#### Parameters
- **model\_name** `str`: The name of the model to load.
- **ctx** `int`: The context window size for the model.
- **gpu\_layers** `Optional[int]`: The number of layers to offload to the GPU for the model.
#### Example
```python
lm.load_model("my_model.gguf", 2048, 32)
```
### `infer`
Run an inference query.
#### Parameters
- **prompt** `str`: the prompt to generate text from.
- **params** `InferenceParams`: the parameters for the inference query.
#### Returns
- **result** `InferenceResult`: the generated text and stats
#### Example
```python
>>> lm.infer("<s>[INST] List the planets in the solar system [/INST>")
The planets in the solar system are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune.
```
## Types
## InferenceParams
Parameters for inference.
### Args
- **stream** `bool, Optional`: Whether to stream the output.
- **template** `str, Optional`: The template to use for the inference.
- **threads** `int, Optional`: The number of threads to use for the inference.
- **max\_tokens** `int, Optional`: The maximum number of tokens to generate.
- **temperature** `float, Optional`: The temperature for the model.
- **top\_p** `float, Optional`: The probability cutoff for the top k tokens.
- **top\_k** `int, Optional`: The top k tokens to generate.
- **min\_p** `float, Optional`: The minimum probability for a token to be considered.
- **stop** `List[str], Optional`: A list of words to stop the model from generating.
- **frequency\_penalty** `float, Optional`: The frequency penalty for the model.
- **presence\_penalty** `float, Optional`: The presence penalty for the model.
- **repeat\_penalty** `float, Optional`: The repeat penalty for the model.
- **tfs** `float, Optional`: The temperature for the model.
- **grammar** `str, Optional`: A gbnf grammar to constraint the model's output
### Example
```python
InferenceParams(stream=True, template="<s>[INST] {prompt} [/INST>")
{
"stream": True,
"template": "<s>[INST] {prompt} [/INST>"
}
```
## LmParams
Parameters for language model.
### Args
- **models\_dir** `str, Optional`: The directory containing the language model.
- **api\_key** `str, Optional`: The API key for the language model.
- **server\_url** `str, Optional`: The server URL for the language model.
- **is\_verbose** `bool, Optional`: Whether to enable verbose output.
- **on\_token** `Callable[[str], None], Optional`: A callback function to be called on each token generated. If not provided the default will output tokens to the command line as they arrive
- **on\_start\_emit** `Callable[[Optional[Any]], None], Optional`: A callback function to be called on the start of the emission.
### Example
```python
LmParams(
models_dir="/home/me/models",
api_key="abc123",
)
```
## Tests
To configure the tests create a `tests/localconf.py` containing the some local config info to
run the tests:
```py
# absolute path to your models dir
MODELS_DIR = "/home/me/my/models/dir"
# the model to use in the tests
MODEL = "q5_1-gguf-mamba-gpt-3B_v4.gguf"
# the context window size for the tests
CTX = 2048
```
Be sure to have the corresponding backend up before running a test.
Raw data
{
"_id": null,
"home_page": "https://github.com/emencia/locallm",
"name": "locallm",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "Python",
"author": "emencia",
"author_email": "contact@emencia.com",
"download_url": "https://files.pythonhosted.org/packages/f6/59/5f54585f386a4b72a375d6bfb23da02329d62f31a23d3ecc6fcc5ff242a3/locallm-0.5.3.tar.gz",
"platform": null,
"description": "# Locallm\n\n[](https://pypi.org/project/locallm/)\n\nAn api to query local language models using different backends. Supported backends:\n\n- [Llama.cpp Python](https://github.com/abetlen/llama-cpp-python): the local Python bindings for Llama.cpp\n- [Kobold.cpp](https://github.com/LostRuins/koboldcpp): the Koboldcpp api server\n- [Ollama](https://github.com/jmorganca/ollama): the Ollama api server\n\n## Quickstart\n\n```bash\npip install locallm\n```\n\n### Local\n\n```python\nfrom locallm import LocalLm, InferenceParams, LmParams\n\nlm = LocalLm(\n LmParams(\n models_dir=\"/home/me/my/models/dir\"\n )\n)\nlm.load_model(\"mistral-7b-instruct-v0.1.Q4_K_M.gguf\", 8192)\ntemplate = \"<s>[INST] {prompt} [/INST]\"\nlm.infer(\n \"list the planets in the solar system\",\n InferenceParams(\n template=template,\n temperature=0.2,\n stream=True,\n max_tokens=512,\n ),\n)\n```\n\n### Koboldcpp\n\n```python\nfrom locallm import KoboldcppLm, LmParams, InferenceParams\n\nlm = KoboldcppLm(\n LmParams(is_verbose=True)\n)\nlm.load_model(\"\", 8192) # sets the context window size to 8196 tokens\ntemplate = \"<s>[INST] {prompt} [/INST]\"\nlm.infer(\n \"list the planets in the solar system\",\n InferenceParams(\n template=template,\n stream=True,\n max_tokens=512,\n ),\n)\n```\n\n### Ollama\n\n```python\nfrom locallm import OllamaLm, LmParams, InferenceParams\n\nlm = Ollama(\n LmParams(is_verbose=True)\n)\nlm.load_model(\"mistral-7b-instruct-v0.1.Q4_K_M.gguf\", 8192)\ntemplate = \"<s>[INST] {prompt} [/INST]\"\nlm.infer(\n \"list the planets in the solar system\",\n InferenceParams(\n stream=True,\n template=template,\n temperature=0.5,\n ),\n)\n```\n\n## Examples\n\nProviders:\n\n- [Llama.cpp Python](https://github.com/abetlen/llama-cpp-python/examples/local) provider\n- [Kobold.cpp](https://github.com/LostRuins/koboldcpp/examples/koboldcpp) provider\n- [Ollama](https://github.com/jmorganca/ollama/ollama) provider\n\nOther:\n\n- [Cli](https://github.com/abetlen/llama-cpp-python/examples/cli): a Python terminal client\n- [Autodoc](https://github.com/emencia/locallm/tree/master/examples/autodoc): generate docstrings from code\n\n## Api\n\n## LmProvider\n\nAn abstract base class to describe a language model provider. All the\nproviders implement this api\n\n### Attributes\n\n- **llm** `Optional[Llama]`: the language model.\n- **models_dir** `str`: the directory where the models are stored.\n- **api_key** `str`: the API key for the language model.\n- **server_url** `str`: the URL of the language model server.\n- **is_verbose** `bool`: whether to print more information.\n- **threads** `Optional[int]`: the numbers of threads to use.\n- **gpu_layers** `Optional[int]`: the numbers of layers to offload to the GPU.\n- **embedding** `Optional[bool]`: use embeddings or not.\n- **on_token** `OnTokenType`: the function to be called when a token is generated. Default: outputs the token to the terminal.\n- **on_start_emit** `OnStartEmitType`: the function to be called when the model starts emitting tokens.\n\n### Example\n\n```python\nlm = OllamaLm(LmParams(is_verbose=True))\n```\n\nMethods:\n\n### `__init__`\n\nConstructs all the necessary attributes for the LmProvider object.\n\n#### Parameters\n\n- **params** `LmParams`: the parameters for the language model.\n\n#### Example\n\n```python\nlm = KoboldcppLm(LmParams())\n```\n\n### `load_model`\n\nLoads a language model.\n\n#### Parameters\n\n- **model\\_name** `str`: The name of the model to load.\n- **ctx** `int`: The context window size for the model.\n- **gpu\\_layers** `Optional[int]`: The number of layers to offload to the GPU for the model.\n\n#### Example\n\n```python\nlm.load_model(\"my_model.gguf\", 2048, 32)\n```\n\n### `infer`\n\nRun an inference query.\n\n#### Parameters\n\n- **prompt** `str`: the prompt to generate text from.\n- **params** `InferenceParams`: the parameters for the inference query.\n\n#### Returns\n\n- **result** `InferenceResult`: the generated text and stats\n\n#### Example\n\n```python\n>>> lm.infer(\"<s>[INST] List the planets in the solar system [/INST>\")\nThe planets in the solar system are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune.\n```\n\n## Types\n\n## InferenceParams\n\nParameters for inference.\n\n### Args\n\n- **stream** `bool, Optional`: Whether to stream the output.\n- **template** `str, Optional`: The template to use for the inference.\n- **threads** `int, Optional`: The number of threads to use for the inference.\n- **max\\_tokens** `int, Optional`: The maximum number of tokens to generate.\n- **temperature** `float, Optional`: The temperature for the model.\n- **top\\_p** `float, Optional`: The probability cutoff for the top k tokens.\n- **top\\_k** `int, Optional`: The top k tokens to generate.\n- **min\\_p** `float, Optional`: The minimum probability for a token to be considered.\n- **stop** `List[str], Optional`: A list of words to stop the model from generating.\n- **frequency\\_penalty** `float, Optional`: The frequency penalty for the model.\n- **presence\\_penalty** `float, Optional`: The presence penalty for the model.\n- **repeat\\_penalty** `float, Optional`: The repeat penalty for the model.\n- **tfs** `float, Optional`: The temperature for the model.\n- **grammar** `str, Optional`: A gbnf grammar to constraint the model's output\n\n### Example\n\n```python\nInferenceParams(stream=True, template=\"<s>[INST] {prompt} [/INST>\")\n{\n \"stream\": True,\n \"template\": \"<s>[INST] {prompt} [/INST>\"\n}\n```\n\n## LmParams\n\nParameters for language model.\n\n### Args\n\n- **models\\_dir** `str, Optional`: The directory containing the language model.\n- **api\\_key** `str, Optional`: The API key for the language model.\n- **server\\_url** `str, Optional`: The server URL for the language model.\n- **is\\_verbose** `bool, Optional`: Whether to enable verbose output.\n- **on\\_token** `Callable[[str], None], Optional`: A callback function to be called on each token generated. If not provided the default will output tokens to the command line as they arrive\n- **on\\_start\\_emit** `Callable[[Optional[Any]], None], Optional`: A callback function to be called on the start of the emission.\n\n### Example\n\n```python\nLmParams(\n models_dir=\"/home/me/models\",\n api_key=\"abc123\",\n)\n```\n\n## Tests\n\nTo configure the tests create a `tests/localconf.py` containing the some local config info to\nrun the tests:\n\n```py\n# absolute path to your models dir\nMODELS_DIR = \"/home/me/my/models/dir\"\n# the model to use in the tests\nMODEL = \"q5_1-gguf-mamba-gpt-3B_v4.gguf\"\n#\u00a0the context window size for the tests\nCTX = 2048\n```\n\nBe sure to have the corresponding backend up before running a test.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "An api to query local language models using different backends",
"version": "0.5.3",
"project_urls": {
"Homepage": "https://github.com/emencia/locallm",
"Issue Tracker": "https://github.com/emencia/locallm/issues",
"Source Code": "https://github.com/emencia/locallm"
},
"split_keywords": [
"python"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "201c0cec07e807e70a207f2e6427cd1f132847d4cf1c57518c42ca9fa351a953",
"md5": "8162f49967c6dbde218e9df9f59a9f9a",
"sha256": "ee8c89c42438f2cd936a7c935cd418d62d135c7e4dcdd61c6a0a137dc29a76c9"
},
"downloads": -1,
"filename": "locallm-0.5.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8162f49967c6dbde218e9df9f59a9f9a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 14392,
"upload_time": "2024-01-16T15:29:17",
"upload_time_iso_8601": "2024-01-16T15:29:17.163556Z",
"url": "https://files.pythonhosted.org/packages/20/1c/0cec07e807e70a207f2e6427cd1f132847d4cf1c57518c42ca9fa351a953/locallm-0.5.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f6595f54585f386a4b72a375d6bfb23da02329d62f31a23d3ecc6fcc5ff242a3",
"md5": "a7e57145339428332ae2e1afbc8d56c8",
"sha256": "615fb753ed0c69c662fa8c592b6a163bb98fb775c44a47c2fc320a9320d1b6ca"
},
"downloads": -1,
"filename": "locallm-0.5.3.tar.gz",
"has_sig": false,
"md5_digest": "a7e57145339428332ae2e1afbc8d56c8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 12268,
"upload_time": "2024-01-16T15:29:18",
"upload_time_iso_8601": "2024-01-16T15:29:18.488621Z",
"url": "https://files.pythonhosted.org/packages/f6/59/5f54585f386a4b72a375d6bfb23da02329d62f31a23d3ecc6fcc5ff242a3/locallm-0.5.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-16 15:29:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "emencia",
"github_project": "locallm",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "locallm"
}