locallm


Namelocallm JSON
Version 0.5.3 PyPI version JSON
download
home_pagehttps://github.com/emencia/locallm
SummaryAn api to query local language models using different backends
upload_time2024-01-16 15:29:18
maintainer
docs_urlNone
authoremencia
requires_python
licenseMIT
keywords python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Locallm

[![pub package](https://img.shields.io/pypi/v/locallm)](https://pypi.org/project/locallm/)

An api to query local language models using different backends. Supported backends:

- [Llama.cpp Python](https://github.com/abetlen/llama-cpp-python): the local Python bindings for Llama.cpp
- [Kobold.cpp](https://github.com/LostRuins/koboldcpp): the Koboldcpp api server
- [Ollama](https://github.com/jmorganca/ollama): the Ollama api server

## Quickstart

```bash
pip install locallm
```

### Local

```python
from locallm import LocalLm, InferenceParams, LmParams

lm = LocalLm(
    LmParams(
        models_dir="/home/me/my/models/dir"
    )
)
lm.load_model("mistral-7b-instruct-v0.1.Q4_K_M.gguf", 8192)
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
    "list the planets in the solar system",
    InferenceParams(
        template=template,
        temperature=0.2,
        stream=True,
        max_tokens=512,
    ),
)
```

### Koboldcpp

```python
from locallm import KoboldcppLm, LmParams, InferenceParams

lm = KoboldcppLm(
    LmParams(is_verbose=True)
)
lm.load_model("", 8192) # sets the context window size to 8196 tokens
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
    "list the planets in the solar system",
    InferenceParams(
        template=template,
        stream=True,
        max_tokens=512,
    ),
)
```

### Ollama

```python
from locallm import OllamaLm, LmParams, InferenceParams

lm = Ollama(
    LmParams(is_verbose=True)
)
lm.load_model("mistral-7b-instruct-v0.1.Q4_K_M.gguf", 8192)
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
    "list the planets in the solar system",
    InferenceParams(
        stream=True,
        template=template,
        temperature=0.5,
    ),
)
```

## Examples

Providers:

- [Llama.cpp Python](https://github.com/abetlen/llama-cpp-python/examples/local) provider
- [Kobold.cpp](https://github.com/LostRuins/koboldcpp/examples/koboldcpp) provider
- [Ollama](https://github.com/jmorganca/ollama/ollama) provider

Other:

- [Cli](https://github.com/abetlen/llama-cpp-python/examples/cli): a Python terminal client
- [Autodoc](https://github.com/emencia/locallm/tree/master/examples/autodoc): generate docstrings from code

## Api

## LmProvider

An abstract base class to describe a language model provider. All the
providers implement this api

### Attributes

- **llm** `Optional[Llama]`: the language model.
- **models_dir** `str`: the directory where the models are stored.
- **api_key** `str`: the API key for the language model.
- **server_url** `str`: the URL of the language model server.
- **is_verbose** `bool`: whether to print more information.
- **threads** `Optional[int]`: the numbers of threads to use.
- **gpu_layers** `Optional[int]`: the numbers of layers to offload to the GPU.
- **embedding** `Optional[bool]`: use embeddings or not.
- **on_token** `OnTokenType`: the function to be called when a token is generated. Default: outputs the token to the terminal.
- **on_start_emit** `OnStartEmitType`: the function to be called when the model starts emitting tokens.

### Example

```python
lm = OllamaLm(LmParams(is_verbose=True))
```

Methods:

### `__init__`

Constructs all the necessary attributes for the LmProvider object.

#### Parameters

- **params** `LmParams`: the parameters for the language model.

#### Example

```python
lm = KoboldcppLm(LmParams())
```

### `load_model`

Loads a language model.

#### Parameters

- **model\_name** `str`: The name of the model to load.
- **ctx** `int`: The context window size for the model.
- **gpu\_layers** `Optional[int]`: The number of layers to offload to the GPU for the model.

#### Example

```python
lm.load_model("my_model.gguf", 2048, 32)
```

### `infer`

Run an inference query.

#### Parameters

- **prompt** `str`: the prompt to generate text from.
- **params** `InferenceParams`: the parameters for the inference query.

#### Returns

- **result** `InferenceResult`: the generated text and stats

#### Example

```python
>>> lm.infer("<s>[INST] List the planets in the solar system [/INST>")
The planets in the solar system are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune.
```

## Types

## InferenceParams

Parameters for inference.

### Args

- **stream** `bool, Optional`: Whether to stream the output.
- **template** `str, Optional`: The template to use for the inference.
- **threads** `int, Optional`: The number of threads to use for the inference.
- **max\_tokens** `int, Optional`: The maximum number of tokens to generate.
- **temperature** `float, Optional`: The temperature for the model.
- **top\_p** `float, Optional`: The probability cutoff for the top k tokens.
- **top\_k** `int, Optional`: The top k tokens to generate.
- **min\_p** `float, Optional`: The minimum probability for a token to be considered.
- **stop** `List[str], Optional`: A list of words to stop the model from generating.
- **frequency\_penalty** `float, Optional`: The frequency penalty for the model.
- **presence\_penalty** `float, Optional`: The presence penalty for the model.
- **repeat\_penalty** `float, Optional`: The repeat penalty for the model.
- **tfs** `float, Optional`: The temperature for the model.
- **grammar** `str, Optional`: A gbnf grammar to constraint the model's output

### Example

```python
InferenceParams(stream=True, template="<s>[INST] {prompt} [/INST>")
{
    "stream": True,
    "template": "<s>[INST] {prompt} [/INST>"
}
```

## LmParams

Parameters for language model.

### Args

- **models\_dir** `str, Optional`: The directory containing the language model.
- **api\_key** `str, Optional`: The API key for the language model.
- **server\_url** `str, Optional`: The server URL for the language model.
- **is\_verbose** `bool, Optional`: Whether to enable verbose output.
- **on\_token** `Callable[[str], None], Optional`: A callback function to be called on each token generated. If not provided the default will output tokens to the command line as they arrive
- **on\_start\_emit** `Callable[[Optional[Any]], None], Optional`: A callback function to be called on the start of the emission.

### Example

```python
LmParams(
    models_dir="/home/me/models",
    api_key="abc123",
)
```

## Tests

To configure the tests create a `tests/localconf.py` containing the some local config info to
run the tests:

```py
# absolute path to your models dir
MODELS_DIR = "/home/me/my/models/dir"
# the model to use in the tests
MODEL = "q5_1-gguf-mamba-gpt-3B_v4.gguf"
# the context window size for the tests
CTX = 2048
```

Be sure to have the corresponding backend up before running a test.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/emencia/locallm",
    "name": "locallm",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "Python",
    "author": "emencia",
    "author_email": "contact@emencia.com",
    "download_url": "https://files.pythonhosted.org/packages/f6/59/5f54585f386a4b72a375d6bfb23da02329d62f31a23d3ecc6fcc5ff242a3/locallm-0.5.3.tar.gz",
    "platform": null,
    "description": "# Locallm\n\n[![pub package](https://img.shields.io/pypi/v/locallm)](https://pypi.org/project/locallm/)\n\nAn api to query local language models using different backends. Supported backends:\n\n- [Llama.cpp Python](https://github.com/abetlen/llama-cpp-python): the local Python bindings for Llama.cpp\n- [Kobold.cpp](https://github.com/LostRuins/koboldcpp): the Koboldcpp api server\n- [Ollama](https://github.com/jmorganca/ollama): the Ollama api server\n\n## Quickstart\n\n```bash\npip install locallm\n```\n\n### Local\n\n```python\nfrom locallm import LocalLm, InferenceParams, LmParams\n\nlm = LocalLm(\n    LmParams(\n        models_dir=\"/home/me/my/models/dir\"\n    )\n)\nlm.load_model(\"mistral-7b-instruct-v0.1.Q4_K_M.gguf\", 8192)\ntemplate = \"<s>[INST] {prompt} [/INST]\"\nlm.infer(\n    \"list the planets in the solar system\",\n    InferenceParams(\n        template=template,\n        temperature=0.2,\n        stream=True,\n        max_tokens=512,\n    ),\n)\n```\n\n### Koboldcpp\n\n```python\nfrom locallm import KoboldcppLm, LmParams, InferenceParams\n\nlm = KoboldcppLm(\n    LmParams(is_verbose=True)\n)\nlm.load_model(\"\", 8192) # sets the context window size to 8196 tokens\ntemplate = \"<s>[INST] {prompt} [/INST]\"\nlm.infer(\n    \"list the planets in the solar system\",\n    InferenceParams(\n        template=template,\n        stream=True,\n        max_tokens=512,\n    ),\n)\n```\n\n### Ollama\n\n```python\nfrom locallm import OllamaLm, LmParams, InferenceParams\n\nlm = Ollama(\n    LmParams(is_verbose=True)\n)\nlm.load_model(\"mistral-7b-instruct-v0.1.Q4_K_M.gguf\", 8192)\ntemplate = \"<s>[INST] {prompt} [/INST]\"\nlm.infer(\n    \"list the planets in the solar system\",\n    InferenceParams(\n        stream=True,\n        template=template,\n        temperature=0.5,\n    ),\n)\n```\n\n## Examples\n\nProviders:\n\n- [Llama.cpp Python](https://github.com/abetlen/llama-cpp-python/examples/local) provider\n- [Kobold.cpp](https://github.com/LostRuins/koboldcpp/examples/koboldcpp) provider\n- [Ollama](https://github.com/jmorganca/ollama/ollama) provider\n\nOther:\n\n- [Cli](https://github.com/abetlen/llama-cpp-python/examples/cli): a Python terminal client\n- [Autodoc](https://github.com/emencia/locallm/tree/master/examples/autodoc): generate docstrings from code\n\n## Api\n\n## LmProvider\n\nAn abstract base class to describe a language model provider. All the\nproviders implement this api\n\n### Attributes\n\n- **llm** `Optional[Llama]`: the language model.\n- **models_dir** `str`: the directory where the models are stored.\n- **api_key** `str`: the API key for the language model.\n- **server_url** `str`: the URL of the language model server.\n- **is_verbose** `bool`: whether to print more information.\n- **threads** `Optional[int]`: the numbers of threads to use.\n- **gpu_layers** `Optional[int]`: the numbers of layers to offload to the GPU.\n- **embedding** `Optional[bool]`: use embeddings or not.\n- **on_token** `OnTokenType`: the function to be called when a token is generated. Default: outputs the token to the terminal.\n- **on_start_emit** `OnStartEmitType`: the function to be called when the model starts emitting tokens.\n\n### Example\n\n```python\nlm = OllamaLm(LmParams(is_verbose=True))\n```\n\nMethods:\n\n### `__init__`\n\nConstructs all the necessary attributes for the LmProvider object.\n\n#### Parameters\n\n- **params** `LmParams`: the parameters for the language model.\n\n#### Example\n\n```python\nlm = KoboldcppLm(LmParams())\n```\n\n### `load_model`\n\nLoads a language model.\n\n#### Parameters\n\n- **model\\_name** `str`: The name of the model to load.\n- **ctx** `int`: The context window size for the model.\n- **gpu\\_layers** `Optional[int]`: The number of layers to offload to the GPU for the model.\n\n#### Example\n\n```python\nlm.load_model(\"my_model.gguf\", 2048, 32)\n```\n\n### `infer`\n\nRun an inference query.\n\n#### Parameters\n\n- **prompt** `str`: the prompt to generate text from.\n- **params** `InferenceParams`: the parameters for the inference query.\n\n#### Returns\n\n- **result** `InferenceResult`: the generated text and stats\n\n#### Example\n\n```python\n>>> lm.infer(\"<s>[INST] List the planets in the solar system [/INST>\")\nThe planets in the solar system are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune.\n```\n\n## Types\n\n## InferenceParams\n\nParameters for inference.\n\n### Args\n\n- **stream** `bool, Optional`: Whether to stream the output.\n- **template** `str, Optional`: The template to use for the inference.\n- **threads** `int, Optional`: The number of threads to use for the inference.\n- **max\\_tokens** `int, Optional`: The maximum number of tokens to generate.\n- **temperature** `float, Optional`: The temperature for the model.\n- **top\\_p** `float, Optional`: The probability cutoff for the top k tokens.\n- **top\\_k** `int, Optional`: The top k tokens to generate.\n- **min\\_p** `float, Optional`: The minimum probability for a token to be considered.\n- **stop** `List[str], Optional`: A list of words to stop the model from generating.\n- **frequency\\_penalty** `float, Optional`: The frequency penalty for the model.\n- **presence\\_penalty** `float, Optional`: The presence penalty for the model.\n- **repeat\\_penalty** `float, Optional`: The repeat penalty for the model.\n- **tfs** `float, Optional`: The temperature for the model.\n- **grammar** `str, Optional`: A gbnf grammar to constraint the model's output\n\n### Example\n\n```python\nInferenceParams(stream=True, template=\"<s>[INST] {prompt} [/INST>\")\n{\n    \"stream\": True,\n    \"template\": \"<s>[INST] {prompt} [/INST>\"\n}\n```\n\n## LmParams\n\nParameters for language model.\n\n### Args\n\n- **models\\_dir** `str, Optional`: The directory containing the language model.\n- **api\\_key** `str, Optional`: The API key for the language model.\n- **server\\_url** `str, Optional`: The server URL for the language model.\n- **is\\_verbose** `bool, Optional`: Whether to enable verbose output.\n- **on\\_token** `Callable[[str], None], Optional`: A callback function to be called on each token generated. If not provided the default will output tokens to the command line as they arrive\n- **on\\_start\\_emit** `Callable[[Optional[Any]], None], Optional`: A callback function to be called on the start of the emission.\n\n### Example\n\n```python\nLmParams(\n    models_dir=\"/home/me/models\",\n    api_key=\"abc123\",\n)\n```\n\n## Tests\n\nTo configure the tests create a `tests/localconf.py` containing the some local config info to\nrun the tests:\n\n```py\n# absolute path to your models dir\nMODELS_DIR = \"/home/me/my/models/dir\"\n# the model to use in the tests\nMODEL = \"q5_1-gguf-mamba-gpt-3B_v4.gguf\"\n#\u00a0the context window size for the tests\nCTX = 2048\n```\n\nBe sure to have the corresponding backend up before running a test.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "An api to query local language models using different backends",
    "version": "0.5.3",
    "project_urls": {
        "Homepage": "https://github.com/emencia/locallm",
        "Issue Tracker": "https://github.com/emencia/locallm/issues",
        "Source Code": "https://github.com/emencia/locallm"
    },
    "split_keywords": [
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "201c0cec07e807e70a207f2e6427cd1f132847d4cf1c57518c42ca9fa351a953",
                "md5": "8162f49967c6dbde218e9df9f59a9f9a",
                "sha256": "ee8c89c42438f2cd936a7c935cd418d62d135c7e4dcdd61c6a0a137dc29a76c9"
            },
            "downloads": -1,
            "filename": "locallm-0.5.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8162f49967c6dbde218e9df9f59a9f9a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 14392,
            "upload_time": "2024-01-16T15:29:17",
            "upload_time_iso_8601": "2024-01-16T15:29:17.163556Z",
            "url": "https://files.pythonhosted.org/packages/20/1c/0cec07e807e70a207f2e6427cd1f132847d4cf1c57518c42ca9fa351a953/locallm-0.5.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f6595f54585f386a4b72a375d6bfb23da02329d62f31a23d3ecc6fcc5ff242a3",
                "md5": "a7e57145339428332ae2e1afbc8d56c8",
                "sha256": "615fb753ed0c69c662fa8c592b6a163bb98fb775c44a47c2fc320a9320d1b6ca"
            },
            "downloads": -1,
            "filename": "locallm-0.5.3.tar.gz",
            "has_sig": false,
            "md5_digest": "a7e57145339428332ae2e1afbc8d56c8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 12268,
            "upload_time": "2024-01-16T15:29:18",
            "upload_time_iso_8601": "2024-01-16T15:29:18.488621Z",
            "url": "https://files.pythonhosted.org/packages/f6/59/5f54585f386a4b72a375d6bfb23da02329d62f31a23d3ecc6fcc5ff242a3/locallm-0.5.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-16 15:29:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "emencia",
    "github_project": "locallm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "locallm"
}
        
Elapsed time: 0.74335s