| Name | zensols-lmtask JSON |
| Version |
0.1.0
JSON |
| download |
| home_page | None |
| Summary | A large language model (LLM) API to train and inference specifically for tasks. |
| upload_time | 2025-08-06 20:54:33 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | <3.14,>=3.11 |
| license | MIT |
| keywords |
tooling
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# Inferencing and Training Large Language Model Tasks
[![PyPI][pypi-badge]][pypi-link]
[![Python 3.11][python311-badge]][python311-link]
A large language model (LLM) API to train and inference specifically for tasks.
The API provides utility classes and configuration to streamline project's
access to LLM responses that (can be) machine readable, such as querying the
LLM to produce JSON output--even if that output is partial given output token
limits. The package provides an API to train and interface with LLMs as both
pretrained embeddings and instruct models.
Features:
* Create new LLMs with configuration without having to write code.
* [Three examples](resources/tasks.yml) of "code-less" models: sentiment
analysis, NER tagging and generation.
* Cache LLM responses to save from recomputing potentially costly prompts
(optional feature).
* [Command-line](#command-line) interface to inference, pre-train and
post-train LLM models.
* [Advanced API](#python-api) to read responses and accept partial output for
max token cutoffs.
* Chat template integration when supported.
* Extendable interfaces with LLMs with built in support for Llama 3.
* [Easy to configure datasets](#datasets) processed by model trainers
## Documentation
See the [full documentation](https://plandes.github.io/lmtask/index.html).
The [API reference](https://plandes.github.io/lmtask/api.html) is also
available.
## Obtaining
The library can be installed with pip from the [pypi] repository:
```bash
pip3 install zensols.lmtask
```
A Conda environment can also be created with the
[environment.yml](src/python/environment.yml):
```bash
conda env create -f src/python/environment.yml
```
## Usage
The package can be used from the command line to both inference and train a new
model or as an API.
### Command Line
The command-line can be used for inferencing to list available tasks:
```bash
lmtask task
```
Generate text by inferencing using the Llama base model:
```bash
lmtask stream base_generate 'in a world long long away' \
--override=lmtask_model_generate_args.temperature=0.9
```
Use named entity recognition (NER):
```bash
lmtask instruct ner 'UIC is in Chicago.'
model_output_json:
label: ORG
span: [0, 3]
text: UIC
label: O
span: [4, 6]
text: is
label: O
span: [7, 9]
text: in
label: LOC
span: [10, 15]
text: Chicago
```
The command-line program can be used to train new models with just
configuration. See the [trainconf](trainconf) directory for examples of
configuration file. Before you train, you might want to get a sample of the
configured dataset:
```bash
lmtask -c trainconf/imdb.yml sample -m 1
________________________________________
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024
You are a sentiment classifier.<|eot_id|><|start_header_id|>user<|end_header_id|>
Only output the sentiment.
### Review:We always watch American movies with their particular...
### Sentiment:```positive```<|eot_id|>
```
To train a new sentiment model on the IMDB dataset:
```bash
lmtask -c trainconf/imdb.yml train
```
### Python API
The Python API can be used to access tasks directly.
```python
>>> import json
>>> from zensols.lmtask import ApplicationFactory
>>> from zensols.lmtask import InstructTaskRequest
# create the task factory
>>> fac = ApplicationFactory.get_task_factory()
# list configured tasks
>>> fac.write(short=True)
base_generate (base generate text)
instruct_generate (base generate text)
ner (tags named entities)
sentiment (classifies sentiment)
# create a sentiment analysis task
>>> task = fac.create('sentiment')
# inference
>>> sents = 'I love football.\nI hate olives.\nEarth is big.'
>>> res = task.process(InstructTaskRequest(instruction=sents))
# print the JSON result as formatted text
>>> print(json.dumps(res.model_output_json, indent=4))
[
{
"index": 0,
"sentence": "I love football.",
"label": "+"
},
{
"index": 1,
"sentence": "I hate olives.",
"label": "-"
},
{
"index": 2,
"sentence": "Earth is big.",
"label": "n"
}
]
```
## Datasets
The package features easy to configure datasets and data processing on it. For
example, the following is taken from the [IMDB training
configuration](trainconf/imdb.yml) example:
```yaml
# a dataset factory instance used by the trainer (lmtask_trainer_hf)
lmtask_imdb_source:
class_name: zensols.lmtask.dataset.LoadedTaskDatasetFactory
# the dataset name (downloaded if not already); this can be a `pathlib.Path`,
# Pandas dataframe or Zensols Stash
source: stanfordnlp/imdb
# use only the training split
load_args:
split: train
# the task that consumes the data, which will format each datapoint
# specifically for that task's model
task: 'instance: lmtask_task_imdb'
# preprocessing Python source code to add labels and subset the data (db.select)
pre_process: |-
ds = ds.map(lambda x: {'output': 'positive' if x['label'] == 1 else 'negative'})
ds = ds.rename_column('text', 'instruction')
ds = ds.shuffle(seed=0)
# 7K takes 55m
ds = ds.select(range(7_000))
```
## Changelog
An extensive changelog is available [here](CHANGELOG.md).
## Community
Please star this repository and let me know how and where you use this API.
Contributions as pull requests, feedback and any input is welcome.
## License
[MIT License](LICENSE.md)
Copyright (c) 2025 Paul Landes
<!-- links -->
[pypi]: https://pypi.org/project/zensols.lmtask/
[pypi-link]: https://pypi.python.org/pypi/zensols.lmtask
[pypi-badge]: https://img.shields.io/pypi/v/zensols.lmtask.svg
[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg
[python311-link]: https://www.python.org/downloads/release/python-3110
Raw data
{
"_id": null,
"home_page": null,
"name": "zensols-lmtask",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.14,>=3.11",
"maintainer_email": null,
"keywords": "tooling",
"author": null,
"author_email": "Paul Landes <landes@mailc.net>",
"download_url": null,
"platform": null,
"description": "# Inferencing and Training Large Language Model Tasks\n\n[![PyPI][pypi-badge]][pypi-link]\n[![Python 3.11][python311-badge]][python311-link]\n\nA large language model (LLM) API to train and inference specifically for tasks.\nThe API provides utility classes and configuration to streamline project's\naccess to LLM responses that (can be) machine readable, such as querying the\nLLM to produce JSON output--even if that output is partial given output token\nlimits. The package provides an API to train and interface with LLMs as both\npretrained embeddings and instruct models.\n\nFeatures:\n\n* Create new LLMs with configuration without having to write code.\n* [Three examples](resources/tasks.yml) of \"code-less\" models: sentiment\n analysis, NER tagging and generation.\n* Cache LLM responses to save from recomputing potentially costly prompts\n (optional feature).\n* [Command-line](#command-line) interface to inference, pre-train and\n post-train LLM models.\n* [Advanced API](#python-api) to read responses and accept partial output for\n max token cutoffs.\n* Chat template integration when supported.\n* Extendable interfaces with LLMs with built in support for Llama 3.\n* [Easy to configure datasets](#datasets) processed by model trainers\n\n\n## Documentation\n\nSee the [full documentation](https://plandes.github.io/lmtask/index.html).\nThe [API reference](https://plandes.github.io/lmtask/api.html) is also\navailable.\n\n\n## Obtaining\n\nThe library can be installed with pip from the [pypi] repository:\n```bash\npip3 install zensols.lmtask\n```\n\nA Conda environment can also be created with the\n[environment.yml](src/python/environment.yml):\n```bash\nconda env create -f src/python/environment.yml\n```\n\n\n## Usage\n\nThe package can be used from the command line to both inference and train a new\nmodel or as an API.\n\n\n### Command Line\n\nThe command-line can be used for inferencing to list available tasks:\n\n```bash\nlmtask task\n```\n\nGenerate text by inferencing using the Llama base model:\n\n```bash\nlmtask stream base_generate 'in a world long long away' \\\n --override=lmtask_model_generate_args.temperature=0.9\n```\n\nUse named entity recognition (NER):\n```bash\nlmtask instruct ner 'UIC is in Chicago.' \nmodel_output_json:\n label: ORG\n span: [0, 3]\n text: UIC\n label: O\n span: [4, 6]\n text: is\n label: O\n span: [7, 9]\n text: in\n label: LOC\n span: [10, 15]\n text: Chicago\n```\n\nThe command-line program can be used to train new models with just\nconfiguration. See the [trainconf](trainconf) directory for examples of\nconfiguration file. Before you train, you might want to get a sample of the\nconfigured dataset:\n\n```bash\nlmtask -c trainconf/imdb.yml sample -m 1\n________________________________________\n<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 Jul 2024\n\nYou are a sentiment classifier.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nOnly output the sentiment.\n\n### Review:We always watch American movies with their particular...\n### Sentiment:```positive```<|eot_id|>\n```\n\nTo train a new sentiment model on the IMDB dataset:\n\n```bash\nlmtask -c trainconf/imdb.yml train\n```\n\n\n### Python API\n\nThe Python API can be used to access tasks directly.\n\n```python\n>>> import json\n>>> from zensols.lmtask import ApplicationFactory\n>>> from zensols.lmtask import InstructTaskRequest\n\n# create the task factory\n>>> fac = ApplicationFactory.get_task_factory()\n\n# list configured tasks\n>>> fac.write(short=True)\nbase_generate (base generate text)\ninstruct_generate (base generate text)\nner (tags named entities)\nsentiment (classifies sentiment)\n\n# create a sentiment analysis task\n>>> task = fac.create('sentiment')\n\n# inference\n>>> sents = 'I love football.\\nI hate olives.\\nEarth is big.'\n>>> res = task.process(InstructTaskRequest(instruction=sents))\n\n# print the JSON result as formatted text\n>>> print(json.dumps(res.model_output_json, indent=4))\n[\n {\n \"index\": 0,\n \"sentence\": \"I love football.\",\n \"label\": \"+\"\n },\n {\n \"index\": 1,\n \"sentence\": \"I hate olives.\",\n \"label\": \"-\"\n },\n {\n \"index\": 2,\n \"sentence\": \"Earth is big.\",\n \"label\": \"n\"\n }\n]\n```\n\n\n## Datasets\n\nThe package features easy to configure datasets and data processing on it. For\nexample, the following is taken from the [IMDB training\nconfiguration](trainconf/imdb.yml) example:\n\n```yaml\n# a dataset factory instance used by the trainer (lmtask_trainer_hf)\nlmtask_imdb_source:\n class_name: zensols.lmtask.dataset.LoadedTaskDatasetFactory\n # the dataset name (downloaded if not already); this can be a `pathlib.Path`,\n # Pandas dataframe or Zensols Stash\n source: stanfordnlp/imdb\n # use only the training split\n load_args:\n split: train\n # the task that consumes the data, which will format each datapoint\n # specifically for that task's model\n task: 'instance: lmtask_task_imdb'\n # preprocessing Python source code to add labels and subset the data (db.select)\n pre_process: |-\n ds = ds.map(lambda x: {'output': 'positive' if x['label'] == 1 else 'negative'})\n ds = ds.rename_column('text', 'instruction')\n ds = ds.shuffle(seed=0)\n # 7K takes 55m\n ds = ds.select(range(7_000))\n```\n\n\n## Changelog\n\nAn extensive changelog is available [here](CHANGELOG.md).\n\n\n## Community\n\nPlease star this repository and let me know how and where you use this API.\nContributions as pull requests, feedback and any input is welcome.\n\n\n## License\n\n[MIT License](LICENSE.md)\n\nCopyright (c) 2025 Paul Landes\n\n\n<!-- links -->\n[pypi]: https://pypi.org/project/zensols.lmtask/\n[pypi-link]: https://pypi.python.org/pypi/zensols.lmtask\n[pypi-badge]: https://img.shields.io/pypi/v/zensols.lmtask.svg\n[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg\n[python311-link]: https://www.python.org/downloads/release/python-3110\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A large language model (LLM) API to train and inference specifically for tasks.",
"version": "0.1.0",
"project_urls": {
"Changelog": "https://github.com/plandes/lmtask/blob/master/CHANGELOG.md",
"Documentation": "https://plandes.github.io/lmtask",
"Homepage": "https://github.com/plandes/lmtask",
"Issues": "https://github.com/plandes/lmtask/issues",
"Repository": "https://github.com/plandes/lmtask.git"
},
"split_keywords": [
"tooling"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "ac40e8d498886bed155575bd61912b60c6b60cd48ecb25370672435be5d330d8",
"md5": "9dc0503548a1e64fdbbbbcab0cf78f68",
"sha256": "bd31cd36b58e621ace5bd865841e22882d8bfd27bc9c3f1eca12f41a9082ec6c"
},
"downloads": -1,
"filename": "zensols_lmtask-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9dc0503548a1e64fdbbbbcab0cf78f68",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.14,>=3.11",
"size": 33824,
"upload_time": "2025-08-06T20:54:33",
"upload_time_iso_8601": "2025-08-06T20:54:33.352603Z",
"url": "https://files.pythonhosted.org/packages/ac/40/e8d498886bed155575bd61912b60c6b60cd48ecb25370672435be5d330d8/zensols_lmtask-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-06 20:54:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "plandes",
"github_project": "lmtask",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "zensols-lmtask"
}