zensols-lmtask

Name	zensols-lmtask JSON
Version	0.1.0 JSON
	download
home_page	None
Summary	A large language model (LLM) API to train and inference specifically for tasks.
upload_time	2025-08-06 20:54:33
maintainer	None
docs_url	None
author	None
requires_python	<3.14,>=3.11
license	MIT
keywords	tooling
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Inferencing and Training Large Language Model Tasks

[![PyPI][pypi-badge]][pypi-link]
[![Python 3.11][python311-badge]][python311-link]

A large language model (LLM) API to train and inference specifically for tasks.
The API provides utility classes and configuration to streamline project's
access to LLM responses that (can be) machine readable, such as querying the
LLM to produce JSON output--even if that output is partial given output token
limits.  The package provides an API to train and interface with LLMs as both
pretrained embeddings and instruct models.

Features:

* Create new LLMs with configuration without having to write code.
* [Three examples](resources/tasks.yml) of "code-less" models: sentiment
  analysis, NER tagging and generation.
* Cache LLM responses to save from recomputing potentially costly prompts
  (optional feature).
* [Command-line](#command-line) interface to inference, pre-train and
  post-train LLM models.
* [Advanced API](#python-api) to read responses and accept partial output for
  max token cutoffs.
* Chat template integration when supported.
* Extendable interfaces with LLMs with built in support for Llama 3.
* [Easy to configure datasets](#datasets) processed by model trainers


## Documentation

See the [full documentation](https://plandes.github.io/lmtask/index.html).
The [API reference](https://plandes.github.io/lmtask/api.html) is also
available.


## Obtaining

The library can be installed with pip from the [pypi] repository:
```bash
pip3 install zensols.lmtask
```

A Conda environment can also be created with the
[environment.yml](src/python/environment.yml):
```bash
conda env create -f src/python/environment.yml
```


## Usage

The package can be used from the command line to both inference and train a new
model or as an API.


### Command Line

The command-line can be used for inferencing to list available tasks:

```bash
lmtask task
```

Generate text by inferencing using the Llama base model:

```bash
lmtask stream base_generate 'in a world long long away' \
  --override=lmtask_model_generate_args.temperature=0.9
```

Use named entity recognition (NER):
```bash
lmtask instruct ner 'UIC is in Chicago.' 
model_output_json:
    label: ORG
    span: [0, 3]
    text: UIC
    label: O
    span: [4, 6]
    text: is
    label: O
    span: [7, 9]
    text: in
    label: LOC
    span: [10, 15]
    text: Chicago
```

The command-line program can be used to train new models with just
configuration.  See the [trainconf](trainconf) directory for examples of
configuration file.  Before you train, you might want to get a sample of the
configured dataset:

```bash
lmtask -c trainconf/imdb.yml sample -m 1
________________________________________
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

You are a sentiment classifier.<|eot_id|><|start_header_id|>user<|end_header_id|>

Only output the sentiment.

### Review:We always watch American movies with their particular...
### Sentiment:```positive```<|eot_id|>
```

To train a new sentiment model on the IMDB dataset:

```bash
lmtask -c trainconf/imdb.yml train
```


### Python API

The Python API can be used to access tasks directly.

```python
>>> import json
>>> from zensols.lmtask import ApplicationFactory
>>> from zensols.lmtask import InstructTaskRequest

# create the task factory
>>> fac = ApplicationFactory.get_task_factory()

# list configured tasks
>>> fac.write(short=True)
base_generate (base generate text)
instruct_generate (base generate text)
ner (tags named entities)
sentiment (classifies sentiment)

# create a sentiment analysis task
>>> task = fac.create('sentiment')

# inference
>>> sents = 'I love football.\nI hate olives.\nEarth is big.'
>>> res = task.process(InstructTaskRequest(instruction=sents))

# print the JSON result as formatted text
>>> print(json.dumps(res.model_output_json, indent=4))
[
    {
        "index": 0,
        "sentence": "I love football.",
        "label": "+"
    },
    {
        "index": 1,
        "sentence": "I hate olives.",
        "label": "-"
    },
    {
        "index": 2,
        "sentence": "Earth is big.",
        "label": "n"
    }
]
```


## Datasets

The package features easy to configure datasets and data processing on it.  For
example, the following is taken from the [IMDB training
configuration](trainconf/imdb.yml) example:

```yaml
# a dataset factory instance used by the trainer (lmtask_trainer_hf)
lmtask_imdb_source:
  class_name: zensols.lmtask.dataset.LoadedTaskDatasetFactory
  # the dataset name (downloaded if not already); this can be a `pathlib.Path`,
  # Pandas dataframe or Zensols Stash
  source: stanfordnlp/imdb
  # use only the training split
  load_args:
    split: train
  # the task that consumes the data, which will format each datapoint
  # specifically for that task's model
  task: 'instance: lmtask_task_imdb'
  # preprocessing Python source code to add labels and subset the data (db.select)
  pre_process: |-
    ds = ds.map(lambda x: {'output': 'positive' if x['label'] == 1 else 'negative'})
    ds = ds.rename_column('text', 'instruction')
    ds = ds.shuffle(seed=0)
    # 7K takes 55m
    ds = ds.select(range(7_000))
```


## Changelog

An extensive changelog is available [here](CHANGELOG.md).


## Community

Please star this repository and let me know how and where you use this API.
Contributions as pull requests, feedback and any input is welcome.


## License

[MIT License](LICENSE.md)

Copyright (c) 2025 Paul Landes


<!-- links -->
[pypi]: https://pypi.org/project/zensols.lmtask/
[pypi-link]: https://pypi.python.org/pypi/zensols.lmtask
[pypi-badge]: https://img.shields.io/pypi/v/zensols.lmtask.svg
[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg
[python311-link]: https://www.python.org/downloads/release/python-3110

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "zensols-lmtask",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.14,>=3.11",
    "maintainer_email": null,
    "keywords": "tooling",
    "author": null,
    "author_email": "Paul Landes <landes@mailc.net>",
    "download_url": null,
    "platform": null,
    "description": "# Inferencing and Training Large Language Model Tasks\n\n[![PyPI][pypi-badge]][pypi-link]\n[![Python 3.11][python311-badge]][python311-link]\n\nA large language model (LLM) API to train and inference specifically for tasks.\nThe API provides utility classes and configuration to streamline project's\naccess to LLM responses that (can be) machine readable, such as querying the\nLLM to produce JSON output--even if that output is partial given output token\nlimits.  The package provides an API to train and interface with LLMs as both\npretrained embeddings and instruct models.\n\nFeatures:\n\n* Create new LLMs with configuration without having to write code.\n* [Three examples](resources/tasks.yml) of \"code-less\" models: sentiment\n  analysis, NER tagging and generation.\n* Cache LLM responses to save from recomputing potentially costly prompts\n  (optional feature).\n* [Command-line](#command-line) interface to inference, pre-train and\n  post-train LLM models.\n* [Advanced API](#python-api) to read responses and accept partial output for\n  max token cutoffs.\n* Chat template integration when supported.\n* Extendable interfaces with LLMs with built in support for Llama 3.\n* [Easy to configure datasets](#datasets) processed by model trainers\n\n\n## Documentation\n\nSee the [full documentation](https://plandes.github.io/lmtask/index.html).\nThe [API reference](https://plandes.github.io/lmtask/api.html) is also\navailable.\n\n\n## Obtaining\n\nThe library can be installed with pip from the [pypi] repository:\n```bash\npip3 install zensols.lmtask\n```\n\nA Conda environment can also be created with the\n[environment.yml](src/python/environment.yml):\n```bash\nconda env create -f src/python/environment.yml\n```\n\n\n## Usage\n\nThe package can be used from the command line to both inference and train a new\nmodel or as an API.\n\n\n### Command Line\n\nThe command-line can be used for inferencing to list available tasks:\n\n```bash\nlmtask task\n```\n\nGenerate text by inferencing using the Llama base model:\n\n```bash\nlmtask stream base_generate 'in a world long long away' \\\n  --override=lmtask_model_generate_args.temperature=0.9\n```\n\nUse named entity recognition (NER):\n```bash\nlmtask instruct ner 'UIC is in Chicago.' \nmodel_output_json:\n    label: ORG\n    span: [0, 3]\n    text: UIC\n    label: O\n    span: [4, 6]\n    text: is\n    label: O\n    span: [7, 9]\n    text: in\n    label: LOC\n    span: [10, 15]\n    text: Chicago\n```\n\nThe command-line program can be used to train new models with just\nconfiguration.  See the [trainconf](trainconf) directory for examples of\nconfiguration file.  Before you train, you might want to get a sample of the\nconfigured dataset:\n\n```bash\nlmtask -c trainconf/imdb.yml sample -m 1\n________________________________________\n<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 Jul 2024\n\nYou are a sentiment classifier.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nOnly output the sentiment.\n\n### Review:We always watch American movies with their particular...\n### Sentiment:```positive```<|eot_id|>\n```\n\nTo train a new sentiment model on the IMDB dataset:\n\n```bash\nlmtask -c trainconf/imdb.yml train\n```\n\n\n### Python API\n\nThe Python API can be used to access tasks directly.\n\n```python\n>>> import json\n>>> from zensols.lmtask import ApplicationFactory\n>>> from zensols.lmtask import InstructTaskRequest\n\n# create the task factory\n>>> fac = ApplicationFactory.get_task_factory()\n\n# list configured tasks\n>>> fac.write(short=True)\nbase_generate (base generate text)\ninstruct_generate (base generate text)\nner (tags named entities)\nsentiment (classifies sentiment)\n\n# create a sentiment analysis task\n>>> task = fac.create('sentiment')\n\n# inference\n>>> sents = 'I love football.\\nI hate olives.\\nEarth is big.'\n>>> res = task.process(InstructTaskRequest(instruction=sents))\n\n# print the JSON result as formatted text\n>>> print(json.dumps(res.model_output_json, indent=4))\n[\n    {\n        \"index\": 0,\n        \"sentence\": \"I love football.\",\n        \"label\": \"+\"\n    },\n    {\n        \"index\": 1,\n        \"sentence\": \"I hate olives.\",\n        \"label\": \"-\"\n    },\n    {\n        \"index\": 2,\n        \"sentence\": \"Earth is big.\",\n        \"label\": \"n\"\n    }\n]\n```\n\n\n## Datasets\n\nThe package features easy to configure datasets and data processing on it.  For\nexample, the following is taken from the [IMDB training\nconfiguration](trainconf/imdb.yml) example:\n\n```yaml\n# a dataset factory instance used by the trainer (lmtask_trainer_hf)\nlmtask_imdb_source:\n  class_name: zensols.lmtask.dataset.LoadedTaskDatasetFactory\n  # the dataset name (downloaded if not already); this can be a `pathlib.Path`,\n  # Pandas dataframe or Zensols Stash\n  source: stanfordnlp/imdb\n  # use only the training split\n  load_args:\n    split: train\n  # the task that consumes the data, which will format each datapoint\n  # specifically for that task's model\n  task: 'instance: lmtask_task_imdb'\n  # preprocessing Python source code to add labels and subset the data (db.select)\n  pre_process: |-\n    ds = ds.map(lambda x: {'output': 'positive' if x['label'] == 1 else 'negative'})\n    ds = ds.rename_column('text', 'instruction')\n    ds = ds.shuffle(seed=0)\n    # 7K takes 55m\n    ds = ds.select(range(7_000))\n```\n\n\n## Changelog\n\nAn extensive changelog is available [here](CHANGELOG.md).\n\n\n## Community\n\nPlease star this repository and let me know how and where you use this API.\nContributions as pull requests, feedback and any input is welcome.\n\n\n## License\n\n[MIT License](LICENSE.md)\n\nCopyright (c) 2025 Paul Landes\n\n\n<!-- links -->\n[pypi]: https://pypi.org/project/zensols.lmtask/\n[pypi-link]: https://pypi.python.org/pypi/zensols.lmtask\n[pypi-badge]: https://img.shields.io/pypi/v/zensols.lmtask.svg\n[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg\n[python311-link]: https://www.python.org/downloads/release/python-3110\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A large language model (LLM) API to train and inference specifically for tasks.",
    "version": "0.1.0",
    "project_urls": {
        "Changelog": "https://github.com/plandes/lmtask/blob/master/CHANGELOG.md",
        "Documentation": "https://plandes.github.io/lmtask",
        "Homepage": "https://github.com/plandes/lmtask",
        "Issues": "https://github.com/plandes/lmtask/issues",
        "Repository": "https://github.com/plandes/lmtask.git"
    },
    "split_keywords": [
        "tooling"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ac40e8d498886bed155575bd61912b60c6b60cd48ecb25370672435be5d330d8",
                "md5": "9dc0503548a1e64fdbbbbcab0cf78f68",
                "sha256": "bd31cd36b58e621ace5bd865841e22882d8bfd27bc9c3f1eca12f41a9082ec6c"
            },
            "downloads": -1,
            "filename": "zensols_lmtask-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9dc0503548a1e64fdbbbbcab0cf78f68",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.14,>=3.11",
            "size": 33824,
            "upload_time": "2025-08-06T20:54:33",
            "upload_time_iso_8601": "2025-08-06T20:54:33.352603Z",
            "url": "https://files.pythonhosted.org/packages/ac/40/e8d498886bed155575bd61912b60c6b60cd48ecb25370672435be5d330d8/zensols_lmtask-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-06 20:54:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "plandes",
    "github_project": "lmtask",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "zensols-lmtask"
}

None