llmprogram

Name	llmprogram JSON
Version	0.1.1 JSON
	download
home_page	None
Summary	A Python package for creating and running LLM programs.
upload_time	2025-08-11 15:41:09
maintainer	None
docs_url	None
author	Dipankar Sarkar
requires_python	>=3.9
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

# LLM Program

`llmprogram` is a Python package that provides a structured and powerful way to create and run programs that use Large Language Models (LLMs). It uses a YAML-based configuration to define the behavior of your LLM programs, making them easy to create, manage, and share.

## How is `llmprogram` different?

There are many libraries and frameworks available for working with LLMs. Here’s what makes `llmprogram` different:

* **Focus on Programmatic LLM-Chains:** `llmprogram` is designed to create self-contained, reusable "programs" that can be chained together to build more complex applications. The YAML-based configuration makes it easy to define and version these programs.
* **Data Quality and Validation:** The built-in input and output validation using JSON schemas ensures that your programs are robust and that the data flowing through them is correct. This is crucial for building reliable LLM-powered applications.
* **Dataset Generation as a First-Class Citizen:** `llmprogram` is designed with the entire lifecycle of an LLM application in mind, from development to production and fine-tuning. The automatic logging to a SQLite database makes it incredibly easy to create high-quality datasets for fine-tuning your own models.
* **Simplicity and Intuitiveness:** The YAML configuration is easy to read and write, and the Python API is simple and intuitive. This makes it easy to get started and to build complex applications without a steep learning curve.

## Features

* **YAML-based Configuration:** Define your LLM programs using simple and intuitive YAML files.
* **Input/Output Validation:** Use JSON schemas to validate the inputs and outputs of your programs, ensuring data integrity.
* **Jinja2 Templating:** Use the power of Jinja2 templates to create dynamic prompts for your LLMs.
* **Caching:** Built-in support for Redis caching to save time and reduce costs.
* **Execution Logging:** Automatically log program executions to a SQLite database for analysis and debugging.
* **Streaming:** Support for streaming responses from the LLM.
* **Extensible with Tools:** Extend the functionality of your programs by adding custom tools (functions) that the LLM can call.
* **Batch Processing:** Process multiple inputs in parallel for improved performance.
* **CLI for Dataset Generation:** A command-line interface to generate instruction datasets for LLM fine-tuning from your logged data.

## Getting Started

### Installation

```bash
pip install llmprogram
```

### Usage

1. **Set your OpenAI API Key:**

```bash
export OPENAI_API_KEY='your-api-key'
```

2. **Create a program YAML file:**

Create a file named `sentiment_analysis.yaml`:

```yaml
name: sentiment_analysis
description: Analyzes the sentiment of a given text.
version: 1.0.0

model:
provider: openai
name: gpt-4.1-mini
temperature: 0.5
max_tokens: 100
response_format: json_object

system_prompt: |
You are a sentiment analysis expert. Analyze the sentiment of the given text and return a JSON response with the following format:
- sentiment (string): "positive", "negative", or "neutral"
- score (number): A score from -1 (most negative) to 1 (most positive)

input_schema:
type: object
required:
- text
properties:
text:
type: string
description: The text to analyze.

output_schema:
type: object
required:
- sentiment
- score
properties:
sentiment:
type: string
enum: ["positive", "negative", "neutral"]
score:
type: number
minimum: -1
maximum: 1

template: |
Analyze the following text:
{{text}}
```

3. **Run the program:**

Create a file named `run_sentiment_analysis.py`:

```python
import asyncio
from llmprogram import LLMProgram

async def main():
program = LLMProgram('sentiment_analysis.yaml')
result = await program(text='I love this new product! It is amazing.')
print(result)

if __name__ == '__main__':
asyncio.run(main())
```

Run the script:

```bash
python run_sentiment_analysis.py
```

## Configuration

The behavior of each LLM program is defined in a YAML file. Here are the key sections:

* `name`, `description`, `version`: Basic metadata for your program.
* `model`: Defines the LLM provider, model name, and other parameters like `temperature` and `max_tokens`.
* `system_prompt`: The instructions that are given to the LLM to guide its behavior.
* `input_schema`: A JSON schema that defines the expected input for the program. The program will validate the input against this schema before execution.
* `output_schema`: A JSON schema that defines the expected output from the LLM. The program will validate the LLM's output against this schema.
* `template`: A Jinja2 template that is used to generate the prompt that is sent to the LLM. The template is rendered with the input variables.

## Using with other OpenAI-compatible endpoints

You can use `llmprogram` with any OpenAI-compatible endpoint, such as [Ollama](https://ollama.ai/). To do this, you can pass the `api_key` and `base_url` to the `LLMProgram` constructor:

```python
program = LLMProgram(
'your_program.yaml',
api_key='your-api-key', # optional, defaults to OPENAI_API_KEY env var
base_url='http://localhost:11434/v1' # example for Ollama
)
```

## Caching

`llmprogram` supports caching of LLM responses to Redis to improve performance and reduce costs. To enable caching, you need to have a Redis server running.

By default, caching is enabled. You can disable it or configure the Redis connection and cache TTL (time-to-live) when you create an `LLMProgram` instance:

```python
program = LLMProgram(
'your_program.yaml',
enable_cache=True,
redis_url="redis://localhost:6379",
cache_ttl=3600 # in seconds
)
```

## Logging and Dataset Generation

`llmprogram` automatically logs every execution of a program to a SQLite database. The database file is created in the same directory as the program YAML file, with a `.db` extension.

This logging feature is not just for debugging; it's also a powerful tool for creating high-quality datasets for fine-tuning your own LLMs. Each record in the log contains:

* `function_input`: The input given to the program.
* `function_output`: The output received from the LLM.
* `llm_input`: The prompt sent to the LLM.
* `llm_output`: The raw response from the LLM.

### Generating a Dataset

You can use the built-in CLI to generate an instruction dataset from the logged data. The dataset is created in JSONL format, which is commonly used for fine-tuning.

```bash
llmprogram generate-dataset /path/to/your_program.db /path/to/your_dataset.jsonl
```

Each line in the output file will be a JSON object with the following keys:

* `instruction`: The system prompt and the user prompt, combined to form the instruction for the LLM.
* `output`: The output from the LLM.

## Command-Line Interface (CLI)

`llmprogram` comes with a command-line interface for common tasks.

### `generate-dataset`

Generate an instruction dataset for LLM fine-tuning from a SQLite log file.

**Usage:**

```bash
llmprogram generate-dataset <database_path> <output_path>
```

**Arguments:**

* `database_path`: The path to the SQLite database file.
* `output_path`: The path to write the generated dataset to.

## Examples

You can find more examples in the `examples` directory:

* **Sentiment Analysis:** A simple program to analyze the sentiment of a piece of text. (`examples/sentiment_analysis.yaml`)
* **Code Generator:** A program that generates Python code from a natural language description. (`examples/code_generator.yaml`)
* **Email Generator:** A program that generates a professional email based on a few inputs. (`examples/email_generator.yaml`)

To run the examples, navigate to the `examples` directory and run the corresponding `run_*.py` script.

## Development

To run the tests for this package, you will need to install `pytest`:

```bash
pip install pytest
```

Then, you can run the tests from the root directory of the project:

```bash
pytest
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llmprogram",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Dipankar Sarkar",
    "author_email": "me@dipankar.name",
    "download_url": "https://files.pythonhosted.org/packages/89/e4/8983ac46eac5614b797b0f1fe4ba1c72495762b4ab658786a7a9b96f27d1/llmprogram-0.1.1.tar.gz",
    "platform": null,
    "description": "# LLM Program\n\n`llmprogram` is a Python package that provides a structured and powerful way to create and run programs that use Large Language Models (LLMs). It uses a YAML-based configuration to define the behavior of your LLM programs, making them easy to create, manage, and share.\n\n## How is `llmprogram` different?\n\nThere are many libraries and frameworks available for working with LLMs. Here\u2019s what makes `llmprogram` different:\n\n*   **Focus on Programmatic LLM-Chains:** `llmprogram` is designed to create self-contained, reusable \"programs\" that can be chained together to build more complex applications. The YAML-based configuration makes it easy to define and version these programs.\n*   **Data Quality and Validation:** The built-in input and output validation using JSON schemas ensures that your programs are robust and that the data flowing through them is correct. This is crucial for building reliable LLM-powered applications.\n*   **Dataset Generation as a First-Class Citizen:** `llmprogram` is designed with the entire lifecycle of an LLM application in mind, from development to production and fine-tuning. The automatic logging to a SQLite database makes it incredibly easy to create high-quality datasets for fine-tuning your own models.\n*   **Simplicity and Intuitiveness:** The YAML configuration is easy to read and write, and the Python API is simple and intuitive. This makes it easy to get started and to build complex applications without a steep learning curve.\n\n## Features\n\n*   **YAML-based Configuration:** Define your LLM programs using simple and intuitive YAML files.\n*   **Input/Output Validation:** Use JSON schemas to validate the inputs and outputs of your programs, ensuring data integrity.\n*   **Jinja2 Templating:** Use the power of Jinja2 templates to create dynamic prompts for your LLMs.\n*   **Caching:** Built-in support for Redis caching to save time and reduce costs.\n*   **Execution Logging:** Automatically log program executions to a SQLite database for analysis and debugging.\n*   **Streaming:** Support for streaming responses from the LLM.\n*   **Extensible with Tools:** Extend the functionality of your programs by adding custom tools (functions) that the LLM can call.\n*   **Batch Processing:** Process multiple inputs in parallel for improved performance.\n*   **CLI for Dataset Generation:** A command-line interface to generate instruction datasets for LLM fine-tuning from your logged data.\n\n## Getting Started\n\n### Installation\n\n```bash\npip install llmprogram\n```\n\n### Usage\n\n1.  **Set your OpenAI API Key:**\n\n    ```bash\n    export OPENAI_API_KEY='your-api-key'\n    ```\n\n2.  **Create a program YAML file:**\n\n    Create a file named `sentiment_analysis.yaml`:\n\n    ```yaml\n    name: sentiment_analysis\n    description: Analyzes the sentiment of a given text.\n    version: 1.0.0\n\n    model:\n      provider: openai\n      name: gpt-4.1-mini\n      temperature: 0.5\n      max_tokens: 100\n      response_format: json_object\n\n    system_prompt: |\n      You are a sentiment analysis expert. Analyze the sentiment of the given text and return a JSON response with the following format:\n      - sentiment (string): \"positive\", \"negative\", or \"neutral\"\n      - score (number): A score from -1 (most negative) to 1 (most positive)\n\n    input_schema:\n      type: object\n      required:\n        - text\n      properties:\n        text:\n          type: string\n          description: The text to analyze.\n\n    output_schema:\n      type: object\n      required:\n        - sentiment\n        - score\n      properties:\n        sentiment:\n          type: string\n          enum: [\"positive\", \"negative\", \"neutral\"]\n        score:\n          type: number\n          minimum: -1\n          maximum: 1\n\n    template: |\n      Analyze the following text:\n      {{text}}\n    ```\n\n3.  **Run the program:**\n\n    Create a file named `run_sentiment_analysis.py`:\n\n    ```python\n    import asyncio\n    from llmprogram import LLMProgram\n\n    async def main():\n        program = LLMProgram('sentiment_analysis.yaml')\n        result = await program(text='I love this new product! It is amazing.')\n        print(result)\n\n    if __name__ == '__main__':\n        asyncio.run(main())\n    ```\n\n    Run the script:\n\n    ```bash\n    python run_sentiment_analysis.py\n    ```\n\n## Configuration\n\nThe behavior of each LLM program is defined in a YAML file. Here are the key sections:\n\n*   `name`, `description`, `version`: Basic metadata for your program.\n*   `model`: Defines the LLM provider, model name, and other parameters like `temperature` and `max_tokens`.\n*   `system_prompt`: The instructions that are given to the LLM to guide its behavior.\n*   `input_schema`: A JSON schema that defines the expected input for the program. The program will validate the input against this schema before execution.\n*   `output_schema`: A JSON schema that defines the expected output from the LLM. The program will validate the LLM's output against this schema.\n*   `template`: A Jinja2 template that is used to generate the prompt that is sent to the LLM. The template is rendered with the input variables.\n\n## Using with other OpenAI-compatible endpoints\n\nYou can use `llmprogram` with any OpenAI-compatible endpoint, such as [Ollama](https://ollama.ai/). To do this, you can pass the `api_key` and `base_url` to the `LLMProgram` constructor:\n\n```python\nprogram = LLMProgram(\n    'your_program.yaml',\n    api_key='your-api-key',  # optional, defaults to OPENAI_API_KEY env var\n    base_url='http://localhost:11434/v1'  # example for Ollama\n)\n```\n\n## Caching\n\n`llmprogram` supports caching of LLM responses to Redis to improve performance and reduce costs. To enable caching, you need to have a Redis server running.\n\nBy default, caching is enabled. You can disable it or configure the Redis connection and cache TTL (time-to-live) when you create an `LLMProgram` instance:\n\n```python\nprogram = LLMProgram(\n    'your_program.yaml',\n    enable_cache=True,\n    redis_url=\"redis://localhost:6379\",\n    cache_ttl=3600  # in seconds\n)\n```\n\n## Logging and Dataset Generation\n\n`llmprogram` automatically logs every execution of a program to a SQLite database. The database file is created in the same directory as the program YAML file, with a `.db` extension.\n\nThis logging feature is not just for debugging; it's also a powerful tool for creating high-quality datasets for fine-tuning your own LLMs. Each record in the log contains:\n\n*   `function_input`: The input given to the program.\n*   `function_output`: The output received from the LLM.\n*   `llm_input`: The prompt sent to the LLM.\n*   `llm_output`: The raw response from the LLM.\n\n### Generating a Dataset\n\nYou can use the built-in CLI to generate an instruction dataset from the logged data. The dataset is created in JSONL format, which is commonly used for fine-tuning.\n\n```bash\nllmprogram generate-dataset /path/to/your_program.db /path/to/your_dataset.jsonl\n```\n\nEach line in the output file will be a JSON object with the following keys:\n\n*   `instruction`: The system prompt and the user prompt, combined to form the instruction for the LLM.\n*   `output`: The output from the LLM.\n\n## Command-Line Interface (CLI)\n\n`llmprogram` comes with a command-line interface for common tasks.\n\n### `generate-dataset`\n\nGenerate an instruction dataset for LLM fine-tuning from a SQLite log file.\n\n**Usage:**\n\n```bash\nllmprogram generate-dataset <database_path> <output_path>\n```\n\n**Arguments:**\n\n*   `database_path`: The path to the SQLite database file.\n*   `output_path`: The path to write the generated dataset to.\n\n## Examples\n\nYou can find more examples in the `examples` directory:\n\n*   **Sentiment Analysis:** A simple program to analyze the sentiment of a piece of text. (`examples/sentiment_analysis.yaml`)\n*   **Code Generator:** A program that generates Python code from a natural language description. (`examples/code_generator.yaml`)\n*   **Email Generator:** A program that generates a professional email based on a few inputs. (`examples/email_generator.yaml`)\n\nTo run the examples, navigate to the `examples` directory and run the corresponding `run_*.py` script.\n\n## Development\n\nTo run the tests for this package, you will need to install `pytest`:\n\n```bash\npip install pytest\n```\n\nThen, you can run the tests from the root directory of the project:\n\n```bash\npytest\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python package for creating and running LLM programs.",
    "version": "0.1.1",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "dc2e3ed26157b2bf0306c3812014c6cc1f1f3d99146b5226dbd5bcea51521c00",
                "md5": "1564cea7fe0c0ed149f622bdd9fc8be4",
                "sha256": "bbe1125d889597d40b89c544c3ee88f35fa67bde006548dcb148c46535fb0466"
            },
            "downloads": -1,
            "filename": "llmprogram-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1564cea7fe0c0ed149f622bdd9fc8be4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 12525,
            "upload_time": "2025-08-11T15:41:08",
            "upload_time_iso_8601": "2025-08-11T15:41:08.472004Z",
            "url": "https://files.pythonhosted.org/packages/dc/2e/3ed26157b2bf0306c3812014c6cc1f1f3d99146b5226dbd5bcea51521c00/llmprogram-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "89e48983ac46eac5614b797b0f1fe4ba1c72495762b4ab658786a7a9b96f27d1",
                "md5": "9c6ef1ef202e1d4d492c98b67a82728a",
                "sha256": "9a920adb952faae2778e7b8d0e4cfe59156c543ea9ee9e502dd2312957e80091"
            },
            "downloads": -1,
            "filename": "llmprogram-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9c6ef1ef202e1d4d492c98b67a82728a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 13735,
            "upload_time": "2025-08-11T15:41:09",
            "upload_time_iso_8601": "2025-08-11T15:41:09.669307Z",
            "url": "https://files.pythonhosted.org/packages/89/e4/8983ac46eac5614b797b0f1fe4ba1c72495762b4ab658786a7a9b96f27d1/llmprogram-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-11 15:41:09",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llmprogram"
}

Dipankar Sarkar