llms-py

Name	llms-py JSON
Version	2.0.10 JSON
	download
home_page	https://github.com/ServiceStack/llms
Summary	A lightweight CLI tool and OpenAI-compatible server for querying multiple Large Language Model (LLM) providers
upload_time	2025-10-07 04:17:01
maintainer	None
docs_url	None
author	ServiceStack
requires_python	>=3.7
license	None
keywords	llm ai openai anthropic google gemini groq mistral ollama cli server chat completion
VCS
bugtrack_url
requirements	aiohttp
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # llms.py

Lightweight CLI and OpenAI-compatible server for querying multiple Large Language Model (LLM) providers.

Configure additional providers and models in [llms.json](llms.json)
 - Mix and match local models with models from different API providers
 - Requests automatically routed to available providers that supports the requested model (in defined order)
 - Define free/cheapest/local providers first to save on costs
 - Any failures are automatically retried on the next available provider

## Features

- **Lightweight**: Single [llms.py](llms.py) Python file with single `aiohttp` dependency
- **Multi-Provider Support**: OpenRouter, Ollama, Anthropic, Google, OpenAI, Grok, Groq, Qwen, Z.ai, Mistral
- **OpenAI-Compatible API**: Works with any client that supports OpenAI's chat completion API
- **Configuration Management**: Easy provider enable/disable and configuration management
- **CLI Interface**: Simple command-line interface for quick interactions
- **Server Mode**: Run an OpenAI-compatible HTTP server at `http://localhost:{PORT}/v1/chat/completions`
- **Image Support**: Process images through vision-capable models
- **Audio Support**: Process audio through audio-capable models
- **Custom Chat Templates**: Configurable chat completion request templates for different modalities
- **Auto-Discovery**: Automatically discover available Ollama models
- **Unified Models**: Define custom model names that map to different provider-specific names
- **Multi-Model Support**: Support for over 160+ different LLMs

## llms.py UI

Simple ChatGPT-like UI to access ALL Your LLMs, Locally or Remotely!

[![llms.py UI](https://servicestack.net/img/posts/llms-py-ui/bg.webp)](https://servicestack.net/posts/llms-py-ui)

Read the [Introductory Blog Post](https://servicestack.net/posts/llms-py-ui).

## Installation

### Option 1: Install from PyPI

```bash
pip install llms-py
```

### Option 2: Download directly

1. Download [llms.py](llms.py)

```bash
curl -O https://raw.githubusercontent.com/ServiceStack/llms/main/llms.py
chmod +x llms.py
mv llms.py ~/.local/bin/llms
```

2. Install single dependency:

```bash
pip install aiohttp
```

## Quick Start

### 1. Initialize Configuration

Create a default configuration file:

```bash
llms --init
```

This saves the latest [llms.json](llms.json) configuration to `~/.llms/llms.json`.

Modify `~/.llms/llms.json` to enable providers, add required API keys, additional models or any custom
OpenAI-compatible providers.

### 2. Set API Keys

Set environment variables for the providers you want to use:

```bash
export OPENROUTER_API_KEY="..."
export GROQ_API_KEY="..."
export GOOGLE_API_KEY="..."
export ANTHROPIC_API_KEY="..."
export GROK_API_KEY="..."
export DASHSCOPE_API_KEY="..."
# ... etc
```

### 3. Enable Providers

Enable the providers you want to use:

```bash
# Enable providers with free models and free tiers
llms --enable openrouter_free google_free groq

# Enable paid providers
llms --enable openrouter anthropic google openai mistral grok qwen
```

### 4. Start Chatting

```bash
llms "What is the capital of France?"
```

## Configuration

The configuration file (`llms.json`) defines available providers, models, and default settings. Key sections:

### Defaults
- `headers`: Common HTTP headers for all requests
- `text`: Default chat completion request template for text prompts

### Providers

Each provider configuration includes:
- `enabled`: Whether the provider is active
- `type`: Provider class (OpenAiProvider, GoogleProvider, etc.)
- `api_key`: API key (supports environment variables with `$VAR_NAME`)
- `base_url`: API endpoint URL
- `models`: Model name mappings (local name → provider name)

## Command Line Usage

### Basic Chat

```bash
# Simple question
llms "Explain quantum computing"

# With specific model
llms -m gemini-2.5-pro "Write a Python function to sort a list"
llms -m grok-4 "Explain this code with humor"
llms -m qwen3-max "Translate this to Chinese"

# With system prompt
llms -s "You are a helpful coding assistant" "How do I reverse a string in Python?"

# With image (vision models)
llms --image image.jpg "What's in this image?"
llms --image https://example.com/photo.png "Describe this photo"

# Display full JSON Response
llms "Explain quantum computing" --raw
```

### Using a Chat Template

By default llms uses the `defaults/text` chat completion request defined in [llms.json](llms.json).

You can instead use a custom chat completion request with `--chat`, e.g:

```bash
# Load chat completion request from JSON file
llms --chat request.json

# Override user message
llms --chat request.json "New user message"

# Override model
llms -m kimi-k2 --chat request.json
```

Example `request.json`:

```json
{
  "model": "kimi-k2",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user",   "content": ""}
  ],
  "temperature": 0.7,
  "max_tokens": 150
}
```

### Image Requests

Send images to vision-capable models using the `--image` option:

```bash
# Use defaults/image Chat Template (Describe the key features of the input image)
llms --image ./screenshot.png

# Local image file
llms --image ./screenshot.png "What's in this image?"

# Remote image URL
llms --image https://example.org/photo.jpg "Describe this photo"

# Data URI
llms --image "data:image/png;base64,$(base64 -w 0 image.png)" "Describe this image"

# With a specific vision model
llms -m gemini-2.5-flash --image chart.png "Analyze this chart"
llms -m qwen2.5vl --image document.jpg "Extract text from this document"

# Combined with system prompt
llms -s "You are a data analyst" --image graph.png "What trends do you see?"

# With custom chat template
llms --chat image-request.json --image photo.jpg
```

Example of `image-request.json`:

```json
{
    "model": "qwen2.5vl",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": ""
                    }
                },
                {
                    "type": "text",
                    "text": "Caption this image"
                }
            ]
        }
    ]
}
```

**Supported image formats**: PNG, WEBP, JPG, JPEG, GIF, BMP, TIFF, ICO

**Image sources**:
- **Local files**: Absolute paths (`/path/to/image.jpg`) or relative paths (`./image.png`, `../image.jpg`)
- **Remote URLs**: HTTP/HTTPS URLs are automatically downloaded
- **Data URIs**: Base64-encoded images (`data:image/png;base64,...`)

Images are automatically processed and converted to base64 data URIs before being sent to the model.

### Vision-Capable Models

Popular models that support image analysis:
- **OpenAI**: GPT-4o, GPT-4o-mini, GPT-4.1
- **Anthropic**: Claude Sonnet 4.0, Claude Opus 4.1
- **Google**: Gemini 2.5 Pro, Gemini Flash
- **Qwen**: Qwen2.5-VL, Qwen3-VL, QVQ-max
- **Ollama**: qwen2.5vl, llava

Images are automatically downloaded and converted to base64 data URIs.

### Audio Requests

Send audio files to audio-capable models using the `--audio` option:

```bash
# Use defaults/audio Chat Template (Transcribe the audio)
llms --audio ./recording.mp3

# Local audio file
llms --audio ./meeting.wav "Summarize this meeting recording"

# Remote audio URL
llms --audio https://example.org/podcast.mp3 "What are the key points discussed?"

# With a specific audio model
llms -m gpt-4o-audio-preview --audio interview.mp3 "Extract the main topics"
llms -m gemini-2.5-flash --audio interview.mp3 "Extract the main topics"

# Combined with system prompt
llms -s "You're a transcription specialist" --audio talk.mp3 "Provide a detailed transcript"

# With custom chat template
llms --chat audio-request.json --audio speech.wav
```

Example of `audio-request.json`:

```json
{
    "model": "gpt-4o-audio-preview",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "",
                        "format": "mp3"
                    }
                },
                {
                    "type": "text",
                    "text": "Please transcribe this audio"
                }
            ]
        }
    ]
}
```

**Supported audio formats**: MP3, WAV

**Audio sources**:
- **Local files**: Absolute paths (`/path/to/audio.mp3`) or relative paths (`./audio.wav`, `../recording.m4a`)
- **Remote URLs**: HTTP/HTTPS URLs are automatically downloaded
- **Base64 Data**: Base64-encoded audio

Audio files are automatically processed and converted to base64 data before being sent to the model.

### Audio-Capable Models

Popular models that support audio processing:
- **OpenAI**: gpt-4o-audio-preview
- **Google**: gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite

Audio files are automatically downloaded and converted to base64 data URIs with appropriate format detection.

### File Requests

Send documents (e.g. PDFs) to file-capable models using the `--file` option:

```bash
# Use defaults/file Chat Template (Summarize the document)
llms --file ./docs/handbook.pdf

# Local PDF file
llms --file ./docs/policy.pdf "Summarize the key changes"

# Remote PDF URL
llms --file https://example.org/whitepaper.pdf "What are the main findings?"

# With specific file-capable models
llms -m gpt-5               --file ./policy.pdf   "Summarize the key changes"
llms -m gemini-flash-latest --file ./report.pdf   "Extract action items"
llms -m qwen2.5vl           --file ./manual.pdf   "List key sections and their purpose"

# Combined with system prompt
llms -s "You're a compliance analyst" --file ./policy.pdf "Identify compliance risks"

# With custom chat template
llms --chat file-request.json --file ./docs/handbook.pdf
```

Example of `file-request.json`:

```json
{
  "model": "gpt-5",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "file",
          "file": {
            "filename": "",
            "file_data": ""
          }
        },
        {
          "type": "text",
          "text": "Please summarize this document"
        }
      ]
    }
  ]
}
```

**Supported file formats**: PDF

Other document types may work depending on the model/provider.

**File sources**:
- **Local files**: Absolute paths (`/path/to/file.pdf`) or relative paths (`./file.pdf`, `../file.pdf`)
- **Remote URLs**: HTTP/HTTPS URLs are automatically downloaded
- **Base64/Data URIs**: Inline `data:application/pdf;base64,...` is supported

Files are automatically downloaded (for URLs) and converted to base64 data URIs before being sent to the model.

### File-Capable Models

Popular multi-modal models that support file (PDF) inputs:
- OpenAI: gpt-5, gpt-5-mini, gpt-4o, gpt-4o-mini
- Google: gemini-flash-latest, gemini-2.5-flash-lite
- Grok: grok-4-fast (OpenRouter)
- Qwen: qwen2.5vl, qwen3-max, qwen3-vl:235b, qwen3-coder, qwen3-coder-flash (OpenRouter)
- Others: kimi-k2, glm-4.5-air, deepseek-v3.1:671b, llama4:400b, llama3.3:70b, mai-ds-r1, nemotron-nano:9b

## Server Mode

Run as an OpenAI-compatible HTTP server:

```bash
# Start server on port 8000
llms --serve 8000
```

The server exposes a single endpoint:
- `POST /v1/chat/completions` - OpenAI-compatible chat completions

Example client usage:

```bash
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'
```

### Configuration Management

```bash
# List enabled providers and models
llms --list
llms ls

# List specific providers
llms ls ollama
llms ls google anthropic

# Enable providers
llms --enable openrouter
llms --enable anthropic google_free groq

# Disable providers
llms --disable ollama
llms --disable openai anthropic

# Set default model
llms --default grok-4
```

### Update

1. Installed from PyPI

```bash
pip install llms-py --upgrade
```

2. Using Direct Download

```bash
# Update to latest version (Downloads latest llms.py)
llms --update
```

### Advanced Options

```bash
# Use custom config file
llms --config /path/to/config.json "Hello"

# Get raw JSON response
llms --raw "What is 2+2?"

# Enable verbose logging
llms --verbose "Tell me a joke"

# Custom log prefix
llms --verbose --logprefix "[DEBUG] " "Hello world"

# Set default model (updates config file)
llms --default grok-4

# Update llms.py to latest version
llms --update

# Pass custom parameters to chat request (URL-encoded)
llms --args "temperature=0.7&seed=111" "What is 2+2?"

# Multiple parameters with different types
llms --args "temperature=0.5&max_completion_tokens=50" "Tell me a joke"

# URL-encoded special characters (stop sequences)
llms --args "stop=Two,Words" "Count to 5"

# Combine with other options
llms --system "You are helpful" --args "temperature=0.3" --raw "Hello"
```

#### Custom Parameters with `--args`

The `--args` option allows you to pass URL-encoded parameters to customize the chat request sent to LLM providers:

**Parameter Types:**
- **Floats**: `temperature=0.7`, `frequency_penalty=0.2`
- **Integers**: `max_completion_tokens=100`
- **Booleans**: `store=true`, `verbose=false`, `logprobs=true`
- **Strings**: `stop=one`
- **Lists**: `stop=two,words`

**Common Parameters:**
- `temperature`: Controls randomness (0.0 to 2.0)
- `max_completion_tokens`: Maximum tokens in response
- `seed`: For reproducible outputs
- `top_p`: Nucleus sampling parameter
- `stop`: Stop sequences (URL-encode special chars)
- `store`: Whether or not to store the output
- `frequency_penalty`: Penalize new tokens based on frequency
- `presence_penalty`: Penalize new tokens based on presence
- `logprobs`: Include log probabilities in response
- `parallel_tool_calls`: Enable parallel tool calls
- `prompt_cache_key`: Cache key for prompt
- `reasoning_effort`: Reasoning effort (low, medium, high, *minimal, *none, *default)
- `safety_identifier`: A string that uniquely identifies each user
- `seed`: For reproducible outputs
- `service_tier`: Service tier (free, standard, premium, *default)
- `top_logprobs`: Number of top logprobs to return
- `top_p`: Nucleus sampling parameter
- `verbosity`: Verbosity level (0, 1, 2, 3, *default)
- `enable_thinking`: Enable thinking mode (Qwen)
- `stream`: Enable streaming responses

### Default Model Configuration

The `--default MODEL` option allows you to set the default model used for all chat completions. This updates the `defaults.text.model` field in your configuration file:

```bash
# Set default model to gpt-oss
llms --default gpt-oss:120b

# Set default model to Claude Sonnet
llms --default claude-sonnet-4-0

# The model must be available in your enabled providers
llms --default gemini-2.5-pro
```

When you set a default model:
- The configuration file (`~/.llms/llms.json`) is automatically updated
- The specified model becomes the default for all future chat requests
- The model must exist in your currently enabled providers
- You can still override the default using `-m MODEL` for individual requests

### Updating llms.py

The `--update` option downloads and installs the latest version of `llms.py` from the GitHub repository:

```bash
# Update to latest version
llms --update
```

This command:
- Downloads the latest `llms.py` from `https://raw.githubusercontent.com/ServiceStack/llms/refs/heads/main/llms.py`
- Overwrites your current `llms.py` file with the latest version
- Preserves your existing configuration file (`llms.json`)
- Requires an internet connection to download the update

### Beautiful rendered Markdown

Pipe Markdown output to [glow](https://github.com/charmbracelet/glow) to beautifully render it in the terminal:

```bash
llms "Explain quantum computing" | glow
```

## Supported Providers

Any OpenAI-compatible providers and their models can be added by configuring them in [llms.json](./llms.json). By default only AI Providers with free tiers are enabled which will only be "available" if their API Key is set. 

You can list the available providers, their models and which are enabled or disabled with:

```bash
llms ls
```

They can be enabled/disabled in your `llms.json` file or with:

```bash
llms --enable <provider>
llms --disable <provider>
```

For a provider to be available, they also require their API Key configured in either your Environment Variables
or directly in your `llms.json`.

### Environment Variables

| Provider        | Variable                  | Description         | Example |
|-----------------|---------------------------|---------------------|---------|
| openrouter_free | `OPENROUTER_FREE_API_KEY` | OpenRouter FREE models API key | `sk-or-...` |
| groq            | `GROQ_API_KEY`            | Groq API key        | `gsk_...` |
| google_free     | `GOOGLE_FREE_API_KEY`     | Google FREE API key | `AIza...` |
| codestral       | `CODESTRAL_API_KEY`       | Codestral API key   | `...` |
| ollama          | N/A                       | No API key required | |
| openrouter      | `OPENROUTER_API_KEY`      | OpenRouter API key  | `sk-or-...` |
| google          | `GOOGLE_API_KEY`          | Google API key      | `AIza...` |
| anthropic       | `ANTHROPIC_API_KEY`       | Anthropic API key   | `sk-ant-...` |
| openai          | `OPENAI_API_KEY`          | OpenAI API key      | `sk-...` |
| grok            | `GROK_API_KEY`            | Grok (X.AI) API key | `xai-...` |
| qwen            | `DASHSCOPE_API_KEY`       | Qwen (Alibaba) API key | `sk-...` |
| z.ai            | `ZAI_API_KEY`             | Z.ai API key        | `sk-...` |
| mistral         | `MISTRAL_API_KEY`         | Mistral API key     | `...` |

### OpenAI
- **Type**: `OpenAiProvider`
- **Models**: GPT-5, GPT-5 Codex, GPT-4o, GPT-4o-mini, o3, etc.
- **Features**: Text, images, function calling

```bash
export OPENAI_API_KEY="your-key"
llms --enable openai
```

### Anthropic (Claude)
- **Type**: `OpenAiProvider`
- **Models**: Claude Opus 4.1, Sonnet 4.0, Haiku 3.5, etc.
- **Features**: Text, images, large context windows

```bash
export ANTHROPIC_API_KEY="your-key"
llms --enable anthropic
```

### Google Gemini
- **Type**: `GoogleProvider`
- **Models**: Gemini 2.5 Pro, Flash, Flash-Lite
- **Features**: Text, images, safety settings

```bash
export GOOGLE_API_KEY="your-key"
llms --enable google_free
```

### OpenRouter
- **Type**: `OpenAiProvider`
- **Models**: 100+ models from various providers
- **Features**: Access to latest models, free tier available

```bash
export OPENROUTER_API_KEY="your-key"
llms --enable openrouter
```

### Grok (X.AI)
- **Type**: `OpenAiProvider`
- **Models**: Grok-4, Grok-3, Grok-3-mini, Grok-code-fast-1, etc.
- **Features**: Real-time information, humor, uncensored responses

```bash
export GROK_API_KEY="your-key"
llms --enable grok
```

### Groq
- **Type**: `OpenAiProvider`
- **Models**: Llama 3.3, Gemma 2, Kimi K2, etc.
- **Features**: Fast inference, competitive pricing

```bash
export GROQ_API_KEY="your-key"
llms --enable groq
```

### Ollama (Local)
- **Type**: `OllamaProvider`
- **Models**: Auto-discovered from local Ollama installation
- **Features**: Local inference, privacy, no API costs

```bash
# Ollama must be running locally
llms --enable ollama
```

### Qwen (Alibaba Cloud)
- **Type**: `OpenAiProvider`
- **Models**: Qwen3-max, Qwen-max, Qwen-plus, Qwen2.5-VL, QwQ-plus, etc.
- **Features**: Multilingual, vision models, coding, reasoning, audio processing

```bash
export DASHSCOPE_API_KEY="your-key"
llms --enable qwen
```

### Z.ai
- **Type**: `OpenAiProvider`
- **Models**: GLM-4.6, GLM-4.5, GLM-4.5-air, GLM-4.5-x, GLM-4.5-airx, GLM-4.5-flash, GLM-4:32b
- **Features**: Advanced language models with strong reasoning capabilities

```bash
export ZAI_API_KEY="your-key"
llms --enable z.ai
```

### Mistral
- **Type**: `OpenAiProvider`
- **Models**: Mistral Large, Codestral, Pixtral, etc.
- **Features**: Code generation, multilingual

```bash
export MISTRAL_API_KEY="your-key"
llms --enable mistral
```

### Codestral
- **Type**: `OpenAiProvider`
- **Models**: Codestral
- **Features**: Code generation

```bash
export CODESTRAL_API_KEY="your-key"
llms --enable codestral
```

## Model Routing

The tool automatically routes requests to the first available provider that supports the requested model. If a provider fails, it tries the next available provider with that model.

Example: If both OpenAI and OpenRouter support `kimi-k2`, the request will first try OpenRouter (free), then fall back to Groq than OpenRouter (Paid) if requests fails.

## Configuration Examples

### Minimal Configuration

```json
{
  "defaults": {
    "headers": {"Content-Type": "application/json"},
    "text": {
      "model": "kimi-k2",
      "messages": [{"role": "user", "content": ""}]
    }
  },
  "providers": {
    "groq": {
      "enabled": true,
      "type": "OpenAiProvider",
      "base_url": "https://api.groq.com/openai",
      "api_key": "$GROQ_API_KEY",
      "models": {
        "llama3.3:70b": "llama-3.3-70b-versatile",
        "llama4:109b": "meta-llama/llama-4-scout-17b-16e-instruct",
        "llama4:400b": "meta-llama/llama-4-maverick-17b-128e-instruct",
        "kimi-k2": "moonshotai/kimi-k2-instruct-0905",
        "gpt-oss:120b": "openai/gpt-oss-120b",
        "gpt-oss:20b": "openai/gpt-oss-20b",
        "qwen3:32b": "qwen/qwen3-32b"
      }
    }
  }
}
```

### Multi-Provider Setup

```json
{
  "providers": {
    "openrouter": {
      "enabled": false,
      "type": "OpenAiProvider",
      "base_url": "https://openrouter.ai/api",
      "api_key": "$OPENROUTER_API_KEY",
      "models": {
        "grok-4": "x-ai/grok-4",
        "glm-4.5-air": "z-ai/glm-4.5-air",
        "kimi-k2": "moonshotai/kimi-k2",
        "deepseek-v3.1:671b": "deepseek/deepseek-chat",
        "llama4:400b": "meta-llama/llama-4-maverick"
      }
    },
    "anthropic": {
      "enabled": false,
      "type": "OpenAiProvider",
      "base_url": "https://api.anthropic.com",
      "api_key": "$ANTHROPIC_API_KEY",
      "models": {
        "claude-sonnet-4-0": "claude-sonnet-4-0"
      }
    },
    "ollama": {
      "enabled": false,
      "type": "OllamaProvider",
      "base_url": "http://localhost:11434",
      "models": {},
      "all_models": true
    }
  }
}
```

## Usage

    Run `llms` without arguments to see the help screen:

    usage: llms.py [-h] [--config FILE] [-m MODEL] [--chat REQUEST] [-s PROMPT] [--image IMAGE] [--audio AUDIO]
                  [--file FILE] [--raw] [--list] [--serve PORT] [--enable PROVIDER] [--disable PROVIDER]
                  [--default MODEL] [--init] [--logprefix PREFIX] [--verbose] [--update]

    llms

    options:
      -h, --help            show this help message and exit
      --config FILE         Path to config file
      -m MODEL, --model MODEL
                            Model to use
      --chat REQUEST        OpenAI Chat Completion Request to send
      -s PROMPT, --system PROMPT
                            System prompt to use for chat completion
      --image IMAGE         Image input to use in chat completion
      --audio AUDIO         Audio input to use in chat completion
      --file FILE           File input to use in chat completion
      --raw                 Return raw AI JSON response
      --list                Show list of enabled providers and their models (alias ls provider?)
      --serve PORT          Port to start an OpenAI Chat compatible server on
      --enable PROVIDER     Enable a provider
      --disable PROVIDER    Disable a provider
      --default MODEL       Configure the default model to use
      --init                Create a default llms.json
      --logprefix PREFIX    Prefix used in log messages
      --verbose             Verbose output
      --update              Update to latest version

## Troubleshooting

### Common Issues

**Config file not found**
```bash
# Initialize default config
llms --init

# Or specify custom path
llms --config ./my-config.json
```

**No providers enabled**

```bash
# Check status
llms --list

# Enable providers
llms --enable google anthropic
```

**API key issues**
```bash
# Check environment variables
echo $ANTHROPIC_API_KEY

# Enable verbose logging
llms --verbose "test"
```

**Model not found**

```bash
# List available models
llms --list

# Check provider configuration
llms ls openrouter
```

### Debug Mode

Enable verbose logging to see detailed request/response information:

```bash
llms --verbose --logprefix "[DEBUG] " "Hello"
```

This shows:
- Enabled providers
- Model routing decisions
- HTTP request details
- Error messages with stack traces

## Development

### Project Structure

- `llms.py` - Main script with CLI and server functionality
- `llms.json` - Default configuration file
- `requirements.txt` - Python dependencies

### Provider Classes

- `OpenAiProvider` - Generic OpenAI-compatible provider
- `OllamaProvider` - Ollama-specific provider with model auto-discovery
- `GoogleProvider` - Google Gemini with native API format
- `GoogleOpenAiProvider` - Google Gemini via OpenAI-compatible endpoint

### Adding New Providers

1. Create a provider class inheriting from `OpenAiProvider`
2. Implement provider-specific authentication and formatting
3. Add provider configuration to `llms.json`
4. Update initialization logic in `init_llms()`

## Contributing

Contributions are welcome! Please submit a PR to add support for any missing OpenAI-compatible providers.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ServiceStack/llms",
    "name": "llms-py",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "ServiceStack <team@servicestack.net>",
    "keywords": "llm, ai, openai, anthropic, google, gemini, groq, mistral, ollama, cli, server, chat, completion",
    "author": "ServiceStack",
    "author_email": "ServiceStack <team@servicestack.net>",
    "download_url": "https://files.pythonhosted.org/packages/96/cb/433477d8999c90a637d4f242bd5b674494afd4c998cfd438b92f5609694d/llms_py-2.0.10.tar.gz",
    "platform": null,
    "description": "# llms.py\n\nLightweight CLI and OpenAI-compatible server for querying multiple Large Language Model (LLM) providers.\n\nConfigure additional providers and models in [llms.json](llms.json)\n - Mix and match local models with models from different API providers\n - Requests automatically routed to available providers that supports the requested model (in defined order)\n - Define free/cheapest/local providers first to save on costs\n - Any failures are automatically retried on the next available provider\n\n## Features\n\n- **Lightweight**: Single [llms.py](llms.py) Python file with single `aiohttp` dependency\n- **Multi-Provider Support**: OpenRouter, Ollama, Anthropic, Google, OpenAI, Grok, Groq, Qwen, Z.ai, Mistral\n- **OpenAI-Compatible API**: Works with any client that supports OpenAI's chat completion API\n- **Configuration Management**: Easy provider enable/disable and configuration management\n- **CLI Interface**: Simple command-line interface for quick interactions\n- **Server Mode**: Run an OpenAI-compatible HTTP server at `http://localhost:{PORT}/v1/chat/completions`\n- **Image Support**: Process images through vision-capable models\n- **Audio Support**: Process audio through audio-capable models\n- **Custom Chat Templates**: Configurable chat completion request templates for different modalities\n- **Auto-Discovery**: Automatically discover available Ollama models\n- **Unified Models**: Define custom model names that map to different provider-specific names\n- **Multi-Model Support**: Support for over 160+ different LLMs\n\n## llms.py UI\n\nSimple ChatGPT-like UI to access ALL Your LLMs, Locally or Remotely!\n\n[![llms.py UI](https://servicestack.net/img/posts/llms-py-ui/bg.webp)](https://servicestack.net/posts/llms-py-ui)\n\nRead the [Introductory Blog Post](https://servicestack.net/posts/llms-py-ui).\n\n## Installation\n\n### Option 1: Install from PyPI\n\n```bash\npip install llms-py\n```\n\n### Option 2: Download directly\n\n1. Download [llms.py](llms.py)\n\n```bash\ncurl -O https://raw.githubusercontent.com/ServiceStack/llms/main/llms.py\nchmod +x llms.py\nmv llms.py ~/.local/bin/llms\n```\n\n2. Install single dependency:\n\n```bash\npip install aiohttp\n```\n\n## Quick Start\n\n### 1. Initialize Configuration\n\nCreate a default configuration file:\n\n```bash\nllms --init\n```\n\nThis saves the latest [llms.json](llms.json) configuration to `~/.llms/llms.json`.\n\nModify `~/.llms/llms.json` to enable providers, add required API keys, additional models or any custom\nOpenAI-compatible providers.\n\n### 2. Set API Keys\n\nSet environment variables for the providers you want to use:\n\n```bash\nexport OPENROUTER_API_KEY=\"...\"\nexport GROQ_API_KEY=\"...\"\nexport GOOGLE_API_KEY=\"...\"\nexport ANTHROPIC_API_KEY=\"...\"\nexport GROK_API_KEY=\"...\"\nexport DASHSCOPE_API_KEY=\"...\"\n# ... etc\n```\n\n### 3. Enable Providers\n\nEnable the providers you want to use:\n\n```bash\n# Enable providers with free models and free tiers\nllms --enable openrouter_free google_free groq\n\n# Enable paid providers\nllms --enable openrouter anthropic google openai mistral grok qwen\n```\n\n### 4. Start Chatting\n\n```bash\nllms \"What is the capital of France?\"\n```\n\n## Configuration\n\nThe configuration file (`llms.json`) defines available providers, models, and default settings. Key sections:\n\n### Defaults\n- `headers`: Common HTTP headers for all requests\n- `text`: Default chat completion request template for text prompts\n\n### Providers\n\nEach provider configuration includes:\n- `enabled`: Whether the provider is active\n- `type`: Provider class (OpenAiProvider, GoogleProvider, etc.)\n- `api_key`: API key (supports environment variables with `$VAR_NAME`)\n- `base_url`: API endpoint URL\n- `models`: Model name mappings (local name \u2192 provider name)\n\n## Command Line Usage\n\n### Basic Chat\n\n```bash\n# Simple question\nllms \"Explain quantum computing\"\n\n# With specific model\nllms -m gemini-2.5-pro \"Write a Python function to sort a list\"\nllms -m grok-4 \"Explain this code with humor\"\nllms -m qwen3-max \"Translate this to Chinese\"\n\n# With system prompt\nllms -s \"You are a helpful coding assistant\" \"How do I reverse a string in Python?\"\n\n# With image (vision models)\nllms --image image.jpg \"What's in this image?\"\nllms --image https://example.com/photo.png \"Describe this photo\"\n\n# Display full JSON Response\nllms \"Explain quantum computing\" --raw\n```\n\n### Using a Chat Template\n\nBy default llms uses the `defaults/text` chat completion request defined in [llms.json](llms.json).\n\nYou can instead use a custom chat completion request with `--chat`, e.g:\n\n```bash\n# Load chat completion request from JSON file\nllms --chat request.json\n\n# Override user message\nllms --chat request.json \"New user message\"\n\n# Override model\nllms -m kimi-k2 --chat request.json\n```\n\nExample `request.json`:\n\n```json\n{\n  \"model\": \"kimi-k2\",\n  \"messages\": [\n    {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n    {\"role\": \"user\",   \"content\": \"\"}\n  ],\n  \"temperature\": 0.7,\n  \"max_tokens\": 150\n}\n```\n\n### Image Requests\n\nSend images to vision-capable models using the `--image` option:\n\n```bash\n# Use defaults/image Chat Template (Describe the key features of the input image)\nllms --image ./screenshot.png\n\n# Local image file\nllms --image ./screenshot.png \"What's in this image?\"\n\n# Remote image URL\nllms --image https://example.org/photo.jpg \"Describe this photo\"\n\n# Data URI\nllms --image \"data:image/png;base64,$(base64 -w 0 image.png)\" \"Describe this image\"\n\n# With a specific vision model\nllms -m gemini-2.5-flash --image chart.png \"Analyze this chart\"\nllms -m qwen2.5vl --image document.jpg \"Extract text from this document\"\n\n# Combined with system prompt\nllms -s \"You are a data analyst\" --image graph.png \"What trends do you see?\"\n\n# With custom chat template\nllms --chat image-request.json --image photo.jpg\n```\n\nExample of `image-request.json`:\n\n```json\n{\n    \"model\": \"qwen2.5vl\",\n    \"messages\": [\n        {\n            \"role\": \"user\",\n            \"content\": [\n                {\n                    \"type\": \"image_url\",\n                    \"image_url\": {\n                        \"url\": \"\"\n                    }\n                },\n                {\n                    \"type\": \"text\",\n                    \"text\": \"Caption this image\"\n                }\n            ]\n        }\n    ]\n}\n```\n\n**Supported image formats**: PNG, WEBP, JPG, JPEG, GIF, BMP, TIFF, ICO\n\n**Image sources**:\n- **Local files**: Absolute paths (`/path/to/image.jpg`) or relative paths (`./image.png`, `../image.jpg`)\n- **Remote URLs**: HTTP/HTTPS URLs are automatically downloaded\n- **Data URIs**: Base64-encoded images (`data:image/png;base64,...`)\n\nImages are automatically processed and converted to base64 data URIs before being sent to the model.\n\n### Vision-Capable Models\n\nPopular models that support image analysis:\n- **OpenAI**: GPT-4o, GPT-4o-mini, GPT-4.1\n- **Anthropic**: Claude Sonnet 4.0, Claude Opus 4.1\n- **Google**: Gemini 2.5 Pro, Gemini Flash\n- **Qwen**: Qwen2.5-VL, Qwen3-VL, QVQ-max\n- **Ollama**: qwen2.5vl, llava\n\nImages are automatically downloaded and converted to base64 data URIs.\n\n### Audio Requests\n\nSend audio files to audio-capable models using the `--audio` option:\n\n```bash\n# Use defaults/audio Chat Template (Transcribe the audio)\nllms --audio ./recording.mp3\n\n# Local audio file\nllms --audio ./meeting.wav \"Summarize this meeting recording\"\n\n# Remote audio URL\nllms --audio https://example.org/podcast.mp3 \"What are the key points discussed?\"\n\n# With a specific audio model\nllms -m gpt-4o-audio-preview --audio interview.mp3 \"Extract the main topics\"\nllms -m gemini-2.5-flash --audio interview.mp3 \"Extract the main topics\"\n\n# Combined with system prompt\nllms -s \"You're a transcription specialist\" --audio talk.mp3 \"Provide a detailed transcript\"\n\n# With custom chat template\nllms --chat audio-request.json --audio speech.wav\n```\n\nExample of `audio-request.json`:\n\n```json\n{\n    \"model\": \"gpt-4o-audio-preview\",\n    \"messages\": [\n        {\n            \"role\": \"user\",\n            \"content\": [\n                {\n                    \"type\": \"input_audio\",\n                    \"input_audio\": {\n                        \"data\": \"\",\n                        \"format\": \"mp3\"\n                    }\n                },\n                {\n                    \"type\": \"text\",\n                    \"text\": \"Please transcribe this audio\"\n                }\n            ]\n        }\n    ]\n}\n```\n\n**Supported audio formats**: MP3, WAV\n\n**Audio sources**:\n- **Local files**: Absolute paths (`/path/to/audio.mp3`) or relative paths (`./audio.wav`, `../recording.m4a`)\n- **Remote URLs**: HTTP/HTTPS URLs are automatically downloaded\n- **Base64 Data**: Base64-encoded audio\n\nAudio files are automatically processed and converted to base64 data before being sent to the model.\n\n### Audio-Capable Models\n\nPopular models that support audio processing:\n- **OpenAI**: gpt-4o-audio-preview\n- **Google**: gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite\n\nAudio files are automatically downloaded and converted to base64 data URIs with appropriate format detection.\n\n### File Requests\n\nSend documents (e.g. PDFs) to file-capable models using the `--file` option:\n\n```bash\n# Use defaults/file Chat Template (Summarize the document)\nllms --file ./docs/handbook.pdf\n\n# Local PDF file\nllms --file ./docs/policy.pdf \"Summarize the key changes\"\n\n# Remote PDF URL\nllms --file https://example.org/whitepaper.pdf \"What are the main findings?\"\n\n# With specific file-capable models\nllms -m gpt-5               --file ./policy.pdf   \"Summarize the key changes\"\nllms -m gemini-flash-latest --file ./report.pdf   \"Extract action items\"\nllms -m qwen2.5vl           --file ./manual.pdf   \"List key sections and their purpose\"\n\n# Combined with system prompt\nllms -s \"You're a compliance analyst\" --file ./policy.pdf \"Identify compliance risks\"\n\n# With custom chat template\nllms --chat file-request.json --file ./docs/handbook.pdf\n```\n\nExample of `file-request.json`:\n\n```json\n{\n  \"model\": \"gpt-5\",\n  \"messages\": [\n    {\n      \"role\": \"user\",\n      \"content\": [\n        {\n          \"type\": \"file\",\n          \"file\": {\n            \"filename\": \"\",\n            \"file_data\": \"\"\n          }\n        },\n        {\n          \"type\": \"text\",\n          \"text\": \"Please summarize this document\"\n        }\n      ]\n    }\n  ]\n}\n```\n\n**Supported file formats**: PDF\n\nOther document types may work depending on the model/provider.\n\n**File sources**:\n- **Local files**: Absolute paths (`/path/to/file.pdf`) or relative paths (`./file.pdf`, `../file.pdf`)\n- **Remote URLs**: HTTP/HTTPS URLs are automatically downloaded\n- **Base64/Data URIs**: Inline `data:application/pdf;base64,...` is supported\n\nFiles are automatically downloaded (for URLs) and converted to base64 data URIs before being sent to the model.\n\n### File-Capable Models\n\nPopular multi-modal models that support file (PDF) inputs:\n- OpenAI: gpt-5, gpt-5-mini, gpt-4o, gpt-4o-mini\n- Google: gemini-flash-latest, gemini-2.5-flash-lite\n- Grok: grok-4-fast (OpenRouter)\n- Qwen: qwen2.5vl, qwen3-max, qwen3-vl:235b, qwen3-coder, qwen3-coder-flash (OpenRouter)\n- Others: kimi-k2, glm-4.5-air, deepseek-v3.1:671b, llama4:400b, llama3.3:70b, mai-ds-r1, nemotron-nano:9b\n\n## Server Mode\n\nRun as an OpenAI-compatible HTTP server:\n\n```bash\n# Start server on port 8000\nllms --serve 8000\n```\n\nThe server exposes a single endpoint:\n- `POST /v1/chat/completions` - OpenAI-compatible chat completions\n\nExample client usage:\n\n```bash\ncurl -X POST http://localhost:8000/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"kimi-k2\",\n    \"messages\": [\n      {\"role\": \"user\", \"content\": \"Hello!\"}\n    ]\n  }'\n```\n\n### Configuration Management\n\n```bash\n# List enabled providers and models\nllms --list\nllms ls\n\n# List specific providers\nllms ls ollama\nllms ls google anthropic\n\n# Enable providers\nllms --enable openrouter\nllms --enable anthropic google_free groq\n\n# Disable providers\nllms --disable ollama\nllms --disable openai anthropic\n\n# Set default model\nllms --default grok-4\n```\n\n### Update\n\n1. Installed from PyPI\n\n```bash\npip install llms-py --upgrade\n```\n\n2. Using Direct Download\n\n```bash\n# Update to latest version (Downloads latest llms.py)\nllms --update\n```\n\n### Advanced Options\n\n```bash\n# Use custom config file\nllms --config /path/to/config.json \"Hello\"\n\n# Get raw JSON response\nllms --raw \"What is 2+2?\"\n\n# Enable verbose logging\nllms --verbose \"Tell me a joke\"\n\n# Custom log prefix\nllms --verbose --logprefix \"[DEBUG] \" \"Hello world\"\n\n# Set default model (updates config file)\nllms --default grok-4\n\n# Update llms.py to latest version\nllms --update\n\n# Pass custom parameters to chat request (URL-encoded)\nllms --args \"temperature=0.7&seed=111\" \"What is 2+2?\"\n\n# Multiple parameters with different types\nllms --args \"temperature=0.5&max_completion_tokens=50\" \"Tell me a joke\"\n\n# URL-encoded special characters (stop sequences)\nllms --args \"stop=Two,Words\" \"Count to 5\"\n\n# Combine with other options\nllms --system \"You are helpful\" --args \"temperature=0.3\" --raw \"Hello\"\n```\n\n#### Custom Parameters with `--args`\n\nThe `--args` option allows you to pass URL-encoded parameters to customize the chat request sent to LLM providers:\n\n**Parameter Types:**\n- **Floats**: `temperature=0.7`, `frequency_penalty=0.2`\n- **Integers**: `max_completion_tokens=100`\n- **Booleans**: `store=true`, `verbose=false`, `logprobs=true`\n- **Strings**: `stop=one`\n- **Lists**: `stop=two,words`\n\n**Common Parameters:**\n- `temperature`: Controls randomness (0.0 to 2.0)\n- `max_completion_tokens`: Maximum tokens in response\n- `seed`: For reproducible outputs\n- `top_p`: Nucleus sampling parameter\n- `stop`: Stop sequences (URL-encode special chars)\n- `store`: Whether or not to store the output\n- `frequency_penalty`: Penalize new tokens based on frequency\n- `presence_penalty`: Penalize new tokens based on presence\n- `logprobs`: Include log probabilities in response\n- `parallel_tool_calls`: Enable parallel tool calls\n- `prompt_cache_key`: Cache key for prompt\n- `reasoning_effort`: Reasoning effort (low, medium, high, *minimal, *none, *default)\n- `safety_identifier`: A string that uniquely identifies each user\n- `seed`: For reproducible outputs\n- `service_tier`: Service tier (free, standard, premium, *default)\n- `top_logprobs`: Number of top logprobs to return\n- `top_p`: Nucleus sampling parameter\n- `verbosity`: Verbosity level (0, 1, 2, 3, *default)\n- `enable_thinking`: Enable thinking mode (Qwen)\n- `stream`: Enable streaming responses\n\n### Default Model Configuration\n\nThe `--default MODEL` option allows you to set the default model used for all chat completions. This updates the `defaults.text.model` field in your configuration file:\n\n```bash\n# Set default model to gpt-oss\nllms --default gpt-oss:120b\n\n# Set default model to Claude Sonnet\nllms --default claude-sonnet-4-0\n\n# The model must be available in your enabled providers\nllms --default gemini-2.5-pro\n```\n\nWhen you set a default model:\n- The configuration file (`~/.llms/llms.json`) is automatically updated\n- The specified model becomes the default for all future chat requests\n- The model must exist in your currently enabled providers\n- You can still override the default using `-m MODEL` for individual requests\n\n### Updating llms.py\n\nThe `--update` option downloads and installs the latest version of `llms.py` from the GitHub repository:\n\n```bash\n# Update to latest version\nllms --update\n```\n\nThis command:\n- Downloads the latest `llms.py` from `https://raw.githubusercontent.com/ServiceStack/llms/refs/heads/main/llms.py`\n- Overwrites your current `llms.py` file with the latest version\n- Preserves your existing configuration file (`llms.json`)\n- Requires an internet connection to download the update\n\n### Beautiful rendered Markdown\n\nPipe Markdown output to [glow](https://github.com/charmbracelet/glow) to beautifully render it in the terminal:\n\n```bash\nllms \"Explain quantum computing\" | glow\n```\n\n## Supported Providers\n\nAny OpenAI-compatible providers and their models can be added by configuring them in [llms.json](./llms.json). By default only AI Providers with free tiers are enabled which will only be \"available\" if their API Key is set. \n\nYou can list the available providers, their models and which are enabled or disabled with:\n\n```bash\nllms ls\n```\n\nThey can be enabled/disabled in your `llms.json` file or with:\n\n```bash\nllms --enable <provider>\nllms --disable <provider>\n```\n\nFor a provider to be available, they also require their API Key configured in either your Environment Variables\nor directly in your `llms.json`.\n\n### Environment Variables\n\n| Provider        | Variable                  | Description         | Example |\n|-----------------|---------------------------|---------------------|---------|\n| openrouter_free | `OPENROUTER_FREE_API_KEY` | OpenRouter FREE models API key | `sk-or-...` |\n| groq            | `GROQ_API_KEY`            | Groq API key        | `gsk_...` |\n| google_free     | `GOOGLE_FREE_API_KEY`     | Google FREE API key | `AIza...` |\n| codestral       | `CODESTRAL_API_KEY`       | Codestral API key   | `...` |\n| ollama          | N/A                       | No API key required | |\n| openrouter      | `OPENROUTER_API_KEY`      | OpenRouter API key  | `sk-or-...` |\n| google          | `GOOGLE_API_KEY`          | Google API key      | `AIza...` |\n| anthropic       | `ANTHROPIC_API_KEY`       | Anthropic API key   | `sk-ant-...` |\n| openai          | `OPENAI_API_KEY`          | OpenAI API key      | `sk-...` |\n| grok            | `GROK_API_KEY`            | Grok (X.AI) API key | `xai-...` |\n| qwen            | `DASHSCOPE_API_KEY`       | Qwen (Alibaba) API key | `sk-...` |\n| z.ai            | `ZAI_API_KEY`             | Z.ai API key        | `sk-...` |\n| mistral         | `MISTRAL_API_KEY`         | Mistral API key     | `...` |\n\n### OpenAI\n- **Type**: `OpenAiProvider`\n- **Models**: GPT-5, GPT-5 Codex, GPT-4o, GPT-4o-mini, o3, etc.\n- **Features**: Text, images, function calling\n\n```bash\nexport OPENAI_API_KEY=\"your-key\"\nllms --enable openai\n```\n\n### Anthropic (Claude)\n- **Type**: `OpenAiProvider`\n- **Models**: Claude Opus 4.1, Sonnet 4.0, Haiku 3.5, etc.\n- **Features**: Text, images, large context windows\n\n```bash\nexport ANTHROPIC_API_KEY=\"your-key\"\nllms --enable anthropic\n```\n\n### Google Gemini\n- **Type**: `GoogleProvider`\n- **Models**: Gemini 2.5 Pro, Flash, Flash-Lite\n- **Features**: Text, images, safety settings\n\n```bash\nexport GOOGLE_API_KEY=\"your-key\"\nllms --enable google_free\n```\n\n### OpenRouter\n- **Type**: `OpenAiProvider`\n- **Models**: 100+ models from various providers\n- **Features**: Access to latest models, free tier available\n\n```bash\nexport OPENROUTER_API_KEY=\"your-key\"\nllms --enable openrouter\n```\n\n### Grok (X.AI)\n- **Type**: `OpenAiProvider`\n- **Models**: Grok-4, Grok-3, Grok-3-mini, Grok-code-fast-1, etc.\n- **Features**: Real-time information, humor, uncensored responses\n\n```bash\nexport GROK_API_KEY=\"your-key\"\nllms --enable grok\n```\n\n### Groq\n- **Type**: `OpenAiProvider`\n- **Models**: Llama 3.3, Gemma 2, Kimi K2, etc.\n- **Features**: Fast inference, competitive pricing\n\n```bash\nexport GROQ_API_KEY=\"your-key\"\nllms --enable groq\n```\n\n### Ollama (Local)\n- **Type**: `OllamaProvider`\n- **Models**: Auto-discovered from local Ollama installation\n- **Features**: Local inference, privacy, no API costs\n\n```bash\n# Ollama must be running locally\nllms --enable ollama\n```\n\n### Qwen (Alibaba Cloud)\n- **Type**: `OpenAiProvider`\n- **Models**: Qwen3-max, Qwen-max, Qwen-plus, Qwen2.5-VL, QwQ-plus, etc.\n- **Features**: Multilingual, vision models, coding, reasoning, audio processing\n\n```bash\nexport DASHSCOPE_API_KEY=\"your-key\"\nllms --enable qwen\n```\n\n### Z.ai\n- **Type**: `OpenAiProvider`\n- **Models**: GLM-4.6, GLM-4.5, GLM-4.5-air, GLM-4.5-x, GLM-4.5-airx, GLM-4.5-flash, GLM-4:32b\n- **Features**: Advanced language models with strong reasoning capabilities\n\n```bash\nexport ZAI_API_KEY=\"your-key\"\nllms --enable z.ai\n```\n\n### Mistral\n- **Type**: `OpenAiProvider`\n- **Models**: Mistral Large, Codestral, Pixtral, etc.\n- **Features**: Code generation, multilingual\n\n```bash\nexport MISTRAL_API_KEY=\"your-key\"\nllms --enable mistral\n```\n\n### Codestral\n- **Type**: `OpenAiProvider`\n- **Models**: Codestral\n- **Features**: Code generation\n\n```bash\nexport CODESTRAL_API_KEY=\"your-key\"\nllms --enable codestral\n```\n\n## Model Routing\n\nThe tool automatically routes requests to the first available provider that supports the requested model. If a provider fails, it tries the next available provider with that model.\n\nExample: If both OpenAI and OpenRouter support `kimi-k2`, the request will first try OpenRouter (free), then fall back to Groq than OpenRouter (Paid) if requests fails.\n\n## Configuration Examples\n\n### Minimal Configuration\n\n```json\n{\n  \"defaults\": {\n    \"headers\": {\"Content-Type\": \"application/json\"},\n    \"text\": {\n      \"model\": \"kimi-k2\",\n      \"messages\": [{\"role\": \"user\", \"content\": \"\"}]\n    }\n  },\n  \"providers\": {\n    \"groq\": {\n      \"enabled\": true,\n      \"type\": \"OpenAiProvider\",\n      \"base_url\": \"https://api.groq.com/openai\",\n      \"api_key\": \"$GROQ_API_KEY\",\n      \"models\": {\n        \"llama3.3:70b\": \"llama-3.3-70b-versatile\",\n        \"llama4:109b\": \"meta-llama/llama-4-scout-17b-16e-instruct\",\n        \"llama4:400b\": \"meta-llama/llama-4-maverick-17b-128e-instruct\",\n        \"kimi-k2\": \"moonshotai/kimi-k2-instruct-0905\",\n        \"gpt-oss:120b\": \"openai/gpt-oss-120b\",\n        \"gpt-oss:20b\": \"openai/gpt-oss-20b\",\n        \"qwen3:32b\": \"qwen/qwen3-32b\"\n      }\n    }\n  }\n}\n```\n\n### Multi-Provider Setup\n\n```json\n{\n  \"providers\": {\n    \"openrouter\": {\n      \"enabled\": false,\n      \"type\": \"OpenAiProvider\",\n      \"base_url\": \"https://openrouter.ai/api\",\n      \"api_key\": \"$OPENROUTER_API_KEY\",\n      \"models\": {\n        \"grok-4\": \"x-ai/grok-4\",\n        \"glm-4.5-air\": \"z-ai/glm-4.5-air\",\n        \"kimi-k2\": \"moonshotai/kimi-k2\",\n        \"deepseek-v3.1:671b\": \"deepseek/deepseek-chat\",\n        \"llama4:400b\": \"meta-llama/llama-4-maverick\"\n      }\n    },\n    \"anthropic\": {\n      \"enabled\": false,\n      \"type\": \"OpenAiProvider\",\n      \"base_url\": \"https://api.anthropic.com\",\n      \"api_key\": \"$ANTHROPIC_API_KEY\",\n      \"models\": {\n        \"claude-sonnet-4-0\": \"claude-sonnet-4-0\"\n      }\n    },\n    \"ollama\": {\n      \"enabled\": false,\n      \"type\": \"OllamaProvider\",\n      \"base_url\": \"http://localhost:11434\",\n      \"models\": {},\n      \"all_models\": true\n    }\n  }\n}\n```\n\n## Usage\n\n    Run `llms` without arguments to see the help screen:\n\n    usage: llms.py [-h] [--config FILE] [-m MODEL] [--chat REQUEST] [-s PROMPT] [--image IMAGE] [--audio AUDIO]\n                  [--file FILE] [--raw] [--list] [--serve PORT] [--enable PROVIDER] [--disable PROVIDER]\n                  [--default MODEL] [--init] [--logprefix PREFIX] [--verbose] [--update]\n\n    llms\n\n    options:\n      -h, --help            show this help message and exit\n      --config FILE         Path to config file\n      -m MODEL, --model MODEL\n                            Model to use\n      --chat REQUEST        OpenAI Chat Completion Request to send\n      -s PROMPT, --system PROMPT\n                            System prompt to use for chat completion\n      --image IMAGE         Image input to use in chat completion\n      --audio AUDIO         Audio input to use in chat completion\n      --file FILE           File input to use in chat completion\n      --raw                 Return raw AI JSON response\n      --list                Show list of enabled providers and their models (alias ls provider?)\n      --serve PORT          Port to start an OpenAI Chat compatible server on\n      --enable PROVIDER     Enable a provider\n      --disable PROVIDER    Disable a provider\n      --default MODEL       Configure the default model to use\n      --init                Create a default llms.json\n      --logprefix PREFIX    Prefix used in log messages\n      --verbose             Verbose output\n      --update              Update to latest version\n\n## Troubleshooting\n\n### Common Issues\n\n**Config file not found**\n```bash\n# Initialize default config\nllms --init\n\n# Or specify custom path\nllms --config ./my-config.json\n```\n\n**No providers enabled**\n\n```bash\n# Check status\nllms --list\n\n# Enable providers\nllms --enable google anthropic\n```\n\n**API key issues**\n```bash\n# Check environment variables\necho $ANTHROPIC_API_KEY\n\n# Enable verbose logging\nllms --verbose \"test\"\n```\n\n**Model not found**\n\n```bash\n# List available models\nllms --list\n\n# Check provider configuration\nllms ls openrouter\n```\n\n### Debug Mode\n\nEnable verbose logging to see detailed request/response information:\n\n```bash\nllms --verbose --logprefix \"[DEBUG] \" \"Hello\"\n```\n\nThis shows:\n- Enabled providers\n- Model routing decisions\n- HTTP request details\n- Error messages with stack traces\n\n## Development\n\n### Project Structure\n\n- `llms.py` - Main script with CLI and server functionality\n- `llms.json` - Default configuration file\n- `requirements.txt` - Python dependencies\n\n### Provider Classes\n\n- `OpenAiProvider` - Generic OpenAI-compatible provider\n- `OllamaProvider` - Ollama-specific provider with model auto-discovery\n- `GoogleProvider` - Google Gemini with native API format\n- `GoogleOpenAiProvider` - Google Gemini via OpenAI-compatible endpoint\n\n### Adding New Providers\n\n1. Create a provider class inheriting from `OpenAiProvider`\n2. Implement provider-specific authentication and formatting\n3. Add provider configuration to `llms.json`\n4. Update initialization logic in `init_llms()`\n\n## Contributing\n\nContributions are welcome! Please submit a PR to add support for any missing OpenAI-compatible providers.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A lightweight CLI tool and OpenAI-compatible server for querying multiple Large Language Model (LLM) providers",
    "version": "2.0.10",
    "project_urls": {
        "Bug Reports": "https://github.com/ServiceStack/llms/issues",
        "Documentation": "https://github.com/ServiceStack/llms#readme",
        "Homepage": "https://github.com/ServiceStack/llms",
        "Repository": "https://github.com/ServiceStack/llms"
    },
    "split_keywords": [
        "llm",
        " ai",
        " openai",
        " anthropic",
        " google",
        " gemini",
        " groq",
        " mistral",
        " ollama",
        " cli",
        " server",
        " chat",
        " completion"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bc03911471ffac7ae45bca0fbf11809ce788460c63cb450860ee746706c1b476",
                "md5": "0d43d4f4956c6030afd5fb00c63799f6",
                "sha256": "48cf43ef28d35ddb903519b602415017a3cc749e7dacf7f4fd5310cd24f7a962"
            },
            "downloads": -1,
            "filename": "llms_py-2.0.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0d43d4f4956c6030afd5fb00c63799f6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 333711,
            "upload_time": "2025-10-07T04:17:00",
            "upload_time_iso_8601": "2025-10-07T04:17:00.030599Z",
            "url": "https://files.pythonhosted.org/packages/bc/03/911471ffac7ae45bca0fbf11809ce788460c63cb450860ee746706c1b476/llms_py-2.0.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "96cb433477d8999c90a637d4f242bd5b674494afd4c998cfd438b92f5609694d",
                "md5": "7f1f122b32bf0e2d0ecd561d9feff4ba",
                "sha256": "e56cf977242b812642d2e2aca534d2ac8303cc9b9ea8cf1c0fcb22bd56eab4a6"
            },
            "downloads": -1,
            "filename": "llms_py-2.0.10.tar.gz",
            "has_sig": false,
            "md5_digest": "7f1f122b32bf0e2d0ecd561d9feff4ba",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 456947,
            "upload_time": "2025-10-07T04:17:01",
            "upload_time_iso_8601": "2025-10-07T04:17:01.822885Z",
            "url": "https://files.pythonhosted.org/packages/96/cb/433477d8999c90a637d4f242bd5b674494afd4c998cfd438b92f5609694d/llms_py-2.0.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-07 04:17:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ServiceStack",
    "github_project": "llms",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "aiohttp",
            "specs": []
        }
    ],
    "lcname": "llms-py"
}

ServiceStack