litechat


Namelitechat JSON
Version 0.0.69 PyPI version JSON
download
home_pagehttps://github.com/santhosh/
Summaryautomated huggingchat openai style fastapi inference
upload_time2025-02-02 05:04:22
maintainerNone
docs_urlNone
authorKammari Santhosh
requires_python<4.0,>=3.10
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # LiteChat 🚀

LiteChat is a lightweight, OpenAI-compatible interface for running local LLM inference servers. It provides seamless integration with various open-source models while maintaining OpenAI-style API compatibility.

## Features ✨

- 🔄 OpenAI API compatibility
- 🌐 Web search integration
- 💬 Conversation memory
- 🔄 Streaming responses
- 🛠️ Easy integration with HuggingFace models
- 📦 Compatible with both litellm and OpenAI clients
- 🎯 Type-safe model selection

## Installation 🛠️

```bash
pip install litechat playwright
playwright install
```

## Available Models 🤖

LiteChat supports the following models:

- `Qwen/Qwen2.5-Coder-32B-Instruct`: Specialized coding model
- `Qwen/Qwen2.5-72B-Instruct`: Large general-purpose model
- `meta-llama/Llama-3.3-70B-Instruct`: Latest Llama 3 model
- `CohereForAI/c4ai-command-r-plus-08-2024`: Cohere's command model
- `Qwen/QwQ-32B-Preview`: Preview version of QwQ
- `nvidia/Llama-3.1-Nemotron-70B-Instruct-HF`: NVIDIA's Nemotron model
- `meta-llama/Llama-3.2-11B-Vision-Instruct`: Vision-capable Llama model
- `NousResearch/Hermes-3-Llama-3.1-8B`: Lightweight Hermes model
- `mistralai/Mistral-Nemo-Instruct-2407`: Mistral's instruction model
- `microsoft/Phi-3.5-mini-instruct`: Microsoft's compact Phi model

## Model Selection Helpers 🎯

LiteChat provides helper functions for type-safe model selection:

```python
from litechat import litechat_model, litellm_model

# For use with LiteChat native client
model = litechat_model("Qwen/Qwen2.5-72B-Instruct")

# For use with LiteLLM
model = litellm_model("Qwen/Qwen2.5-72B-Instruct")  # Returns "openai/Qwen/Qwen2.5-72B-Instruct"
```

## Quick Start 🚀

### Starting the Server

You can start the LiteChat server in two ways:

1. Using the CLI:
```bash
litechat_server
```

2. Programmatically:

```python
from litechat import litechat_server

if __name__ == "__main__":
    litechat_server(host="0.0.0.0", port=11437)
```

### Using with OpenAI Client

```python
import os
from openai import OpenAI

os.environ['OPENAI_BASE_URL'] = "http://localhost:11437/v1"
os.environ['OPENAI_API_KEY'] = "key123" # required, but not used

client = OpenAI()
response = client.chat.completions.create(
    model=litechat_model("NousResearch/Hermes-3-Llama-3.1-8B"),
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)
print(response.choices[0].message.content)
```

### Using with LiteLLM

```python
import os

from litellm import completion
from litechat import OPENAI_COMPATIBLE_BASE_URL, litellm_model

os.environ["OPENAI_API_KEY"] = "key123"

response = completion(
    model=litellm_model("NousResearch/Hermes-3-Llama-3.1-8B"),
    messages=[{"content": "Hello, how are you?", "role": "user"}],
    api_base=OPENAI_COMPATIBLE_BASE_URL
)
print(response)
```

### Using LiteChat's Native Client

```python
from litechat import completion, genai, pp_completion
from litechat import litechat_model

# Basic completion
response = completion(
    prompt="What is quantum computing?",
    model="nvidia/Llama-3.1-Nemotron-70B-Instruct-HF",
    web_search=True  # Enable web search
)

# Stream with pretty printing
pp_completion(
    prompt="Explain the theory of relativity",
    model="Qwen/Qwen2.5-72B-Instruct",
    conversation_id="physics_chat"  # Enable conversation memory
)

# Get direct response
result = genai(
    prompt="Write a poem about spring",
    model="meta-llama/Llama-3.3-70B-Instruct",
    system_prompt="You are a creative poet"
)
```

## Advanced Features 🔧

### Web Search Integration

Enable web search to get up-to-date information:

```python
response = completion(
    prompt="What are the latest developments in AI?",
    web_search=True
)
```

### Conversation Memory

Maintain context across multiple interactions:

```python
response = completion(
    prompt="Tell me more about that",
    conversation_id="unique_conversation_id"
)
```

### Streaming Responses

Get token-by-token streaming:

```python
for chunk in completion(
    prompt="Write a long story",
    stream=True
):
    print(chunk.choices[0].delta.content, end="", flush=True)
```

## API Reference 📚

### LiteAI Client

```python
from litechat import LiteAI, litechat_model

client = LiteAI(
    api_key="optional-key",  # Optional API key
    base_url="http://localhost:11437",  # Server URL
    system_prompt="You are a helpful assistant",  # Default system prompt
    web_search=False,  # Enable/disable web search by default
    model=litechat_model("nvidia/Llama-3.1-Nemotron-70B-Instruct-HF")  # Default model
)
```

### Completion Function Parameters

- `messages`: List of conversation messages or direct prompt string
- `model`: HuggingFace model identifier (use `litechat_model()` for type safety)
- `system_prompt`: System instruction for the model
- `temperature`: Control randomness (0.0 to 1.0)
- `stream`: Enable streaming responses
- `web_search`: Enable web search
- `conversation_id`: Enable conversation memory
- `max_tokens`: Maximum tokens in response
- `tools`: List of available tools/functions

## Contributing 🤝

Contributions are welcome! Please feel free to submit a Pull Request.

## License 📄

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Support 💬

For support, please open an issue on the GitHub repository or reach out to the maintainers.
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/santhosh/",
    "name": "litechat",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Kammari Santhosh",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/a2/91/a738488211d658679e4d0b27b3110df6664bf017f5de12f86bf16c21d841/litechat-0.0.69.tar.gz",
    "platform": null,
    "description": "# LiteChat \ud83d\ude80\n\nLiteChat is a lightweight, OpenAI-compatible interface for running local LLM inference servers. It provides seamless integration with various open-source models while maintaining OpenAI-style API compatibility.\n\n## Features \u2728\n\n- \ud83d\udd04 OpenAI API compatibility\n- \ud83c\udf10 Web search integration\n- \ud83d\udcac Conversation memory\n- \ud83d\udd04 Streaming responses\n- \ud83d\udee0\ufe0f Easy integration with HuggingFace models\n- \ud83d\udce6 Compatible with both litellm and OpenAI clients\n- \ud83c\udfaf Type-safe model selection\n\n## Installation \ud83d\udee0\ufe0f\n\n```bash\npip install litechat playwright\nplaywright install\n```\n\n## Available Models \ud83e\udd16\n\nLiteChat supports the following models:\n\n- `Qwen/Qwen2.5-Coder-32B-Instruct`: Specialized coding model\n- `Qwen/Qwen2.5-72B-Instruct`: Large general-purpose model\n- `meta-llama/Llama-3.3-70B-Instruct`: Latest Llama 3 model\n- `CohereForAI/c4ai-command-r-plus-08-2024`: Cohere's command model\n- `Qwen/QwQ-32B-Preview`: Preview version of QwQ\n- `nvidia/Llama-3.1-Nemotron-70B-Instruct-HF`: NVIDIA's Nemotron model\n- `meta-llama/Llama-3.2-11B-Vision-Instruct`: Vision-capable Llama model\n- `NousResearch/Hermes-3-Llama-3.1-8B`: Lightweight Hermes model\n- `mistralai/Mistral-Nemo-Instruct-2407`: Mistral's instruction model\n- `microsoft/Phi-3.5-mini-instruct`: Microsoft's compact Phi model\n\n## Model Selection Helpers \ud83c\udfaf\n\nLiteChat provides helper functions for type-safe model selection:\n\n```python\nfrom litechat import litechat_model, litellm_model\n\n# For use with LiteChat native client\nmodel = litechat_model(\"Qwen/Qwen2.5-72B-Instruct\")\n\n# For use with LiteLLM\nmodel = litellm_model(\"Qwen/Qwen2.5-72B-Instruct\")  # Returns \"openai/Qwen/Qwen2.5-72B-Instruct\"\n```\n\n## Quick Start \ud83d\ude80\n\n### Starting the Server\n\nYou can start the LiteChat server in two ways:\n\n1. Using the CLI:\n```bash\nlitechat_server\n```\n\n2. Programmatically:\n\n```python\nfrom litechat import litechat_server\n\nif __name__ == \"__main__\":\n    litechat_server(host=\"0.0.0.0\", port=11437)\n```\n\n### Using with OpenAI Client\n\n```python\nimport os\nfrom openai import OpenAI\n\nos.environ['OPENAI_BASE_URL'] = \"http://localhost:11437/v1\"\nos.environ['OPENAI_API_KEY'] = \"key123\" # required, but not used\n\nclient = OpenAI()\nresponse = client.chat.completions.create(\n    model=litechat_model(\"NousResearch/Hermes-3-Llama-3.1-8B\"),\n    messages=[\n        {\"role\": \"system\", \"content\": \"You are a helpful assistant\"},\n        {\"role\": \"user\", \"content\": \"What is the capital of France?\"}\n    ]\n)\nprint(response.choices[0].message.content)\n```\n\n### Using with LiteLLM\n\n```python\nimport os\n\nfrom litellm import completion\nfrom litechat import OPENAI_COMPATIBLE_BASE_URL, litellm_model\n\nos.environ[\"OPENAI_API_KEY\"] = \"key123\"\n\nresponse = completion(\n    model=litellm_model(\"NousResearch/Hermes-3-Llama-3.1-8B\"),\n    messages=[{\"content\": \"Hello, how are you?\", \"role\": \"user\"}],\n    api_base=OPENAI_COMPATIBLE_BASE_URL\n)\nprint(response)\n```\n\n### Using LiteChat's Native Client\n\n```python\nfrom litechat import completion, genai, pp_completion\nfrom litechat import litechat_model\n\n# Basic completion\nresponse = completion(\n    prompt=\"What is quantum computing?\",\n    model=\"nvidia/Llama-3.1-Nemotron-70B-Instruct-HF\",\n    web_search=True  # Enable web search\n)\n\n# Stream with pretty printing\npp_completion(\n    prompt=\"Explain the theory of relativity\",\n    model=\"Qwen/Qwen2.5-72B-Instruct\",\n    conversation_id=\"physics_chat\"  # Enable conversation memory\n)\n\n# Get direct response\nresult = genai(\n    prompt=\"Write a poem about spring\",\n    model=\"meta-llama/Llama-3.3-70B-Instruct\",\n    system_prompt=\"You are a creative poet\"\n)\n```\n\n## Advanced Features \ud83d\udd27\n\n### Web Search Integration\n\nEnable web search to get up-to-date information:\n\n```python\nresponse = completion(\n    prompt=\"What are the latest developments in AI?\",\n    web_search=True\n)\n```\n\n### Conversation Memory\n\nMaintain context across multiple interactions:\n\n```python\nresponse = completion(\n    prompt=\"Tell me more about that\",\n    conversation_id=\"unique_conversation_id\"\n)\n```\n\n### Streaming Responses\n\nGet token-by-token streaming:\n\n```python\nfor chunk in completion(\n    prompt=\"Write a long story\",\n    stream=True\n):\n    print(chunk.choices[0].delta.content, end=\"\", flush=True)\n```\n\n## API Reference \ud83d\udcda\n\n### LiteAI Client\n\n```python\nfrom litechat import LiteAI, litechat_model\n\nclient = LiteAI(\n    api_key=\"optional-key\",  # Optional API key\n    base_url=\"http://localhost:11437\",  # Server URL\n    system_prompt=\"You are a helpful assistant\",  # Default system prompt\n    web_search=False,  # Enable/disable web search by default\n    model=litechat_model(\"nvidia/Llama-3.1-Nemotron-70B-Instruct-HF\")  # Default model\n)\n```\n\n### Completion Function Parameters\n\n- `messages`: List of conversation messages or direct prompt string\n- `model`: HuggingFace model identifier (use `litechat_model()` for type safety)\n- `system_prompt`: System instruction for the model\n- `temperature`: Control randomness (0.0 to 1.0)\n- `stream`: Enable streaming responses\n- `web_search`: Enable web search\n- `conversation_id`: Enable conversation memory\n- `max_tokens`: Maximum tokens in response\n- `tools`: List of available tools/functions\n\n## Contributing \ud83e\udd1d\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License \ud83d\udcc4\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Support \ud83d\udcac\n\nFor support, please open an issue on the GitHub repository or reach out to the maintainers.",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "automated huggingchat openai style fastapi inference",
    "version": "0.0.69",
    "project_urls": {
        "Homepage": "https://github.com/santhosh/",
        "Repository": "https://github.com/santhosh/"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0de11b3235510dd20c52e663071219474369ec50aa7002912a00ba44cacb28c0",
                "md5": "4f488488497c944352efe5562a30ea92",
                "sha256": "c82c0000fa27438ce4e2e3b82fcebb8a03cd1bfd965bb303e2407ec08a03c3ed"
            },
            "downloads": -1,
            "filename": "litechat-0.0.69-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4f488488497c944352efe5562a30ea92",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 32303,
            "upload_time": "2025-02-02T05:04:20",
            "upload_time_iso_8601": "2025-02-02T05:04:20.655435Z",
            "url": "https://files.pythonhosted.org/packages/0d/e1/1b3235510dd20c52e663071219474369ec50aa7002912a00ba44cacb28c0/litechat-0.0.69-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a291a738488211d658679e4d0b27b3110df6664bf017f5de12f86bf16c21d841",
                "md5": "c482ffdcfbe0f68faa70e99841fc818e",
                "sha256": "e11c8d01f1f0701758534914cc8e8f5386e7ba18560943824d3734bfe9ef643c"
            },
            "downloads": -1,
            "filename": "litechat-0.0.69.tar.gz",
            "has_sig": false,
            "md5_digest": "c482ffdcfbe0f68faa70e99841fc818e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 66250,
            "upload_time": "2025-02-02T05:04:22",
            "upload_time_iso_8601": "2025-02-02T05:04:22.416682Z",
            "url": "https://files.pythonhosted.org/packages/a2/91/a738488211d658679e4d0b27b3110df6664bf017f5de12f86bf16c21d841/litechat-0.0.69.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-02 05:04:22",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "litechat"
}
        
Elapsed time: 0.43202s