Name | litechat JSON |
Version |
0.0.69
JSON |
| download |
home_page | https://github.com/santhosh/ |
Summary | automated huggingchat openai style fastapi inference |
upload_time | 2025-02-02 05:04:22 |
maintainer | None |
docs_url | None |
author | Kammari Santhosh |
requires_python | <4.0,>=3.10 |
license | MIT |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# LiteChat 🚀
LiteChat is a lightweight, OpenAI-compatible interface for running local LLM inference servers. It provides seamless integration with various open-source models while maintaining OpenAI-style API compatibility.
## Features ✨
- 🔄 OpenAI API compatibility
- 🌐 Web search integration
- 💬 Conversation memory
- 🔄 Streaming responses
- 🛠️ Easy integration with HuggingFace models
- 📦 Compatible with both litellm and OpenAI clients
- 🎯 Type-safe model selection
## Installation 🛠️
```bash
pip install litechat playwright
playwright install
```
## Available Models 🤖
LiteChat supports the following models:
- `Qwen/Qwen2.5-Coder-32B-Instruct`: Specialized coding model
- `Qwen/Qwen2.5-72B-Instruct`: Large general-purpose model
- `meta-llama/Llama-3.3-70B-Instruct`: Latest Llama 3 model
- `CohereForAI/c4ai-command-r-plus-08-2024`: Cohere's command model
- `Qwen/QwQ-32B-Preview`: Preview version of QwQ
- `nvidia/Llama-3.1-Nemotron-70B-Instruct-HF`: NVIDIA's Nemotron model
- `meta-llama/Llama-3.2-11B-Vision-Instruct`: Vision-capable Llama model
- `NousResearch/Hermes-3-Llama-3.1-8B`: Lightweight Hermes model
- `mistralai/Mistral-Nemo-Instruct-2407`: Mistral's instruction model
- `microsoft/Phi-3.5-mini-instruct`: Microsoft's compact Phi model
## Model Selection Helpers 🎯
LiteChat provides helper functions for type-safe model selection:
```python
from litechat import litechat_model, litellm_model
# For use with LiteChat native client
model = litechat_model("Qwen/Qwen2.5-72B-Instruct")
# For use with LiteLLM
model = litellm_model("Qwen/Qwen2.5-72B-Instruct") # Returns "openai/Qwen/Qwen2.5-72B-Instruct"
```
## Quick Start 🚀
### Starting the Server
You can start the LiteChat server in two ways:
1. Using the CLI:
```bash
litechat_server
```
2. Programmatically:
```python
from litechat import litechat_server
if __name__ == "__main__":
litechat_server(host="0.0.0.0", port=11437)
```
### Using with OpenAI Client
```python
import os
from openai import OpenAI
os.environ['OPENAI_BASE_URL'] = "http://localhost:11437/v1"
os.environ['OPENAI_API_KEY'] = "key123" # required, but not used
client = OpenAI()
response = client.chat.completions.create(
model=litechat_model("NousResearch/Hermes-3-Llama-3.1-8B"),
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.choices[0].message.content)
```
### Using with LiteLLM
```python
import os
from litellm import completion
from litechat import OPENAI_COMPATIBLE_BASE_URL, litellm_model
os.environ["OPENAI_API_KEY"] = "key123"
response = completion(
model=litellm_model("NousResearch/Hermes-3-Llama-3.1-8B"),
messages=[{"content": "Hello, how are you?", "role": "user"}],
api_base=OPENAI_COMPATIBLE_BASE_URL
)
print(response)
```
### Using LiteChat's Native Client
```python
from litechat import completion, genai, pp_completion
from litechat import litechat_model
# Basic completion
response = completion(
prompt="What is quantum computing?",
model="nvidia/Llama-3.1-Nemotron-70B-Instruct-HF",
web_search=True # Enable web search
)
# Stream with pretty printing
pp_completion(
prompt="Explain the theory of relativity",
model="Qwen/Qwen2.5-72B-Instruct",
conversation_id="physics_chat" # Enable conversation memory
)
# Get direct response
result = genai(
prompt="Write a poem about spring",
model="meta-llama/Llama-3.3-70B-Instruct",
system_prompt="You are a creative poet"
)
```
## Advanced Features 🔧
### Web Search Integration
Enable web search to get up-to-date information:
```python
response = completion(
prompt="What are the latest developments in AI?",
web_search=True
)
```
### Conversation Memory
Maintain context across multiple interactions:
```python
response = completion(
prompt="Tell me more about that",
conversation_id="unique_conversation_id"
)
```
### Streaming Responses
Get token-by-token streaming:
```python
for chunk in completion(
prompt="Write a long story",
stream=True
):
print(chunk.choices[0].delta.content, end="", flush=True)
```
## API Reference 📚
### LiteAI Client
```python
from litechat import LiteAI, litechat_model
client = LiteAI(
api_key="optional-key", # Optional API key
base_url="http://localhost:11437", # Server URL
system_prompt="You are a helpful assistant", # Default system prompt
web_search=False, # Enable/disable web search by default
model=litechat_model("nvidia/Llama-3.1-Nemotron-70B-Instruct-HF") # Default model
)
```
### Completion Function Parameters
- `messages`: List of conversation messages or direct prompt string
- `model`: HuggingFace model identifier (use `litechat_model()` for type safety)
- `system_prompt`: System instruction for the model
- `temperature`: Control randomness (0.0 to 1.0)
- `stream`: Enable streaming responses
- `web_search`: Enable web search
- `conversation_id`: Enable conversation memory
- `max_tokens`: Maximum tokens in response
- `tools`: List of available tools/functions
## Contributing 🤝
Contributions are welcome! Please feel free to submit a Pull Request.
## License 📄
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Support 💬
For support, please open an issue on the GitHub repository or reach out to the maintainers.
Raw data
{
"_id": null,
"home_page": "https://github.com/santhosh/",
"name": "litechat",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": null,
"author": "Kammari Santhosh",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/a2/91/a738488211d658679e4d0b27b3110df6664bf017f5de12f86bf16c21d841/litechat-0.0.69.tar.gz",
"platform": null,
"description": "# LiteChat \ud83d\ude80\n\nLiteChat is a lightweight, OpenAI-compatible interface for running local LLM inference servers. It provides seamless integration with various open-source models while maintaining OpenAI-style API compatibility.\n\n## Features \u2728\n\n- \ud83d\udd04 OpenAI API compatibility\n- \ud83c\udf10 Web search integration\n- \ud83d\udcac Conversation memory\n- \ud83d\udd04 Streaming responses\n- \ud83d\udee0\ufe0f Easy integration with HuggingFace models\n- \ud83d\udce6 Compatible with both litellm and OpenAI clients\n- \ud83c\udfaf Type-safe model selection\n\n## Installation \ud83d\udee0\ufe0f\n\n```bash\npip install litechat playwright\nplaywright install\n```\n\n## Available Models \ud83e\udd16\n\nLiteChat supports the following models:\n\n- `Qwen/Qwen2.5-Coder-32B-Instruct`: Specialized coding model\n- `Qwen/Qwen2.5-72B-Instruct`: Large general-purpose model\n- `meta-llama/Llama-3.3-70B-Instruct`: Latest Llama 3 model\n- `CohereForAI/c4ai-command-r-plus-08-2024`: Cohere's command model\n- `Qwen/QwQ-32B-Preview`: Preview version of QwQ\n- `nvidia/Llama-3.1-Nemotron-70B-Instruct-HF`: NVIDIA's Nemotron model\n- `meta-llama/Llama-3.2-11B-Vision-Instruct`: Vision-capable Llama model\n- `NousResearch/Hermes-3-Llama-3.1-8B`: Lightweight Hermes model\n- `mistralai/Mistral-Nemo-Instruct-2407`: Mistral's instruction model\n- `microsoft/Phi-3.5-mini-instruct`: Microsoft's compact Phi model\n\n## Model Selection Helpers \ud83c\udfaf\n\nLiteChat provides helper functions for type-safe model selection:\n\n```python\nfrom litechat import litechat_model, litellm_model\n\n# For use with LiteChat native client\nmodel = litechat_model(\"Qwen/Qwen2.5-72B-Instruct\")\n\n# For use with LiteLLM\nmodel = litellm_model(\"Qwen/Qwen2.5-72B-Instruct\") # Returns \"openai/Qwen/Qwen2.5-72B-Instruct\"\n```\n\n## Quick Start \ud83d\ude80\n\n### Starting the Server\n\nYou can start the LiteChat server in two ways:\n\n1. Using the CLI:\n```bash\nlitechat_server\n```\n\n2. Programmatically:\n\n```python\nfrom litechat import litechat_server\n\nif __name__ == \"__main__\":\n litechat_server(host=\"0.0.0.0\", port=11437)\n```\n\n### Using with OpenAI Client\n\n```python\nimport os\nfrom openai import OpenAI\n\nos.environ['OPENAI_BASE_URL'] = \"http://localhost:11437/v1\"\nos.environ['OPENAI_API_KEY'] = \"key123\" # required, but not used\n\nclient = OpenAI()\nresponse = client.chat.completions.create(\n model=litechat_model(\"NousResearch/Hermes-3-Llama-3.1-8B\"),\n messages=[\n {\"role\": \"system\", \"content\": \"You are a helpful assistant\"},\n {\"role\": \"user\", \"content\": \"What is the capital of France?\"}\n ]\n)\nprint(response.choices[0].message.content)\n```\n\n### Using with LiteLLM\n\n```python\nimport os\n\nfrom litellm import completion\nfrom litechat import OPENAI_COMPATIBLE_BASE_URL, litellm_model\n\nos.environ[\"OPENAI_API_KEY\"] = \"key123\"\n\nresponse = completion(\n model=litellm_model(\"NousResearch/Hermes-3-Llama-3.1-8B\"),\n messages=[{\"content\": \"Hello, how are you?\", \"role\": \"user\"}],\n api_base=OPENAI_COMPATIBLE_BASE_URL\n)\nprint(response)\n```\n\n### Using LiteChat's Native Client\n\n```python\nfrom litechat import completion, genai, pp_completion\nfrom litechat import litechat_model\n\n# Basic completion\nresponse = completion(\n prompt=\"What is quantum computing?\",\n model=\"nvidia/Llama-3.1-Nemotron-70B-Instruct-HF\",\n web_search=True # Enable web search\n)\n\n# Stream with pretty printing\npp_completion(\n prompt=\"Explain the theory of relativity\",\n model=\"Qwen/Qwen2.5-72B-Instruct\",\n conversation_id=\"physics_chat\" # Enable conversation memory\n)\n\n# Get direct response\nresult = genai(\n prompt=\"Write a poem about spring\",\n model=\"meta-llama/Llama-3.3-70B-Instruct\",\n system_prompt=\"You are a creative poet\"\n)\n```\n\n## Advanced Features \ud83d\udd27\n\n### Web Search Integration\n\nEnable web search to get up-to-date information:\n\n```python\nresponse = completion(\n prompt=\"What are the latest developments in AI?\",\n web_search=True\n)\n```\n\n### Conversation Memory\n\nMaintain context across multiple interactions:\n\n```python\nresponse = completion(\n prompt=\"Tell me more about that\",\n conversation_id=\"unique_conversation_id\"\n)\n```\n\n### Streaming Responses\n\nGet token-by-token streaming:\n\n```python\nfor chunk in completion(\n prompt=\"Write a long story\",\n stream=True\n):\n print(chunk.choices[0].delta.content, end=\"\", flush=True)\n```\n\n## API Reference \ud83d\udcda\n\n### LiteAI Client\n\n```python\nfrom litechat import LiteAI, litechat_model\n\nclient = LiteAI(\n api_key=\"optional-key\", # Optional API key\n base_url=\"http://localhost:11437\", # Server URL\n system_prompt=\"You are a helpful assistant\", # Default system prompt\n web_search=False, # Enable/disable web search by default\n model=litechat_model(\"nvidia/Llama-3.1-Nemotron-70B-Instruct-HF\") # Default model\n)\n```\n\n### Completion Function Parameters\n\n- `messages`: List of conversation messages or direct prompt string\n- `model`: HuggingFace model identifier (use `litechat_model()` for type safety)\n- `system_prompt`: System instruction for the model\n- `temperature`: Control randomness (0.0 to 1.0)\n- `stream`: Enable streaming responses\n- `web_search`: Enable web search\n- `conversation_id`: Enable conversation memory\n- `max_tokens`: Maximum tokens in response\n- `tools`: List of available tools/functions\n\n## Contributing \ud83e\udd1d\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License \ud83d\udcc4\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Support \ud83d\udcac\n\nFor support, please open an issue on the GitHub repository or reach out to the maintainers.",
"bugtrack_url": null,
"license": "MIT",
"summary": "automated huggingchat openai style fastapi inference",
"version": "0.0.69",
"project_urls": {
"Homepage": "https://github.com/santhosh/",
"Repository": "https://github.com/santhosh/"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0de11b3235510dd20c52e663071219474369ec50aa7002912a00ba44cacb28c0",
"md5": "4f488488497c944352efe5562a30ea92",
"sha256": "c82c0000fa27438ce4e2e3b82fcebb8a03cd1bfd965bb303e2407ec08a03c3ed"
},
"downloads": -1,
"filename": "litechat-0.0.69-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4f488488497c944352efe5562a30ea92",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 32303,
"upload_time": "2025-02-02T05:04:20",
"upload_time_iso_8601": "2025-02-02T05:04:20.655435Z",
"url": "https://files.pythonhosted.org/packages/0d/e1/1b3235510dd20c52e663071219474369ec50aa7002912a00ba44cacb28c0/litechat-0.0.69-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a291a738488211d658679e4d0b27b3110df6664bf017f5de12f86bf16c21d841",
"md5": "c482ffdcfbe0f68faa70e99841fc818e",
"sha256": "e11c8d01f1f0701758534914cc8e8f5386e7ba18560943824d3734bfe9ef643c"
},
"downloads": -1,
"filename": "litechat-0.0.69.tar.gz",
"has_sig": false,
"md5_digest": "c482ffdcfbe0f68faa70e99841fc818e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 66250,
"upload_time": "2025-02-02T05:04:22",
"upload_time_iso_8601": "2025-02-02T05:04:22.416682Z",
"url": "https://files.pythonhosted.org/packages/a2/91/a738488211d658679e4d0b27b3110df6664bf017f5de12f86bf16c21d841/litechat-0.0.69.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-02 05:04:22",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "litechat"
}