<p align="center">
<img src="/assets/bhumi_logo.png" alt="Bhumi Logo" width="1600"/>
</p>
# ๐ **BHUMI - The Fastest AI Inference Client** โก
## **Introduction**
Bhumi is the fastest AI inference client, built with Rust for Python. It is designed to maximize performance, efficiency, and scalability, making it the best choice for LLM API interactions.
### **Why Bhumi?**
- ๐ **Fastest AI inference client** โ Outperforms alternatives with **2-3x higher throughput**
- โก **Built with Rust for Python** โ Achieves high efficiency with low overhead
- ๐ **Supports multiple AI providers** โ OpenAI, Anthropic, Google Gemini, Groq, SambaNova, and more
- ๐ **Streaming and async capabilities** โ Real-time responses with Rust-powered concurrency
- ๐ **Automatic connection pooling and retries** โ Ensures reliability and efficiency
- ๐ก **Minimal memory footprint** โ Uses up to **60% less memory** than other clients
- ๐ **Production-ready** โ Optimized for high-throughput applications
Bhumi (เคญเฅเคฎเคฟ) is Sanskrit for **Earth**, symbolizing **stability, grounding, and speed**โjust like our inference engine, which ensures rapid and stable performance. ๐
## Installation
```bash
pip install bhumi
```
## Quick Start
### OpenAI Example
```python
import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os
api_key = os.getenv("OPENAI_API_KEY")
async def main():
config = LLMConfig(
api_key=api_key,
model="openai/gpt-4o",
debug=True
)
client = BaseLLMClient(config)
response = await client.completion([
{"role": "user", "content": "Tell me a joke"}
])
print(f"Response: {response['text']}")
if __name__ == "__main__":
asyncio.run(main())
```
## โก **Performance Optimizations**
Bhumi includes cutting-edge performance optimizations that make it **2-3x faster** than alternatives:
### ๐ง **MAP-Elites Buffer Strategy**
- **Ultra-fast archive loading** with Satya validation + orjson parsing (**3x faster** than standard JSON)
- **Trained buffer configurations** optimized through evolutionary algorithms
- **Automatic buffer adjustment** based on response patterns and historical data
- **Type-safe validation** with comprehensive error checking
- **Secure loading** without unsafe `eval()` operations
### ๐ **Performance Status Check**
Check if you have optimal performance with the built-in diagnostics:
```python
from bhumi.utils import print_performance_status
# Check optimization status
print_performance_status()
# ๐ Bhumi Performance Status
# โ
Optimized MAP-Elites archive loaded
# โก Optimization Details:
# โข Entries: 15,644 total, 15,644 optimized
# โข Coverage: 100.0% of search space
# โข Loading: Satya validation + orjson parsing (3x faster)
```
### ๐ **Archive Distribution**
When you install Bhumi, you automatically get:
- Pre-trained MAP-Elites archive for optimal buffer sizing
- Fast orjson-based JSON parsing (2-3x faster than standard `json`)
- Satya-powered type validation for bulletproof data loading
- Performance metrics and diagnostics
### Gemini Example
```python
import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os
api_key = os.getenv("GEMINI_API_KEY")
async def main():
config = LLMConfig(
api_key=api_key,
model="gemini/gemini-2.0-flash",
debug=True
)
client = BaseLLMClient(config)
response = await client.completion([
{"role": "user", "content": "Tell me a joke"}
])
print(f"Response: {response['text']}")
if __name__ == "__main__":
asyncio.run(main())
```
## Streaming Support
All providers support streaming responses:
```python
async for chunk in await client.completion([
{"role": "user", "content": "Write a story"}
], stream=True):
print(chunk, end="", flush=True)
```
## ๐ **Benchmark Results**
Our latest benchmarks show significant performance advantages across different metrics:

### โก Response Time
- LiteLLM: 13.79s
- Native: 5.55s
- Bhumi: 4.26s
- Google GenAI: 6.76s
### ๐ Throughput (Requests/Second)
- LiteLLM: 3.48
- Native: 8.65
- Bhumi: 11.27
- Google GenAI: 7.10
### ๐พ Peak Memory Usage (MB)
- LiteLLM: 275.9MB
- Native: 279.6MB
- Bhumi: 284.3MB
- Google GenAI: 284.8MB
These benchmarks demonstrate Bhumi's superior performance, particularly in throughput where it outperforms other solutions by up to 3.2x.
## Configuration Options
The LLMConfig class supports various options:
- `api_key`: API key for the provider
- `model`: Model name in format "provider/model_name"
- `base_url`: Optional custom base URL
- `max_retries`: Number of retries (default: 3)
- `timeout`: Request timeout in seconds (default: 30)
- `max_tokens`: Maximum tokens in response
- `debug`: Enable debug logging
## ๐ฏ **Why Use Bhumi?**
โ **Open Source:** Apache 2.0 licensed, free for commercial use
โ **Community Driven:** Welcomes contributions from individuals and companies
โ **Blazing Fast:** **2-3x faster** than alternative solutions
โ **Resource Efficient:** Uses **60% less memory** than comparable clients
โ **Multi-Model Support:** Easily switch between providers
โ **Parallel Requests:** Handles **multiple concurrent requests** effortlessly
โ **Flexibility:** Debugging and customization options available
โ **Production Ready:** Battle-tested in high-throughput environments
## ๐ค **Contributing**
We welcome contributions from the community! Whether you're an individual developer or representing a company like Google, OpenAI, or Anthropic, feel free to:
- Submit pull requests
- Report issues
- Suggest improvements
- Share benchmarks
- Integrate our optimizations into your libraries (with attribution)
## ๐ **License**
Apache 2.0
๐ **Join our community and help make AI inference faster for everyone!** ๐
Raw data
{
"_id": null,
"home_page": "https://github.com/yourusername/bhumi",
"name": "bhumi",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "llm, ai, groq, batch-processing, async",
"author": "Your Name <your.email@example.com>",
"author_email": "Rach Pradhan <rach@rachpradhan.com>",
"download_url": null,
"platform": null,
"description": "<p align=\"center\">\n <img src=\"/assets/bhumi_logo.png\" alt=\"Bhumi Logo\" width=\"1600\"/>\n</p>\n\n\n# \ud83c\udf0d **BHUMI - The Fastest AI Inference Client** \u26a1\n\n## **Introduction**\nBhumi is the fastest AI inference client, built with Rust for Python. It is designed to maximize performance, efficiency, and scalability, making it the best choice for LLM API interactions. \n\n### **Why Bhumi?**\n- \ud83d\ude80 **Fastest AI inference client** \u2013 Outperforms alternatives with **2-3x higher throughput**\n- \u26a1 **Built with Rust for Python** \u2013 Achieves high efficiency with low overhead\n- \ud83c\udf10 **Supports multiple AI providers** \u2013 OpenAI, Anthropic, Google Gemini, Groq, SambaNova, and more\n- \ud83d\udd04 **Streaming and async capabilities** \u2013 Real-time responses with Rust-powered concurrency\n- \ud83d\udd01 **Automatic connection pooling and retries** \u2013 Ensures reliability and efficiency\n- \ud83d\udca1 **Minimal memory footprint** \u2013 Uses up to **60% less memory** than other clients\n- \ud83c\udfd7 **Production-ready** \u2013 Optimized for high-throughput applications\n\nBhumi (\u092d\u0942\u092e\u093f) is Sanskrit for **Earth**, symbolizing **stability, grounding, and speed**\u2014just like our inference engine, which ensures rapid and stable performance. \ud83d\ude80\n\n## Installation\n```bash\npip install bhumi\n```\n\n## Quick Start\n\n### OpenAI Example\n```python\nimport asyncio\nfrom bhumi.base_client import BaseLLMClient, LLMConfig\nimport os\n\napi_key = os.getenv(\"OPENAI_API_KEY\")\n\nasync def main():\n config = LLMConfig(\n api_key=api_key,\n model=\"openai/gpt-4o\",\n debug=True\n )\n \n client = BaseLLMClient(config)\n \n response = await client.completion([\n {\"role\": \"user\", \"content\": \"Tell me a joke\"}\n ])\n print(f\"Response: {response['text']}\")\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n## \u26a1 **Performance Optimizations**\n\nBhumi includes cutting-edge performance optimizations that make it **2-3x faster** than alternatives:\n\n### \ud83e\udde0 **MAP-Elites Buffer Strategy**\n- **Ultra-fast archive loading** with Satya validation + orjson parsing (**3x faster** than standard JSON)\n- **Trained buffer configurations** optimized through evolutionary algorithms \n- **Automatic buffer adjustment** based on response patterns and historical data\n- **Type-safe validation** with comprehensive error checking\n- **Secure loading** without unsafe `eval()` operations\n\n### \ud83d\udcca **Performance Status Check**\nCheck if you have optimal performance with the built-in diagnostics:\n\n```python\nfrom bhumi.utils import print_performance_status\n\n# Check optimization status\nprint_performance_status()\n# \ud83d\ude80 Bhumi Performance Status\n# \u2705 Optimized MAP-Elites archive loaded \n# \u26a1 Optimization Details:\n# \u2022 Entries: 15,644 total, 15,644 optimized\n# \u2022 Coverage: 100.0% of search space\n# \u2022 Loading: Satya validation + orjson parsing (3x faster)\n```\n\n### \ud83c\udfc6 **Archive Distribution**\nWhen you install Bhumi, you automatically get:\n- Pre-trained MAP-Elites archive for optimal buffer sizing\n- Fast orjson-based JSON parsing (2-3x faster than standard `json`)\n- Satya-powered type validation for bulletproof data loading\n- Performance metrics and diagnostics\n\n### Gemini Example\n```python\nimport asyncio\nfrom bhumi.base_client import BaseLLMClient, LLMConfig\nimport os\n\napi_key = os.getenv(\"GEMINI_API_KEY\")\n\nasync def main():\n config = LLMConfig(\n api_key=api_key,\n model=\"gemini/gemini-2.0-flash\",\n debug=True\n )\n \n client = BaseLLMClient(config)\n \n response = await client.completion([\n {\"role\": \"user\", \"content\": \"Tell me a joke\"}\n ])\n print(f\"Response: {response['text']}\")\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n## Streaming Support\nAll providers support streaming responses:\n\n```python\nasync for chunk in await client.completion([\n {\"role\": \"user\", \"content\": \"Write a story\"}\n], stream=True):\n print(chunk, end=\"\", flush=True)\n```\n\n## \ud83d\udcca **Benchmark Results**\nOur latest benchmarks show significant performance advantages across different metrics:\n\n\n### \u26a1 Response Time\n- LiteLLM: 13.79s\n- Native: 5.55s\n- Bhumi: 4.26s\n- Google GenAI: 6.76s\n\n### \ud83d\ude80 Throughput (Requests/Second)\n- LiteLLM: 3.48\n- Native: 8.65\n- Bhumi: 11.27\n- Google GenAI: 7.10\n\n### \ud83d\udcbe Peak Memory Usage (MB)\n- LiteLLM: 275.9MB\n- Native: 279.6MB\n- Bhumi: 284.3MB\n- Google GenAI: 284.8MB\n\nThese benchmarks demonstrate Bhumi's superior performance, particularly in throughput where it outperforms other solutions by up to 3.2x.\n\n## Configuration Options\nThe LLMConfig class supports various options:\n- `api_key`: API key for the provider\n- `model`: Model name in format \"provider/model_name\"\n- `base_url`: Optional custom base URL\n- `max_retries`: Number of retries (default: 3)\n- `timeout`: Request timeout in seconds (default: 30)\n- `max_tokens`: Maximum tokens in response\n- `debug`: Enable debug logging\n\n## \ud83c\udfaf **Why Use Bhumi?**\n\u2714 **Open Source:** Apache 2.0 licensed, free for commercial use \n\u2714 **Community Driven:** Welcomes contributions from individuals and companies \n\u2714 **Blazing Fast:** **2-3x faster** than alternative solutions \n\u2714 **Resource Efficient:** Uses **60% less memory** than comparable clients \n\u2714 **Multi-Model Support:** Easily switch between providers \n\u2714 **Parallel Requests:** Handles **multiple concurrent requests** effortlessly \n\u2714 **Flexibility:** Debugging and customization options available \n\u2714 **Production Ready:** Battle-tested in high-throughput environments\n\n## \ud83e\udd1d **Contributing**\nWe welcome contributions from the community! Whether you're an individual developer or representing a company like Google, OpenAI, or Anthropic, feel free to:\n\n- Submit pull requests\n- Report issues\n- Suggest improvements\n- Share benchmarks\n- Integrate our optimizations into your libraries (with attribution)\n\n## \ud83d\udcdc **License**\nApache 2.0\n\n\ud83c\udf1f **Join our community and help make AI inference faster for everyone!** \ud83c\udf1f\n\n\n",
"bugtrack_url": null,
"license": "MIT OR Apache-2.0",
"summary": "High performance LLM client",
"version": "0.3.1",
"project_urls": {
"Homepage": "https://github.com/yourusername/bhumi",
"Source Code": "https://github.com/yourusername/bhumi"
},
"split_keywords": [
"llm",
" ai",
" groq",
" batch-processing",
" async"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b4aa4b825f28f2a3c2d079e8a269142686a0f97204dca919ea915a388a35d541",
"md5": "1afca50f3b5bee051cc50ae27e15fc4a",
"sha256": "0133e1be1ba1162a1f945922eb654a19dc26efc9a4dadfa742d6eab31e16f0ee"
},
"downloads": -1,
"filename": "bhumi-0.3.1-cp38-abi3-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "1afca50f3b5bee051cc50ae27e15fc4a",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 1250744,
"upload_time": "2025-07-20T11:59:26",
"upload_time_iso_8601": "2025-07-20T11:59:26.831452Z",
"url": "https://files.pythonhosted.org/packages/b4/aa/4b825f28f2a3c2d079e8a269142686a0f97204dca919ea915a388a35d541/bhumi-0.3.1-cp38-abi3-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-20 11:59:26",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yourusername",
"github_project": "bhumi",
"github_not_found": true,
"lcname": "bhumi"
}