# coeai `PyPI Package`
**Interact with high-capacity multimodal LLMs hosted on the COE AI GPU cluster from any Python environment.**
`coeai` is a lightweight Python wrapper currently around the *LLaMA-4 16x17B* model (128K context, vision-enabled) deployed on the Centre of Excellence for AI (COE AI) servers at UPES. It exposes a single `/generate` HTTP endpoint, making it trivial to run both text-only and image+text inference from notebooks, scripts or backend services connected to the UPES Wi-Fi.
> **Text and image input** **128,000-token context** **Streaming or batch** **Runs on the COE AI GPU node**
---
## Table of Contents
1. [Features](#features)
2. [Requirements](#requirements)
3. [Installation](#installation)
4. [Quick Start](#quick-start)
5. [API Usage](#api-usage)
6. [Model Parameters](#model-parameters)
7. [Authentication](#authentication)
8. [Joining COE AI](#joining-coe-ai)
9. [Troubleshooting](#troubleshooting)
10. [License](#license)
11. [Author](#author)
---
## Features
* **Ultra-long context** up to **128K tokens** per request for long documents or multi-turn chats
* **Vision support** send images along with text for multimodal reasoning
* **High performance** queries are served by a dedicated GPU node inside the COE AI HPC cluster
* **Simple auth** authenticate with a short-lived API key (valid 30 days) sent in the request header
* **Drop-in wrapper** minimal Python API; no need to handle HTTP manually
---
## Requirements
* Python **3.8 or newer**
* Network access to `http://10.16.1.50:8000` from the UPES campus Wi-Fi
* A **valid API key** issued by the COE AI team
---
## Installation
```bash
pip install coeai
```
This pulls the latest release from PyPI.
---
## Quick Start
The wrapper exposes a single `LLMinfer` class. Initialize it with the API URL and your API key, then call `infer()`.
### Text-to-Text
```python
from coeai import LLMinfer
llm = LLMinfer(
api_url="http://10.16.1.50:8000/generate",
api_key="API_KEY"
)
response = llm.infer(
mode="text-to-text",
prompt_text="Summarize the key points of general relativity.",
max_tokens=500,
temperature=0.6,
top_p=0.95,
stream=False
)
print(response)
```
### Image + Text
```python
from coeai import LLMinfer
# Initialize the client
llm = LLMinfer(
api_url="http://10.16.1.50:8000/generate",
api_key="API_KEY"
)
# Run inference with image and prompt
response = llm.infer(
mode="image-text-to-text",
prompt_text="Describe what's happening in the image.",
image_path="/home/konal.106904/sample.jpg", # <-- update to a valid path
max_tokens=512,
temperature=0.7,
top_p=1.0,
stream=False
)
# Print the response
print(response)
```
---
## API Usage
### Using the Python Wrapper
The examples above show the recommended approach using the `LLMinfer` class.
### Direct API Access with cURL
You can also interact directly with the `/generate` endpoint using cURL.
#### Prerequisites
| Requirement | Purpose |
|-------------|---------|
| A running instance of the API | Default URL: `http://10.16.1.50:8000/generate` |
| Valid API key | Supply in the `X-API-Key` request header |
| cURL 7.68+ | Supports `--data @-` JSON piping |
#### Text-Only Request
```bash
curl -X POST http://10.16.1.50:8000/generate \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY_HERE" \
-d '{
"model": "llama4",
"messages": [
{
"role": "system",
"content": "This is a chat between a user and an assistant. The assistant is helping the user with general questions."
},
{
"role": "user",
"content": "Explain what a black hole is."
}
],
"max_tokens": 512,
"temperature": 0.7,
"top_p": 1.0,
"stream": false
}'
```
#### Image + Text Request
For multimodal requests, include the image as a Base64-encoded Data URI in the `content` array:
> **Note:** Replace `YOUR_API_KEY_HERE` with your own API key and `PUT_BASE64_IMAGE_STRING_HERE` with the **Base64-encoded** contents of your image file.
**How it Works:**
1. **Inline Image**: The `image_url` object embeds the entire image as a Data URI so no separate file upload is required
2. **Multi-Modal Prompt**: The `content` field is an array containing both the image and the accompanying text question, preserving ordering
3. **Response**: The server returns a JSON object containing the assistant's interpretation of the supplied image
---
## Model Parameters
### Default Parameters
| Field | Description | Default |
|----------------|----------------------------------------------|---------|
| `model` | Model name (currently fixed to `llama4`) | `llama4` |
| `stream` | Return tokens incrementally | `false` |
| `max_tokens` | Maximum new tokens to generate | `1024` |
| `temperature` | Sampling temperature (creativity) | `0.7` |
| `top_p` | Nucleus sampling | `1.0` |
| `stop` | List of stop sequences | `null` |
### Parameter Details
| Parameter | Description |
|----------------|-------------|
| `model` | The model identifier exposed by your server (here `llama4`) |
| `messages` | Conversation history, each entry containing a `role` and `content` |
| `max_tokens` | Upper bound on tokens in the assistant reply |
| `temperature` | Controls randomness; lower values yield more deterministic output |
| `top_p` | Nucleus sampling; keep at `1.0` for default behavior |
| `stream` | When `true`, the API will send incremental responses via Server-Sent Events (SSE) |
> **Note:** The server enforces total context of 128K tokens (prompt + generated). Adjust `max_tokens` accordingly.
---
## Authentication
All requests must include an **API key** issued by the COE AI team. Pass the key when constructing `LLMinfer` (it is added as an `Authorization` header behind the scenes).
### Requesting an API Key
1. **Send an email** to `hpc-access@ddn.upes.ac.in` *from your official UPES account* using this template:
```
Subject: API Key Request for COE AI LLM Access
Dear COE AI Team,
I am requesting access to the LLM API for my project work.
Project Details:
- Project Name: <Your Project Name>
- Project Description: <Brief description>
- Expected Usage: <How you plan to use the LLM>
- Duration: <Timeline>
Reason for API Access:
<Research objectives or academic requirements>
Additional Information:
- Name: <Your Name>
- Email: <Your Email>
- Department/Affiliation: <Dept/Organisation>
- Student/Faculty ID: <If applicable>
Thank you for considering my request.
Best regards,
<Your Name>
```
2. Allow **2-3 business days** for processing. The team will reply with your API key.
### Key Renewal
Keys expire **after 30 days**. Email the same address with the subject:
```
Subject: API Key Renewal Request for COE AI LLM Access
```
Include your previous key and a brief usage summary.
---
## Troubleshooting
| Symptom | Possible Cause | Fix |
|---------|----------------|-----|
| `ConnectionError` | Not on UPES network | Connect to campus Wi-Fi or VPN |
| `401 Unauthorized` | Missing/expired API key | Request or renew your key |
| Long latency | Very large prompts or high `max_tokens` | Reduce prompt size or output length |
---
## License
`coeai` is released under the **MIT License**.
---
## Author
**Konal Puri**
Centre of Excellence: AI (COE AI), HPC Project, UPES
PyPI: <https://pypi.org/project/coeai>
Raw data
{
"_id": null,
"home_page": "https://github.com/pkonal/coeai",
"name": "coeai",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "llm inference coeai ollama ai-client",
"author": "Konal Puri",
"author_email": "purikonal23@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/19/03/7c7c44152e99ccd720b4c1612958c874396e4b7f00ad987c1be65d19ded7/coeai-2.0.0.tar.gz",
"platform": null,
"description": "# coeai `PyPI Package`\n\n**Interact with high-capacity multimodal LLMs hosted on the COE AI GPU cluster from any Python environment.**\n\n`coeai` is a lightweight Python wrapper currently around the *LLaMA-4 16x17B* model (128K context, vision-enabled) deployed on the Centre of Excellence for AI (COE AI) servers at UPES. It exposes a single `/generate` HTTP endpoint, making it trivial to run both text-only and image+text inference from notebooks, scripts or backend services connected to the UPES Wi-Fi.\n\n> **Text and image input** **128,000-token context** **Streaming or batch** **Runs on the COE AI GPU node**\n\n---\n\n## Table of Contents\n1. [Features](#features)\n2. [Requirements](#requirements)\n3. [Installation](#installation)\n4. [Quick Start](#quick-start)\n5. [API Usage](#api-usage)\n6. [Model Parameters](#model-parameters)\n7. [Authentication](#authentication)\n8. [Joining COE AI](#joining-coe-ai)\n9. [Troubleshooting](#troubleshooting)\n10. [License](#license)\n11. [Author](#author)\n\n---\n\n## Features\n\n* **Ultra-long context** up to **128K tokens** per request for long documents or multi-turn chats\n* **Vision support** send images along with text for multimodal reasoning\n* **High performance** queries are served by a dedicated GPU node inside the COE AI HPC cluster\n* **Simple auth** authenticate with a short-lived API key (valid 30 days) sent in the request header\n* **Drop-in wrapper** minimal Python API; no need to handle HTTP manually\n\n---\n\n## Requirements\n\n* Python **3.8 or newer**\n* Network access to `http://10.16.1.50:8000` from the UPES campus Wi-Fi\n* A **valid API key** issued by the COE AI team\n\n---\n\n## Installation\n\n```bash\npip install coeai\n```\n\nThis pulls the latest release from PyPI.\n\n---\n\n## Quick Start\n\nThe wrapper exposes a single `LLMinfer` class. Initialize it with the API URL and your API key, then call `infer()`.\n\n### Text-to-Text\n\n```python\nfrom coeai import LLMinfer\n\nllm = LLMinfer(\n api_url=\"http://10.16.1.50:8000/generate\",\n api_key=\"API_KEY\"\n)\n\nresponse = llm.infer(\n mode=\"text-to-text\",\n prompt_text=\"Summarize the key points of general relativity.\",\n max_tokens=500,\n temperature=0.6,\n top_p=0.95,\n stream=False\n)\n\nprint(response)\n\n```\n\n### Image + Text\n\n```python\nfrom coeai import LLMinfer\n\n# Initialize the client\nllm = LLMinfer(\n api_url=\"http://10.16.1.50:8000/generate\",\n api_key=\"API_KEY\"\n)\n\n# Run inference with image and prompt\nresponse = llm.infer(\n mode=\"image-text-to-text\",\n prompt_text=\"Describe what's happening in the image.\",\n image_path=\"/home/konal.106904/sample.jpg\", # <-- update to a valid path\n max_tokens=512,\n temperature=0.7,\n top_p=1.0,\n stream=False\n)\n\n# Print the response\nprint(response)\n\n```\n\n---\n\n## API Usage\n\n### Using the Python Wrapper\n\nThe examples above show the recommended approach using the `LLMinfer` class.\n\n### Direct API Access with cURL\n\nYou can also interact directly with the `/generate` endpoint using cURL.\n\n#### Prerequisites\n\n| Requirement | Purpose |\n|-------------|---------|\n| A running instance of the API | Default URL: `http://10.16.1.50:8000/generate` |\n| Valid API key | Supply in the `X-API-Key` request header |\n| cURL 7.68+ | Supports `--data @-` JSON piping |\n\n#### Text-Only Request\n\n```bash\ncurl -X POST http://10.16.1.50:8000/generate \\\n -H \"Content-Type: application/json\" \\\n -H \"X-API-Key: YOUR_API_KEY_HERE\" \\\n -d '{\n \"model\": \"llama4\",\n \"messages\": [\n {\n \"role\": \"system\",\n \"content\": \"This is a chat between a user and an assistant. The assistant is helping the user with general questions.\"\n },\n {\n \"role\": \"user\",\n \"content\": \"Explain what a black hole is.\"\n }\n ],\n \"max_tokens\": 512,\n \"temperature\": 0.7,\n \"top_p\": 1.0,\n \"stream\": false\n }'\n```\n\n#### Image + Text Request\n\nFor multimodal requests, include the image as a Base64-encoded Data URI in the `content` array:\n\n> **Note:** Replace `YOUR_API_KEY_HERE` with your own API key and `PUT_BASE64_IMAGE_STRING_HERE` with the **Base64-encoded** contents of your image file.\n\n**How it Works:**\n1. **Inline Image**: The `image_url` object embeds the entire image as a Data URI so no separate file upload is required\n2. **Multi-Modal Prompt**: The `content` field is an array containing both the image and the accompanying text question, preserving ordering\n3. **Response**: The server returns a JSON object containing the assistant's interpretation of the supplied image\n\n---\n\n## Model Parameters\n\n### Default Parameters\n\n| Field | Description | Default |\n|----------------|----------------------------------------------|---------|\n| `model` | Model name (currently fixed to `llama4`) | `llama4` |\n| `stream` | Return tokens incrementally | `false` |\n| `max_tokens` | Maximum new tokens to generate | `1024` |\n| `temperature` | Sampling temperature (creativity) | `0.7` |\n| `top_p` | Nucleus sampling | `1.0` |\n| `stop` | List of stop sequences | `null` |\n\n### Parameter Details\n\n| Parameter | Description |\n|----------------|-------------|\n| `model` | The model identifier exposed by your server (here `llama4`) |\n| `messages` | Conversation history, each entry containing a `role` and `content` |\n| `max_tokens` | Upper bound on tokens in the assistant reply |\n| `temperature` | Controls randomness; lower values yield more deterministic output |\n| `top_p` | Nucleus sampling; keep at `1.0` for default behavior |\n| `stream` | When `true`, the API will send incremental responses via Server-Sent Events (SSE) |\n\n> **Note:** The server enforces total context of 128K tokens (prompt + generated). Adjust `max_tokens` accordingly.\n\n---\n\n## Authentication\n\nAll requests must include an **API key** issued by the COE AI team. Pass the key when constructing `LLMinfer` (it is added as an `Authorization` header behind the scenes).\n\n### Requesting an API Key\n\n1. **Send an email** to `hpc-access@ddn.upes.ac.in` *from your official UPES account* using this template:\n\n```\nSubject: API Key Request for COE AI LLM Access\n\nDear COE AI Team,\n\nI am requesting access to the LLM API for my project work.\n\nProject Details:\n- Project Name: <Your Project Name>\n- Project Description: <Brief description>\n- Expected Usage: <How you plan to use the LLM>\n- Duration: <Timeline>\n\nReason for API Access:\n<Research objectives or academic requirements>\n\nAdditional Information:\n- Name: <Your Name>\n- Email: <Your Email>\n- Department/Affiliation: <Dept/Organisation>\n- Student/Faculty ID: <If applicable>\n\nThank you for considering my request.\n\nBest regards,\n<Your Name>\n```\n\n2. Allow **2-3 business days** for processing. The team will reply with your API key.\n\n### Key Renewal\n\nKeys expire **after 30 days**. Email the same address with the subject:\n```\nSubject: API Key Renewal Request for COE AI LLM Access\n```\n\nInclude your previous key and a brief usage summary.\n\n---\n\n## Troubleshooting\n\n| Symptom | Possible Cause | Fix |\n|---------|----------------|-----|\n| `ConnectionError` | Not on UPES network | Connect to campus Wi-Fi or VPN |\n| `401 Unauthorized` | Missing/expired API key | Request or renew your key |\n| Long latency | Very large prompts or high `max_tokens` | Reduce prompt size or output length |\n\n---\n\n## License\n\n`coeai` is released under the **MIT License**.\n\n---\n\n## Author\n\n**Konal Puri** \nCentre of Excellence: AI (COE AI), HPC Project, UPES\n\nPyPI: <https://pypi.org/project/coeai>\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Client to interact with COE AI-hosted LLM models",
"version": "2.0.0",
"project_urls": {
"Homepage": "https://github.com/pkonal/coeai"
},
"split_keywords": [
"llm",
"inference",
"coeai",
"ollama",
"ai-client"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3f23888552c7fdd4298911518c059c07ef4858f8a22156b12d340adda0d0ddf7",
"md5": "3b3c0d8032019d362c298eff629d423e",
"sha256": "5aba4aa15c29c37789cc567c5c09f1d6112bb695d50a28b385efd31585cd521b"
},
"downloads": -1,
"filename": "coeai-2.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3b3c0d8032019d362c298eff629d423e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 6605,
"upload_time": "2025-07-25T09:24:14",
"upload_time_iso_8601": "2025-07-25T09:24:14.074315Z",
"url": "https://files.pythonhosted.org/packages/3f/23/888552c7fdd4298911518c059c07ef4858f8a22156b12d340adda0d0ddf7/coeai-2.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "19037c7c44152e99ccd720b4c1612958c874396e4b7f00ad987c1be65d19ded7",
"md5": "9ec8fda6be6c2cf48393e01b2a5b4efa",
"sha256": "9613cc9e3e253ea359005444d44501dd882358c1cf6c33795e7a08173781ad23"
},
"downloads": -1,
"filename": "coeai-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "9ec8fda6be6c2cf48393e01b2a5b4efa",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 6486,
"upload_time": "2025-07-25T09:24:15",
"upload_time_iso_8601": "2025-07-25T09:24:15.340649Z",
"url": "https://files.pythonhosted.org/packages/19/03/7c7c44152e99ccd720b4c1612958c874396e4b7f00ad987c1be65d19ded7/coeai-2.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-25 09:24:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "pkonal",
"github_project": "coeai",
"github_not_found": true,
"lcname": "coeai"
}