# Jina-Serve
<a href="https://pypi.org/project/jina/"><img alt="PyPI" src="https://img.shields.io/pypi/v/jina?label=Release&style=flat-square"></a>
<a href="https://discord.jina.ai"><img src="https://img.shields.io/discord/1106542220112302130?logo=discord&logoColor=white&style=flat-square"></a>
<a href="https://pypistats.org/packages/jina"><img alt="PyPI - Downloads from official pypistats" src="https://img.shields.io/pypi/dm/jina?style=flat-square"></a>
<a href="https://github.com/jina-ai/jina/actions/workflows/cd.yml"><img alt="Github CD status" src="https://github.com/jina-ai/jina/actions/workflows/cd.yml/badge.svg"></a>
Jina-serve is a framework for building and deploying AI services that communicate via gRPC, HTTP and WebSockets. Scale your services from local development to production while focusing on your core logic.
## Key Features
- Native support for all major ML frameworks and data types
- High-performance service design with scaling, streaming, and dynamic batching
- LLM serving with streaming output
- Built-in Docker integration and Executor Hub
- One-click deployment to Jina AI Cloud
- Enterprise-ready with Kubernetes and Docker Compose support
<details>
<summary><strong>Comparison with FastAPI</strong></summary>
Key advantages over FastAPI:
- DocArray-based data handling with native gRPC support
- Built-in containerization and service orchestration
- Seamless scaling of microservices
- One-command cloud deployment
</details>
## Install
```bash
pip install jina
```
See guides for [Apple Silicon](https://jina.ai/serve/get-started/install/apple-silicon-m1-m2/) and [Windows](https://jina.ai/serve/get-started/install/windows/).
## Core Concepts
Three main layers:
- **Data**: BaseDoc and DocList for input/output
- **Serving**: Executors process Documents, Gateway connects services
- **Orchestration**: Deployments serve Executors, Flows create pipelines
## Build AI Services
Let's create a gRPC-based AI service using StableLM:
```python
from jina import Executor, requests
from docarray import DocList, BaseDoc
from transformers import pipeline
class Prompt(BaseDoc):
text: str
class Generation(BaseDoc):
prompt: str
text: str
class StableLM(Executor):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.generator = pipeline(
'text-generation', model='stabilityai/stablelm-base-alpha-3b'
)
@requests
def generate(self, docs: DocList[Prompt], **kwargs) -> DocList[Generation]:
generations = DocList[Generation]()
prompts = docs.text
llm_outputs = self.generator(prompts)
for prompt, output in zip(prompts, llm_outputs):
generations.append(Generation(prompt=prompt, text=output))
return generations
```
Deploy with Python or YAML:
```python
from jina import Deployment
from executor import StableLM
dep = Deployment(uses=StableLM, timeout_ready=-1, port=12345)
with dep:
dep.block()
```
```yaml
jtype: Deployment
with:
uses: StableLM
py_modules:
- executor.py
timeout_ready: -1
port: 12345
```
Use the client:
```python
from jina import Client
from docarray import DocList
from executor import Prompt, Generation
prompt = Prompt(text='suggest an interesting image generation prompt')
client = Client(port=12345)
response = client.post('/', inputs=[prompt], return_type=DocList[Generation])
```
## Build Pipelines
Chain services into a Flow:
```python
from jina import Flow
flow = Flow(port=12345).add(uses=StableLM).add(uses=TextToImage)
with flow:
flow.block()
```
## Scaling and Deployment
### Local Scaling
Boost throughput with built-in features:
- Replicas for parallel processing
- Shards for data partitioning
- Dynamic batching for efficient model inference
Example scaling a Stable Diffusion deployment:
```yaml
jtype: Deployment
with:
uses: TextToImage
timeout_ready: -1
py_modules:
- text_to_image.py
env:
CUDA_VISIBLE_DEVICES: RR
replicas: 2
uses_dynamic_batching:
/default:
preferred_batch_size: 10
timeout: 200
```
### Cloud Deployment
#### Containerize Services
1. Structure your Executor:
```
TextToImage/
├── executor.py
├── config.yml
├── requirements.txt
```
2. Configure:
```yaml
# config.yml
jtype: TextToImage
py_modules:
- executor.py
metas:
name: TextToImage
description: Text to Image generation Executor
```
3. Push to Hub:
```bash
jina hub push TextToImage
```
#### Deploy to Kubernetes
```bash
jina export kubernetes flow.yml ./my-k8s
kubectl apply -R -f my-k8s
```
#### Use Docker Compose
```bash
jina export docker-compose flow.yml docker-compose.yml
docker-compose up
```
#### JCloud Deployment
Deploy with a single command:
```bash
jina cloud deploy jcloud-flow.yml
```
## LLM Streaming
Enable token-by-token streaming for responsive LLM applications:
1. Define schemas:
```python
from docarray import BaseDoc
class PromptDocument(BaseDoc):
prompt: str
max_tokens: int
class ModelOutputDocument(BaseDoc):
token_id: int
generated_text: str
```
2. Initialize service:
```python
from transformers import GPT2Tokenizer, GPT2LMHeadModel
class TokenStreamingExecutor(Executor):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.model = GPT2LMHeadModel.from_pretrained('gpt2')
```
3. Implement streaming:
```python
@requests(on='/stream')
async def task(self, doc: PromptDocument, **kwargs) -> ModelOutputDocument:
input = tokenizer(doc.prompt, return_tensors='pt')
input_len = input['input_ids'].shape[1]
for _ in range(doc.max_tokens):
output = self.model.generate(**input, max_new_tokens=1)
if output[0][-1] == tokenizer.eos_token_id:
break
yield ModelOutputDocument(
token_id=output[0][-1],
generated_text=tokenizer.decode(
output[0][input_len:], skip_special_tokens=True
),
)
input = {
'input_ids': output,
'attention_mask': torch.ones(1, len(output[0])),
}
```
4. Serve and use:
```python
# Server
with Deployment(uses=TokenStreamingExecutor, port=12345, protocol='grpc') as dep:
dep.block()
# Client
async def main():
client = Client(port=12345, protocol='grpc', asyncio=True)
async for doc in client.stream_doc(
on='/stream',
inputs=PromptDocument(prompt='what is the capital of France ?', max_tokens=10),
return_type=ModelOutputDocument,
):
print(doc.generated_text)
```
## Support
Jina-serve is backed by [Jina AI](https://jina.ai) and licensed under [Apache-2.0](./LICENSE).
Raw data
{
"_id": null,
"home_page": "https://github.com/jina-ai/jina/",
"name": "jina",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "jina cloud-native cross-modal multimodal neural-search query search index elastic neural-network encoding embedding serving docker container image video audio deep-learning mlops",
"author": "Jina AI",
"author_email": "hello@jina.ai",
"download_url": "https://github.com/jina-ai/jina/tags",
"platform": null,
"description": "# Jina-Serve\n<a href=\"https://pypi.org/project/jina/\"><img alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/jina?label=Release&style=flat-square\"></a>\n<a href=\"https://discord.jina.ai\"><img src=\"https://img.shields.io/discord/1106542220112302130?logo=discord&logoColor=white&style=flat-square\"></a>\n<a href=\"https://pypistats.org/packages/jina\"><img alt=\"PyPI - Downloads from official pypistats\" src=\"https://img.shields.io/pypi/dm/jina?style=flat-square\"></a>\n<a href=\"https://github.com/jina-ai/jina/actions/workflows/cd.yml\"><img alt=\"Github CD status\" src=\"https://github.com/jina-ai/jina/actions/workflows/cd.yml/badge.svg\"></a>\n\nJina-serve is a framework for building and deploying AI services that communicate via gRPC, HTTP and WebSockets. Scale your services from local development to production while focusing on your core logic.\n\n## Key Features\n\n- Native support for all major ML frameworks and data types\n- High-performance service design with scaling, streaming, and dynamic batching\n- LLM serving with streaming output\n- Built-in Docker integration and Executor Hub\n- One-click deployment to Jina AI Cloud\n- Enterprise-ready with Kubernetes and Docker Compose support\n\n<details>\n<summary><strong>Comparison with FastAPI</strong></summary>\n\nKey advantages over FastAPI:\n\n- DocArray-based data handling with native gRPC support\n- Built-in containerization and service orchestration\n- Seamless scaling of microservices\n- One-command cloud deployment\n</details>\n\n## Install \n\n```bash\npip install jina\n```\n\nSee guides for [Apple Silicon](https://jina.ai/serve/get-started/install/apple-silicon-m1-m2/) and [Windows](https://jina.ai/serve/get-started/install/windows/).\n\n## Core Concepts\n\nThree main layers:\n- **Data**: BaseDoc and DocList for input/output\n- **Serving**: Executors process Documents, Gateway connects services\n- **Orchestration**: Deployments serve Executors, Flows create pipelines\n\n## Build AI Services\n\nLet's create a gRPC-based AI service using StableLM:\n\n```python\nfrom jina import Executor, requests\nfrom docarray import DocList, BaseDoc\nfrom transformers import pipeline\n\n\nclass Prompt(BaseDoc):\n text: str\n\n\nclass Generation(BaseDoc):\n prompt: str\n text: str\n\n\nclass StableLM(Executor):\n def __init__(self, **kwargs):\n super().__init__(**kwargs)\n self.generator = pipeline(\n 'text-generation', model='stabilityai/stablelm-base-alpha-3b'\n )\n\n @requests\n def generate(self, docs: DocList[Prompt], **kwargs) -> DocList[Generation]:\n generations = DocList[Generation]()\n prompts = docs.text\n llm_outputs = self.generator(prompts)\n for prompt, output in zip(prompts, llm_outputs):\n generations.append(Generation(prompt=prompt, text=output))\n return generations\n```\n\nDeploy with Python or YAML:\n\n```python\nfrom jina import Deployment\nfrom executor import StableLM\n\ndep = Deployment(uses=StableLM, timeout_ready=-1, port=12345)\n\nwith dep:\n dep.block()\n```\n\n```yaml\njtype: Deployment\nwith:\n uses: StableLM\n py_modules:\n - executor.py\n timeout_ready: -1\n port: 12345\n```\n\nUse the client:\n\n```python\nfrom jina import Client\nfrom docarray import DocList\nfrom executor import Prompt, Generation\n\nprompt = Prompt(text='suggest an interesting image generation prompt')\nclient = Client(port=12345)\nresponse = client.post('/', inputs=[prompt], return_type=DocList[Generation])\n```\n\n## Build Pipelines\n\nChain services into a Flow:\n\n```python\nfrom jina import Flow\n\nflow = Flow(port=12345).add(uses=StableLM).add(uses=TextToImage)\n\nwith flow:\n flow.block()\n```\n\n## Scaling and Deployment\n\n### Local Scaling\n\nBoost throughput with built-in features:\n- Replicas for parallel processing\n- Shards for data partitioning\n- Dynamic batching for efficient model inference\n\nExample scaling a Stable Diffusion deployment:\n\n```yaml\njtype: Deployment\nwith:\n uses: TextToImage\n timeout_ready: -1\n py_modules:\n - text_to_image.py\n env:\n CUDA_VISIBLE_DEVICES: RR\n replicas: 2\n uses_dynamic_batching:\n /default:\n preferred_batch_size: 10\n timeout: 200\n```\n\n### Cloud Deployment\n\n#### Containerize Services\n\n1. Structure your Executor:\n```\nTextToImage/\n\u251c\u2500\u2500 executor.py\n\u251c\u2500\u2500 config.yml\n\u251c\u2500\u2500 requirements.txt\n```\n\n2. Configure:\n```yaml\n# config.yml\njtype: TextToImage\npy_modules:\n - executor.py\nmetas:\n name: TextToImage\n description: Text to Image generation Executor\n```\n\n3. Push to Hub:\n```bash\njina hub push TextToImage\n```\n\n#### Deploy to Kubernetes\n```bash\njina export kubernetes flow.yml ./my-k8s\nkubectl apply -R -f my-k8s\n```\n\n#### Use Docker Compose\n```bash\njina export docker-compose flow.yml docker-compose.yml\ndocker-compose up\n```\n\n#### JCloud Deployment\n\nDeploy with a single command:\n```bash\njina cloud deploy jcloud-flow.yml\n```\n\n## LLM Streaming\n\nEnable token-by-token streaming for responsive LLM applications:\n\n1. Define schemas:\n```python\nfrom docarray import BaseDoc\n\n\nclass PromptDocument(BaseDoc):\n prompt: str\n max_tokens: int\n\n\nclass ModelOutputDocument(BaseDoc):\n token_id: int\n generated_text: str\n```\n\n2. Initialize service:\n```python\nfrom transformers import GPT2Tokenizer, GPT2LMHeadModel\n\n\nclass TokenStreamingExecutor(Executor):\n def __init__(self, **kwargs):\n super().__init__(**kwargs)\n self.model = GPT2LMHeadModel.from_pretrained('gpt2')\n```\n\n3. Implement streaming:\n```python\n@requests(on='/stream')\nasync def task(self, doc: PromptDocument, **kwargs) -> ModelOutputDocument:\n input = tokenizer(doc.prompt, return_tensors='pt')\n input_len = input['input_ids'].shape[1]\n for _ in range(doc.max_tokens):\n output = self.model.generate(**input, max_new_tokens=1)\n if output[0][-1] == tokenizer.eos_token_id:\n break\n yield ModelOutputDocument(\n token_id=output[0][-1],\n generated_text=tokenizer.decode(\n output[0][input_len:], skip_special_tokens=True\n ),\n )\n input = {\n 'input_ids': output,\n 'attention_mask': torch.ones(1, len(output[0])),\n }\n```\n\n4. Serve and use:\n```python\n# Server\nwith Deployment(uses=TokenStreamingExecutor, port=12345, protocol='grpc') as dep:\n dep.block()\n\n\n# Client\nasync def main():\n client = Client(port=12345, protocol='grpc', asyncio=True)\n async for doc in client.stream_doc(\n on='/stream',\n inputs=PromptDocument(prompt='what is the capital of France ?', max_tokens=10),\n return_type=ModelOutputDocument,\n ):\n print(doc.generated_text)\n```\n\n## Support\n\nJina-serve is backed by [Jina AI](https://jina.ai) and licensed under [Apache-2.0](./LICENSE).\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Multimodal AI services & pipelines with cloud-native stack: gRPC, Kubernetes, Docker, OpenTelemetry, Prometheus, Jaeger, etc.",
"version": "3.28.0",
"project_urls": {
"Documentation": "https://jina.ai/serve",
"Download": "https://github.com/jina-ai/jina/tags",
"Homepage": "https://github.com/jina-ai/jina/",
"Source": "https://github.com/jina-ai/jina/",
"Tracker": "https://github.com/jina-ai/jina/issues"
},
"split_keywords": [
"jina",
"cloud-native",
"cross-modal",
"multimodal",
"neural-search",
"query",
"search",
"index",
"elastic",
"neural-network",
"encoding",
"embedding",
"serving",
"docker",
"container",
"image",
"video",
"audio",
"deep-learning",
"mlops"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1a02f921df7a9b4b9ba22c47004142e0ed5a0333bafc979a78ec3f34991917f1",
"md5": "f9f38341be15e21ab936b94e989fca69",
"sha256": "d18097f09f579a28522e1e202ae821dfc45e99fb497db4c68e9b1d22563449ae"
},
"downloads": -1,
"filename": "jina-3.28.0-cp310-cp310-macosx_10_9_x86_64.whl",
"has_sig": false,
"md5_digest": "f9f38341be15e21ab936b94e989fca69",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": null,
"size": 8954136,
"upload_time": "2024-11-08T13:26:22",
"upload_time_iso_8601": "2024-11-08T13:26:22.240671Z",
"url": "https://files.pythonhosted.org/packages/1a/02/f921df7a9b4b9ba22c47004142e0ed5a0333bafc979a78ec3f34991917f1/jina-3.28.0-cp310-cp310-macosx_10_9_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "34b60d63d3cb3d6061920b6c869308de6584d04842a02beaa10ad601b7e551a6",
"md5": "91600335bc2c6f338c0b5543c8725601",
"sha256": "c3c04a686074e3064da3ad25660aad850cfa664e9a599872540208ffd92a1a04"
},
"downloads": -1,
"filename": "jina-3.28.0-cp310-cp310-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "91600335bc2c6f338c0b5543c8725601",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": null,
"size": 8395470,
"upload_time": "2024-11-08T13:26:27",
"upload_time_iso_8601": "2024-11-08T13:26:27.206436Z",
"url": "https://files.pythonhosted.org/packages/34/b6/0d63d3cb3d6061920b6c869308de6584d04842a02beaa10ad601b7e551a6/jina-3.28.0-cp310-cp310-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "48963ea845b2dd9586cef6d4218556e0152dac31b38e526987a5fc8a6b9d5663",
"md5": "73bae2b81305df0888f199afd6ebc1db",
"sha256": "8743afbfc1fdf62e9ea818f42b44799a0f74e3093be4132c87a41d560c61ebea"
},
"downloads": -1,
"filename": "jina-3.28.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "73bae2b81305df0888f199afd6ebc1db",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": null,
"size": 7879857,
"upload_time": "2024-11-08T13:26:29",
"upload_time_iso_8601": "2024-11-08T13:26:29.209059Z",
"url": "https://files.pythonhosted.org/packages/48/96/3ea845b2dd9586cef6d4218556e0152dac31b38e526987a5fc8a6b9d5663/jina-3.28.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5ba823898ce4246fe9959553c3ab153b1d0db83c7ff336219034e5fb784cf792",
"md5": "09d9f1aac38d346c7cae49920920eb06",
"sha256": "dafe3055c37ca7a0b1627d82543498178462bbe6795fe4f60e2ee043a67adb55"
},
"downloads": -1,
"filename": "jina-3.28.0-cp311-cp311-macosx_10_9_x86_64.whl",
"has_sig": false,
"md5_digest": "09d9f1aac38d346c7cae49920920eb06",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": null,
"size": 8954100,
"upload_time": "2024-11-08T13:26:31",
"upload_time_iso_8601": "2024-11-08T13:26:31.538236Z",
"url": "https://files.pythonhosted.org/packages/5b/a8/23898ce4246fe9959553c3ab153b1d0db83c7ff336219034e5fb784cf792/jina-3.28.0-cp311-cp311-macosx_10_9_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "093b937bbbaee2b8dc7d5f316a2d6b6cb0872a0bae0c24cc4f351453edb37c92",
"md5": "5c300a0682a8b364cd030cc57b5fc805",
"sha256": "8e06055342ec4850a069ef8572b4315b17442f256c047e4931a48d3d5b2d2965"
},
"downloads": -1,
"filename": "jina-3.28.0-cp311-cp311-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "5c300a0682a8b364cd030cc57b5fc805",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": null,
"size": 8395442,
"upload_time": "2024-11-08T13:26:35",
"upload_time_iso_8601": "2024-11-08T13:26:35.000682Z",
"url": "https://files.pythonhosted.org/packages/09/3b/937bbbaee2b8dc7d5f316a2d6b6cb0872a0bae0c24cc4f351453edb37c92/jina-3.28.0-cp311-cp311-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-08 13:26:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jina-ai",
"github_project": "jina",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "jina"
}