# LLMOps-Xclient
Xclient客户端是一个用户友好的工具,旨在与Triton推理服务器毫不费力地通信。它为您管理技术细节,让您专注于您的数据和您的目标实现的结果。以下是它的帮助:
* 获取模型配置:客户端从服务器检索关于模型的详细信息,例如输入和输出数据张量的形状和名称。这一步对于正确准备数据和解释响应是至关重要的。这个功能被封装在ModelClient类中。
* 发送请求:利用模型信息,客户端通过将传递给infer_sample或infer_batch方法的参数映射到模型输入来生成推理请求。它将您的数据发送到Triton服务器,请求模型执行推理。参数可以作为位置参数或关键字参数传递(不允许混合它们),其余的由客户端处理。
* 返回响应:然后将模型的响应返回给您。它将输入解码为numpy数组,并将模型输出映射到从infer_sample或infer_batch方法返回给您的字典元素。如果批处理维度是由客户端添加的,它还会删除该维度。
由于获取模型配置的额外步骤,此过程可能会引入一些延迟。但是,您可以通过为多个请求重用Xclient客户端,或者使用预加载的模型配置(如果有的话)对其进行设置,从而将其最小化。
Xclient包括五个专门的基础客户端,以满足不同的需求:
* ModelClient:用于简单请求-响应操作的直接的同步客户端。
* FuturesModelClient:一个多线程客户端,可以并行处理多个请求,加快操作速度。
* DecoupledModelClient:为解耦模型设计的同步客户端,它允许与Triton服务器进行灵活的交互模式。
* AsyncioModelClient:一个异步客户端,可以很好地与Python的asyncio一起工作,以实现高效的并发操作。
* AsyncioDecoupledModelClient:一个与异步兼容的客户端,专门用于异步处理解耦模型。
三个专门为MetaLM 服务的高级客户端:
* ChatMetaLM:专门用于大模型对话,可以流式返回。
* MetaLMEmbeddings:专门用于嵌入的客户端,支持稠密嵌入和稀疏嵌入。
* MetaLMRerank:专门用于重排的客户端。
Xclient客户机使用来自Triton的tritonclient包。它是Triton Inference Server的Python客户端库。它提供了使用HTTP或gRPC协议与服务器通信的低级API。Xclient客户端构建在tritonclient之上,并提供用于与服务器通信的高级API。并不是所有tritonclient的特性都可以在Xclient客户端中使用。如果需要对与服务器的通信进行更多的控制,可以直接使用tritonclient。
## 三个高级客户端
### ChatMetaLM
同步模式
```
from Xclient.llms import ChatMetaLM
llm = ChatMetaLM(server_url="10.88.36.58:8201", model_name="Qwen2-0.5B-Instruct")
result = llm.invoke("介绍一下你自己")
print(result)
```
流模式
```
from Xclient.llms import ChatMetaLM
llm = ChatMetaLM(server_url="10.88.36.58:8201", model_name="Qwen2-0.5B-Instruct")
for token in llm.stream("介绍一下你自己"):
print(token)
```
提早终止
```
from Xclient.llms import ChatMetaLM
llm = ChatMetaLM(server_url="10.88.36.58:8201", model_name="Qwen2-0.5B-Instruct",stop=['。'])
for token in llm.stream("介绍一下你自己"):
print(token)
```
### MetaLMEmbeddings
```
from Xclient import MetaLMEmbeddings
xlembed = MetaLMEmbeddings(model="bge-m3-v",base_url="http://10.88.36.58:8200")
text = ['asdasda','asdwrfa']
res= xlembed.embed_query('asdasda')
print(res)
res= xlembed.embed_documents(text)
print(res)
#附带稀疏向量
res= xlembed.embed_documents_sparse(text)
print(res)
res= xlembed.embed_query_sparse('asdasda')
print(res)
```
### MetaLMRerank
```
from Xclient import MetaLMRerank
import os
from typing import List
from langchain_core.documents import Document
class CharacterTextSplitter:
def __init__(self, chunk_size: int):
self.chunk_size = chunk_size
def create_documents(self, text: str) -> List[Document]:
words = text.split(',')
chunks = []
for i in range(0, len(words), self.chunk_size):
chunk = " ".join(words[i : i + self.chunk_size])
chunks.append(Document(page_content=chunk))
return chunks
splitter = CharacterTextSplitter(1)
documents = splitter.create_documents("测试程序1,测试程序2,测试程序3,测试程序4,测试程序5,测试程序6,测试程序7",)
query = '测试程序5'
rerank = MetaLMRerank(model="bge-reranker-v2-m3",base_url="http://10.88.36.58:8200")
result_docs = rerank.compress_documents(documents=documents, query=query)
print(result_docs)
```
## 4种基础模式
```
import numpy as np
from Xclient.client import ModelClient,FuturesModelClient,DecoupledModelClient,AsyncioModelClient
sample = np.array(
['你从哪里来,要到哪里去'.encode("utf-8")], dtype=np.object_
)
sample2 = np.array([
['你从哪里来,要到哪里去'.encode("utf-8")],
['你从哪里来,要到哪里去.........'.encode("utf-8")]
], dtype=np.object_
)
```
### 同步模式
```
with ModelClient("grpc://10.88.36.58:8201", "bge_large_zh","2") as client:
print(client.model_config)
res = client.infer_sample(sample)
print(res)
res = client.infer_batch(sample2)
print(res)
```
### 并发模式,不等待
```
with FuturesModelClient("grpc://10.88.36.58:8201", "bge_large_zh","2") as client:
res = client.infer_sample(sample)
print(res.result())
```
### 异步模式
#async 暂时可以不用
### 解耦模式,流
```
import numpy as np
from Xclient import AsyncioDecoupledModelClient
async def main():
client = AsyncioDecoupledModelClient("grpc://10.88.36.58:8201", "Qwen2-0.5B-Instruct")
async for answer in client.infer_sample(np.array(["I'm Pickle Rick".encode('utf-8')])):
print(answer)
await client.close()
# Run the code as a coroutine using asyncio.run()
import asyncio
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
```
Raw data
{
"_id": null,
"home_page": null,
"name": "metalm-xclient",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "grpc, http, triton, tensorrt, inference, server, service, client, nvidia, rtzr",
"author": null,
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/83/71/2b1c964806ab2530f22190e917a0d3c9526d4983c29355d9dd761f4242de/metalm_xclient-0.1.0.tar.gz",
"platform": null,
"description": "# LLMOps-Xclient\n\nXclient\u5ba2\u6237\u7aef\u662f\u4e00\u4e2a\u7528\u6237\u53cb\u597d\u7684\u5de5\u5177\uff0c\u65e8\u5728\u4e0eTriton\u63a8\u7406\u670d\u52a1\u5668\u6beb\u4e0d\u8d39\u529b\u5730\u901a\u4fe1\u3002\u5b83\u4e3a\u60a8\u7ba1\u7406\u6280\u672f\u7ec6\u8282\uff0c\u8ba9\u60a8\u4e13\u6ce8\u4e8e\u60a8\u7684\u6570\u636e\u548c\u60a8\u7684\u76ee\u6807\u5b9e\u73b0\u7684\u7ed3\u679c\u3002\u4ee5\u4e0b\u662f\u5b83\u7684\u5e2e\u52a9:\n\n* \u83b7\u53d6\u6a21\u578b\u914d\u7f6e:\u5ba2\u6237\u7aef\u4ece\u670d\u52a1\u5668\u68c0\u7d22\u5173\u4e8e\u6a21\u578b\u7684\u8be6\u7ec6\u4fe1\u606f\uff0c\u4f8b\u5982\u8f93\u5165\u548c\u8f93\u51fa\u6570\u636e\u5f20\u91cf\u7684\u5f62\u72b6\u548c\u540d\u79f0\u3002\u8fd9\u4e00\u6b65\u5bf9\u4e8e\u6b63\u786e\u51c6\u5907\u6570\u636e\u548c\u89e3\u91ca\u54cd\u5e94\u662f\u81f3\u5173\u91cd\u8981\u7684\u3002\u8fd9\u4e2a\u529f\u80fd\u88ab\u5c01\u88c5\u5728ModelClient\u7c7b\u4e2d\u3002\n\n* \u53d1\u9001\u8bf7\u6c42:\u5229\u7528\u6a21\u578b\u4fe1\u606f\uff0c\u5ba2\u6237\u7aef\u901a\u8fc7\u5c06\u4f20\u9012\u7ed9infer_sample\u6216infer_batch\u65b9\u6cd5\u7684\u53c2\u6570\u6620\u5c04\u5230\u6a21\u578b\u8f93\u5165\u6765\u751f\u6210\u63a8\u7406\u8bf7\u6c42\u3002\u5b83\u5c06\u60a8\u7684\u6570\u636e\u53d1\u9001\u5230Triton\u670d\u52a1\u5668\uff0c\u8bf7\u6c42\u6a21\u578b\u6267\u884c\u63a8\u7406\u3002\u53c2\u6570\u53ef\u4ee5\u4f5c\u4e3a\u4f4d\u7f6e\u53c2\u6570\u6216\u5173\u952e\u5b57\u53c2\u6570\u4f20\u9012(\u4e0d\u5141\u8bb8\u6df7\u5408\u5b83\u4eec)\uff0c\u5176\u4f59\u7684\u7531\u5ba2\u6237\u7aef\u5904\u7406\u3002\n\n* \u8fd4\u56de\u54cd\u5e94:\u7136\u540e\u5c06\u6a21\u578b\u7684\u54cd\u5e94\u8fd4\u56de\u7ed9\u60a8\u3002\u5b83\u5c06\u8f93\u5165\u89e3\u7801\u4e3anumpy\u6570\u7ec4\uff0c\u5e76\u5c06\u6a21\u578b\u8f93\u51fa\u6620\u5c04\u5230\u4eceinfer_sample\u6216infer_batch\u65b9\u6cd5\u8fd4\u56de\u7ed9\u60a8\u7684\u5b57\u5178\u5143\u7d20\u3002\u5982\u679c\u6279\u5904\u7406\u7ef4\u5ea6\u662f\u7531\u5ba2\u6237\u7aef\u6dfb\u52a0\u7684\uff0c\u5b83\u8fd8\u4f1a\u5220\u9664\u8be5\u7ef4\u5ea6\u3002\n\n\u7531\u4e8e\u83b7\u53d6\u6a21\u578b\u914d\u7f6e\u7684\u989d\u5916\u6b65\u9aa4\uff0c\u6b64\u8fc7\u7a0b\u53ef\u80fd\u4f1a\u5f15\u5165\u4e00\u4e9b\u5ef6\u8fdf\u3002\u4f46\u662f\uff0c\u60a8\u53ef\u4ee5\u901a\u8fc7\u4e3a\u591a\u4e2a\u8bf7\u6c42\u91cd\u7528Xclient\u5ba2\u6237\u7aef\uff0c\u6216\u8005\u4f7f\u7528\u9884\u52a0\u8f7d\u7684\u6a21\u578b\u914d\u7f6e(\u5982\u679c\u6709\u7684\u8bdd)\u5bf9\u5176\u8fdb\u884c\u8bbe\u7f6e\uff0c\u4ece\u800c\u5c06\u5176\u6700\u5c0f\u5316\u3002\n\nXclient\u5305\u62ec\u4e94\u4e2a\u4e13\u95e8\u7684\u57fa\u7840\u5ba2\u6237\u7aef\uff0c\u4ee5\u6ee1\u8db3\u4e0d\u540c\u7684\u9700\u6c42:\n\n* ModelClient:\u7528\u4e8e\u7b80\u5355\u8bf7\u6c42-\u54cd\u5e94\u64cd\u4f5c\u7684\u76f4\u63a5\u7684\u540c\u6b65\u5ba2\u6237\u7aef\u3002\n* FuturesModelClient:\u4e00\u4e2a\u591a\u7ebf\u7a0b\u5ba2\u6237\u7aef\uff0c\u53ef\u4ee5\u5e76\u884c\u5904\u7406\u591a\u4e2a\u8bf7\u6c42\uff0c\u52a0\u5feb\u64cd\u4f5c\u901f\u5ea6\u3002\n* DecoupledModelClient:\u4e3a\u89e3\u8026\u6a21\u578b\u8bbe\u8ba1\u7684\u540c\u6b65\u5ba2\u6237\u7aef\uff0c\u5b83\u5141\u8bb8\u4e0eTriton\u670d\u52a1\u5668\u8fdb\u884c\u7075\u6d3b\u7684\u4ea4\u4e92\u6a21\u5f0f\u3002\n* AsyncioModelClient:\u4e00\u4e2a\u5f02\u6b65\u5ba2\u6237\u7aef\uff0c\u53ef\u4ee5\u5f88\u597d\u5730\u4e0ePython\u7684asyncio\u4e00\u8d77\u5de5\u4f5c\uff0c\u4ee5\u5b9e\u73b0\u9ad8\u6548\u7684\u5e76\u53d1\u64cd\u4f5c\u3002\n* AsyncioDecoupledModelClient:\u4e00\u4e2a\u4e0e\u5f02\u6b65\u517c\u5bb9\u7684\u5ba2\u6237\u7aef\uff0c\u4e13\u95e8\u7528\u4e8e\u5f02\u6b65\u5904\u7406\u89e3\u8026\u6a21\u578b\u3002\n\n\u4e09\u4e2a\u4e13\u95e8\u4e3aMetaLM \u670d\u52a1\u7684\u9ad8\u7ea7\u5ba2\u6237\u7aef\uff1a\n* ChatMetaLM\uff1a\u4e13\u95e8\u7528\u4e8e\u5927\u6a21\u578b\u5bf9\u8bdd\uff0c\u53ef\u4ee5\u6d41\u5f0f\u8fd4\u56de\u3002\n* MetaLMEmbeddings\uff1a\u4e13\u95e8\u7528\u4e8e\u5d4c\u5165\u7684\u5ba2\u6237\u7aef\uff0c\u652f\u6301\u7a20\u5bc6\u5d4c\u5165\u548c\u7a00\u758f\u5d4c\u5165\u3002\n* MetaLMRerank\uff1a\u4e13\u95e8\u7528\u4e8e\u91cd\u6392\u7684\u5ba2\u6237\u7aef\u3002\n\nXclient\u5ba2\u6237\u673a\u4f7f\u7528\u6765\u81eaTriton\u7684tritonclient\u5305\u3002\u5b83\u662fTriton Inference Server\u7684Python\u5ba2\u6237\u7aef\u5e93\u3002\u5b83\u63d0\u4f9b\u4e86\u4f7f\u7528HTTP\u6216gRPC\u534f\u8bae\u4e0e\u670d\u52a1\u5668\u901a\u4fe1\u7684\u4f4e\u7ea7API\u3002Xclient\u5ba2\u6237\u7aef\u6784\u5efa\u5728tritonclient\u4e4b\u4e0a\uff0c\u5e76\u63d0\u4f9b\u7528\u4e8e\u4e0e\u670d\u52a1\u5668\u901a\u4fe1\u7684\u9ad8\u7ea7API\u3002\u5e76\u4e0d\u662f\u6240\u6709tritonclient\u7684\u7279\u6027\u90fd\u53ef\u4ee5\u5728Xclient\u5ba2\u6237\u7aef\u4e2d\u4f7f\u7528\u3002\u5982\u679c\u9700\u8981\u5bf9\u4e0e\u670d\u52a1\u5668\u7684\u901a\u4fe1\u8fdb\u884c\u66f4\u591a\u7684\u63a7\u5236\uff0c\u53ef\u4ee5\u76f4\u63a5\u4f7f\u7528tritonclient\u3002\n\n## \u4e09\u4e2a\u9ad8\u7ea7\u5ba2\u6237\u7aef\n### ChatMetaLM\n\u540c\u6b65\u6a21\u5f0f\n```\nfrom Xclient.llms import ChatMetaLM\nllm = ChatMetaLM(server_url=\"10.88.36.58:8201\", model_name=\"Qwen2-0.5B-Instruct\")\nresult = llm.invoke(\"\u4ecb\u7ecd\u4e00\u4e0b\u4f60\u81ea\u5df1\")\nprint(result)\n```\n\u6d41\u6a21\u5f0f\n```\nfrom Xclient.llms import ChatMetaLM\nllm = ChatMetaLM(server_url=\"10.88.36.58:8201\", model_name=\"Qwen2-0.5B-Instruct\")\nfor token in llm.stream(\"\u4ecb\u7ecd\u4e00\u4e0b\u4f60\u81ea\u5df1\"):\n print(token)\n```\n\u63d0\u65e9\u7ec8\u6b62\n```\nfrom Xclient.llms import ChatMetaLM\nllm = ChatMetaLM(server_url=\"10.88.36.58:8201\", model_name=\"Qwen2-0.5B-Instruct\",stop=['\u3002'])\nfor token in llm.stream(\"\u4ecb\u7ecd\u4e00\u4e0b\u4f60\u81ea\u5df1\"):\n print(token)\n```\n### MetaLMEmbeddings\n```\nfrom Xclient import MetaLMEmbeddings\n\nxlembed = MetaLMEmbeddings(model=\"bge-m3-v\",base_url=\"http://10.88.36.58:8200\")\n\ntext = ['asdasda','asdwrfa']\n\nres= xlembed.embed_query('asdasda')\nprint(res)\n\nres= xlembed.embed_documents(text)\nprint(res)\n#\u9644\u5e26\u7a00\u758f\u5411\u91cf\nres= xlembed.embed_documents_sparse(text)\nprint(res)\n\nres= xlembed.embed_query_sparse('asdasda')\nprint(res)\n```\n\n### MetaLMRerank\n```\nfrom Xclient import MetaLMRerank\nimport os\nfrom typing import List\n\n\nfrom langchain_core.documents import Document\n\n\nclass CharacterTextSplitter:\n def __init__(self, chunk_size: int):\n self.chunk_size = chunk_size\n\n def create_documents(self, text: str) -> List[Document]:\n words = text.split(',')\n chunks = []\n for i in range(0, len(words), self.chunk_size):\n chunk = \" \".join(words[i : i + self.chunk_size])\n chunks.append(Document(page_content=chunk))\n return chunks\nsplitter = CharacterTextSplitter(1)\ndocuments = splitter.create_documents(\"\u6d4b\u8bd5\u7a0b\u5e8f1,\u6d4b\u8bd5\u7a0b\u5e8f2,\u6d4b\u8bd5\u7a0b\u5e8f3,\u6d4b\u8bd5\u7a0b\u5e8f4,\u6d4b\u8bd5\u7a0b\u5e8f5,\u6d4b\u8bd5\u7a0b\u5e8f6,\u6d4b\u8bd5\u7a0b\u5e8f7\",)\nquery = '\u6d4b\u8bd5\u7a0b\u5e8f5'\n\nrerank = MetaLMRerank(model=\"bge-reranker-v2-m3\",base_url=\"http://10.88.36.58:8200\")\nresult_docs = rerank.compress_documents(documents=documents, query=query)\nprint(result_docs)\n\n```\n\n## 4\u79cd\u57fa\u7840\u6a21\u5f0f\n\n```\nimport numpy as np\n\nfrom Xclient.client import ModelClient,FuturesModelClient,DecoupledModelClient,AsyncioModelClient\n\nsample = np.array(\n ['\u4f60\u4ece\u54ea\u91cc\u6765\uff0c\u8981\u5230\u54ea\u91cc\u53bb'.encode(\"utf-8\")], dtype=np.object_\n)\nsample2 = np.array([\n ['\u4f60\u4ece\u54ea\u91cc\u6765\uff0c\u8981\u5230\u54ea\u91cc\u53bb'.encode(\"utf-8\")],\n ['\u4f60\u4ece\u54ea\u91cc\u6765\uff0c\u8981\u5230\u54ea\u91cc\u53bb.........'.encode(\"utf-8\")]\n], dtype=np.object_\n)\n```\n\n### \u540c\u6b65\u6a21\u5f0f\n```\nwith ModelClient(\"grpc://10.88.36.58:8201\", \"bge_large_zh\",\"2\") as client:\n print(client.model_config)\n res = client.infer_sample(sample)\n print(res)\n res = client.infer_batch(sample2)\n print(res)\n```\n\n### \u5e76\u53d1\u6a21\u5f0f\uff0c\u4e0d\u7b49\u5f85\n```\nwith FuturesModelClient(\"grpc://10.88.36.58:8201\", \"bge_large_zh\",\"2\") as client:\n res = client.infer_sample(sample)\nprint(res.result())\n```\n\n### \u5f02\u6b65\u6a21\u5f0f\n#async \u6682\u65f6\u53ef\u4ee5\u4e0d\u7528\n\n### \u89e3\u8026\u6a21\u5f0f\uff0c\u6d41\n```\nimport numpy as np\nfrom Xclient import AsyncioDecoupledModelClient\n\nasync def main():\n client = AsyncioDecoupledModelClient(\"grpc://10.88.36.58:8201\", \"Qwen2-0.5B-Instruct\")\n async for answer in client.infer_sample(np.array([\"I'm Pickle Rick\".encode('utf-8')])):\n print(answer)\n await client.close()\n\n# Run the code as a coroutine using asyncio.run()\nimport asyncio\nloop = asyncio.get_event_loop()\nloop.run_until_complete(main())\n```\n",
"bugtrack_url": null,
"license": "BSD",
"summary": "\u96ea\u6d6a\u6a21\u578b\u63a8\u7406\u670d\u52a1\u7684\u5ba2\u6237\u7aef",
"version": "0.1.0",
"project_urls": null,
"split_keywords": [
"grpc",
" http",
" triton",
" tensorrt",
" inference",
" server",
" service",
" client",
" nvidia",
" rtzr"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2d81661d70d8b9eef9c5f02d60c369d29a2f99410aa32b6b9852f46ddb3bf526",
"md5": "a07f66b43ddb3bd1f5c9d7bf958980c3",
"sha256": "226ef1dfe99f94eeeb08a6b15aa6e06747fef2cace7ffe1673734db4f81bf259"
},
"downloads": -1,
"filename": "metalm_xclient-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a07f66b43ddb3bd1f5c9d7bf958980c3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 52807,
"upload_time": "2024-11-04T13:31:30",
"upload_time_iso_8601": "2024-11-04T13:31:30.722235Z",
"url": "https://files.pythonhosted.org/packages/2d/81/661d70d8b9eef9c5f02d60c369d29a2f99410aa32b6b9852f46ddb3bf526/metalm_xclient-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "83712b1c964806ab2530f22190e917a0d3c9526d4983c29355d9dd761f4242de",
"md5": "3907af2fc4964d7a379bf03bb1ff620d",
"sha256": "1f18cba47d8d94fa2f7b93057623c57906eed4697c4b4719fbdc8fffb209a8ed"
},
"downloads": -1,
"filename": "metalm_xclient-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "3907af2fc4964d7a379bf03bb1ff620d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 45190,
"upload_time": "2024-11-04T13:31:32",
"upload_time_iso_8601": "2024-11-04T13:31:32.245824Z",
"url": "https://files.pythonhosted.org/packages/83/71/2b1c964806ab2530f22190e917a0d3c9526d4983c29355d9dd761f4242de/metalm_xclient-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-04 13:31:32",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "metalm-xclient"
}