xinference


Namexinference JSON
Version 0.11.1 PyPI version JSON
download
home_pagehttps://github.com/xorbitsai/inference
SummaryModel Serving Made Easy
upload_time2024-05-17 07:21:06
maintainerQin Xuye
docs_urlNone
authorQin Xuye
requires_pythonNone
licenseApache License 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
<img src="./assets/xorbits-logo.png" width="180px" alt="xorbits" />

# Xorbits Inference: Model Serving Made Easy πŸ€–

[![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/)
[![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
[![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main)
[![Slack](https://img.shields.io/badge/join_Slack-781FF5.svg?logo=slack&style=for-the-badge)](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg)
[![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=x&style=for-the-badge)](https://twitter.com/xorbitsio)

English | [中文介绍](README_zh_CN.md) | [ζ—₯本θͺž](README_ja_JP.md)
</div>
<br />


Xorbits Inference(Xinference) is a powerful and versatile library designed to serve language, 
speech recognition, and multimodal models. With Xorbits Inference, you can effortlessly deploy 
and serve your or state-of-the-art built-in models using just a single command. Whether you are a 
researcher, developer, or data scientist, Xorbits Inference empowers you to unleash the full 
potential of cutting-edge AI models.

<div align="center">
<i><a href="https://join.slack.com/t/xorbitsio/shared_invite/zt-1z3zsm9ep-87yI9YZ_B79HLB2ccTq4WA">πŸ‘‰ Join our Slack community!</a></i>
</div>

## πŸ”₯ Hot Topics
### Framework Enhancements
- Support specifying worker and GPU indexes for launching models: [#1195](https://github.com/xorbitsai/inference/pull/1195)
- Support SGLang backend: [#1161](https://github.com/xorbitsai/inference/pull/1161)
- Support LoRA for LLM and image models: [#1080](https://github.com/xorbitsai/inference/pull/1080)
- Support speech recognition model: [#929](https://github.com/xorbitsai/inference/pull/929)
- Metrics support: [#906](https://github.com/xorbitsai/inference/pull/906)
- Docker image: [#855](https://github.com/xorbitsai/inference/pull/855)
- Support multimodal: [#829](https://github.com/xorbitsai/inference/pull/829)
### New Models
- Built-in support for [Llama 3](https://github.com/meta-llama/llama3): [#1332](https://github.com/xorbitsai/inference/pull/1332)
- Built-in support for [Qwen1.5 110B](https://huggingface.co/Qwen/Qwen1.5-110B-Chat): [#1388](https://github.com/xorbitsai/inference/pull/1388)
- Built-in support for [Mixtral-8x22B-instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1): [#1340](https://github.com/xorbitsai/inference/pull/1340)
- Built-in support for [Command-R](https://huggingface.co/CohereForAI/c4ai-command-r-v01): [#1310](https://github.com/xorbitsai/inference/pull/1310)
- Built-in support for [Qwen1.5 MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B-Chat): [#1263](https://github.com/xorbitsai/inference/pull/1263)
- Built-in support for [Qwen1.5 32B](https://huggingface.co/Qwen/Qwen1.5-32B-Chat): [#1249](https://github.com/xorbitsai/inference/pull/1249)
### Integrations
- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
- [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization.
- [Chatbox](https://chatboxai.app/): a desktop client for multiple cutting-edge LLM models, available on Windows, Mac and Linux.
- [RAGFlow](https://github.com/infiniflow/ragflow): is an open-source RAG engine based on deep document understanding.


## Key Features
🌟 **Model Serving Made Easy**: Simplify the process of serving large language, speech 
recognition, and multimodal models. You can set up and deploy your models
for experimentation and production with a single command.

⚑️ **State-of-the-Art Models**: Experiment with cutting-edge built-in models using a single 
command. Inference provides access to state-of-the-art open-source models!

πŸ–₯ **Heterogeneous Hardware Utilization**: Make the most of your hardware resources with
[ggml](https://github.com/ggerganov/ggml). Xorbits Inference intelligently utilizes heterogeneous
hardware, including GPUs and CPUs, to accelerate your model inference tasks.

βš™οΈ **Flexible API and Interfaces**: Offer multiple interfaces for interacting
with your models, supporting OpenAI compatible RESTful API (including Function Calling API), RPC, CLI 
and WebUI for seamless model management and interaction.

🌐 **Distributed Deployment**: Excel in distributed deployment scenarios, 
allowing the seamless distribution of model inference across multiple devices or machines.

πŸ”Œ **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates
with popular third-party libraries including [LangChain](https://python.langchain.com/docs/integrations/providers/xinference), [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window), [Dify](https://docs.dify.ai/advanced/model-configuration/xinference), and [Chatbox](https://chatboxai.app/).

## Why Xinference
| Feature                                        | Xinference | FastChat | OpenLLM | RayLLM |
|------------------------------------------------|------------|----------|---------|--------|
| OpenAI-Compatible RESTful API                  | βœ… | βœ… | βœ… | βœ… |
| vLLM Integrations                              | βœ… | βœ… | βœ… | βœ… |
| More Inference Engines (GGML, TensorRT)        | βœ… | ❌ | βœ… | βœ… |
| More Platforms (CPU, Metal)                    | βœ… | βœ… | ❌ | ❌ |
| Multi-node Cluster Deployment                  | βœ… | ❌ | ❌ | βœ… |
| Image Models (Text-to-Image)                   | βœ… | βœ… | ❌ | ❌ |
| Text Embedding Models                          | βœ… | ❌ | ❌ | ❌ |
| Multimodal Models                              | βœ… | ❌ | ❌ | ❌ |
| Audio Models                                   | βœ… | ❌ | ❌ | ❌ |
| More OpenAI Functionalities (Function Calling) | βœ… | ❌ | ❌ | ❌ |

## Getting Started

**Please give us a star before you begin, and you'll receive instant notifications for every new release on GitHub!**

* [Docs](https://inference.readthedocs.io/en/latest/index.html)
* [Built-in Models](https://inference.readthedocs.io/en/latest/models/builtin/index.html)
* [Custom Models](https://inference.readthedocs.io/en/latest/models/custom.html)
* [Deployment Docs](https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html)
* [Examples and Tutorials](https://inference.readthedocs.io/en/latest/examples/index.html)

### Jupyter Notebook

The lightest way to experience Xinference is to try our [Juypter Notebook on Google Colab](https://colab.research.google.com/github/xorbitsai/inference/blob/main/examples/Xinference_Quick_Start.ipynb).

### Docker 

Nvidia GPU users can start Xinference server using [Xinference Docker Image](https://inference.readthedocs.io/en/latest/getting_started/using_docker_image.html). Prior to executing the installation command, ensure that both [Docker](https://docs.docker.com/get-docker/) and [CUDA](https://developer.nvidia.com/cuda-downloads) are set up on your system.

```bash
docker run --name xinference -d -p 9997:9997 -e XINFERENCE_HOME=/data -v </on/your/host>:/data --gpus all xprobe/xinference:latest xinference-local -H 0.0.0.0
```

### Quick Start

Install Xinference by using pip as follows. (For more options, see [Installation page](https://inference.readthedocs.io/en/latest/getting_started/installation.html).)

```bash
pip install "xinference[all]"
```

To start a local instance of Xinference, run the following command:

```bash
$ xinference-local
```

Once Xinference is running, there are multiple ways you can try it: via the web UI, via cURL,
 via the command line, or via the Xinference’s python client. Check out our [docs]( https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html#run-xinference-locally) for the guide.

![web UI](assets/screenshot.png)

## Getting involved

| Platform                                                                                      | Purpose                                            |
|-----------------------------------------------------------------------------------------------|----------------------------------------------------|
| [Github Issues](https://github.com/xorbitsai/inference/issues)                                | Reporting bugs and filing feature requests.        |
| [Slack](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg) | Collaborating with other Xorbits users.            |
| [Twitter](https://twitter.com/xorbitsio)                                                      | Staying up-to-date on new features.                |

## Contributors

<a href="https://github.com/xorbitsai/inference/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=xorbitsai/inference" />
</a>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/xorbitsai/inference",
    "name": "xinference",
    "maintainer": "Qin Xuye",
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": "qinxuye@xprobe.io",
    "keywords": null,
    "author": "Qin Xuye",
    "author_email": "qinxuye@xprobe.io",
    "download_url": "https://files.pythonhosted.org/packages/d3/19/376547f27bdc9cc0982081019fedfa50a4d0b0c81fc349d42ea50f0eb596/xinference-0.11.1.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n<img src=\"./assets/xorbits-logo.png\" width=\"180px\" alt=\"xorbits\" />\n\n# Xorbits Inference: Model Serving Made Easy \ud83e\udd16\n\n[![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/)\n[![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)\n[![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main)\n[![Slack](https://img.shields.io/badge/join_Slack-781FF5.svg?logo=slack&style=for-the-badge)](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg)\n[![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=x&style=for-the-badge)](https://twitter.com/xorbitsio)\n\nEnglish | [\u4e2d\u6587\u4ecb\u7ecd](README_zh_CN.md) | [\u65e5\u672c\u8a9e](README_ja_JP.md)\n</div>\n<br />\n\n\nXorbits Inference(Xinference) is a powerful and versatile library designed to serve language, \nspeech recognition, and multimodal models. With Xorbits Inference, you can effortlessly deploy \nand serve your or state-of-the-art built-in models using just a single command. Whether you are a \nresearcher, developer, or data scientist, Xorbits Inference empowers you to unleash the full \npotential of cutting-edge AI models.\n\n<div align=\"center\">\n<i><a href=\"https://join.slack.com/t/xorbitsio/shared_invite/zt-1z3zsm9ep-87yI9YZ_B79HLB2ccTq4WA\">\ud83d\udc49 Join our Slack community!</a></i>\n</div>\n\n## \ud83d\udd25 Hot Topics\n### Framework Enhancements\n- Support specifying worker and GPU indexes for launching models: [#1195](https://github.com/xorbitsai/inference/pull/1195)\n- Support SGLang backend: [#1161](https://github.com/xorbitsai/inference/pull/1161)\n- Support LoRA for LLM and image models: [#1080](https://github.com/xorbitsai/inference/pull/1080)\n- Support speech recognition model: [#929](https://github.com/xorbitsai/inference/pull/929)\n- Metrics support: [#906](https://github.com/xorbitsai/inference/pull/906)\n- Docker image: [#855](https://github.com/xorbitsai/inference/pull/855)\n- Support multimodal: [#829](https://github.com/xorbitsai/inference/pull/829)\n### New Models\n- Built-in support for [Llama 3](https://github.com/meta-llama/llama3): [#1332](https://github.com/xorbitsai/inference/pull/1332)\n- Built-in support for [Qwen1.5 110B](https://huggingface.co/Qwen/Qwen1.5-110B-Chat): [#1388](https://github.com/xorbitsai/inference/pull/1388)\n- Built-in support for [Mixtral-8x22B-instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1): [#1340](https://github.com/xorbitsai/inference/pull/1340)\n- Built-in support for [Command-R](https://huggingface.co/CohereForAI/c4ai-command-r-v01): [#1310](https://github.com/xorbitsai/inference/pull/1310)\n- Built-in support for [Qwen1.5 MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B-Chat): [#1263](https://github.com/xorbitsai/inference/pull/1263)\n- Built-in support for [Qwen1.5 32B](https://huggingface.co/Qwen/Qwen1.5-32B-Chat): [#1249](https://github.com/xorbitsai/inference/pull/1249)\n### Integrations\n- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.\n- [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization.\n- [Chatbox](https://chatboxai.app/): a desktop client for multiple cutting-edge LLM models, available on Windows, Mac and Linux.\n- [RAGFlow](https://github.com/infiniflow/ragflow): is an open-source RAG engine based on deep document understanding.\n\n\n## Key Features\n\ud83c\udf1f **Model Serving Made Easy**: Simplify the process of serving large language, speech \nrecognition, and multimodal models. You can set up and deploy your models\nfor experimentation and production with a single command.\n\n\u26a1\ufe0f **State-of-the-Art Models**: Experiment with cutting-edge built-in models using a single \ncommand. Inference provides access to state-of-the-art open-source models!\n\n\ud83d\udda5 **Heterogeneous Hardware Utilization**: Make the most of your hardware resources with\n[ggml](https://github.com/ggerganov/ggml). Xorbits Inference intelligently utilizes heterogeneous\nhardware, including GPUs and CPUs, to accelerate your model inference tasks.\n\n\u2699\ufe0f **Flexible API and Interfaces**: Offer multiple interfaces for interacting\nwith your models, supporting OpenAI compatible RESTful API (including Function Calling API), RPC, CLI \nand WebUI for seamless model management and interaction.\n\n\ud83c\udf10 **Distributed Deployment**: Excel in distributed deployment scenarios, \nallowing the seamless distribution of model inference across multiple devices or machines.\n\n\ud83d\udd0c **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates\nwith popular third-party libraries including [LangChain](https://python.langchain.com/docs/integrations/providers/xinference), [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window), [Dify](https://docs.dify.ai/advanced/model-configuration/xinference), and [Chatbox](https://chatboxai.app/).\n\n## Why Xinference\n| Feature                                        | Xinference | FastChat | OpenLLM | RayLLM |\n|------------------------------------------------|------------|----------|---------|--------|\n| OpenAI-Compatible RESTful API                  | \u2705 | \u2705 | \u2705 | \u2705 |\n| vLLM Integrations                              | \u2705 | \u2705 | \u2705 | \u2705 |\n| More Inference Engines (GGML, TensorRT)        | \u2705 | \u274c | \u2705 | \u2705 |\n| More Platforms (CPU, Metal)                    | \u2705 | \u2705 | \u274c | \u274c |\n| Multi-node Cluster Deployment                  | \u2705 | \u274c | \u274c | \u2705 |\n| Image Models (Text-to-Image)                   | \u2705 | \u2705 | \u274c | \u274c |\n| Text Embedding Models                          | \u2705 | \u274c | \u274c | \u274c |\n| Multimodal Models                              | \u2705 | \u274c | \u274c | \u274c |\n| Audio Models                                   | \u2705 | \u274c | \u274c | \u274c |\n| More OpenAI Functionalities (Function Calling) | \u2705 | \u274c | \u274c | \u274c |\n\n## Getting Started\n\n**Please give us a star before you begin, and you'll receive instant notifications for every new release on GitHub!**\n\n* [Docs](https://inference.readthedocs.io/en/latest/index.html)\n* [Built-in Models](https://inference.readthedocs.io/en/latest/models/builtin/index.html)\n* [Custom Models](https://inference.readthedocs.io/en/latest/models/custom.html)\n* [Deployment Docs](https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html)\n* [Examples and Tutorials](https://inference.readthedocs.io/en/latest/examples/index.html)\n\n### Jupyter Notebook\n\nThe lightest way to experience Xinference is to try our [Juypter Notebook on Google Colab](https://colab.research.google.com/github/xorbitsai/inference/blob/main/examples/Xinference_Quick_Start.ipynb).\n\n### Docker \n\nNvidia GPU users can start Xinference server using [Xinference Docker Image](https://inference.readthedocs.io/en/latest/getting_started/using_docker_image.html). Prior to executing the installation command, ensure that both [Docker](https://docs.docker.com/get-docker/) and [CUDA](https://developer.nvidia.com/cuda-downloads) are set up on your system.\n\n```bash\ndocker run --name xinference -d -p 9997:9997 -e XINFERENCE_HOME=/data -v </on/your/host>:/data --gpus all xprobe/xinference:latest xinference-local -H 0.0.0.0\n```\n\n### Quick Start\n\nInstall Xinference by using pip as follows. (For more options, see [Installation page](https://inference.readthedocs.io/en/latest/getting_started/installation.html).)\n\n```bash\npip install \"xinference[all]\"\n```\n\nTo start a local instance of Xinference, run the following command:\n\n```bash\n$ xinference-local\n```\n\nOnce Xinference is running, there are multiple ways you can try it: via the web UI, via cURL,\n via the command line, or via the Xinference\u2019s python client. Check out our [docs]( https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html#run-xinference-locally) for the guide.\n\n![web UI](assets/screenshot.png)\n\n## Getting involved\n\n| Platform                                                                                      | Purpose                                            |\n|-----------------------------------------------------------------------------------------------|----------------------------------------------------|\n| [Github Issues](https://github.com/xorbitsai/inference/issues)                                | Reporting bugs and filing feature requests.        |\n| [Slack](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg) | Collaborating with other Xorbits users.            |\n| [Twitter](https://twitter.com/xorbitsio)                                                      | Staying up-to-date on new features.                |\n\n## Contributors\n\n<a href=\"https://github.com/xorbitsai/inference/graphs/contributors\">\n  <img src=\"https://contrib.rocks/image?repo=xorbitsai/inference\" />\n</a>\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "Model Serving Made Easy",
    "version": "0.11.1",
    "project_urls": {
        "Homepage": "https://github.com/xorbitsai/inference"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3e7f820ed226acf2f5c4e3f8d7ff162bca0803027d86896da38043634c14cb34",
                "md5": "026e3c6e0e7b71b0d66fe2b7ba970429",
                "sha256": "682ecaea2b5232e1590fb2dfa712210480150df090452371521b06526e66cfa8"
            },
            "downloads": -1,
            "filename": "xinference-0.11.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "026e3c6e0e7b71b0d66fe2b7ba970429",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 21857521,
            "upload_time": "2024-05-17T07:21:02",
            "upload_time_iso_8601": "2024-05-17T07:21:02.611610Z",
            "url": "https://files.pythonhosted.org/packages/3e/7f/820ed226acf2f5c4e3f8d7ff162bca0803027d86896da38043634c14cb34/xinference-0.11.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d319376547f27bdc9cc0982081019fedfa50a4d0b0c81fc349d42ea50f0eb596",
                "md5": "761bb8b0c4b8c623c2ca8a9b76b23f58",
                "sha256": "d815679b6edb0c6ab02201f686d6a747b06127133c27a9b784375ed310eac2a5"
            },
            "downloads": -1,
            "filename": "xinference-0.11.1.tar.gz",
            "has_sig": false,
            "md5_digest": "761bb8b0c4b8c623c2ca8a9b76b23f58",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 12176781,
            "upload_time": "2024-05-17T07:21:06",
            "upload_time_iso_8601": "2024-05-17T07:21:06.979063Z",
            "url": "https://files.pythonhosted.org/packages/d3/19/376547f27bdc9cc0982081019fedfa50a4d0b0c81fc349d42ea50f0eb596/xinference-0.11.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-17 07:21:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "xorbitsai",
    "github_project": "inference",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "xinference"
}
        
Elapsed time: 0.27417s