<!--
Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div style="display: flex; flex-direction: column; align-items: center;">
<h1>
<img alt="tool icon" src="https://raw.githubusercontent.com/neuralmagic/deepsparse/main/docs/source/icon-deepsparse.png" />
DeepSparse
</h1>
<h4> An inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application</h4>
<div align="center">
<a href="https://docs.neuralmagic.com/deepsparse/">
<img alt="Documentation" src="https://img.shields.io/badge/documentation-darkred?&style=for-the-badge&logo=read-the-docs" height="20" />
</a>
<a href="https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ/">
<img alt="Slack" src="https://img.shields.io/badge/slack-purple?style=for-the-badge&logo=slack" height="20" />
</a>
<a href="https://github.com/neuralmagic/deepsparse/issues/">
<img alt="Support" src="https://img.shields.io/badge/support%20forums-navy?style=for-the-badge&logo=github" height="20" />
</a>
<a href="https://github.com/neuralmagic/deepsparse/actions/workflows/quality-check.yaml">
<img alt="Main" src="https://img.shields.io/github/workflow/status/neuralmagic/deepsparse/Quality%20Checks/main?label=build&style=for-the-badge" height="20" />
</a>
<a href="https://github.com/neuralmagic/deepsparse/releases">
<img alt="GitHub release" src="https://img.shields.io/github/release/neuralmagic/deepsparse.svg?style=for-the-badge" height="20" />
</a>
<a href="https://github.com/neuralmagic/deepsparse/blob/main/CODE_OF_CONDUCT.md">
<img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-v2.1%20adopted-ff69b4.svg?color=yellow&style=for-the-badge" height="20" />
</a>
<a href="https://www.youtube.com/channel/UCo8dO_WMGYbWCRnj_Dxr4EA">
<img alt="YouTube" src="https://img.shields.io/badge/-YouTube-red?&style=for-the-badge&logo=youtube&logoColor=white" height="20" />
</a>
<a href="https://medium.com/limitlessai">
<img alt="Medium" src="https://img.shields.io/badge/medium-%2312100E.svg?&style=for-the-badge&logo=medium&logoColor=white" height="20" />
</a>
<a href="https://twitter.com/neuralmagic">
<img alt="Twitter" src="https://img.shields.io/twitter/follow/neuralmagic?color=darkgreen&label=Follow&style=social" height="20" />
</a>
</div>
</div>
A CPU runtime that takes advantage of sparsity within neural networks to reduce compute. Read [more about sparsification](https://docs.neuralmagic.com/user-guides/sparsification).
Neural Magic's DeepSparse is able to integrate into popular deep learning libraries (e.g., Hugging Face, Ultralytics) allowing you to leverage DeepSparse for loading and deploying sparse models with ONNX.
ONNX gives the flexibility to serve your model in a framework-agnostic environment.
Support includes [PyTorch,](https://pytorch.org/docs/stable/onnx.html) [TensorFlow,](https://github.com/onnx/tensorflow-onnx) [Keras,](https://github.com/onnx/keras-onnx) and [many other frameworks](https://github.com/onnx/onnxmltools).
## Installation
Install DeepSparse Community as follows:
```bash
pip install deepsparse
```
DeepSparse is available in two editions:
1. [**DeepSparse Community**](#installation) is open-source and free for evaluation, research, and non-production use with our [DeepSparse Community License](https://neuralmagic.com/legal/engine-license-agreement/).
2. [**DeepSparse Enterprise**](https://docs.neuralmagic.com/products/deepsparse-ent) requires a Trial License or [can be fully licensed](https://neuralmagic.com/legal/master-software-license-and-service-agreement/) for production, commercial applications.
## 🧰 Hardware Support and System Requirements
To ensure that your CPU is compatible with DeepSparse, it is recommended to review the [Supported Hardware for DeepSparse](https://docs.neuralmagic.com/user-guides/deepsparse-engine/hardware-support) documentation.
To ensure that you get the best performance from DeepSparse, it has been thoroughly tested on Python versions 3.7-3.10, ONNX versions 1.5.0-1.12.0, ONNX opset version 11 or higher, and manylinux compliant systems. It is highly recommended to use a [virtual environment](https://docs.python.org/3/library/venv.html) when running DeepSparse. Please note that DeepSparse is only supported natively on Linux. For those using Mac or Windows, running Linux in a Docker or virtual machine is necessary to use DeepSparse.
## Features
- 👩💻 Pipelines for [NLP](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/transformers), [CV Classification](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/image_classification), [CV Detection](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/yolo), [CV Segmentation](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/yolact) and more!
- 🔌 [DeepSparse Server](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/server)
- 📜 [DeepSparse Benchmark](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/benchmark)
- ☁️ [Cloud Deployments and Demos](https://github.com/neuralmagic/deepsparse/tree/main/examples)
### 👩💻 Pipelines
Pipelines are a high-level Python interface for running inference with DeepSparse across select tasks in NLP and CV:
| NLP | CV |
|-----------------------|---------------------------|
| Text Classification `"text_classification"` | Image Classification `"image_classification"` |
| Token Classification `"token_classification"` | Object Detection `"yolo"` |
| Sentiment Analysis `"sentiment_analysis"` | Instance Segmentation `"yolact"` |
| Question Answering `"question_answering"` | Keypoint Detection `"open_pif_paf"` |
| MultiLabel Text Classification `"text_classification"` | |
| Document Classification `"text_classification"` | |
| Zero-Shot Text Classification `"zero_shot_text_classification"` | |
**NLP Example** | Question Answering
```python
from deepsparse import Pipeline
qa_pipeline = Pipeline.create(
task="question-answering",
model_path="zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni",
)
inference = qa_pipeline(question="What's my name?", context="My name is Snorlax")
```
**CV Example** | Image Classification
```python
from deepsparse import Pipeline
cv_pipeline = Pipeline.create(
task='image_classification',
model_path='zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none',
)
input_image = "my_image.png"
inference = cv_pipeline(images=input_image)
```
### 🔌 DeepSparse Server
DeepSparse Server is a tool that enables you to serve your models and pipelines directly from your terminal.
The server is built on top of two powerful libraries: the FastAPI web framework and the Uvicorn web server. This combination ensures that DeepSparse Server delivers excellent performance and reliability. Install with this command:
```bash
pip install deepsparse[server]
```
#### Single Model
Once installed, the following example CLI command is available for running inference with a single BERT model:
```bash
deepsparse.server \
task question_answering \
--model_path "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni"
```
To look up arguments run: `deepsparse.server --help`.
#### Multiple Models
To deploy multiple models in your setup, a `config.yaml` file should be created. In the example provided, two BERT models are configured for the question-answering task:
```yaml
num_workers: 1
endpoints:
- task: question_answering
route: /predict/question_answering/base
model: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none
batch_size: 1
- task: question_answering
route: /predict/question_answering/pruned_quant
model: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni
batch_size: 1
```
After the `config.yaml` file has been created, the server can be started by passing the file path as an argument:
```bash
deepsparse.server config config.yaml
```
Read the [DeepSparse Server](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/server) README for further details.
### 📜 DeepSparse Benchmark
DeepSparse Benchmark, a command-line (CLI) tool, is used to evaluate the DeepSparse Engine's performance with ONNX models. This tool processes arguments, downloads and compiles the network into the engine, creates input tensors, and runs the model based on the selected scenario.
Run `deepsparse.benchmark -h` to look up arguments:
```shell
deepsparse.benchmark [-h] [-b BATCH_SIZE] [-i INPUT_SHAPES] [-ncores NUM_CORES] [-s {async,sync,elastic}] [-t TIME]
[-w WARMUP_TIME] [-nstreams NUM_STREAMS] [-pin {none,core,numa}] [-e ENGINE] [-q] [-x EXPORT_PATH]
model_path
```
Refer to the [Benchmark](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/benchmark) README for examples of specific inference scenarios.
### 🦉 Custom ONNX Model Support
DeepSparse is capable of accepting ONNX models from two sources:
**SparseZoo ONNX**: This is an open-source repository of sparse models available for download. [SparseZoo](https://github.com/neuralmagic/sparsezoo) offers inference-optimized models, which are trained using repeatable sparsification recipes and state-of-the-art techniques from [SparseML](https://github.com/neuralmagic/sparseml).
**Custom ONNX**: Users can provide their own ONNX models, whether dense or sparse. By plugging in a custom model, users can compare its performance with other solutions.
```bash
> wget https://github.com/onnx/models/raw/main/vision/classification/mobilenet/model/mobilenetv2-7.onnx
Saving to: ‘mobilenetv2-7.onnx’
```
Custom ONNX Benchmark example:
```python
from deepsparse import compile_model
from deepsparse.utils import generate_random_inputs
onnx_filepath = "mobilenetv2-7.onnx"
batch_size = 16
# Generate random sample input
inputs = generate_random_inputs(onnx_filepath, batch_size)
# Compile and run
engine = compile_model(onnx_filepath, batch_size)
outputs = engine.run(inputs)
```
The [GitHub repository](https://github.com/neuralmagic/deepsparse) repository contains package APIs and examples that help users swiftly begin benchmarking and performing inference on sparse models.
### Scheduling Single-Stream, Multi-Stream, and Elastic Inference
DeepSparse offers different inference scenarios based on your use case. Read more details here: [Inference Types](https://github.com/neuralmagic/deepsparse/blob/main/docs/source/scheduler.md).
⚡ **Single-stream** scheduling: the latency/synchronous scenario, requests execute serially. [`default`]
<img src="https://raw.githubusercontent.com/neuralmagic/deepsparse/main/docs/source/single-stream.png" alt="single stream diagram" />
It's highly optimized for minimum per-request latency, using all of the system's resources provided to it on every request it gets.
⚡ **Multi-stream** scheduling: the throughput/asynchronous scenario, requests execute in parallel.
<img src="https://raw.githubusercontent.com/neuralmagic/deepsparse/main/docs/source/multi-stream.png" alt="multi stream diagram" />
The most common use cases for the multi-stream scheduler are where parallelism is low with respect to core count, and where requests need to be made asynchronously without time to batch them.
## Resources
#### Libraries
- [DeepSparse](https://docs.neuralmagic.com/deepsparse/)
- [SparseML](https://docs.neuralmagic.com/sparseml/)
- [SparseZoo](https://docs.neuralmagic.com/sparsezoo/)
- [Sparsify](https://docs.neuralmagic.com/sparsify/)
#### Versions
- [DeepSparse](https://pypi.org/project/deepsparse) | stable
- [DeepSparse-Nightly](https://pypi.org/project/deepsparse-nightly/) | nightly (dev)
- [GitHub](https://github.com/neuralmagic/deepsparse/releases) | releases
#### Info
- [Blog](https://www.neuralmagic.com/blog/)
- [Resources](https://www.neuralmagic.com/resources/)
## Community
### Be Part of the Future... And the Future is Sparse!
Contribute with code, examples, integrations, and documentation as well as bug reports and feature requests! [Learn how here.](https://github.com/neuralmagic/deepsparse/blob/main/CONTRIBUTING.md)
For user help or questions about DeepSparse, sign up or log in to our **[Deep Sparse Community Slack](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)**. We are growing the community member by member and happy to see you there. Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue.](https://github.com/neuralmagic/deepsparse/issues) You can get the latest news, webinar and event invites, research papers, and other ML Performance tidbits by [subscribing](https://neuralmagic.com/subscribe/) to the Neural Magic community.
For more general questions about Neural Magic, complete this [form.](http://neuralmagic.com/contact/)
### License
[DeepSparse Community](https://docs.neuralmagic.com/products/deepsparse) is licensed under the [Neural Magic DeepSparse Community License.](https://github.com/neuralmagic/deepsparse/blob/main/LICENSE-NEURALMAGIC)
Some source code, example files, and scripts included in the deepsparse GitHub repository or directory are licensed under the [Apache License Version 2.0](https://github.com/neuralmagic/deepsparse/blob/main/LICENSE) as noted.
[DeepSparse Enterprise](https://docs.neuralmagic.com/products/deepsparse-ent) requires a Trial License or [can be fully licensed](https://neuralmagic.com/legal/master-software-license-and-service-agreement/) for production, commercial applications.
### Cite
Find this project useful in your research or other communications? Please consider citing:
```bibtex
@InProceedings{
pmlr-v119-kurtz20a,
title = {Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks},
author = {Kurtz, Mark and Kopinsky, Justin and Gelashvili, Rati and Matveev, Alexander and Carr, John and Goin, Michael and Leiserson, William and Moore, Sage and Nell, Bill and Shavit, Nir and Alistarh, Dan},
booktitle = {Proceedings of the 37th International Conference on Machine Learning},
pages = {5533--5543},
year = {2020},
editor = {Hal Daumé III and Aarti Singh},
volume = {119},
series = {Proceedings of Machine Learning Research},
address = {Virtual},
month = {13--18 Jul},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v119/kurtz20a/kurtz20a.pdf},
url = {http://proceedings.mlr.press/v119/kurtz20a.html}
}
@article{DBLP:journals/corr/abs-2111-13445,
author = {Eugenia Iofinova and
Alexandra Peste and
Mark Kurtz and
Dan Alistarh},
title = {How Well Do Sparse Imagenet Models Transfer?},
journal = {CoRR},
volume = {abs/2111.13445},
year = {2021},
url = {https://arxiv.org/abs/2111.13445},
eprinttype = {arXiv},
eprint = {2111.13445},
timestamp = {Wed, 01 Dec 2021 15:16:43 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2111-13445.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/neuralmagic/deepsparse",
"name": "deepsparse-ent",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7, <3.11",
"maintainer_email": "",
"keywords": "inference,machine learning,x86,x86_64,avx2,avx512,neural network,sparse,inference engine,cpu,runtime,deepsparse,computer vision,object detection,sparsity",
"author": "Neuralmagic, Inc.",
"author_email": "support@neuralmagic.com",
"download_url": "https://files.pythonhosted.org/packages/b1/05/8365ee60de0f8f606489b9ed26b1bfb1704d33d3e4107b7685a14bf313f0/deepsparse-ent-1.5.0.tar.gz",
"platform": null,
"description": "<!--\nCopyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing,\nsoftware distributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n-->\n\n\n<div style=\"display: flex; flex-direction: column; align-items: center;\">\n <h1>\n <img alt=\"tool icon\" src=\"https://raw.githubusercontent.com/neuralmagic/deepsparse/main/docs/source/icon-deepsparse.png\" />\n DeepSparse\n </h1>\n <h4> An inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application</h4>\n <div align=\"center\">\n <a href=\"https://docs.neuralmagic.com/deepsparse/\">\n <img alt=\"Documentation\" src=\"https://img.shields.io/badge/documentation-darkred?&style=for-the-badge&logo=read-the-docs\" height=\"20\" />\n </a>\n <a href=\"https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ/\">\n <img alt=\"Slack\" src=\"https://img.shields.io/badge/slack-purple?style=for-the-badge&logo=slack\" height=\"20\" />\n </a>\n <a href=\"https://github.com/neuralmagic/deepsparse/issues/\">\n <img alt=\"Support\" src=\"https://img.shields.io/badge/support%20forums-navy?style=for-the-badge&logo=github\" height=\"20\" />\n </a>\n <a href=\"https://github.com/neuralmagic/deepsparse/actions/workflows/quality-check.yaml\">\n <img alt=\"Main\" src=\"https://img.shields.io/github/workflow/status/neuralmagic/deepsparse/Quality%20Checks/main?label=build&style=for-the-badge\" height=\"20\" />\n </a>\n <a href=\"https://github.com/neuralmagic/deepsparse/releases\">\n <img alt=\"GitHub release\" src=\"https://img.shields.io/github/release/neuralmagic/deepsparse.svg?style=for-the-badge\" height=\"20\" />\n </a>\n <a href=\"https://github.com/neuralmagic/deepsparse/blob/main/CODE_OF_CONDUCT.md\">\n <img alt=\"Contributor Covenant\" src=\"https://img.shields.io/badge/Contributor%20Covenant-v2.1%20adopted-ff69b4.svg?color=yellow&style=for-the-badge\" height=\"20\" />\n </a>\n <a href=\"https://www.youtube.com/channel/UCo8dO_WMGYbWCRnj_Dxr4EA\">\n <img alt=\"YouTube\" src=\"https://img.shields.io/badge/-YouTube-red?&style=for-the-badge&logo=youtube&logoColor=white\" height=\"20\" />\n </a>\n <a href=\"https://medium.com/limitlessai\">\n <img alt=\"Medium\" src=\"https://img.shields.io/badge/medium-%2312100E.svg?&style=for-the-badge&logo=medium&logoColor=white\" height=\"20\" />\n </a>\n <a href=\"https://twitter.com/neuralmagic\">\n <img alt=\"Twitter\" src=\"https://img.shields.io/twitter/follow/neuralmagic?color=darkgreen&label=Follow&style=social\" height=\"20\" />\n </a>\n </div>\n</div>\n\nA CPU runtime that takes advantage of sparsity within neural networks to reduce compute. Read [more about sparsification](https://docs.neuralmagic.com/user-guides/sparsification).\n\nNeural Magic's DeepSparse is able to integrate into popular deep learning libraries (e.g., Hugging Face, Ultralytics) allowing you to leverage DeepSparse for loading and deploying sparse models with ONNX. \nONNX gives the flexibility to serve your model in a framework-agnostic environment. \nSupport includes [PyTorch,](https://pytorch.org/docs/stable/onnx.html) [TensorFlow,](https://github.com/onnx/tensorflow-onnx) [Keras,](https://github.com/onnx/keras-onnx) and [many other frameworks](https://github.com/onnx/onnxmltools).\n\n## Installation\n\nInstall DeepSparse Community as follows: \n\n```bash\npip install deepsparse\n```\n\nDeepSparse is available in two editions: \n1. [**DeepSparse Community**](#installation) is open-source and free for evaluation, research, and non-production use with our [DeepSparse Community License](https://neuralmagic.com/legal/engine-license-agreement/).\n2. [**DeepSparse Enterprise**](https://docs.neuralmagic.com/products/deepsparse-ent) requires a Trial License or [can be fully licensed](https://neuralmagic.com/legal/master-software-license-and-service-agreement/) for production, commercial applications.\n\n## \ud83e\uddf0 Hardware Support and System Requirements\n\nTo ensure that your CPU is compatible with DeepSparse, it is recommended to review the [Supported Hardware for DeepSparse](https://docs.neuralmagic.com/user-guides/deepsparse-engine/hardware-support) documentation.\n\nTo ensure that you get the best performance from DeepSparse, it has been thoroughly tested on Python versions 3.7-3.10, ONNX versions 1.5.0-1.12.0, ONNX opset version 11 or higher, and manylinux compliant systems. It is highly recommended to use a [virtual environment](https://docs.python.org/3/library/venv.html) when running DeepSparse. Please note that DeepSparse is only supported natively on Linux. For those using Mac or Windows, running Linux in a Docker or virtual machine is necessary to use DeepSparse.\n\n## Features\n\n- \ud83d\udc69\u200d\ud83d\udcbb Pipelines for [NLP](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/transformers), [CV Classification](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/image_classification), [CV Detection](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/yolo), [CV Segmentation](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/yolact) and more!\n- \ud83d\udd0c [DeepSparse Server](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/server)\n- \ud83d\udcdc [DeepSparse Benchmark](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/benchmark)\n- \u2601\ufe0f [Cloud Deployments and Demos](https://github.com/neuralmagic/deepsparse/tree/main/examples)\n\n### \ud83d\udc69\u200d\ud83d\udcbb Pipelines\n\nPipelines are a high-level Python interface for running inference with DeepSparse across select tasks in NLP and CV:\n\n| NLP | CV |\n|-----------------------|---------------------------|\n| Text Classification `\"text_classification\"` | Image Classification `\"image_classification\"` |\n| Token Classification `\"token_classification\"` | Object Detection `\"yolo\"` |\n| Sentiment Analysis `\"sentiment_analysis\"` | Instance Segmentation `\"yolact\"` |\n| Question Answering `\"question_answering\"` | Keypoint Detection `\"open_pif_paf\"` |\n| MultiLabel Text Classification `\"text_classification\"` | |\n| Document Classification `\"text_classification\"` | |\n| Zero-Shot Text Classification `\"zero_shot_text_classification\"` | |\n\n\n**NLP Example** | Question Answering\n```python\nfrom deepsparse import Pipeline\n\nqa_pipeline = Pipeline.create(\n task=\"question-answering\",\n model_path=\"zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni\",\n)\n\ninference = qa_pipeline(question=\"What's my name?\", context=\"My name is Snorlax\")\n```\n**CV Example** | Image Classification\n\n```python\nfrom deepsparse import Pipeline\n\ncv_pipeline = Pipeline.create(\n task='image_classification', \n model_path='zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none',\n)\n\ninput_image = \"my_image.png\"\ninference = cv_pipeline(images=input_image)\n```\n\n\n### \ud83d\udd0c DeepSparse Server\n\nDeepSparse Server is a tool that enables you to serve your models and pipelines directly from your terminal.\n\nThe server is built on top of two powerful libraries: the FastAPI web framework and the Uvicorn web server. This combination ensures that DeepSparse Server delivers excellent performance and reliability. Install with this command:\n\n```bash\npip install deepsparse[server]\n```\n\n#### Single Model\n\nOnce installed, the following example CLI command is available for running inference with a single BERT model:\n\n```bash\ndeepsparse.server \\\n task question_answering \\\n --model_path \"zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni\"\n```\n\nTo look up arguments run: `deepsparse.server --help`.\n\n#### Multiple Models\nTo deploy multiple models in your setup, a `config.yaml` file should be created. In the example provided, two BERT models are configured for the question-answering task:\n\n```yaml\nnum_workers: 1\nendpoints:\n - task: question_answering\n route: /predict/question_answering/base\n model: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none\n batch_size: 1\n - task: question_answering\n route: /predict/question_answering/pruned_quant\n model: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni\n batch_size: 1\n```\n\nAfter the `config.yaml` file has been created, the server can be started by passing the file path as an argument:\n```bash\ndeepsparse.server config config.yaml\n```\n\nRead the [DeepSparse Server](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/server) README for further details.\n\n### \ud83d\udcdc DeepSparse Benchmark\n\nDeepSparse Benchmark, a command-line (CLI) tool, is used to evaluate the DeepSparse Engine's performance with ONNX models. This tool processes arguments, downloads and compiles the network into the engine, creates input tensors, and runs the model based on the selected scenario. \n\nRun `deepsparse.benchmark -h` to look up arguments:\n\n```shell\ndeepsparse.benchmark [-h] [-b BATCH_SIZE] [-i INPUT_SHAPES] [-ncores NUM_CORES] [-s {async,sync,elastic}] [-t TIME]\n [-w WARMUP_TIME] [-nstreams NUM_STREAMS] [-pin {none,core,numa}] [-e ENGINE] [-q] [-x EXPORT_PATH]\n model_path\n\n```\n\n\nRefer to the [Benchmark](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/benchmark) README for examples of specific inference scenarios.\n\n### \ud83e\udd89 Custom ONNX Model Support\n\nDeepSparse is capable of accepting ONNX models from two sources:\n\n**SparseZoo ONNX**: This is an open-source repository of sparse models available for download. [SparseZoo](https://github.com/neuralmagic/sparsezoo) offers inference-optimized models, which are trained using repeatable sparsification recipes and state-of-the-art techniques from [SparseML](https://github.com/neuralmagic/sparseml).\n\n**Custom ONNX**: Users can provide their own ONNX models, whether dense or sparse. By plugging in a custom model, users can compare its performance with other solutions.\n\n```bash\n> wget https://github.com/onnx/models/raw/main/vision/classification/mobilenet/model/mobilenetv2-7.onnx\nSaving to: \u2018mobilenetv2-7.onnx\u2019\n```\n\nCustom ONNX Benchmark example:\n```python\nfrom deepsparse import compile_model\nfrom deepsparse.utils import generate_random_inputs\nonnx_filepath = \"mobilenetv2-7.onnx\"\nbatch_size = 16\n\n# Generate random sample input\ninputs = generate_random_inputs(onnx_filepath, batch_size)\n\n# Compile and run\nengine = compile_model(onnx_filepath, batch_size)\noutputs = engine.run(inputs)\n```\n\nThe [GitHub repository](https://github.com/neuralmagic/deepsparse) repository contains package APIs and examples that help users swiftly begin benchmarking and performing inference on sparse models.\n\n### Scheduling Single-Stream, Multi-Stream, and Elastic Inference\n\nDeepSparse offers different inference scenarios based on your use case. Read more details here: [Inference Types](https://github.com/neuralmagic/deepsparse/blob/main/docs/source/scheduler.md).\n\n\u26a1 **Single-stream** scheduling: the latency/synchronous scenario, requests execute serially. [`default`]\n\n<img src=\"https://raw.githubusercontent.com/neuralmagic/deepsparse/main/docs/source/single-stream.png\" alt=\"single stream diagram\" />\n\nIt's highly optimized for minimum per-request latency, using all of the system's resources provided to it on every request it gets.\n\n\u26a1 **Multi-stream** scheduling: the throughput/asynchronous scenario, requests execute in parallel.\n\n<img src=\"https://raw.githubusercontent.com/neuralmagic/deepsparse/main/docs/source/multi-stream.png\" alt=\"multi stream diagram\" />\n\nThe most common use cases for the multi-stream scheduler are where parallelism is low with respect to core count, and where requests need to be made asynchronously without time to batch them.\n\n## Resources\n#### Libraries\n- [DeepSparse](https://docs.neuralmagic.com/deepsparse/)\n- [SparseML](https://docs.neuralmagic.com/sparseml/)\n- [SparseZoo](https://docs.neuralmagic.com/sparsezoo/)\n- [Sparsify](https://docs.neuralmagic.com/sparsify/)\n\n#### Versions\n- [DeepSparse](https://pypi.org/project/deepsparse) | stable\n- [DeepSparse-Nightly](https://pypi.org/project/deepsparse-nightly/) | nightly (dev)\n- [GitHub](https://github.com/neuralmagic/deepsparse/releases) | releases\n\n#### Info\n- [Blog](https://www.neuralmagic.com/blog/) \n- [Resources](https://www.neuralmagic.com/resources/)\n\n## Community\n\n### Be Part of the Future... And the Future is Sparse!\n\n\nContribute with code, examples, integrations, and documentation as well as bug reports and feature requests! [Learn how here.](https://github.com/neuralmagic/deepsparse/blob/main/CONTRIBUTING.md)\n\nFor user help or questions about DeepSparse, sign up or log in to our **[Deep Sparse Community Slack](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)**. We are growing the community member by member and happy to see you there. Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue.](https://github.com/neuralmagic/deepsparse/issues) You can get the latest news, webinar and event invites, research papers, and other ML Performance tidbits by [subscribing](https://neuralmagic.com/subscribe/) to the Neural Magic community.\n\nFor more general questions about Neural Magic, complete this [form.](http://neuralmagic.com/contact/)\n\n### License\n\n[DeepSparse Community](https://docs.neuralmagic.com/products/deepsparse) is licensed under the [Neural Magic DeepSparse Community License.](https://github.com/neuralmagic/deepsparse/blob/main/LICENSE-NEURALMAGIC)\nSome source code, example files, and scripts included in the deepsparse GitHub repository or directory are licensed under the [Apache License Version 2.0](https://github.com/neuralmagic/deepsparse/blob/main/LICENSE) as noted.\n\n[DeepSparse Enterprise](https://docs.neuralmagic.com/products/deepsparse-ent) requires a Trial License or [can be fully licensed](https://neuralmagic.com/legal/master-software-license-and-service-agreement/) for production, commercial applications.\n\n### Cite\n\nFind this project useful in your research or other communications? Please consider citing:\n\n```bibtex\n@InProceedings{\n pmlr-v119-kurtz20a, \n title = {Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks}, \n author = {Kurtz, Mark and Kopinsky, Justin and Gelashvili, Rati and Matveev, Alexander and Carr, John and Goin, Michael and Leiserson, William and Moore, Sage and Nell, Bill and Shavit, Nir and Alistarh, Dan}, \n booktitle = {Proceedings of the 37th International Conference on Machine Learning}, \n pages = {5533--5543}, \n year = {2020}, \n editor = {Hal Daum\u00e9 III and Aarti Singh}, \n volume = {119}, \n series = {Proceedings of Machine Learning Research}, \n address = {Virtual}, \n month = {13--18 Jul}, \n publisher = {PMLR}, \n pdf = {http://proceedings.mlr.press/v119/kurtz20a/kurtz20a.pdf},\n url = {http://proceedings.mlr.press/v119/kurtz20a.html}\n}\n\n@article{DBLP:journals/corr/abs-2111-13445,\n author = {Eugenia Iofinova and\n Alexandra Peste and\n Mark Kurtz and\n Dan Alistarh},\n title = {How Well Do Sparse Imagenet Models Transfer?},\n journal = {CoRR},\n volume = {abs/2111.13445},\n year = {2021},\n url = {https://arxiv.org/abs/2111.13445},\n eprinttype = {arXiv},\n eprint = {2111.13445},\n timestamp = {Wed, 01 Dec 2021 15:16:43 +0100},\n biburl = {https://dblp.org/rec/journals/corr/abs-2111-13445.bib},\n bibsource = {dblp computer science bibliography, https://dblp.org}\n}\n```\n",
"bugtrack_url": null,
"license": "Apache",
"summary": "An inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application",
"version": "1.5.0",
"project_urls": {
"Homepage": "https://github.com/neuralmagic/deepsparse"
},
"split_keywords": [
"inference",
"machine learning",
"x86",
"x86_64",
"avx2",
"avx512",
"neural network",
"sparse",
"inference engine",
"cpu",
"runtime",
"deepsparse",
"computer vision",
"object detection",
"sparsity"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8f558553d22a046cf5c7b4b833d75feb446e63596bdb958c33304e7c60d4a053",
"md5": "66c96071541bbd98a7f93d01a7ebadd8",
"sha256": "20d0d94669e7de03a11d289689f3326e6b2aabdf24d33a6196c6ea669b1b8a8e"
},
"downloads": -1,
"filename": "deepsparse_ent-1.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "66c96071541bbd98a7f93d01a7ebadd8",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.7, <3.11",
"size": 43537324,
"upload_time": "2023-06-02T12:28:03",
"upload_time_iso_8601": "2023-06-02T12:28:03.985999Z",
"url": "https://files.pythonhosted.org/packages/8f/55/8553d22a046cf5c7b4b833d75feb446e63596bdb958c33304e7c60d4a053/deepsparse_ent-1.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "fe67f18f12e5bd6206818de9399c0c7e219187a91b5805386daaeafe48236339",
"md5": "d055659ebccd5911fe527069e2a1055a",
"sha256": "0e7c02e429ee06c4527b2addc2f8a345e571284ee74dff2325399537fb94b3f8"
},
"downloads": -1,
"filename": "deepsparse_ent-1.5.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "d055659ebccd5911fe527069e2a1055a",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.7, <3.11",
"size": 43551887,
"upload_time": "2023-06-02T12:26:48",
"upload_time_iso_8601": "2023-06-02T12:26:48.403982Z",
"url": "https://files.pythonhosted.org/packages/fe/67/f18f12e5bd6206818de9399c0c7e219187a91b5805386daaeafe48236339/deepsparse_ent-1.5.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "69540262bd941ec638a5eb892faebbc82d0ad7b2ded176f7cf85d3ae96ceb95b",
"md5": "4462118121488a7c0179b3dee03acfb4",
"sha256": "ef4d5e4b28bedb94c9cf3fbc4af304f0cd51fb12fe24e40550dcc329ec3ac88c"
},
"downloads": -1,
"filename": "deepsparse_ent-1.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "4462118121488a7c0179b3dee03acfb4",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.7, <3.11",
"size": 43537118,
"upload_time": "2023-06-02T12:27:13",
"upload_time_iso_8601": "2023-06-02T12:27:13.342615Z",
"url": "https://files.pythonhosted.org/packages/69/54/0262bd941ec638a5eb892faebbc82d0ad7b2ded176f7cf85d3ae96ceb95b/deepsparse_ent-1.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ff3c090e1a727040fa2b3e296c27a8aa7e9dcc63abd5876b95f692a533bf2989",
"md5": "3dda18a7cd1a3d9a2e8eb42ab46c261c",
"sha256": "7c61282478641c9d6bb5c3d9f121ac292442a42a853c1cea1509b3a9feaba0d5"
},
"downloads": -1,
"filename": "deepsparse_ent-1.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "3dda18a7cd1a3d9a2e8eb42ab46c261c",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7, <3.11",
"size": 43537606,
"upload_time": "2023-06-02T12:27:39",
"upload_time_iso_8601": "2023-06-02T12:27:39.817271Z",
"url": "https://files.pythonhosted.org/packages/ff/3c/090e1a727040fa2b3e296c27a8aa7e9dcc63abd5876b95f692a533bf2989/deepsparse_ent-1.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b1058365ee60de0f8f606489b9ed26b1bfb1704d33d3e4107b7685a14bf313f0",
"md5": "bb86d797851e3b9df7944476512c6f48",
"sha256": "6a67268c4bdd67f0b0e472d9bd1e001742a60eb6fdac8f07d18c63f6c92d5990"
},
"downloads": -1,
"filename": "deepsparse-ent-1.5.0.tar.gz",
"has_sig": false,
"md5_digest": "bb86d797851e3b9df7944476512c6f48",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7, <3.11",
"size": 43151633,
"upload_time": "2023-06-02T12:26:23",
"upload_time_iso_8601": "2023-06-02T12:26:23.883909Z",
"url": "https://files.pythonhosted.org/packages/b1/05/8365ee60de0f8f606489b9ed26b1bfb1704d33d3e4107b7685a14bf313f0/deepsparse-ent-1.5.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-02 12:26:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "neuralmagic",
"github_project": "deepsparse",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "deepsparse-ent"
}