deepsparse-ent

Name	deepsparse-ent JSON
Version	1.8.0 JSON
	download
home_page	https://github.com/neuralmagic/deepsparse
Summary	An inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application
upload_time	2024-07-19 16:32:32
maintainer	None
docs_url	None
author	Neuralmagic, Inc.
requires_python	<3.12,>=3.8
license	Neural Magic Master Software License and Service Agreement
keywords	inference machine learning x86 x86_64 avx2 avx512 neural network sparse inference engine cpu runtime deepsparse computer vision object detection sparsity
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <!--
Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->


<div style="display: flex; flex-direction: column; align-items: center;">
  <h1 style="display: flex; align-items: center;" >
     <img width="60" height="60" alt="tool icon" src="https://neuralmagic.com/wp-content/uploads/2024/03/icon_DeepSparse-005.png" />
      <span>&nbsp;&nbsp;DeepSparse</span>
  </h1>
  <h4>Sparsity-aware deep learning inference runtime for CPUs</h4>
  <div align="center">
  <a href="https://docs.neuralmagic.com/deepsparse/">
    <img alt="Documentation" src="https://img.shields.io/badge/documentation-darkred?&style=for-the-badge&logo=read-the-docs" height="20" />
  </a>
  <a href="https://neuralmagic.com/community/">
    <img alt="Slack" src="https://img.shields.io/badge/slack-purple?style=for-the-badge&logo=slack" height="20" />
  </a>
  <a href="https://github.com/neuralmagic/deepsparse/issues/">
    <img alt="Support" src="https://img.shields.io/badge/support%20forums-navy?style=for-the-badge&logo=github" height="20" />
  </a>
  <a href="https://www.youtube.com/channel/UCo8dO_WMGYbWCRnj_Dxr4EA">
    <img alt="YouTube" src="https://img.shields.io/badge/-YouTube-red?&style=for-the-badge&logo=youtube&logoColor=white" height="20" />
  </a>
  <a href="https://twitter.com/neuralmagic">
    <img alt="Twitter" src="https://img.shields.io/twitter/follow/neuralmagic?color=darkgreen&label=Follow&style=social" height="20" />
  </a>
</div>

[DeepSparse](https://github.com/neuralmagic/deepsparse) is a CPU inference runtime that takes advantage of sparsity to accelerate neural network inference. Coupled with [SparseML](https://github.com/neuralmagic/sparseml), our optimization library for pruning and quantizing your models, DeepSparse delivers exceptional inference performance on CPU hardware.

<p align="center">
   <img alt="NM Flow" src="https://github.com/neuralmagic/deepsparse/assets/3195154/51e62fe7-9d9a-4fa5-a774-877158da1e29" width="60%" />
</p>

## ✨NEW✨ DeepSparse LLMs

Neural Magic is excited to announce initial support for performant LLM inference in DeepSparse with:
- sparse kernels for speedups and memory savings from unstructured sparse weights.
- 8-bit weight and activation quantization support.
- efficient usage of cached attention keys and values for minimal memory movement.

![mpt-chat-comparison](https://github.com/neuralmagic/deepsparse/assets/3195154/ccf39323-4603-4489-8462-7b103872aeb3)

### Try It Now

Install (requires Linux):
```bash
pip install -U deepsparse-nightly[llm]
```

Run inference:
```python
from deepsparse import TextGeneration
pipeline = TextGeneration(model="zoo:mpt-7b-dolly_mpt_pretrain-pruned50_quantized")

prompt="""
Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: what is sparsity? ### Response:
"""
print(pipeline(prompt, max_new_tokens=75).generations[0].text)

# Sparsity is the property of a matrix or other data structure in which a large number of elements are zero and a smaller number of elements are non-zero. In the context of machine learning, sparsity can be used to improve the efficiency of training and prediction.
```

Check out the [`TextGeneration` documentation for usage details](https://github.com/neuralmagic/deepsparse/blob/main/docs/llms/text-generation-pipeline.md) and get the [latest sparsified LLMs on our HF Collection](https://huggingface.co/collections/neuralmagic/deepsparse-sparse-llms-659d61e81774dd48343642bf).

### Sparsity :handshake: Performance

Developed in collaboration with IST Austria, [our recent paper](https://arxiv.org/abs/2310.06927) details a new technique called **Sparse Fine-Tuning**, which allows us to prune MPT-7B to 60% sparsity during fine-tuning without drop in accuracy. With our new support for LLMs, DeepSparse accelerates the sparse-quantized model 7x over the dense baseline:

<div align="center">
    <img src="https://github.com/neuralmagic/deepsparse/assets/3195154/8687401c-f479-4999-ba6b-e01c747dace9" width="60%"/>
</div>

> [Learn more about our Sparse Fine-Tuning research.](https://github.com/neuralmagic/deepsparse/blob/main/research/mpt#sparse-finetuned-llms-with-deepsparse)

> [Check out the model running live on Hugging Face.](https://huggingface.co/spaces/neuralmagic/sparse-mpt-7b-gsm8k)

### LLM Roadmap

Following this initial launch, we are rapidly expanding our support for LLMs, including:

1. Productizing Sparse Fine-Tuning: Enable external users to apply sparse fine-tuning to their datasets via SparseML.
2. Expanding model support: Apply our sparse fine-tuning results to Llama 2 and Mistral models.
3. Pushing for higher sparsity: Improving our pruning algorithms to reach even higher sparsity.

## Computer Vision and NLP Models

In addition to LLMs, DeepSparse supports many variants of CNNs and Transformer models, such as BERT, ViT, ResNet, EfficientNet, YOLOv5/8, and many more! Take a look at the [Computer Vision](https://sparsezoo.neuralmagic.com/?modelSet=computer_vision) and [Natural Language Processing](https://sparsezoo.neuralmagic.com/?modelSet=natural_language_processing) domains of [SparseZoo](https://sparsezoo.neuralmagic.com/), our home for optimized models.

### Installation

Install via [PyPI](https://pypi.org/project/deepsparse/) ([optional dependencies detailed here](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide/installation.md)):

```bash
pip install deepsparse 
```

To experiment with the latest features, there is a nightly build available using `pip install deepsparse-nightly` or you can clone and install from source using `pip install -e path/to/deepsparse`.

#### System Requirements
- Hardware: [x86 AVX2, AVX-512, AVX-512 VNNI and ARM v8.2+](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide/hardware-support.md)
- Operating System: Linux
- Python: 3.8-3.11
- ONNX versions 1.5.0-1.15.0, ONNX opset version 11 or higher

For those using Mac or Windows, we recommend using Linux containers with Docker.

## Deployment APIs

DeepSparse includes three deployment APIs:

- **Engine** is the lowest-level API. With Engine, you compile an ONNX model, pass tensors as input, and receive the raw outputs.
- **Pipeline** wraps the Engine with pre- and post-processing. With Pipeline, you pass raw data and receive the prediction.
- **Server** wraps Pipelines with a REST API using FastAPI. With Server, you send raw data over HTTP and receive the prediction.

### Engine

The example below downloads a 90% pruned-quantized BERT model for sentiment analysis in ONNX format from SparseZoo, compiles the model, and runs inference on randomly generated input. Users can provide their own ONNX models, whether dense or sparse.

```python
from deepsparse import Engine

# download onnx, compile
zoo_stub = "zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none"
compiled_model = Engine(model=zoo_stub, batch_size=1)

# run inference (input is raw numpy tensors, output is raw scores)
inputs = compiled_model.generate_random_inputs()
output = compiled_model(inputs)
print(output)

# > [array([[-0.3380675 ,  0.09602544]], dtype=float32)] << raw scores
```

### Pipeline

Pipelines wrap Engine with pre- and post-processing, enabling you to pass raw data and receive the post-processed prediction. The example below downloads a 90% pruned-quantized BERT model for sentiment analysis in ONNX format from SparseZoo, sets up a pipeline, and runs inference on sample data.

```python
from deepsparse import Pipeline

# download onnx, set up pipeline
zoo_stub = "zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none"  
sentiment_analysis_pipeline = Pipeline.create(
  task="sentiment-analysis",    # name of the task
  model_path=zoo_stub,          # zoo stub or path to local onnx file
)

# run inference (input is a sentence, output is the prediction)
prediction = sentiment_analysis_pipeline("I love using DeepSparse Pipelines")
print(prediction)
# > labels=['positive'] scores=[0.9954759478569031]
```

### Server

Server wraps Pipelines with REST APIs, enabling you to set up a model-serving endpoint running DeepSparse. This enables you to send raw data to DeepSparse over HTTP and receive the post-processed predictions. DeepSparse Server is launched from the command line and configured via arguments or a server configuration file. The following downloads a 90% pruned-quantized BERT model for sentiment analysis in ONNX format from SparseZoo and launches a sentiment analysis endpoint:

```bash
deepsparse.server \
  --task sentiment-analysis \
  --model_path zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none
```

Sending a request:

```python
import requests

url = "http://localhost:5543/v2/models/sentiment_analysis/infer" # Server's port default to 5543
obj = {"sequences": "Snorlax loves my Tesla!"}

response = requests.post(url, json=obj)
print(response.text)
# {"labels":["positive"],"scores":[0.9965094327926636]}
```

### Additional Resources
- [Use Cases Page](https://github.com/neuralmagic/deepsparse/tree/main/docs/use-cases) for more details on supported tasks
- [Pipelines User Guide](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide/deepsparse-pipelines.md) for Pipeline documentation
- [Server User Guide](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide/deepsparse-server.md) for Server documentation
- [Benchmarking User Guide](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide/deepsparse-benchmarking.md) for benchmarking documentation
- [Cloud Deployments and Demos](https://github.com/neuralmagic/deepsparse/tree/main/examples/)
- [User Guide](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide) for more detailed documentation


## Product Usage Analytics

DeepSparse gathers basic usage telemetry, including, but not limited to, Invocations, Package, Version, and IP Address, for Product Usage Analytics purposes. Review Neural Magic's [Products Privacy Policy](https://neuralmagic.com/legal/) for further details on how we process this data. 

To disable Product Usage Analytics, run:
```bash
export NM_DISABLE_ANALYTICS=True
```

Confirm that telemetry is shut off through info logs streamed with engine invocation by looking for the phrase "Skipping Neural Magic's latest package version check."

## Community

### Get In Touch

- [Contribution Guide](https://github.com/neuralmagic/deepsparse/blob/main/CONTRIBUTING.md)
- [Community Slack](https://neuralmagic.com/community/)
- [GitHub Issue Queue](https://github.com/neuralmagic/deepsparse/issues) 
- [Subscribe To Our Newsletter](https://neuralmagic.com/subscribe/)
- [Blog](https://www.neuralmagic.com/blog/) 

For more general questions about Neural Magic, [complete this form.](http://neuralmagic.com/contact/)

### License

- **DeepSparse Community** is free to use and is licensed under the [Neural Magic DeepSparse Community License.](https://github.com/neuralmagic/deepsparse/blob/main/LICENSE)
Some source code, example files, and scripts included in the DeepSparse GitHub repository or directory are licensed under the [Apache License Version 2.0](https://www.apache.org/licenses/LICENSE-2.0) as noted.

- **DeepSparse Enterprise** requires a Trial License or [can be fully licensed](https://neuralmagic.com/legal/master-software-license-and-service-agreement/) for production, commercial applications.

### Cite

Find this project useful in your research or other communications? Please consider citing:

```bibtex
@misc{kurtic2023sparse,
      title={Sparse Fine-Tuning for Inference Acceleration of Large Language Models}, 
      author={Eldar Kurtic and Denis Kuznedelev and Elias Frantar and Michael Goin and Dan Alistarh},
      year={2023},
      url={https://arxiv.org/abs/2310.06927},
      eprint={2310.06927},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{kurtic2022optimal,
      title={The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models}, 
      author={Eldar Kurtic and Daniel Campos and Tuan Nguyen and Elias Frantar and Mark Kurtz and Benjamin Fineran and Michael Goin and Dan Alistarh},
      year={2022},
      url={https://arxiv.org/abs/2203.07259},
      eprint={2203.07259},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@InProceedings{
    pmlr-v119-kurtz20a, 
    title = {Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks}, 
    author = {Kurtz, Mark and Kopinsky, Justin and Gelashvili, Rati and Matveev, Alexander and Carr, John and Goin, Michael and Leiserson, William and Moore, Sage and Nell, Bill and Shavit, Nir and Alistarh, Dan}, 
    booktitle = {Proceedings of the 37th International Conference on Machine Learning}, 
    pages = {5533--5543}, 
    year = {2020}, 
    editor = {Hal Daumé III and Aarti Singh}, 
    volume = {119}, 
    series = {Proceedings of Machine Learning Research}, 
    address = {Virtual}, 
    month = {13--18 Jul}, 
    publisher = {PMLR}, 
    pdf = {http://proceedings.mlr.press/v119/kurtz20a/kurtz20a.pdf},
    url = {http://proceedings.mlr.press/v119/kurtz20a.html}
}

@article{DBLP:journals/corr/abs-2111-13445,
  author    = {Eugenia Iofinova and Alexandra Peste and Mark Kurtz and Dan Alistarh},
  title     = {How Well Do Sparse Imagenet Models Transfer?},
  journal   = {CoRR},
  volume    = {abs/2111.13445},
  year      = {2021},
  url       = {https://arxiv.org/abs/2111.13445},
  eprinttype = {arXiv},
  eprint    = {2111.13445},
  timestamp = {Wed, 01 Dec 2021 15:16:43 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2111-13445.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
```
# All Thanks To Our Contributors

<a href="https://github.com/neuralmagic/deepsparse/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=neuralmagic/deepsparse" />
</a>

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/neuralmagic/deepsparse",
    "name": "deepsparse-ent",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.8",
    "maintainer_email": null,
    "keywords": "inference, machine learning, x86, x86_64, avx2, avx512, neural network, sparse, inference engine, cpu, runtime, deepsparse, computer vision, object detection, sparsity",
    "author": "Neuralmagic, Inc.",
    "author_email": "support@neuralmagic.com",
    "download_url": "https://files.pythonhosted.org/packages/d6/d9/1f237fdfd4cd9d7f2faf381094a7eab4e2a92cbbe39c3753710ef312b181/deepsparse_ent-1.8.0.tar.gz",
    "platform": null,
    "description": "<!--\nCopyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n   http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing,\nsoftware distributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n-->\n\n\n<div style=\"display: flex; flex-direction: column; align-items: center;\">\n  <h1 style=\"display: flex; align-items: center;\" >\n     <img width=\"60\" height=\"60\" alt=\"tool icon\" src=\"https://neuralmagic.com/wp-content/uploads/2024/03/icon_DeepSparse-005.png\" />\n      <span>&nbsp;&nbsp;DeepSparse</span>\n  </h1>\n  <h4>Sparsity-aware deep learning inference runtime for CPUs</h4>\n  <div align=\"center\">\n  <a href=\"https://docs.neuralmagic.com/deepsparse/\">\n    <img alt=\"Documentation\" src=\"https://img.shields.io/badge/documentation-darkred?&style=for-the-badge&logo=read-the-docs\" height=\"20\" />\n  </a>\n  <a href=\"https://neuralmagic.com/community/\">\n    <img alt=\"Slack\" src=\"https://img.shields.io/badge/slack-purple?style=for-the-badge&logo=slack\" height=\"20\" />\n  </a>\n  <a href=\"https://github.com/neuralmagic/deepsparse/issues/\">\n    <img alt=\"Support\" src=\"https://img.shields.io/badge/support%20forums-navy?style=for-the-badge&logo=github\" height=\"20\" />\n  </a>\n  <a href=\"https://www.youtube.com/channel/UCo8dO_WMGYbWCRnj_Dxr4EA\">\n    <img alt=\"YouTube\" src=\"https://img.shields.io/badge/-YouTube-red?&style=for-the-badge&logo=youtube&logoColor=white\" height=\"20\" />\n  </a>\n  <a href=\"https://twitter.com/neuralmagic\">\n    <img alt=\"Twitter\" src=\"https://img.shields.io/twitter/follow/neuralmagic?color=darkgreen&label=Follow&style=social\" height=\"20\" />\n  </a>\n</div>\n\n[DeepSparse](https://github.com/neuralmagic/deepsparse) is a CPU inference runtime that takes advantage of sparsity to accelerate neural network inference. Coupled with [SparseML](https://github.com/neuralmagic/sparseml), our optimization library for pruning and quantizing your models, DeepSparse delivers exceptional inference performance on CPU hardware.\n\n<p align=\"center\">\n   <img alt=\"NM Flow\" src=\"https://github.com/neuralmagic/deepsparse/assets/3195154/51e62fe7-9d9a-4fa5-a774-877158da1e29\" width=\"60%\" />\n</p>\n\n## \u2728NEW\u2728 DeepSparse LLMs\n\nNeural Magic is excited to announce initial support for performant LLM inference in DeepSparse with:\n- sparse kernels for speedups and memory savings from unstructured sparse weights.\n- 8-bit weight and activation quantization support.\n- efficient usage of cached attention keys and values for minimal memory movement.\n\n![mpt-chat-comparison](https://github.com/neuralmagic/deepsparse/assets/3195154/ccf39323-4603-4489-8462-7b103872aeb3)\n\n### Try It Now\n\nInstall (requires Linux):\n```bash\npip install -U deepsparse-nightly[llm]\n```\n\nRun inference:\n```python\nfrom deepsparse import TextGeneration\npipeline = TextGeneration(model=\"zoo:mpt-7b-dolly_mpt_pretrain-pruned50_quantized\")\n\nprompt=\"\"\"\nBelow is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: what is sparsity? ### Response:\n\"\"\"\nprint(pipeline(prompt, max_new_tokens=75).generations[0].text)\n\n# Sparsity is the property of a matrix or other data structure in which a large number of elements are zero and a smaller number of elements are non-zero. In the context of machine learning, sparsity can be used to improve the efficiency of training and prediction.\n```\n\nCheck out the [`TextGeneration` documentation for usage details](https://github.com/neuralmagic/deepsparse/blob/main/docs/llms/text-generation-pipeline.md) and get the [latest sparsified LLMs on our HF Collection](https://huggingface.co/collections/neuralmagic/deepsparse-sparse-llms-659d61e81774dd48343642bf).\n\n### Sparsity :handshake: Performance\n\nDeveloped in collaboration with IST Austria, [our recent paper](https://arxiv.org/abs/2310.06927) details a new technique called **Sparse Fine-Tuning**, which allows us to prune MPT-7B to 60% sparsity during fine-tuning without drop in accuracy. With our new support for LLMs, DeepSparse accelerates the sparse-quantized model 7x over the dense baseline:\n\n<div align=\"center\">\n    <img src=\"https://github.com/neuralmagic/deepsparse/assets/3195154/8687401c-f479-4999-ba6b-e01c747dace9\" width=\"60%\"/>\n</div>\n\n> [Learn more about our Sparse Fine-Tuning research.](https://github.com/neuralmagic/deepsparse/blob/main/research/mpt#sparse-finetuned-llms-with-deepsparse)\n\n> [Check out the model running live on Hugging Face.](https://huggingface.co/spaces/neuralmagic/sparse-mpt-7b-gsm8k)\n\n### LLM Roadmap\n\nFollowing this initial launch, we are rapidly expanding our support for LLMs, including:\n\n1. Productizing Sparse Fine-Tuning: Enable external users to apply sparse fine-tuning to their datasets via SparseML.\n2. Expanding model support: Apply our sparse fine-tuning results to Llama 2 and Mistral models.\n3. Pushing for higher sparsity: Improving our pruning algorithms to reach even higher sparsity.\n\n## Computer Vision and NLP Models\n\nIn addition to LLMs, DeepSparse supports many variants of CNNs and Transformer models, such as BERT, ViT, ResNet, EfficientNet, YOLOv5/8, and many more! Take a look at the [Computer Vision](https://sparsezoo.neuralmagic.com/?modelSet=computer_vision) and [Natural Language Processing](https://sparsezoo.neuralmagic.com/?modelSet=natural_language_processing) domains of [SparseZoo](https://sparsezoo.neuralmagic.com/), our home for optimized models.\n\n### Installation\n\nInstall via [PyPI](https://pypi.org/project/deepsparse/) ([optional dependencies detailed here](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide/installation.md)):\n\n```bash\npip install deepsparse \n```\n\nTo experiment with the latest features, there is a nightly build available using `pip install deepsparse-nightly` or you can clone and install from source using `pip install -e path/to/deepsparse`.\n\n#### System Requirements\n- Hardware: [x86 AVX2, AVX-512, AVX-512 VNNI and ARM v8.2+](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide/hardware-support.md)\n- Operating System: Linux\n- Python: 3.8-3.11\n- ONNX versions 1.5.0-1.15.0, ONNX opset version 11 or higher\n\nFor those using Mac or Windows, we recommend using Linux containers with Docker.\n\n## Deployment APIs\n\nDeepSparse includes three deployment APIs:\n\n- **Engine** is the lowest-level API. With Engine, you compile an ONNX model, pass tensors as input, and receive the raw outputs.\n- **Pipeline** wraps the Engine with pre- and post-processing. With Pipeline, you pass raw data and receive the prediction.\n- **Server** wraps Pipelines with a REST API using FastAPI. With Server, you send raw data over HTTP and receive the prediction.\n\n### Engine\n\nThe example below downloads a 90% pruned-quantized BERT model for sentiment analysis in ONNX format from SparseZoo, compiles the model, and runs inference on randomly generated input. Users can provide their own ONNX models, whether dense or sparse.\n\n```python\nfrom deepsparse import Engine\n\n# download onnx, compile\nzoo_stub = \"zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none\"\ncompiled_model = Engine(model=zoo_stub, batch_size=1)\n\n# run inference (input is raw numpy tensors, output is raw scores)\ninputs = compiled_model.generate_random_inputs()\noutput = compiled_model(inputs)\nprint(output)\n\n# > [array([[-0.3380675 ,  0.09602544]], dtype=float32)] << raw scores\n```\n\n### Pipeline\n\nPipelines wrap Engine with pre- and post-processing, enabling you to pass raw data and receive the post-processed prediction. The example below downloads a 90% pruned-quantized BERT model for sentiment analysis in ONNX format from SparseZoo, sets up a pipeline, and runs inference on sample data.\n\n```python\nfrom deepsparse import Pipeline\n\n# download onnx, set up pipeline\nzoo_stub = \"zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none\"  \nsentiment_analysis_pipeline = Pipeline.create(\n  task=\"sentiment-analysis\",    # name of the task\n  model_path=zoo_stub,          # zoo stub or path to local onnx file\n)\n\n# run inference (input is a sentence, output is the prediction)\nprediction = sentiment_analysis_pipeline(\"I love using DeepSparse Pipelines\")\nprint(prediction)\n# > labels=['positive'] scores=[0.9954759478569031]\n```\n\n### Server\n\nServer wraps Pipelines with REST APIs, enabling you to set up a model-serving endpoint running DeepSparse. This enables you to send raw data to DeepSparse over HTTP and receive the post-processed predictions. DeepSparse Server is launched from the command line and configured via arguments or a server configuration file. The following downloads a 90% pruned-quantized BERT model for sentiment analysis in ONNX format from SparseZoo and launches a sentiment analysis endpoint:\n\n```bash\ndeepsparse.server \\\n  --task sentiment-analysis \\\n  --model_path zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none\n```\n\nSending a request:\n\n```python\nimport requests\n\nurl = \"http://localhost:5543/v2/models/sentiment_analysis/infer\" # Server's port default to 5543\nobj = {\"sequences\": \"Snorlax loves my Tesla!\"}\n\nresponse = requests.post(url, json=obj)\nprint(response.text)\n# {\"labels\":[\"positive\"],\"scores\":[0.9965094327926636]}\n```\n\n### Additional Resources\n- [Use Cases Page](https://github.com/neuralmagic/deepsparse/tree/main/docs/use-cases) for more details on supported tasks\n- [Pipelines User Guide](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide/deepsparse-pipelines.md) for Pipeline documentation\n- [Server User Guide](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide/deepsparse-server.md) for Server documentation\n- [Benchmarking User Guide](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide/deepsparse-benchmarking.md) for benchmarking documentation\n- [Cloud Deployments and Demos](https://github.com/neuralmagic/deepsparse/tree/main/examples/)\n- [User Guide](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide) for more detailed documentation\n\n\n## Product Usage Analytics\n\nDeepSparse gathers basic usage telemetry, including, but not limited to, Invocations, Package, Version, and IP Address, for Product Usage Analytics purposes. Review Neural Magic's [Products Privacy Policy](https://neuralmagic.com/legal/) for further details on how we process this data. \n\nTo disable Product Usage Analytics, run:\n```bash\nexport NM_DISABLE_ANALYTICS=True\n```\n\nConfirm that telemetry is shut off through info logs streamed with engine invocation by looking for the phrase \"Skipping Neural Magic's latest package version check.\"\n\n## Community\n\n### Get In Touch\n\n- [Contribution Guide](https://github.com/neuralmagic/deepsparse/blob/main/CONTRIBUTING.md)\n- [Community Slack](https://neuralmagic.com/community/)\n- [GitHub Issue Queue](https://github.com/neuralmagic/deepsparse/issues) \n- [Subscribe To Our Newsletter](https://neuralmagic.com/subscribe/)\n- [Blog](https://www.neuralmagic.com/blog/) \n\nFor more general questions about Neural Magic, [complete this form.](http://neuralmagic.com/contact/)\n\n### License\n\n- **DeepSparse Community** is free to use and is licensed under the [Neural Magic DeepSparse Community License.](https://github.com/neuralmagic/deepsparse/blob/main/LICENSE)\nSome source code, example files, and scripts included in the DeepSparse GitHub repository or directory are licensed under the [Apache License Version 2.0](https://www.apache.org/licenses/LICENSE-2.0) as noted.\n\n- **DeepSparse Enterprise** requires a Trial License or [can be fully licensed](https://neuralmagic.com/legal/master-software-license-and-service-agreement/) for production, commercial applications.\n\n### Cite\n\nFind this project useful in your research or other communications? Please consider citing:\n\n```bibtex\n@misc{kurtic2023sparse,\n      title={Sparse Fine-Tuning for Inference Acceleration of Large Language Models}, \n      author={Eldar Kurtic and Denis Kuznedelev and Elias Frantar and Michael Goin and Dan Alistarh},\n      year={2023},\n      url={https://arxiv.org/abs/2310.06927},\n      eprint={2310.06927},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n\n@misc{kurtic2022optimal,\n      title={The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models}, \n      author={Eldar Kurtic and Daniel Campos and Tuan Nguyen and Elias Frantar and Mark Kurtz and Benjamin Fineran and Michael Goin and Dan Alistarh},\n      year={2022},\n      url={https://arxiv.org/abs/2203.07259},\n      eprint={2203.07259},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n\n@InProceedings{\n    pmlr-v119-kurtz20a, \n    title = {Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks}, \n    author = {Kurtz, Mark and Kopinsky, Justin and Gelashvili, Rati and Matveev, Alexander and Carr, John and Goin, Michael and Leiserson, William and Moore, Sage and Nell, Bill and Shavit, Nir and Alistarh, Dan}, \n    booktitle = {Proceedings of the 37th International Conference on Machine Learning}, \n    pages = {5533--5543}, \n    year = {2020}, \n    editor = {Hal Daum\u00e9 III and Aarti Singh}, \n    volume = {119}, \n    series = {Proceedings of Machine Learning Research}, \n    address = {Virtual}, \n    month = {13--18 Jul}, \n    publisher = {PMLR}, \n    pdf = {http://proceedings.mlr.press/v119/kurtz20a/kurtz20a.pdf},\n    url = {http://proceedings.mlr.press/v119/kurtz20a.html}\n}\n\n@article{DBLP:journals/corr/abs-2111-13445,\n  author    = {Eugenia Iofinova and Alexandra Peste and Mark Kurtz and Dan Alistarh},\n  title     = {How Well Do Sparse Imagenet Models Transfer?},\n  journal   = {CoRR},\n  volume    = {abs/2111.13445},\n  year      = {2021},\n  url       = {https://arxiv.org/abs/2111.13445},\n  eprinttype = {arXiv},\n  eprint    = {2111.13445},\n  timestamp = {Wed, 01 Dec 2021 15:16:43 +0100},\n  biburl    = {https://dblp.org/rec/journals/corr/abs-2111-13445.bib},\n  bibsource = {dblp computer science bibliography, https://dblp.org}\n}\n```\n# All Thanks To Our Contributors\n\n<a href=\"https://github.com/neuralmagic/deepsparse/graphs/contributors\">\n  <img src=\"https://contrib.rocks/image?repo=neuralmagic/deepsparse\" />\n</a>\n",
    "bugtrack_url": null,
    "license": "Neural Magic Master Software License and Service Agreement",
    "summary": "An inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application",
    "version": "1.8.0",
    "project_urls": {
        "Homepage": "https://github.com/neuralmagic/deepsparse"
    },
    "split_keywords": [
        "inference",
        " machine learning",
        " x86",
        " x86_64",
        " avx2",
        " avx512",
        " neural network",
        " sparse",
        " inference engine",
        " cpu",
        " runtime",
        " deepsparse",
        " computer vision",
        " object detection",
        " sparsity"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "db56c6bcd40bd5801ccdd779139d61935823e546049b293b2679dbf652f647ef",
                "md5": "29d9ed2031659a92b5e6f0a01f5de50e",
                "sha256": "d7fd5b4e90c5622ff7d78015ff719301523c2bcb1411fd21852321421c3f5fc8"
            },
            "downloads": -1,
            "filename": "deepsparse_ent-1.8.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "29d9ed2031659a92b5e6f0a01f5de50e",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": "<3.12,>=3.8",
            "size": 41648214,
            "upload_time": "2024-07-19T16:31:01",
            "upload_time_iso_8601": "2024-07-19T16:31:01.042060Z",
            "url": "https://files.pythonhosted.org/packages/db/56/c6bcd40bd5801ccdd779139d61935823e546049b293b2679dbf652f647ef/deepsparse_ent-1.8.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f740973b934610fe02136f95ca13d8867f69280042af7a25e14eb06335a6a33f",
                "md5": "38825bef93529a5f76a7fbf6b1a09f78",
                "sha256": "d8337e70e75e1947d2250f618089de6633685ad48cd8fcee567078a59b4fcd58"
            },
            "downloads": -1,
            "filename": "deepsparse_ent-1.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "38825bef93529a5f76a7fbf6b1a09f78",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": "<3.12,>=3.8",
            "size": 49760304,
            "upload_time": "2024-07-19T16:30:39",
            "upload_time_iso_8601": "2024-07-19T16:30:39.442417Z",
            "url": "https://files.pythonhosted.org/packages/f7/40/973b934610fe02136f95ca13d8867f69280042af7a25e14eb06335a6a33f/deepsparse_ent-1.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "25b250d64e2a8eec16b54eea6dc6f0da8ff659a8d285f2d2da4f9a86a3b95c8f",
                "md5": "bf2fe275fa8adde9d085153bb4e258a2",
                "sha256": "b7e68d01df82aec092b804ea289e1f25aeca250ef151a99fad7f51c0a844e52d"
            },
            "downloads": -1,
            "filename": "deepsparse_ent-1.8.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "bf2fe275fa8adde9d085153bb4e258a2",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": "<3.12,>=3.8",
            "size": 41648241,
            "upload_time": "2024-07-19T16:31:37",
            "upload_time_iso_8601": "2024-07-19T16:31:37.809367Z",
            "url": "https://files.pythonhosted.org/packages/25/b2/50d64e2a8eec16b54eea6dc6f0da8ff659a8d285f2d2da4f9a86a3b95c8f/deepsparse_ent-1.8.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d533c6ac83d28922e428a80852a325297bf8911a58e31a8a94bea556e2d6f44f",
                "md5": "c3de4a093e36eae3e53db979256fdbda",
                "sha256": "140b3d17c75097caa49ee570a9eca81fb0c7e2d4afaeba3c6b772ae63d363ea7"
            },
            "downloads": -1,
            "filename": "deepsparse_ent-1.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "c3de4a093e36eae3e53db979256fdbda",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": "<3.12,>=3.8",
            "size": 49760124,
            "upload_time": "2024-07-19T16:31:20",
            "upload_time_iso_8601": "2024-07-19T16:31:20.429773Z",
            "url": "https://files.pythonhosted.org/packages/d5/33/c6ac83d28922e428a80852a325297bf8911a58e31a8a94bea556e2d6f44f/deepsparse_ent-1.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "69df987a045911f340d36592e4a59c04cd75b6d1740fd1dfaad7fa6d188e6e6c",
                "md5": "813fb753582acf1fec337c7d21f267fb",
                "sha256": "4d3155e8a131629a8ac35c5761052850c67744054eaa46ee05ab54ceac05bfc3"
            },
            "downloads": -1,
            "filename": "deepsparse_ent-1.8.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "813fb753582acf1fec337c7d21f267fb",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": "<3.12,>=3.8",
            "size": 41647717,
            "upload_time": "2024-07-19T16:29:51",
            "upload_time_iso_8601": "2024-07-19T16:29:51.578166Z",
            "url": "https://files.pythonhosted.org/packages/69/df/987a045911f340d36592e4a59c04cd75b6d1740fd1dfaad7fa6d188e6e6c/deepsparse_ent-1.8.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "56d036f76b19ddc7dad975e18b889cf2bcd88c0eab9be29f967e5fb714b1c8ec",
                "md5": "de01020e96438a63ec78ccf2f5278ab4",
                "sha256": "298d63414afac8c50246407bf30a67d14b16ad4356bd5d4e1dee762ada84c1b0"
            },
            "downloads": -1,
            "filename": "deepsparse_ent-1.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "de01020e96438a63ec78ccf2f5278ab4",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": "<3.12,>=3.8",
            "size": 49760293,
            "upload_time": "2024-07-19T16:29:32",
            "upload_time_iso_8601": "2024-07-19T16:29:32.240410Z",
            "url": "https://files.pythonhosted.org/packages/56/d0/36f76b19ddc7dad975e18b889cf2bcd88c0eab9be29f967e5fb714b1c8ec/deepsparse_ent-1.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c6c33e23fae27c5f8c52ce34995c1e63bf0e4a85eb272dd0688b5024a380bee9",
                "md5": "9a0248bf6d09ed30f0a4dc51dfb170e4",
                "sha256": "d04f849d4d8519e2f6d186f378d4da4ec7f1a247d84dfaf96f75e9c22f371595"
            },
            "downloads": -1,
            "filename": "deepsparse_ent-1.8.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "9a0248bf6d09ed30f0a4dc51dfb170e4",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": "<3.12,>=3.8",
            "size": 41648223,
            "upload_time": "2024-07-19T16:30:23",
            "upload_time_iso_8601": "2024-07-19T16:30:23.711376Z",
            "url": "https://files.pythonhosted.org/packages/c6/c3/3e23fae27c5f8c52ce34995c1e63bf0e4a85eb272dd0688b5024a380bee9/deepsparse_ent-1.8.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "423b5ca9df04c603749cd9d57e19bbe7199775213924c92879d6301b12f11c5e",
                "md5": "4403910780c2e3db5b292805ddc131b1",
                "sha256": "2ebd79e1d0eddcf1ace39cc423c41f75ac2c0bed9d12d7ad95d9f9f98623192c"
            },
            "downloads": -1,
            "filename": "deepsparse_ent-1.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "4403910780c2e3db5b292805ddc131b1",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": "<3.12,>=3.8",
            "size": 49761349,
            "upload_time": "2024-07-19T16:30:09",
            "upload_time_iso_8601": "2024-07-19T16:30:09.428893Z",
            "url": "https://files.pythonhosted.org/packages/42/3b/5ca9df04c603749cd9d57e19bbe7199775213924c92879d6301b12f11c5e/deepsparse_ent-1.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d6d91f237fdfd4cd9d7f2faf381094a7eab4e2a92cbbe39c3753710ef312b181",
                "md5": "b9282c5dacadb0295c92f060f2556acc",
                "sha256": "46241ee6edc51e6e9a70f7fb71804e291a378d5521c6b26e50af761b3d6ce092"
            },
            "downloads": -1,
            "filename": "deepsparse_ent-1.8.0.tar.gz",
            "has_sig": false,
            "md5_digest": "b9282c5dacadb0295c92f060f2556acc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.8",
            "size": 49269231,
            "upload_time": "2024-07-19T16:32:32",
            "upload_time_iso_8601": "2024-07-19T16:32:32.962319Z",
            "url": "https://files.pythonhosted.org/packages/d6/d9/1f237fdfd4cd9d7f2faf381094a7eab4e2a92cbbe39c3753710ef312b181/deepsparse_ent-1.8.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-19 16:32:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "neuralmagic",
    "github_project": "deepsparse",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "deepsparse-ent"
}

Neuralmagic, Inc.