Name | bentomlx JSON |
Version |
0.0.2
JSON |
| download |
home_page | |
Summary | |
upload_time | 2024-03-10 15:22:25 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.10 |
license | Apache-2.0 |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# bentoml-extensions [](https://pdm-project.org) 
#### `todo`: plan for 2024
[[Project]bentoml-extensions alpha release ](https://github.com/users/KimSoungRyoul/projects/2)
* FeatureStore Runner [ODM],
* optimize cpu inference [ipex, ovms]
## QuickStart
~~~shell
pip install bentoml bentomlx
~~~
todo ...
## FeatureStore
* `pip install bentomlx[featurestore-redis]`
* `pip install bentomlx[featurestore-aerospike]`
~~~Python
import logging
from typing import Dict, TypedDict
import bentoml
import numpy as np
from bentoml.io import JSON
import bentomlx
from bentomlx.feature_repo import DBSettings
class IrisFeature(TypedDict, total=False):
pk: str
sepal_len: float | int
sepal_width: float
petal_len: float | int
petal_width: float
# db_settings = DBSettings(namespace="test", hosts=["127.0.0.1:3000"], use_shared_connection=True)
db_settings = DBSettings() # EXPORT ENV BENTOML_REPO_NAMESPACE=test; BENTOML_REPO__HOSTS=localhost:3000; BENTOML_REPO__USE_SHARED_CONNECTION=true
repo_runner = bentomlx.feature_repo.aerospike_fs(db_settings).to_repo_runner(entity_name="iris_features", embedded=True)
iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()
svc = bentoml.Service("iris_classifier_svc", runners=[repo_runner, iris_clf_runner])
logger = logging.getLogger("bentoml")
@svc.api(
input=JSON.from_sample(["pk1", "pk2", "pk3"]),
output=JSON(),
)
async def classify(feature_keys: list[str]) -> Dict[str, list[int]]:
# features: list[list[float]] = await repository.get_many.async_run(pks=feature_keys, _nokey=True) # [[4.9, 3.0, 1.4, 0.2], [5.1 3.5 1.4 0.3], [5.5 2.5 4. 1.3]]
# features: list[IrisFeature] = repo_runner.get_many.run(pks=feature_keys) # input_arr = [{"pk": "pk1": "sepal_len":4.9, "sepal_width":3. "petal_len":1.4, "petal_width": 0.2], ... ]
features: np.array = repo_runner.get_many.run(pks=feature_keys, _numpy=True) # input_arr = np.array([[4.9, 3.0, 1.4, 0.2], [5.1 3.5 1.4 0.3], [5.5 2.5 4. 1.3]])
result: np.ndarray = await iris_clf_runner.predict.async_run(features)
return {"result": result.tolist()}
~~~
## CPU Optimized Runner
* `bentomlx[ipex]`
* `bentomlx[ovms]` `like a bentoml[triton]`
~~~Python
import bentoml
import bentomlx
#iris_clf_runner = bentoml.ipex.get("iris_clf:latest").to_runner()
# change like this
iris_clf_runner = bentomlx.pytorch.get("iris_clf:latest").to_runner(intel_optimize=True)
xxx_runner = bentomlx.transformers.get("xxx:latest").to_runner(intel_optimize=True)
xxx_tf_runner = bentomlx.tensorflow.get("xxx:latest").to_runner(intel_optimize=True)
# support only in bentoml-extension
# model type such as ipex, tensorflow, onnx
xxx_ov_runner = bentomlx.openvino.get("xxx:latest").to_runner(intel_optimize=True)
# or
xxx_ov_runner = bentomlx.pytorch.get("xxx:latest").to_runner(openvino=True, post_quant=True)
# intel bert op
# https://www.intel.com/content/www/us/en/developer/articles/guide/bert-ai-inference-amx-4th-gen-xeon-scalable.html
# ?? need discussion about Out of ML serving framework responsibility
#https://github.com/intel/light-model-transformer/tree/main/BERT
xxx_ov_runner = bentomlx.experimental.light_model_transformer.bert.get("xxx:latest").to_runner(post_quant=True,quant_dtype=torch.float32)
~~~
## Post(Runtime) Model Compression (oneapi nncl)
* post quant ?
* ...

~~~
❯ python fibo_main.py # mypyc
0.17745399475097656
0.1755237579345703
0.17790436744689941
0.18230915069580078
❯ python fibo_main.py # 3.11.7
1.2891952991485596
1.2943885326385498
1.2915637493133545
1.305750846862793
❯ pyenv global cinder-3.10-dev
❯ PYTHONJIT=1 python fibo_main.py
2.9099485874176025
2.918196678161621
2.929981231689453
2.9137821197509766
❯ pyenv global pypy3.10-7.3.15
❯ PYTHONJIT=1 python fibo_main.py
0.8286490440368652
0.8387455940246582
0.8492231369018555
0.84218430519104
~~~
#### BertOperator
#### Batch-Size 1
~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-large-uncased --warmup-time 5 --run-time 20
:: initializing oneAPI environment ...
entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-large-uncased --warmup-time 5 --run-time 20
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-large-uncased | False | False | False | False | 1 | 128 | 0.878 | 1139.490 ms |
~~~
~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-large-uncased --bert-op --warmup-time 5 --run-time 20 -q
:: initializing oneAPI environment ...
entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-large-uncased --bert-op --warmup-time 5 --run-time 20 -q
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-large-uncased | False | True | True | False | 1 | 128 | 6.124 | 163.285 ms |
~~~
#### Batch-Size 10
~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-large-uncased --warmup-time 5 --run-time 20 --batch-size 10
:: initializing oneAPI environment ...
entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-large-uncased --warmup-time 5 --run-time 20 --batch-size 10
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-large-uncased | False | False | False | False | 10 | 128 | 1.588 | 6296.495 ms |
~~~
~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-large-uncased --bert-op --warmup-time 5 --run-time 20 --batch-size 10 --quant
:: initializing oneAPI environment ...
entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-large-uncased --bert-op --warmup-time 5 --run-time 20 --batch-size 10 --quant
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-large-uncased | False | True | True | False | 10 | 128 | 5.959 | 1678.104 ms |
~~~
### bert-base-uncased
~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20
:: initializing oneAPI environment ...
entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
config.json: 100%|██████████| 570/570 [00:00<00:00, 170kB/s]
model.safetensors: 100%|██████████| 440M/440M [00:12<00:00, 36.2MB/s]
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | False | False | False | 1 | 128 | 3.832 | 260.979 ms |
~~~
~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant
:: initializing oneAPI environment ...
entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
config.json: 100%|██████████| 570/570 [00:00<00:00, 168kB/s]
model.safetensors: 100%|██████████| 440M/440M [00:11<00:00, 37.0MB/s]
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | True | True | False | 1 | 128 | 16.622 | 60.160 ms |
~~~
### bert-base-uncased (batch-size 10)
~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant --batch-size 10
:: initializing oneAPI environment ...
entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant --batch-size 10
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
config.json: 100%|██████████| 570/570 [00:00<00:00, 172kB/s]
model.safetensors: 100%|██████████| 440M/440M [00:12<00:00, 35.2MB/s]
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | True | True | False | 10 | 128 | 23.923 | 418.015 ms |
~~~
### diff ( origin bert, ipex bert, bert operator)
~~~
---------------- BatchSize 1 ------------
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | False | False | False | 1 | 128 | 4.89 | 204.520 ms |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | True | False | False | False | 1 | 128 | 5.243 | 190.739 ms |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | True | False | False | 1 | 128 | 5.88 | 170.077 ms |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | True | True | False | 1 | 128 | 15.444 | 64.752 ms |
~~~
~~~
---------------- BatchSize 10 ------------
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-large-uncased | False | False | False | False | 10 | 128 | 1.588 | 6296.495 ms |
|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-large-uncased | False | True | True | False | 10 | 128 | 5.959 | 1678.104 ms |
~~~
~~~
-------------- BatchSize 20 -------------
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | False | True | False | 20 | 128 | 5.441 | 3675.675 ms |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | True | True | False | 20 | 128 | 16.334 | 1224.473 ms |
~~~
~~~
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | False | True | False | 10 | 128 | 5.727 | 1746.223 ms |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | True | True | False | 10 | 128 | 17.384 | 575.239 ms |
~~~
~~~
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | False | True | False | 100 | 128 | 5.15 | 19417.498 ms |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | True | True | False | 100 | 128 | 18.638 | 5365.511 ms |
~~~
#### origin bert
~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20
:: initializing oneAPI environment ...
entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
config.json: 100%|██████████| 570/570 [00:00<00:00, 169kB/s]
model.safetensors: 100%|██████████| 440M/440M [00:12<00:00, 36.2MB/s]
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | False | False | False | 1 | 128 | 4.89 | 204.520 ms |
~~~
#### ipex optimized bert
~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --ipex
:: initializing oneAPI environment ...
entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --ipex
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
config.json: 100%|██████████| 570/570 [00:00<00:00, 175kB/s]
model.safetensors: 100%|██████████| 440M/440M [00:11<00:00, 38.7MB/s][W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
/usr/local/lib/python3.8/dist-packages/intel_extension_for_pytorch/frontend.py:396: UserWarning: Conv BatchNorm folding failed during the optimize process.
warnings.warn("Conv BatchNorm folding failed during the optimize process.")
/usr/local/lib/python3.8/dist-packages/intel_extension_for_pytorch/frontend.py:401: UserWarning: Linear BatchNorm folding failed during the optimize process.
warnings.warn("Linear BatchNorm folding failed during the optimize process.")
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | True | False | False | False | 1 | 128 | 5.243 | 190.739 ms |
~~~
#### bert operator
~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op
:: initializing oneAPI environment ...
entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
config.json: 100%|██████████| 570/570 [00:00<00:00, 187kB/s]
model.safetensors: 100%|██████████| 440M/440M [00:12<00:00, 35.4MB/s]
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | True | False | False | 1 | 128 | 5.88 | 170.077 ms |
~~~
#### bert operator (with quant)
~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant
:: initializing oneAPI environment ...
entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
config.json: 100%|██████████| 570/570 [00:00<00:00, 159kB/s]
model.safetensors: 100%|██████████| 440M/440M [00:11<00:00, 39.1MB/s]
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | True | True | False | 1 | 128 | 15.444 | 64.752 ms |
~~~
#### origin bert VS bert operator (with quant, batch-size 20)
* batch-size=1 : 성능차이가 10~20% 차이이지만
* batch-size=20 : 3배 정도 성능 차이난다.
* `DNNL_CPU_RUNTIME=TBB|OMP` 는 큰 차이를 확인 못함 이론상 tbb는 thread num이 늘어나도 성능저하 없는게 특징
~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo-oneapi-tbb-onednn-v34pc numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --quant --batch-size 20
config.json: 100%|██████████| 570/570 [00:00<00:00, 5.15MB/s]
model.safetensors: 100%|██████████| 440M/440M [00:07<00:00, 61.7MB/s]
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | False | True | False | 20 | 128 | 5.441 | 3675.675 ms |
❯ sudo docker run --rm --privileged bert-op-pytorch-demo-oneapi-tbb-onednn-v34pc numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant --batch-size 20
config.json: 100%|██████████| 570/570 [00:00<00:00, 5.25MB/s]
model.safetensors: 100%|██████████| 440M/440M [00:06<00:00, 63.2MB/s]
| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
| 0 | bert-base-uncased | False | True | True | False | 20 | 128 | 16.334 | 1224.473 ms |
~~~
pip install --index-url https://pypi.anaconda.org/intel/simple --extra-index-url https://pypi.org/simple
pip install dpnp numba-dpex dpctl intel-optimization-for-horovod==0.28.1.1 torch==2.0.1 torchvision==0.15.2 --extra-index-url=https://download.pytorch.org/whl/cpu intel_extension_for_pytorch==2.0.100 oneccl-bind-pt==2.0.0 --extra-index-url=https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
Raw data
{
"_id": null,
"home_page": "",
"name": "bentomlx",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "",
"keywords": "",
"author": "",
"author_email": "kimsoungryoul <kimsoungryoul@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/f7/46/3f3c69030291462641375ed2ee8b4afd371e7837df7264e548e2f613e0d3/bentomlx-0.0.2.tar.gz",
"platform": null,
"description": "# bentoml-extensions [](https://pdm-project.org) \n\n#### `todo`: plan for 2024\n[[Project]bentoml-extensions alpha release ](https://github.com/users/KimSoungRyoul/projects/2)\n* FeatureStore Runner [ODM],\n* optimize cpu inference [ipex, ovms]\n\n\n## QuickStart\n\n~~~shell\npip install bentoml bentomlx\n~~~\n\ntodo ...\n\n\n\n## FeatureStore\n* `pip install bentomlx[featurestore-redis]`\n* `pip install bentomlx[featurestore-aerospike]`\n\n~~~Python\nimport logging\nfrom typing import Dict, TypedDict\n\nimport bentoml\nimport numpy as np\nfrom bentoml.io import JSON\n\n\nimport bentomlx\nfrom bentomlx.feature_repo import DBSettings\n\n\nclass IrisFeature(TypedDict, total=False):\n pk: str\n sepal_len: float | int\n sepal_width: float\n petal_len: float | int\n petal_width: float\n\n\n# db_settings = DBSettings(namespace=\"test\", hosts=[\"127.0.0.1:3000\"], use_shared_connection=True)\ndb_settings = DBSettings() # EXPORT ENV BENTOML_REPO_NAMESPACE=test; BENTOML_REPO__HOSTS=localhost:3000; BENTOML_REPO__USE_SHARED_CONNECTION=true\n\nrepo_runner = bentomlx.feature_repo.aerospike_fs(db_settings).to_repo_runner(entity_name=\"iris_features\", embedded=True)\n\niris_clf_runner = bentoml.sklearn.get(\"iris_clf:latest\").to_runner()\n\nsvc = bentoml.Service(\"iris_classifier_svc\", runners=[repo_runner, iris_clf_runner])\n\nlogger = logging.getLogger(\"bentoml\")\n\n\n@svc.api(\n input=JSON.from_sample([\"pk1\", \"pk2\", \"pk3\"]),\n output=JSON(),\n)\nasync def classify(feature_keys: list[str]) -> Dict[str, list[int]]:\n # features: list[list[float]] = await repository.get_many.async_run(pks=feature_keys, _nokey=True) # [[4.9, 3.0, 1.4, 0.2], [5.1 3.5 1.4 0.3], [5.5 2.5 4. 1.3]]\n # features: list[IrisFeature] = repo_runner.get_many.run(pks=feature_keys) # input_arr = [{\"pk\": \"pk1\": \"sepal_len\":4.9, \"sepal_width\":3. \"petal_len\":1.4, \"petal_width\": 0.2], ... ]\n features: np.array = repo_runner.get_many.run(pks=feature_keys, _numpy=True) # input_arr = np.array([[4.9, 3.0, 1.4, 0.2], [5.1 3.5 1.4 0.3], [5.5 2.5 4. 1.3]])\n result: np.ndarray = await iris_clf_runner.predict.async_run(features)\n return {\"result\": result.tolist()}\n\n~~~\n\n\n\n## CPU Optimized Runner\n * `bentomlx[ipex]`\n * `bentomlx[ovms]` `like a bentoml[triton]`\n\n~~~Python\nimport bentoml\nimport bentomlx\n\n\n#iris_clf_runner = bentoml.ipex.get(\"iris_clf:latest\").to_runner()\n# change like this\niris_clf_runner = bentomlx.pytorch.get(\"iris_clf:latest\").to_runner(intel_optimize=True)\nxxx_runner = bentomlx.transformers.get(\"xxx:latest\").to_runner(intel_optimize=True)\nxxx_tf_runner = bentomlx.tensorflow.get(\"xxx:latest\").to_runner(intel_optimize=True)\n\n\n# support only in bentoml-extension\n# model type such as ipex, tensorflow, onnx\nxxx_ov_runner = bentomlx.openvino.get(\"xxx:latest\").to_runner(intel_optimize=True)\n# or\nxxx_ov_runner = bentomlx.pytorch.get(\"xxx:latest\").to_runner(openvino=True, post_quant=True)\n\n# intel bert op\n# https://www.intel.com/content/www/us/en/developer/articles/guide/bert-ai-inference-amx-4th-gen-xeon-scalable.html\n# ?? need discussion about Out of ML serving framework responsibility\n#https://github.com/intel/light-model-transformer/tree/main/BERT\nxxx_ov_runner = bentomlx.experimental.light_model_transformer.bert.get(\"xxx:latest\").to_runner(post_quant=True,quant_dtype=torch.float32)\n\n~~~\n\n\n## Post(Runtime) Model Compression (oneapi nncl)\n * post quant ?\n * ...\n\n\n\n\n\n\n\n~~~\n\u276f python fibo_main.py # mypyc\n0.17745399475097656\n0.1755237579345703\n0.17790436744689941\n0.18230915069580078\n\n\u276f python fibo_main.py # 3.11.7\n1.2891952991485596\n1.2943885326385498\n1.2915637493133545\n1.305750846862793\n\n\u276f pyenv global cinder-3.10-dev\n\u276f PYTHONJIT=1 python fibo_main.py\n2.9099485874176025\n2.918196678161621\n2.929981231689453\n2.9137821197509766\n\n\u276f pyenv global pypy3.10-7.3.15\n\u276f PYTHONJIT=1 python fibo_main.py\n0.8286490440368652\n0.8387455940246582\n0.8492231369018555\n0.84218430519104\n\n\n~~~\n\n\n\n\n\n\n#### BertOperator\n\n#### Batch-Size 1\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-large-uncased --warmup-time 5 --run-time 20\n \n:: initializing oneAPI environment ...\n entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-large-uncased --warmup-time 5 --run-time 20\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-large-uncased | False | False | False | False | 1 | 128 | 0.878 | 1139.490 ms |\n~~~\n\n\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-large-uncased --bert-op --warmup-time 5 --run-time 20 -q\n \n:: initializing oneAPI environment ...\n entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-large-uncased --bert-op --warmup-time 5 --run-time 20 -q\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-large-uncased | False | True | True | False | 1 | 128 | 6.124 | 163.285 ms |\n~~~\n\n\n#### Batch-Size 10\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-large-uncased --warmup-time 5 --run-time 20 --batch-size 10\n \n:: initializing oneAPI environment ...\n entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-large-uncased --warmup-time 5 --run-time 20 --batch-size 10\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-large-uncased | False | False | False | False | 10 | 128 | 1.588 | 6296.495 ms |\n\n~~~\n\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-large-uncased --bert-op --warmup-time 5 --run-time 20 --batch-size 10 --quant\n \n:: initializing oneAPI environment ...\n entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-large-uncased --bert-op --warmup-time 5 --run-time 20 --batch-size 10 --quant\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-large-uncased | False | True | True | False | 10 | 128 | 5.959 | 1678.104 ms |\n\n~~~\n\n### bert-base-uncased\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20\n \n:: initializing oneAPI environment ...\n entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \nconfig.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 170kB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:12<00:00, 36.2MB/s]\n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | False | False | False | 1 | 128 | 3.832 | 260.979 ms |\n\n~~~\n\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant\n \n:: initializing oneAPI environment ...\n entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \nconfig.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 168kB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:11<00:00, 37.0MB/s] \n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | True | True | False | 1 | 128 | 16.622 | 60.160 ms |\n\n~~~\n\n\n### bert-base-uncased (batch-size 10)\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant --batch-size 10\n \n:: initializing oneAPI environment ...\n entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant --batch-size 10\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \nconfig.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 172kB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:12<00:00, 35.2MB/s]\n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | True | True | False | 10 | 128 | 23.923 | 418.015 ms |\n\n~~~\n\n\n### diff ( origin bert, ipex bert, bert operator)\n\n~~~\n---------------- BatchSize 1 ------------\n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | False | False | False | 1 | 128 | 4.89 | 204.520 ms |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | True | False | False | False | 1 | 128 | 5.243 | 190.739 ms |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | True | False | False | 1 | 128 | 5.88 | 170.077 ms |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | True | True | False | 1 | 128 | 15.444 | 64.752 ms |\n~~~\n\n\n~~~\n---------------- BatchSize 10 ------------\n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-large-uncased | False | False | False | False | 10 | 128 | 1.588 | 6296.495 ms |\n|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-large-uncased | False | True | True | False | 10 | 128 | 5.959 | 1678.104 ms |\n~~~\n\n\n~~~\n-------------- BatchSize 20 -------------\n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | False | True | False | 20 | 128 | 5.441 | 3675.675 ms |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | True | True | False | 20 | 128 | 16.334 | 1224.473 ms |\n~~~\n\n\n~~~\n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | False | True | False | 10 | 128 | 5.727 | 1746.223 ms |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | True | True | False | 10 | 128 | 17.384 | 575.239 ms |\n\n~~~\n\n\n\n~~~\n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | False | True | False | 100 | 128 | 5.15 | 19417.498 ms |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | True | True | False | 100 | 128 | 18.638 | 5365.511 ms |\n~~~\n\n\n\n#### origin bert\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20\n \n:: initializing oneAPI environment ...\n entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \nconfig.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 169kB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:12<00:00, 36.2MB/s] \n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | False | False | False | 1 | 128 | 4.89 | 204.520 ms |\n~~~\n\n#### ipex optimized bert\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --ipex\n \n:: initializing oneAPI environment ...\n entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --ipex\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \nconfig.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 175kB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:11<00:00, 38.7MB/s][W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())\n\n/usr/local/lib/python3.8/dist-packages/intel_extension_for_pytorch/frontend.py:396: UserWarning: Conv BatchNorm folding failed during the optimize process.\n warnings.warn(\"Conv BatchNorm folding failed during the optimize process.\")\n/usr/local/lib/python3.8/dist-packages/intel_extension_for_pytorch/frontend.py:401: UserWarning: Linear BatchNorm folding failed during the optimize process.\n warnings.warn(\"Linear BatchNorm folding failed during the optimize process.\")\n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | True | False | False | False | 1 | 128 | 5.243 | 190.739 ms |\n\n~~~\n\n\n#### bert operator\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op\n \n:: initializing oneAPI environment ...\n entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \nconfig.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 187kB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:12<00:00, 35.4MB/s] \n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | True | False | False | 1 | 128 | 5.88 | 170.077 ms |\n~~~\n\n\n#### bert operator (with quant)\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant\n \n:: initializing oneAPI environment ...\n entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \nconfig.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 159kB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:11<00:00, 39.1MB/s] \n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | True | True | False | 1 | 128 | 15.444 | 64.752 ms |\n\n~~~\n\n\n\n\n#### origin bert VS bert operator (with quant, batch-size 20)\n\n* batch-size=1 : \uc131\ub2a5\ucc28\uc774\uac00 10~20% \ucc28\uc774\uc774\uc9c0\ub9cc\n* batch-size=20 : 3\ubc30 \uc815\ub3c4 \uc131\ub2a5 \ucc28\uc774\ub09c\ub2e4. \n* `DNNL_CPU_RUNTIME=TBB|OMP` \ub294 \ud070 \ucc28\uc774\ub97c \ud655\uc778 \ubabb\ud568 \uc774\ub860\uc0c1 tbb\ub294 thread num\uc774 \ub298\uc5b4\ub098\ub3c4 \uc131\ub2a5\uc800\ud558 \uc5c6\ub294\uac8c \ud2b9\uc9d5 \n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo-oneapi-tbb-onednn-v34pc numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --quant --batch-size 20\n config.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 5.15MB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:07<00:00, 61.7MB/s] \n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | False | True | False | 20 | 128 | 5.441 | 3675.675 ms |\n\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo-oneapi-tbb-onednn-v34pc numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant --batch-size 20\nconfig.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 5.25MB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:06<00:00, 63.2MB/s] \n| | Model | IPEX | BERT Op | Quantization | BFloat16 | Batch Size | Seq Len | Throughput [samples/s] | Latency [ms] |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n| 0 | bert-base-uncased | False | True | True | False | 20 | 128 | 16.334 | 1224.473 ms |\n\n~~~\n\npip install --index-url https://pypi.anaconda.org/intel/simple --extra-index-url https://pypi.org/simple\n\npip install dpnp numba-dpex dpctl intel-optimization-for-horovod==0.28.1.1 torch==2.0.1 torchvision==0.15.2 --extra-index-url=https://download.pytorch.org/whl/cpu intel_extension_for_pytorch==2.0.100 oneccl-bind-pt==2.0.0 --extra-index-url=https://pytorch-extension.intel.com/release-whl/stable/cpu/us/\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "",
"version": "0.0.2",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "44304f47241e9a0e1d46dc512175b11aafe5edc16754bf4f53ad9e2e31d815d5",
"md5": "b20b654248194e899931472466bed667",
"sha256": "440355fef8f041fd24b4a67a6bb62b6a13f1e8455fc3fead584be875a75e711e"
},
"downloads": -1,
"filename": "bentomlx-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b20b654248194e899931472466bed667",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 656364,
"upload_time": "2024-03-10T15:22:22",
"upload_time_iso_8601": "2024-03-10T15:22:22.678361Z",
"url": "https://files.pythonhosted.org/packages/44/30/4f47241e9a0e1d46dc512175b11aafe5edc16754bf4f53ad9e2e31d815d5/bentomlx-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f7463f3c69030291462641375ed2ee8b4afd371e7837df7264e548e2f613e0d3",
"md5": "b52a1f4517d2763c841d7c5e60a10311",
"sha256": "c86a8074533874b72266dad895a9434275d3fae5d9af299af0be33cba9a4aaa7"
},
"downloads": -1,
"filename": "bentomlx-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "b52a1f4517d2763c841d7c5e60a10311",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 647269,
"upload_time": "2024-03-10T15:22:25",
"upload_time_iso_8601": "2024-03-10T15:22:25.541374Z",
"url": "https://files.pythonhosted.org/packages/f7/46/3f3c69030291462641375ed2ee8b4afd371e7837df7264e548e2f613e0d3/bentomlx-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-10 15:22:25",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "bentomlx"
}