bentomlx


Namebentomlx JSON
Version 0.0.2 PyPI version JSON
download
home_page
Summary
upload_time2024-03-10 15:22:25
maintainer
docs_urlNone
author
requires_python>=3.10
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # bentoml-extensions [![pdm-managed](https://img.shields.io/badge/pdm-managed-blueviolet)](https://pdm-project.org) ![python](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12-blue)

#### `todo`:  plan for 2024
[[Project]bentoml-extensions alpha release ](https://github.com/users/KimSoungRyoul/projects/2)
* FeatureStore Runner [ODM],
* optimize cpu inference [ipex, ovms]


## QuickStart

~~~shell
pip install bentoml bentomlx
~~~

todo ...



## FeatureStore
* `pip install bentomlx[featurestore-redis]`
* `pip install bentomlx[featurestore-aerospike]`

~~~Python
import logging
from typing import Dict, TypedDict

import bentoml
import numpy as np
from bentoml.io import JSON


import bentomlx
from bentomlx.feature_repo import DBSettings


class IrisFeature(TypedDict, total=False):
    pk: str
    sepal_len: float | int
    sepal_width: float
    petal_len: float | int
    petal_width: float


# db_settings = DBSettings(namespace="test", hosts=["127.0.0.1:3000"], use_shared_connection=True)
db_settings = DBSettings()  # EXPORT ENV BENTOML_REPO_NAMESPACE=test; BENTOML_REPO__HOSTS=localhost:3000; BENTOML_REPO__USE_SHARED_CONNECTION=true

repo_runner = bentomlx.feature_repo.aerospike_fs(db_settings).to_repo_runner(entity_name="iris_features", embedded=True)

iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()

svc = bentoml.Service("iris_classifier_svc", runners=[repo_runner, iris_clf_runner])

logger = logging.getLogger("bentoml")


@svc.api(
    input=JSON.from_sample(["pk1", "pk2", "pk3"]),
    output=JSON(),
)
async def classify(feature_keys: list[str]) -> Dict[str, list[int]]:
    # features: list[list[float]] = await repository.get_many.async_run(pks=feature_keys, _nokey=True) #  [[4.9, 3.0, 1.4, 0.2], [5.1 3.5 1.4 0.3], [5.5 2.5 4.  1.3]]
    # features: list[IrisFeature] = repo_runner.get_many.run(pks=feature_keys) # input_arr = [{"pk": "pk1": "sepal_len":4.9,  "sepal_width":3.  "petal_len":1.4, "petal_width": 0.2], ... ]
    features: np.array = repo_runner.get_many.run(pks=feature_keys, _numpy=True) # input_arr = np.array([[4.9, 3.0, 1.4, 0.2], [5.1 3.5 1.4 0.3], [5.5 2.5 4.  1.3]])
    result: np.ndarray = await iris_clf_runner.predict.async_run(features)
    return {"result": result.tolist()}

~~~



## CPU Optimized Runner
  * `bentomlx[ipex]`
  * `bentomlx[ovms]` `like a bentoml[triton]`

~~~Python
import bentoml
import bentomlx


#iris_clf_runner = bentoml.ipex.get("iris_clf:latest").to_runner()
# change like this
iris_clf_runner = bentomlx.pytorch.get("iris_clf:latest").to_runner(intel_optimize=True)
xxx_runner = bentomlx.transformers.get("xxx:latest").to_runner(intel_optimize=True)
xxx_tf_runner = bentomlx.tensorflow.get("xxx:latest").to_runner(intel_optimize=True)


# support only in bentoml-extension
# model type such as ipex, tensorflow, onnx
xxx_ov_runner = bentomlx.openvino.get("xxx:latest").to_runner(intel_optimize=True)
# or
xxx_ov_runner = bentomlx.pytorch.get("xxx:latest").to_runner(openvino=True, post_quant=True)

# intel bert op
# https://www.intel.com/content/www/us/en/developer/articles/guide/bert-ai-inference-amx-4th-gen-xeon-scalable.html
# ?? need discussion about Out of ML serving framework responsibility
#https://github.com/intel/light-model-transformer/tree/main/BERT
xxx_ov_runner = bentomlx.experimental.light_model_transformer.bert.get("xxx:latest").to_runner(post_quant=True,quant_dtype=torch.float32)

~~~


## Post(Runtime) Model Compression (oneapi nncl)
  * post quant ?
  * ...


![스크린샷 2023-11-27 오후 3 18 18](https://github.com/KimSoungRyoul/bentoml-extensions/assets/24240623/8b922a8f-99e6-4d69-a713-a03f3f7b0d27)




~~~
❯ python fibo_main.py # mypyc
0.17745399475097656
0.1755237579345703
0.17790436744689941
0.18230915069580078

❯ python fibo_main.py # 3.11.7
1.2891952991485596
1.2943885326385498
1.2915637493133545
1.305750846862793

❯ pyenv global cinder-3.10-dev
❯ PYTHONJIT=1 python fibo_main.py
2.9099485874176025
2.918196678161621
2.929981231689453
2.9137821197509766

❯ pyenv global pypy3.10-7.3.15
❯ PYTHONJIT=1 python fibo_main.py
0.8286490440368652
0.8387455940246582
0.8492231369018555
0.84218430519104


~~~






#### BertOperator

#### Batch-Size 1

~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-large-uncased --warmup-time 5 --run-time 20
 
:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
   args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-large-uncased --warmup-time 5 --run-time 20
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
 
|    | Model              | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-large-uncased | False  | False     | False          | False      |            1 |       128 |                    0.878 | 1139.490 ms    |
~~~



~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-large-uncased --bert-op --warmup-time 5 --run-time 20 -q
 
:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
   args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-large-uncased --bert-op --warmup-time 5 --run-time 20 -q
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
 
|    | Model              | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-large-uncased | False  | True      | True           | False      |            1 |       128 |                    6.124 | 163.285 ms     |
~~~


#### Batch-Size 10

~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-large-uncased --warmup-time 5 --run-time 20 --batch-size 10
 
:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
   args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-large-uncased --warmup-time 5 --run-time 20 --batch-size 10
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
 
|    | Model              | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-large-uncased | False  | False     | False          | False      |           10 |       128 |                    1.588 | 6296.495 ms    |

~~~


~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-large-uncased --bert-op  --warmup-time 5 --run-time 20 --batch-size 10 --quant
 
:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
   args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-large-uncased --bert-op --warmup-time 5 --run-time 20 --batch-size 10 --quant
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
 
|    | Model              | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-large-uncased | False  | True      | True           | False      |           10 |       128 |                    5.959 | 1678.104 ms    |

~~~

### bert-base-uncased

~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20
 
:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
   args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
 
config.json: 100%|██████████| 570/570 [00:00<00:00, 170kB/s]
model.safetensors: 100%|██████████| 440M/440M [00:12<00:00, 36.2MB/s]
|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | False     | False          | False      |            1 |       128 |                    3.832 | 260.979 ms     |

~~~


~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant
 
:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
   args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
 
config.json: 100%|██████████| 570/570 [00:00<00:00, 168kB/s]
model.safetensors: 100%|██████████| 440M/440M [00:11<00:00, 37.0MB/s] 
|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | True      | True           | False      |            1 |       128 |                   16.622 | 60.160 ms      |

~~~


### bert-base-uncased (batch-size 10)

~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant --batch-size 10
 
:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
   args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant --batch-size 10
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
 
config.json: 100%|██████████| 570/570 [00:00<00:00, 172kB/s]
model.safetensors: 100%|██████████| 440M/440M [00:12<00:00, 35.2MB/s]
|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | True      | True           | False      |           10 |       128 |                   23.923 | 418.015 ms     |

~~~


### diff ( origin bert, ipex bert, bert operator)

~~~
---------------- BatchSize 1 ------------
|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | False     | False          | False      |            1 |       128 |                     4.89 | 204.520 ms     |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | True   | False     | False          | False      |            1 |       128 |                    5.243 | 190.739 ms     |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | True      | False          | False      |            1 |       128 |                     5.88 | 170.077 ms     |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | True      | True           | False      |            1 |       128 |                   15.444 | 64.752 ms      |
~~~


~~~
---------------- BatchSize 10 ------------
|    | Model              | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-large-uncased | False  | False     | False          | False      |           10 |       128 |                    1.588 | 6296.495 ms    |
|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-large-uncased | False  | True      | True           | False      |           10 |       128 |                    5.959 | 1678.104 ms    |
~~~


~~~
-------------- BatchSize 20 -------------
|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | False     | True           | False      |           20 |       128 |                    5.441 | 3675.675 ms    |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | True      | True           | False      |           20 |       128 |                   16.334 | 1224.473 ms    |
~~~


~~~
|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | False     | True           | False      |           10 |       128 |                    5.727 | 1746.223 ms    |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | True      | True           | False      |           10 |       128 |                   17.384 | 575.239 ms     |

~~~



~~~
|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | False     | True           | False      |          100 |       128 |                     5.15 | 19417.498 ms   |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | True      | True           | False      |          100 |       128 |                   18.638 | 5365.511 ms    |
~~~



#### origin bert

~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20
 
:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
   args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
 
config.json: 100%|██████████| 570/570 [00:00<00:00, 169kB/s]
model.safetensors: 100%|██████████| 440M/440M [00:12<00:00, 36.2MB/s] 
|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | False     | False          | False      |            1 |       128 |                     4.89 | 204.520 ms     |
~~~

#### ipex optimized bert

~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --ipex
 
:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
   args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --ipex
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
 
config.json: 100%|██████████| 570/570 [00:00<00:00, 175kB/s]
model.safetensors: 100%|██████████| 440M/440M [00:11<00:00, 38.7MB/s][W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())

/usr/local/lib/python3.8/dist-packages/intel_extension_for_pytorch/frontend.py:396: UserWarning: Conv BatchNorm folding failed during the optimize process.
  warnings.warn("Conv BatchNorm folding failed during the optimize process.")
/usr/local/lib/python3.8/dist-packages/intel_extension_for_pytorch/frontend.py:401: UserWarning: Linear BatchNorm folding failed during the optimize process.
  warnings.warn("Linear BatchNorm folding failed during the optimize process.")
|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | True   | False     | False          | False      |            1 |       128 |                    5.243 | 190.739 ms     |

~~~


#### bert operator
~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op
 
:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
   args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
 
config.json: 100%|██████████| 570/570 [00:00<00:00, 187kB/s]
model.safetensors: 100%|██████████| 440M/440M [00:12<00:00, 35.4MB/s] 
|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | True      | False          | False      |            1 |       128 |                     5.88 | 170.077 ms     |
~~~


#### bert operator (with quant)

~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant
 
:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release
   args: Using "$@" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: mkl -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
 
config.json: 100%|██████████| 570/570 [00:00<00:00, 159kB/s]
model.safetensors: 100%|██████████| 440M/440M [00:11<00:00, 39.1MB/s] 
|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | True      | True           | False      |            1 |       128 |                   15.444 | 64.752 ms      |

~~~




#### origin bert VS bert operator (with quant, batch-size 20)

*  batch-size=1 : 성능차이가 10~20% 차이이지만
*  batch-size=20 : 3배 정도 성능 차이난다. 
* `DNNL_CPU_RUNTIME=TBB|OMP` 는 큰 차이를 확인 못함 이론상 tbb는 thread num이 늘어나도 성능저하 없는게 특징 

~~~
❯ sudo docker run --rm --privileged bert-op-pytorch-demo-oneapi-tbb-onednn-v34pc numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --quant --batch-size 20
   config.json: 100%|██████████| 570/570 [00:00<00:00, 5.15MB/s]
model.safetensors: 100%|██████████| 440M/440M [00:07<00:00, 61.7MB/s] 
|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | False     | True           | False      |           20 |       128 |                    5.441 | 3675.675 ms    |

❯ sudo docker run --rm --privileged bert-op-pytorch-demo-oneapi-tbb-onednn-v34pc numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant --batch-size 20
config.json: 100%|██████████| 570/570 [00:00<00:00, 5.25MB/s]
model.safetensors: 100%|██████████| 440M/440M [00:06<00:00, 63.2MB/s] 
|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |
|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|
|  0 | bert-base-uncased | False  | True      | True           | False      |           20 |       128 |                   16.334 | 1224.473 ms    |

~~~

pip install  --index-url https://pypi.anaconda.org/intel/simple --extra-index-url https://pypi.org/simple

pip install dpnp numba-dpex dpctl intel-optimization-for-horovod==0.28.1.1 torch==2.0.1 torchvision==0.15.2 --extra-index-url=https://download.pytorch.org/whl/cpu intel_extension_for_pytorch==2.0.100 oneccl-bind-pt==2.0.0 --extra-index-url=https://pytorch-extension.intel.com/release-whl/stable/cpu/us/

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "bentomlx",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "",
    "author": "",
    "author_email": "kimsoungryoul <kimsoungryoul@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/f7/46/3f3c69030291462641375ed2ee8b4afd371e7837df7264e548e2f613e0d3/bentomlx-0.0.2.tar.gz",
    "platform": null,
    "description": "# bentoml-extensions [![pdm-managed](https://img.shields.io/badge/pdm-managed-blueviolet)](https://pdm-project.org) ![python](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12-blue)\n\n#### `todo`:  plan for 2024\n[[Project]bentoml-extensions alpha release ](https://github.com/users/KimSoungRyoul/projects/2)\n* FeatureStore Runner [ODM],\n* optimize cpu inference [ipex, ovms]\n\n\n## QuickStart\n\n~~~shell\npip install bentoml bentomlx\n~~~\n\ntodo ...\n\n\n\n## FeatureStore\n* `pip install bentomlx[featurestore-redis]`\n* `pip install bentomlx[featurestore-aerospike]`\n\n~~~Python\nimport logging\nfrom typing import Dict, TypedDict\n\nimport bentoml\nimport numpy as np\nfrom bentoml.io import JSON\n\n\nimport bentomlx\nfrom bentomlx.feature_repo import DBSettings\n\n\nclass IrisFeature(TypedDict, total=False):\n    pk: str\n    sepal_len: float | int\n    sepal_width: float\n    petal_len: float | int\n    petal_width: float\n\n\n# db_settings = DBSettings(namespace=\"test\", hosts=[\"127.0.0.1:3000\"], use_shared_connection=True)\ndb_settings = DBSettings()  # EXPORT ENV BENTOML_REPO_NAMESPACE=test; BENTOML_REPO__HOSTS=localhost:3000; BENTOML_REPO__USE_SHARED_CONNECTION=true\n\nrepo_runner = bentomlx.feature_repo.aerospike_fs(db_settings).to_repo_runner(entity_name=\"iris_features\", embedded=True)\n\niris_clf_runner = bentoml.sklearn.get(\"iris_clf:latest\").to_runner()\n\nsvc = bentoml.Service(\"iris_classifier_svc\", runners=[repo_runner, iris_clf_runner])\n\nlogger = logging.getLogger(\"bentoml\")\n\n\n@svc.api(\n    input=JSON.from_sample([\"pk1\", \"pk2\", \"pk3\"]),\n    output=JSON(),\n)\nasync def classify(feature_keys: list[str]) -> Dict[str, list[int]]:\n    # features: list[list[float]] = await repository.get_many.async_run(pks=feature_keys, _nokey=True) #  [[4.9, 3.0, 1.4, 0.2], [5.1 3.5 1.4 0.3], [5.5 2.5 4.  1.3]]\n    # features: list[IrisFeature] = repo_runner.get_many.run(pks=feature_keys) # input_arr = [{\"pk\": \"pk1\": \"sepal_len\":4.9,  \"sepal_width\":3.  \"petal_len\":1.4, \"petal_width\": 0.2], ... ]\n    features: np.array = repo_runner.get_many.run(pks=feature_keys, _numpy=True) # input_arr = np.array([[4.9, 3.0, 1.4, 0.2], [5.1 3.5 1.4 0.3], [5.5 2.5 4.  1.3]])\n    result: np.ndarray = await iris_clf_runner.predict.async_run(features)\n    return {\"result\": result.tolist()}\n\n~~~\n\n\n\n## CPU Optimized Runner\n  * `bentomlx[ipex]`\n  * `bentomlx[ovms]` `like a bentoml[triton]`\n\n~~~Python\nimport bentoml\nimport bentomlx\n\n\n#iris_clf_runner = bentoml.ipex.get(\"iris_clf:latest\").to_runner()\n# change like this\niris_clf_runner = bentomlx.pytorch.get(\"iris_clf:latest\").to_runner(intel_optimize=True)\nxxx_runner = bentomlx.transformers.get(\"xxx:latest\").to_runner(intel_optimize=True)\nxxx_tf_runner = bentomlx.tensorflow.get(\"xxx:latest\").to_runner(intel_optimize=True)\n\n\n# support only in bentoml-extension\n# model type such as ipex, tensorflow, onnx\nxxx_ov_runner = bentomlx.openvino.get(\"xxx:latest\").to_runner(intel_optimize=True)\n# or\nxxx_ov_runner = bentomlx.pytorch.get(\"xxx:latest\").to_runner(openvino=True, post_quant=True)\n\n# intel bert op\n# https://www.intel.com/content/www/us/en/developer/articles/guide/bert-ai-inference-amx-4th-gen-xeon-scalable.html\n# ?? need discussion about Out of ML serving framework responsibility\n#https://github.com/intel/light-model-transformer/tree/main/BERT\nxxx_ov_runner = bentomlx.experimental.light_model_transformer.bert.get(\"xxx:latest\").to_runner(post_quant=True,quant_dtype=torch.float32)\n\n~~~\n\n\n## Post(Runtime) Model Compression (oneapi nncl)\n  * post quant ?\n  * ...\n\n\n![\u1109\u1173\u110f\u1173\u1105\u1175\u11ab\u1109\u1163\u11ba 2023-11-27 \u110b\u1169\u1112\u116e 3 18 18](https://github.com/KimSoungRyoul/bentoml-extensions/assets/24240623/8b922a8f-99e6-4d69-a713-a03f3f7b0d27)\n\n\n\n\n~~~\n\u276f python fibo_main.py # mypyc\n0.17745399475097656\n0.1755237579345703\n0.17790436744689941\n0.18230915069580078\n\n\u276f python fibo_main.py # 3.11.7\n1.2891952991485596\n1.2943885326385498\n1.2915637493133545\n1.305750846862793\n\n\u276f pyenv global cinder-3.10-dev\n\u276f PYTHONJIT=1 python fibo_main.py\n2.9099485874176025\n2.918196678161621\n2.929981231689453\n2.9137821197509766\n\n\u276f pyenv global pypy3.10-7.3.15\n\u276f PYTHONJIT=1 python fibo_main.py\n0.8286490440368652\n0.8387455940246582\n0.8492231369018555\n0.84218430519104\n\n\n~~~\n\n\n\n\n\n\n#### BertOperator\n\n#### Batch-Size 1\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-large-uncased --warmup-time 5 --run-time 20\n \n:: initializing oneAPI environment ...\n   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n   args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-large-uncased --warmup-time 5 --run-time 20\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \n|    | Model              | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-large-uncased | False  | False     | False          | False      |            1 |       128 |                    0.878 | 1139.490 ms    |\n~~~\n\n\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-large-uncased --bert-op --warmup-time 5 --run-time 20 -q\n \n:: initializing oneAPI environment ...\n   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n   args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-large-uncased --bert-op --warmup-time 5 --run-time 20 -q\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \n|    | Model              | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-large-uncased | False  | True      | True           | False      |            1 |       128 |                    6.124 | 163.285 ms     |\n~~~\n\n\n#### Batch-Size 10\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-large-uncased --warmup-time 5 --run-time 20 --batch-size 10\n \n:: initializing oneAPI environment ...\n   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n   args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-large-uncased --warmup-time 5 --run-time 20 --batch-size 10\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \n|    | Model              | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-large-uncased | False  | False     | False          | False      |           10 |       128 |                    1.588 | 6296.495 ms    |\n\n~~~\n\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-large-uncased --bert-op  --warmup-time 5 --run-time 20 --batch-size 10 --quant\n \n:: initializing oneAPI environment ...\n   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n   args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-large-uncased --bert-op --warmup-time 5 --run-time 20 --batch-size 10 --quant\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \n|    | Model              | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-large-uncased | False  | True      | True           | False      |           10 |       128 |                    5.959 | 1678.104 ms    |\n\n~~~\n\n### bert-base-uncased\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20\n \n:: initializing oneAPI environment ...\n   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n   args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \nconfig.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 170kB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:12<00:00, 36.2MB/s]\n|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | False     | False          | False      |            1 |       128 |                    3.832 | 260.979 ms     |\n\n~~~\n\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant\n \n:: initializing oneAPI environment ...\n   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n   args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \nconfig.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 168kB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:11<00:00, 37.0MB/s] \n|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | True      | True           | False      |            1 |       128 |                   16.622 | 60.160 ms      |\n\n~~~\n\n\n### bert-base-uncased (batch-size 10)\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant --batch-size 10\n \n:: initializing oneAPI environment ...\n   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n   args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant --batch-size 10\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \nconfig.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 172kB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:12<00:00, 35.2MB/s]\n|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | True      | True           | False      |           10 |       128 |                   23.923 | 418.015 ms     |\n\n~~~\n\n\n### diff ( origin bert, ipex bert, bert operator)\n\n~~~\n---------------- BatchSize 1 ------------\n|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | False     | False          | False      |            1 |       128 |                     4.89 | 204.520 ms     |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | True   | False     | False          | False      |            1 |       128 |                    5.243 | 190.739 ms     |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | True      | False          | False      |            1 |       128 |                     5.88 | 170.077 ms     |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | True      | True           | False      |            1 |       128 |                   15.444 | 64.752 ms      |\n~~~\n\n\n~~~\n---------------- BatchSize 10 ------------\n|    | Model              | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-large-uncased | False  | False     | False          | False      |           10 |       128 |                    1.588 | 6296.495 ms    |\n|---:|:-------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-large-uncased | False  | True      | True           | False      |           10 |       128 |                    5.959 | 1678.104 ms    |\n~~~\n\n\n~~~\n-------------- BatchSize 20 -------------\n|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | False     | True           | False      |           20 |       128 |                    5.441 | 3675.675 ms    |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | True      | True           | False      |           20 |       128 |                   16.334 | 1224.473 ms    |\n~~~\n\n\n~~~\n|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | False     | True           | False      |           10 |       128 |                    5.727 | 1746.223 ms    |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | True      | True           | False      |           10 |       128 |                   17.384 | 575.239 ms     |\n\n~~~\n\n\n\n~~~\n|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | False     | True           | False      |          100 |       128 |                     5.15 | 19417.498 ms   |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | True      | True           | False      |          100 |       128 |                   18.638 | 5365.511 ms    |\n~~~\n\n\n\n#### origin bert\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20\n \n:: initializing oneAPI environment ...\n   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n   args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \nconfig.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 169kB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:12<00:00, 36.2MB/s] \n|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | False     | False          | False      |            1 |       128 |                     4.89 | 204.520 ms     |\n~~~\n\n#### ipex optimized bert\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --ipex\n \n:: initializing oneAPI environment ...\n   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n   args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --ipex\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \nconfig.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 175kB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:11<00:00, 38.7MB/s][W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())\n\n/usr/local/lib/python3.8/dist-packages/intel_extension_for_pytorch/frontend.py:396: UserWarning: Conv BatchNorm folding failed during the optimize process.\n  warnings.warn(\"Conv BatchNorm folding failed during the optimize process.\")\n/usr/local/lib/python3.8/dist-packages/intel_extension_for_pytorch/frontend.py:401: UserWarning: Linear BatchNorm folding failed during the optimize process.\n  warnings.warn(\"Linear BatchNorm folding failed during the optimize process.\")\n|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | True   | False     | False          | False      |            1 |       128 |                    5.243 | 190.739 ms     |\n\n~~~\n\n\n#### bert operator\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op\n \n:: initializing oneAPI environment ...\n   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n   args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \nconfig.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 187kB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:12<00:00, 35.4MB/s] \n|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | True      | False          | False      |            1 |       128 |                     5.88 | 170.077 ms     |\n~~~\n\n\n#### bert operator (with quant)\n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant\n \n:: initializing oneAPI environment ...\n   entrypoint.sh: BASH_VERSION = 5.0.17(1)-release\n   args: Using \"$@\" for setvars.sh arguments: numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant\n:: compiler -- latest\n:: debugger -- latest\n:: dev-utilities -- latest\n:: mkl -- latest\n:: tbb -- latest\n:: oneAPI environment initialized ::\n \nconfig.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 159kB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:11<00:00, 39.1MB/s] \n|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | True      | True           | False      |            1 |       128 |                   15.444 | 64.752 ms      |\n\n~~~\n\n\n\n\n#### origin bert VS bert operator (with quant, batch-size 20)\n\n*  batch-size=1 : \uc131\ub2a5\ucc28\uc774\uac00 10~20% \ucc28\uc774\uc774\uc9c0\ub9cc\n*  batch-size=20 : 3\ubc30 \uc815\ub3c4 \uc131\ub2a5 \ucc28\uc774\ub09c\ub2e4. \n* `DNNL_CPU_RUNTIME=TBB|OMP` \ub294 \ud070 \ucc28\uc774\ub97c \ud655\uc778 \ubabb\ud568 \uc774\ub860\uc0c1 tbb\ub294 thread num\uc774 \ub298\uc5b4\ub098\ub3c4 \uc131\ub2a5\uc800\ud558 \uc5c6\ub294\uac8c \ud2b9\uc9d5 \n\n~~~\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo-oneapi-tbb-onednn-v34pc numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --quant --batch-size 20\n   config.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 5.15MB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:07<00:00, 61.7MB/s] \n|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | False     | True           | False      |           20 |       128 |                    5.441 | 3675.675 ms    |\n\n\u276f sudo docker run --rm --privileged bert-op-pytorch-demo-oneapi-tbb-onednn-v34pc numactl --all -- -m bert-base-uncased --warmup-time 5 --run-time 20 --bert-op --quant --batch-size 20\nconfig.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 570/570 [00:00<00:00, 5.25MB/s]\nmodel.safetensors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 440M/440M [00:06<00:00, 63.2MB/s] \n|    | Model             | IPEX   | BERT Op   | Quantization   | BFloat16   |   Batch Size |   Seq Len |   Throughput [samples/s] | Latency [ms]   |\n|---:|:------------------|:-------|:----------|:---------------|:-----------|-------------:|----------:|-------------------------:|:---------------|\n|  0 | bert-base-uncased | False  | True      | True           | False      |           20 |       128 |                   16.334 | 1224.473 ms    |\n\n~~~\n\npip install  --index-url https://pypi.anaconda.org/intel/simple --extra-index-url https://pypi.org/simple\n\npip install dpnp numba-dpex dpctl intel-optimization-for-horovod==0.28.1.1 torch==2.0.1 torchvision==0.15.2 --extra-index-url=https://download.pytorch.org/whl/cpu intel_extension_for_pytorch==2.0.100 oneccl-bind-pt==2.0.0 --extra-index-url=https://pytorch-extension.intel.com/release-whl/stable/cpu/us/\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "",
    "version": "0.0.2",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "44304f47241e9a0e1d46dc512175b11aafe5edc16754bf4f53ad9e2e31d815d5",
                "md5": "b20b654248194e899931472466bed667",
                "sha256": "440355fef8f041fd24b4a67a6bb62b6a13f1e8455fc3fead584be875a75e711e"
            },
            "downloads": -1,
            "filename": "bentomlx-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b20b654248194e899931472466bed667",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 656364,
            "upload_time": "2024-03-10T15:22:22",
            "upload_time_iso_8601": "2024-03-10T15:22:22.678361Z",
            "url": "https://files.pythonhosted.org/packages/44/30/4f47241e9a0e1d46dc512175b11aafe5edc16754bf4f53ad9e2e31d815d5/bentomlx-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f7463f3c69030291462641375ed2ee8b4afd371e7837df7264e548e2f613e0d3",
                "md5": "b52a1f4517d2763c841d7c5e60a10311",
                "sha256": "c86a8074533874b72266dad895a9434275d3fae5d9af299af0be33cba9a4aaa7"
            },
            "downloads": -1,
            "filename": "bentomlx-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "b52a1f4517d2763c841d7c5e60a10311",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 647269,
            "upload_time": "2024-03-10T15:22:25",
            "upload_time_iso_8601": "2024-03-10T15:22:25.541374Z",
            "url": "https://files.pythonhosted.org/packages/f7/46/3f3c69030291462641375ed2ee8b4afd371e7837df7264e548e2f613e0d3/bentomlx-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-10 15:22:25",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "bentomlx"
}
        
Elapsed time: 2.56586s