[![Unit Tests](https://github.com/google/JetStream/actions/workflows/unit_tests.yaml/badge.svg?branch=main)](https://github.com/google/JetStream/actions/workflows/unit_tests.yaml?query=branch:main)
[![PyPI version](https://badge.fury.io/py/google-jetstream.svg)](https://badge.fury.io/py/google-jetstream)
[![PyPi downloads](https://img.shields.io/pypi/dm/google-jetstream?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/google-jetstream/)
[![Contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)
# JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.
## About
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
## JetStream Engine Implementation
Currently, there are two reference engine implementations available -- one for Jax models and another for Pytorch models.
### Jax
- Git: https://github.com/google/maxtext
- README: https://github.com/google/JetStream/blob/main/docs/online-inference-with-maxtext-engine.md
### Pytorch
- Git: https://github.com/google/jetstream-pytorch
- README: https://github.com/google/jetstream-pytorch/blob/main/README.md
## Documentation
- [Online Inference with MaxText on v5e Cloud TPU VM](https://cloud.google.com/tpu/docs/tutorials/LLM/jetstream) [[README](https://github.com/google/JetStream/blob/main/docs/online-inference-with-maxtext-engine.md)]
- [Online Inference with Pytorch on v5e Cloud TPU VM](https://cloud.google.com/tpu/docs/tutorials/LLM/jetstream-pytorch) [[README](https://github.com/google/jetstream-pytorch/tree/main?tab=readme-ov-file#jetstream-pytorch)]
- [Serve Gemma using TPUs on GKE with JetStream](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-tpu-jetstream)
- [Observability in JetStream Server](https://github.com/google/JetStream/blob/main/docs/observability-prometheus-metrics-in-jetstream-server.md)
- [Profiling in JetStream Server](https://github.com/google/JetStream/blob/main/docs/profiling-with-jax-profiler-and-tensorboard.md)
- [JetStream Standalone Local Setup](#jetstream-standalone-local-setup)
# JetStream Standalone Local Setup
## Getting Started
### Setup
```
pip install -r requirements.txt
```
### Run local server & Testing
Use the following commands to run a server locally:
```
# Start a server
python -m jetstream.core.implementations.mock.server
# Test local mock server
python -m jetstream.tools.requester
# Load test local mock server
python -m jetstream.tools.load_tester
```
### Test core modules
```
# Test JetStream core orchestrator
python -m unittest -v jetstream.tests.core.test_orchestrator
# Test JetStream core server library
python -m unittest -v jetstream.tests.core.test_server
# Test mock JetStream engine implementation
python -m unittest -v jetstream.tests.engine.test_mock_engine
# Test mock JetStream token utils
python -m unittest -v jetstream.tests.engine.test_token_utils
python -m unittest -v jetstream.tests.engine.test_utils
```
Raw data
{
"_id": null,
"home_page": "https://github.com/google/JetStream",
"name": "google-jetstream",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": null,
"author": "Google LLC",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/e9/10/88c13224cdcabdd7e6a39352f9f345ccb0381aff252bee6341fd71dcc745/google_jetstream-0.2.2.tar.gz",
"platform": null,
"description": "[![Unit Tests](https://github.com/google/JetStream/actions/workflows/unit_tests.yaml/badge.svg?branch=main)](https://github.com/google/JetStream/actions/workflows/unit_tests.yaml?query=branch:main)\n[![PyPI version](https://badge.fury.io/py/google-jetstream.svg)](https://badge.fury.io/py/google-jetstream)\n[![PyPi downloads](https://img.shields.io/pypi/dm/google-jetstream?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/google-jetstream/)\n[![Contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)\n\n# JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.\n\n## About\n\nJetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).\n\n## JetStream Engine Implementation \n\nCurrently, there are two reference engine implementations available -- one for Jax models and another for Pytorch models.\n\n### Jax\n\n- Git: https://github.com/google/maxtext\n- README: https://github.com/google/JetStream/blob/main/docs/online-inference-with-maxtext-engine.md\n\n### Pytorch\n\n- Git: https://github.com/google/jetstream-pytorch \n- README: https://github.com/google/jetstream-pytorch/blob/main/README.md \n\n## Documentation\n- [Online Inference with MaxText on v5e Cloud TPU VM](https://cloud.google.com/tpu/docs/tutorials/LLM/jetstream) [[README](https://github.com/google/JetStream/blob/main/docs/online-inference-with-maxtext-engine.md)]\n- [Online Inference with Pytorch on v5e Cloud TPU VM](https://cloud.google.com/tpu/docs/tutorials/LLM/jetstream-pytorch) [[README](https://github.com/google/jetstream-pytorch/tree/main?tab=readme-ov-file#jetstream-pytorch)]\n- [Serve Gemma using TPUs on GKE with JetStream](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-tpu-jetstream)\n- [Observability in JetStream Server](https://github.com/google/JetStream/blob/main/docs/observability-prometheus-metrics-in-jetstream-server.md)\n- [Profiling in JetStream Server](https://github.com/google/JetStream/blob/main/docs/profiling-with-jax-profiler-and-tensorboard.md)\n- [JetStream Standalone Local Setup](#jetstream-standalone-local-setup)\n\n\n# JetStream Standalone Local Setup\n\n## Getting Started\n\n### Setup\n```\npip install -r requirements.txt\n```\n\n### Run local server & Testing\n\nUse the following commands to run a server locally:\n```\n# Start a server\npython -m jetstream.core.implementations.mock.server\n\n# Test local mock server\npython -m jetstream.tools.requester\n\n# Load test local mock server\npython -m jetstream.tools.load_tester\n\n```\n\n### Test core modules\n```\n# Test JetStream core orchestrator\npython -m unittest -v jetstream.tests.core.test_orchestrator\n\n# Test JetStream core server library\npython -m unittest -v jetstream.tests.core.test_server\n\n# Test mock JetStream engine implementation\npython -m unittest -v jetstream.tests.engine.test_mock_engine\n\n# Test mock JetStream token utils\npython -m unittest -v jetstream.tests.engine.test_token_utils\npython -m unittest -v jetstream.tests.engine.test_utils\n\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).",
"version": "0.2.2",
"project_urls": {
"Homepage": "https://github.com/google/JetStream"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7550b7d5ccf7cb3863718dfebe6641973ffd7720a8a4ed22ae4db45b7a2c2954",
"md5": "bbb1ee9717cb79e538c40cc848d4fa68",
"sha256": "d4372f6efbc9cfb7d88127d0c42f6efa89bdb754e2a943df2638fe077900606c"
},
"downloads": -1,
"filename": "google_jetstream-0.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bbb1ee9717cb79e538c40cc848d4fa68",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 72293,
"upload_time": "2024-05-31T18:45:02",
"upload_time_iso_8601": "2024-05-31T18:45:02.980437Z",
"url": "https://files.pythonhosted.org/packages/75/50/b7d5ccf7cb3863718dfebe6641973ffd7720a8a4ed22ae4db45b7a2c2954/google_jetstream-0.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e91088c13224cdcabdd7e6a39352f9f345ccb0381aff252bee6341fd71dcc745",
"md5": "d66ddc697be003bab7f825a1bdb422b2",
"sha256": "9ea3d238cbb2515cd21e2d2753453fdf505e2dc635b81cc159c08161fdad95ef"
},
"downloads": -1,
"filename": "google_jetstream-0.2.2.tar.gz",
"has_sig": false,
"md5_digest": "d66ddc697be003bab7f825a1bdb422b2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 51450,
"upload_time": "2024-05-31T18:45:04",
"upload_time_iso_8601": "2024-05-31T18:45:04.748909Z",
"url": "https://files.pythonhosted.org/packages/e9/10/88c13224cdcabdd7e6a39352f9f345ccb0381aff252bee6341fd71dcc745/google_jetstream-0.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-31 18:45:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "google",
"github_project": "JetStream",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "absl-py",
"specs": [
[
"==",
"1.4.0"
]
]
},
{
"name": "array-record",
"specs": [
[
"==",
"0.5.0"
]
]
},
{
"name": "astunparse",
"specs": [
[
"==",
"1.6.3"
]
]
},
{
"name": "blobfile",
"specs": [
[
"==",
"2.1.1"
]
]
},
{
"name": "cachetools",
"specs": [
[
"==",
"5.3.2"
]
]
},
{
"name": "certifi",
"specs": [
[
"==",
"2024.2.2"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
"==",
"3.3.2"
]
]
},
{
"name": "chex",
"specs": [
[
"==",
"0.1.7"
]
]
},
{
"name": "click",
"specs": [
[
"==",
"8.1.7"
]
]
},
{
"name": "clu",
"specs": [
[
"==",
"0.0.10"
]
]
},
{
"name": "contextlib2",
"specs": [
[
"==",
"21.6.0"
]
]
},
{
"name": "coverage",
"specs": [
[
"==",
"7.4.4"
]
]
},
{
"name": "dm-tree",
"specs": [
[
"==",
"0.1.8"
]
]
},
{
"name": "docstring-parser",
"specs": [
[
"==",
"0.15"
]
]
},
{
"name": "editdistance",
"specs": [
[
"==",
"0.6.2"
]
]
},
{
"name": "etils",
"specs": [
[
"==",
"1.6.0"
]
]
},
{
"name": "exceptiongroup",
"specs": [
[
"==",
"1.2.0"
]
]
},
{
"name": "filelock",
"specs": [
[
"==",
"3.14.0"
]
]
},
{
"name": "flatbuffers",
"specs": [
[
"==",
"23.5.26"
]
]
},
{
"name": "flax",
"specs": [
[
"==",
"0.8.0"
]
]
},
{
"name": "fsspec",
"specs": [
[
"==",
"2023.12.2"
]
]
},
{
"name": "gast",
"specs": [
[
"==",
"0.4.0"
]
]
},
{
"name": "google-auth",
"specs": [
[
"==",
"2.27.0"
]
]
},
{
"name": "google-auth-oauthlib",
"specs": [
[
"==",
"1.0.0"
]
]
},
{
"name": "google-pasta",
"specs": [
[
"==",
"0.2.0"
]
]
},
{
"name": "googleapis-common-protos",
"specs": [
[
"==",
"1.62.0"
]
]
},
{
"name": "grpcio",
"specs": [
[
"==",
"1.60.1"
]
]
},
{
"name": "h5py",
"specs": [
[
"==",
"3.10.0"
]
]
},
{
"name": "idna",
"specs": [
[
"==",
"3.7"
]
]
},
{
"name": "importlib-resources",
"specs": [
[
"==",
"6.1.1"
]
]
},
{
"name": "iniconfig",
"specs": [
[
"==",
"2.0.0"
]
]
},
{
"name": "jax",
"specs": [
[
"==",
"0.4.23"
]
]
},
{
"name": "jaxlib",
"specs": [
[
"==",
"0.4.23"
]
]
},
{
"name": "keras",
"specs": [
[
"==",
"2.13.1"
]
]
},
{
"name": "libclang",
"specs": [
[
"==",
"16.0.6"
]
]
},
{
"name": "lxml",
"specs": [
[
"==",
"4.9.4"
]
]
},
{
"name": "markdown",
"specs": [
[
"==",
"3.5.2"
]
]
},
{
"name": "markdown-it-py",
"specs": [
[
"==",
"3.0.0"
]
]
},
{
"name": "markupsafe",
"specs": [
[
"==",
"2.1.5"
]
]
},
{
"name": "mdurl",
"specs": [
[
"==",
"0.1.2"
]
]
},
{
"name": "ml-collections",
"specs": [
[
"==",
"0.1.1"
]
]
},
{
"name": "ml-dtypes",
"specs": [
[
"==",
"0.3.2"
]
]
},
{
"name": "msgpack",
"specs": [
[
"==",
"1.0.7"
]
]
},
{
"name": "nest-asyncio",
"specs": [
[
"==",
"1.6.0"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"1.23.1"
]
]
},
{
"name": "oauthlib",
"specs": [
[
"==",
"3.2.2"
]
]
},
{
"name": "opt-einsum",
"specs": [
[
"==",
"3.3.0"
]
]
},
{
"name": "optax",
"specs": [
[
"==",
"0.1.8"
]
]
},
{
"name": "orbax-checkpoint",
"specs": [
[
"==",
"0.5.2"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"23.2"
]
]
},
{
"name": "parameterized",
"specs": [
[
"==",
"0.9.0"
]
]
},
{
"name": "pluggy",
"specs": [
[
"==",
"1.4.0"
]
]
},
{
"name": "portpicker",
"specs": [
[
"==",
"1.6.0"
]
]
},
{
"name": "prometheus-client",
"specs": [
[
"==",
"0.20.0"
]
]
},
{
"name": "promise",
"specs": [
[
"==",
"2.3"
]
]
},
{
"name": "protobuf",
"specs": [
[
"==",
"3.20.3"
]
]
},
{
"name": "psutil",
"specs": [
[
"==",
"5.9.8"
]
]
},
{
"name": "pyasn1",
"specs": [
[
"==",
"0.5.1"
]
]
},
{
"name": "pyasn1-modules",
"specs": [
[
"==",
"0.3.0"
]
]
},
{
"name": "pycryptodomex",
"specs": [
[
"==",
"3.20.0"
]
]
},
{
"name": "pyglove",
"specs": [
[
"==",
"0.4.4"
]
]
},
{
"name": "pygments",
"specs": [
[
"==",
"2.17.2"
]
]
},
{
"name": "pytest",
"specs": [
[
"==",
"8.1.1"
]
]
},
{
"name": "pyyaml",
"specs": [
[
"==",
"6.0.1"
]
]
},
{
"name": "regex",
"specs": [
[
"==",
"2024.4.28"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.32.0"
]
]
},
{
"name": "requests-oauthlib",
"specs": [
[
"==",
"1.3.1"
]
]
},
{
"name": "rich",
"specs": [
[
"==",
"13.7.0"
]
]
},
{
"name": "rsa",
"specs": [
[
"==",
"4.9"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.12.0"
]
]
},
{
"name": "sentencepiece",
"specs": [
[
"==",
"0.1.99"
]
]
},
{
"name": "seqio",
"specs": [
[
"==",
"0.0.19"
]
]
},
{
"name": "shortuuid",
"specs": [
[
"==",
"1.0.13"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "tensorboard",
"specs": [
[
"==",
"2.13.0"
]
]
},
{
"name": "tensorboard-data-server",
"specs": [
[
"==",
"0.7.2"
]
]
},
{
"name": "tensorflow",
"specs": [
[
"==",
"2.13.1"
]
]
},
{
"name": "tensorflow-estimator",
"specs": [
[
"==",
"2.13.0"
]
]
},
{
"name": "tensorflow-hub",
"specs": [
[
"==",
"0.16.1"
]
]
},
{
"name": "tensorflow-io-gcs-filesystem",
"specs": [
[
"==",
"0.35.0"
]
]
},
{
"name": "tensorflow-metadata",
"specs": [
[
"==",
"1.14.0"
]
]
},
{
"name": "tensorflow-text",
"specs": [
[
"==",
"2.13.0"
]
]
},
{
"name": "tensorstore",
"specs": [
[
"==",
"0.1.52"
]
]
},
{
"name": "termcolor",
"specs": [
[
"==",
"2.4.0"
]
]
},
{
"name": "tf-keras",
"specs": [
[
"==",
"2.15.0"
]
]
},
{
"name": "tfds-nightly",
"specs": [
[
"==",
"4.9.2.dev202308090034"
]
]
},
{
"name": "tiktoken",
"specs": [
[
"==",
"0.6.0"
]
]
},
{
"name": "toml",
"specs": [
[
"==",
"0.10.2"
]
]
},
{
"name": "tomli",
"specs": [
[
"==",
"2.0.1"
]
]
},
{
"name": "toolz",
"specs": [
[
"==",
"0.12.1"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.66.3"
]
]
},
{
"name": "typing-extensions",
"specs": [
[
"==",
"4.5.0"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"2.2.0"
]
]
},
{
"name": "werkzeug",
"specs": [
[
"==",
"3.0.1"
]
]
},
{
"name": "wheel",
"specs": [
[
"==",
"0.42.0"
]
]
},
{
"name": "wrapt",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "zipp",
"specs": [
[
"==",
"3.17.0"
]
]
}
],
"lcname": "google-jetstream"
}