# ramalama-stack
[](https://pypi.org/project/ramalama-stack/)
[](https://pypi.org/project/ramalama-stack/)
[](https://github.com/containers/ramalama-stack/blob/main/LICENSE)
An external provider for [Llama Stack](https://github.com/meta-llama/llama-stack) allowing for the use of [RamaLama](https://ramalama.ai/) for inference.
## Installing
You can install `ramalama-stack` from PyPI via `pip install ramalama-stack`
This will install Llama Stack and RamaLama as well if they are not installed already.
## Usage
> [!WARNING]
> The following workaround is currently needed to run this provider - see https://github.com/containers/ramalama-stack/issues/53 for more details
> ```bash
> curl --create-dirs --output ~/.llama/providers.d/remote/inference/ramalama.yaml https://raw.githubusercontent.com/containers/ramalama-stack/refs/tags/v0.2.5/src/ramalama_stack/providers.d/remote/inference/ramalama.yaml
> curl --create-dirs --output ~/.llama/distributions/ramalama/ramalama-run.yaml https://raw.githubusercontent.com/containers/ramalama-stack/refs/tags/v0.2.5/src/ramalama_stack/ramalama-run.yaml
> ```
1. First you will need a RamaLama server running - see [the RamaLama project](https://github.com/containers/ramalama) docs for more information.
2. Ensure you set your `INFERENCE_MODEL` environment variable to the name of the model you have running via RamaLama.
3. You can then run the RamaLama external provider via `llama stack run ~/.llama/distributions/ramalama/ramalama-run.yaml`
> [!NOTE]
> You can also run the RamaLama external provider inside of a container via [Podman](https://podman.io/)
> ```bash
> podman run \
> --net=host \
> --env RAMALAMA_URL=http://0.0.0.0:8080 \
> --env INFERENCE_MODEL=$INFERENCE_MODEL \
> quay.io/ramalama/llama-stack
> ```
This will start a Llama Stack server which will use port 8321 by default. You can test this works by configuring the Llama Stack Client to run against this server and
sending a test request.
- If your client is running on the same machine as the server, you can run `llama-stack-client configure --endpoint http://0.0.0.0:8321 --api-key none`
- If your client is running on a different machine, you can run `llama-stack-client configure --endpoint http://<hostname>:8321 --api-key none`
- The client should give you a message similar to `Done! You can now use the Llama Stack Client CLI with endpoint <endpoint>`
- You can then test the server by running `llama-stack-client inference chat-completion --message "tell me a joke"` which should return something like
```bash
ChatCompletionResponse(
completion_message=CompletionMessage(
content='A man walked into a library and asked the librarian, "Do you have any books on Pavlov\'s dogs
and Schrödinger\'s cat?" The librarian replied, "It rings a bell, but I\'m not sure if it\'s here or not."',
role='assistant',
stop_reason='end_of_turn',
tool_calls=[]
),
logprobs=None,
metrics=[
Metric(metric='prompt_tokens', value=14.0, unit=None),
Metric(metric='completion_tokens', value=63.0, unit=None),
Metric(metric='total_tokens', value=77.0, unit=None)
]
)
```
## Llama Stack User Interface
Llama Stack includes an experimental user-interface, check it out
[here](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/distribution/ui).
To deploy the UI, run this:
```bash
podman run -d --rm --network=container:ramalama --name=streamlit quay.io/redhat-et/streamlit_client:0.1.0
```
> [!NOTE]
> If running on MacOS (not Linux), `--network=host` doesn't work. You'll need to publish additional ports `8321:8321` and `8501:8501` with the ramalama serve command,
> then run with `network=container:ramalama`.
>
> If running on Linux use `--network=host` or `-p 8501:8501` instead. The streamlit container will be able to access the ramalama endpoint with either.
Raw data
{
"_id": null,
"home_page": null,
"name": "ramalama-stack",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "ramalama, llama, AI",
"author": "The RamaLama Stack Authors",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/17/5f/2a7a87b5802a82492628beb47fdbc515910b3c2365ad776921ff3a6ca0f1/ramalama_stack-0.2.5.tar.gz",
"platform": null,
"description": "# ramalama-stack\n\n[](https://pypi.org/project/ramalama-stack/)\n[](https://pypi.org/project/ramalama-stack/)\n[](https://github.com/containers/ramalama-stack/blob/main/LICENSE)\n\nAn external provider for [Llama Stack](https://github.com/meta-llama/llama-stack) allowing for the use of [RamaLama](https://ramalama.ai/) for inference.\n\n## Installing\n\nYou can install `ramalama-stack` from PyPI via `pip install ramalama-stack`\n\nThis will install Llama Stack and RamaLama as well if they are not installed already.\n\n## Usage\n\n> [!WARNING]\n> The following workaround is currently needed to run this provider - see https://github.com/containers/ramalama-stack/issues/53 for more details\n> ```bash\n> curl --create-dirs --output ~/.llama/providers.d/remote/inference/ramalama.yaml https://raw.githubusercontent.com/containers/ramalama-stack/refs/tags/v0.2.5/src/ramalama_stack/providers.d/remote/inference/ramalama.yaml\n> curl --create-dirs --output ~/.llama/distributions/ramalama/ramalama-run.yaml https://raw.githubusercontent.com/containers/ramalama-stack/refs/tags/v0.2.5/src/ramalama_stack/ramalama-run.yaml\n> ```\n\n1. First you will need a RamaLama server running - see [the RamaLama project](https://github.com/containers/ramalama) docs for more information.\n\n2. Ensure you set your `INFERENCE_MODEL` environment variable to the name of the model you have running via RamaLama.\n\n3. You can then run the RamaLama external provider via `llama stack run ~/.llama/distributions/ramalama/ramalama-run.yaml`\n\n> [!NOTE]\n> You can also run the RamaLama external provider inside of a container via [Podman](https://podman.io/)\n> ```bash\n> podman run \\\n> --net=host \\\n> --env RAMALAMA_URL=http://0.0.0.0:8080 \\\n> --env INFERENCE_MODEL=$INFERENCE_MODEL \\\n> quay.io/ramalama/llama-stack\n> ```\n\nThis will start a Llama Stack server which will use port 8321 by default. You can test this works by configuring the Llama Stack Client to run against this server and\nsending a test request.\n- If your client is running on the same machine as the server, you can run `llama-stack-client configure --endpoint http://0.0.0.0:8321 --api-key none`\n- If your client is running on a different machine, you can run `llama-stack-client configure --endpoint http://<hostname>:8321 --api-key none`\n- The client should give you a message similar to `Done! You can now use the Llama Stack Client CLI with endpoint <endpoint>`\n- You can then test the server by running `llama-stack-client inference chat-completion --message \"tell me a joke\"` which should return something like\n\n```bash\nChatCompletionResponse(\n completion_message=CompletionMessage(\n content='A man walked into a library and asked the librarian, \"Do you have any books on Pavlov\\'s dogs\nand Schr\u00f6dinger\\'s cat?\" The librarian replied, \"It rings a bell, but I\\'m not sure if it\\'s here or not.\"',\n role='assistant',\n stop_reason='end_of_turn',\n tool_calls=[]\n ),\n logprobs=None,\n metrics=[\n Metric(metric='prompt_tokens', value=14.0, unit=None),\n Metric(metric='completion_tokens', value=63.0, unit=None),\n Metric(metric='total_tokens', value=77.0, unit=None)\n ]\n)\n```\n\n## Llama Stack User Interface\n\nLlama Stack includes an experimental user-interface, check it out\n[here](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/distribution/ui).\n\nTo deploy the UI, run this:\n\n```bash\npodman run -d --rm --network=container:ramalama --name=streamlit quay.io/redhat-et/streamlit_client:0.1.0\n```\n\n> [!NOTE]\n> If running on MacOS (not Linux), `--network=host` doesn't work. You'll need to publish additional ports `8321:8321` and `8501:8501` with the ramalama serve command,\n> then run with `network=container:ramalama`.\n>\n> If running on Linux use `--network=host` or `-p 8501:8501` instead. The streamlit container will be able to access the ramalama endpoint with either.\n",
"bugtrack_url": null,
"license": null,
"summary": "An external provider for Llama Stack allowing for the use of RamaLama for inference.",
"version": "0.2.5",
"project_urls": {
"Issues": "https://github.com/containers/ramalama-stack/issues",
"Repository": "https://github.com/containers/ramalama-stack",
"homepage": "https://ramalama.ai"
},
"split_keywords": [
"ramalama",
" llama",
" ai"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "79f1632d723b3dc126a3980f4abad42f30840c5d7726a0ce23f95e7db5129d8f",
"md5": "3cc985d33ac525cd03b1d7777dd4885f",
"sha256": "96057559ce95205f32d058cb707c432bf0f11f219a6a8040b579477e7ac13891"
},
"downloads": -1,
"filename": "ramalama_stack-0.2.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3cc985d33ac525cd03b1d7777dd4885f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 17497,
"upload_time": "2025-07-09T19:39:27",
"upload_time_iso_8601": "2025-07-09T19:39:27.832563Z",
"url": "https://files.pythonhosted.org/packages/79/f1/632d723b3dc126a3980f4abad42f30840c5d7726a0ce23f95e7db5129d8f/ramalama_stack-0.2.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "175f2a7a87b5802a82492628beb47fdbc515910b3c2365ad776921ff3a6ca0f1",
"md5": "974da599f03bd35ba3e7c5fcca88c9ed",
"sha256": "25cd60b436e93f743def57b51cdb3af04d09150df26f3e10ec79c3fa0e8cca09"
},
"downloads": -1,
"filename": "ramalama_stack-0.2.5.tar.gz",
"has_sig": false,
"md5_digest": "974da599f03bd35ba3e7c5fcca88c9ed",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 159782,
"upload_time": "2025-07-09T19:39:29",
"upload_time_iso_8601": "2025-07-09T19:39:29.010408Z",
"url": "https://files.pythonhosted.org/packages/17/5f/2a7a87b5802a82492628beb47fdbc515910b3c2365ad776921ff3a6ca0f1/ramalama_stack-0.2.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-09 19:39:29",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "containers",
"github_project": "ramalama-stack",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "accelerate",
"specs": [
[
"==",
"1.7.0"
]
]
},
{
"name": "aiohappyeyeballs",
"specs": [
[
"==",
"2.6.1"
]
]
},
{
"name": "aiohttp",
"specs": [
[
"==",
"3.12.7"
]
]
},
{
"name": "aiosignal",
"specs": [
[
"==",
"1.3.2"
]
]
},
{
"name": "aiosqlite",
"specs": [
[
"==",
"0.21.0"
]
]
},
{
"name": "annotated-types",
"specs": [
[
"==",
"0.7.0"
]
]
},
{
"name": "anyio",
"specs": [
[
"==",
"4.9.0"
]
]
},
{
"name": "argcomplete",
"specs": [
[
"==",
"3.6.2"
]
]
},
{
"name": "asyncpg",
"specs": [
[
"==",
"0.30.0"
]
]
},
{
"name": "attrs",
"specs": [
[
"==",
"25.3.0"
]
]
},
{
"name": "autoevals",
"specs": [
[
"==",
"0.0.129"
]
]
},
{
"name": "blobfile",
"specs": [
[
"==",
"3.0.0"
]
]
},
{
"name": "braintrust-core",
"specs": [
[
"==",
"0.0.59"
]
]
},
{
"name": "certifi",
"specs": [
[
"==",
"2025.4.26"
]
]
},
{
"name": "chardet",
"specs": [
[
"==",
"5.2.0"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
"==",
"3.4.2"
]
]
},
{
"name": "chevron",
"specs": [
[
"==",
"0.14.0"
]
]
},
{
"name": "click",
"specs": [
[
"==",
"8.2.1"
]
]
},
{
"name": "colorama",
"specs": [
[
"==",
"0.4.6"
]
]
},
{
"name": "datasets",
"specs": [
[
"==",
"3.6.0"
]
]
},
{
"name": "deprecated",
"specs": [
[
"==",
"1.2.18"
]
]
},
{
"name": "dill",
"specs": [
[
"==",
"0.3.8"
]
]
},
{
"name": "distro",
"specs": [
[
"==",
"1.9.0"
]
]
},
{
"name": "ecdsa",
"specs": [
[
"==",
"0.19.1"
]
]
},
{
"name": "fastapi",
"specs": [
[
"==",
"0.115.12"
]
]
},
{
"name": "filelock",
"specs": [
[
"==",
"3.18.0"
]
]
},
{
"name": "fire",
"specs": [
[
"==",
"0.7.0"
]
]
},
{
"name": "frozenlist",
"specs": [
[
"==",
"1.6.0"
]
]
},
{
"name": "fsspec",
"specs": [
[
"==",
"2025.3.0"
]
]
},
{
"name": "googleapis-common-protos",
"specs": [
[
"==",
"1.70.0"
]
]
},
{
"name": "greenlet",
"specs": [
[
"==",
"3.2.2"
]
]
},
{
"name": "grpcio",
"specs": [
[
"==",
"1.67.1"
]
]
},
{
"name": "h11",
"specs": [
[
"==",
"0.16.0"
]
]
},
{
"name": "hf-xet",
"specs": [
[
"==",
"1.1.2"
]
]
},
{
"name": "httpcore",
"specs": [
[
"==",
"1.0.9"
]
]
},
{
"name": "httpx",
"specs": [
[
"==",
"0.28.1"
]
]
},
{
"name": "httpx-sse",
"specs": [
[
"==",
"0.4.0"
]
]
},
{
"name": "huggingface-hub",
"specs": [
[
"==",
"0.32.4"
]
]
},
{
"name": "idna",
"specs": [
[
"==",
"3.10"
]
]
},
{
"name": "importlib-metadata",
"specs": [
[
"==",
"8.6.1"
]
]
},
{
"name": "jinja2",
"specs": [
[
"==",
"3.1.6"
]
]
},
{
"name": "jiter",
"specs": [
[
"==",
"0.10.0"
]
]
},
{
"name": "joblib",
"specs": [
[
"==",
"1.5.1"
]
]
},
{
"name": "jsonschema",
"specs": [
[
"==",
"4.24.0"
]
]
},
{
"name": "jsonschema-specifications",
"specs": [
[
"==",
"2025.4.1"
]
]
},
{
"name": "llama-stack",
"specs": [
[
"==",
"0.2.14"
]
]
},
{
"name": "llama-stack-client",
"specs": [
[
"==",
"0.2.14"
]
]
},
{
"name": "lxml",
"specs": [
[
"==",
"5.4.0"
]
]
},
{
"name": "markdown-it-py",
"specs": [
[
"==",
"3.0.0"
]
]
},
{
"name": "markupsafe",
"specs": [
[
"==",
"3.0.2"
]
]
},
{
"name": "mcp",
"specs": [
[
"==",
"1.9.2"
]
]
},
{
"name": "mdurl",
"specs": [
[
"==",
"0.1.2"
]
]
},
{
"name": "milvus-lite",
"specs": [
[
"==",
"2.4.12"
]
]
},
{
"name": "mpmath",
"specs": [
[
"==",
"1.3.0"
]
]
},
{
"name": "multidict",
"specs": [
[
"==",
"6.4.4"
]
]
},
{
"name": "multiprocess",
"specs": [
[
"==",
"0.70.16"
]
]
},
{
"name": "networkx",
"specs": [
[
"==",
"3.5"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"2.2.6"
]
]
},
{
"name": "nvidia-cublas-cu12",
"specs": [
[
"==",
"12.6.4.1"
]
]
},
{
"name": "nvidia-cuda-cupti-cu12",
"specs": [
[
"==",
"12.6.80"
]
]
},
{
"name": "nvidia-cuda-nvrtc-cu12",
"specs": [
[
"==",
"12.6.77"
]
]
},
{
"name": "nvidia-cuda-runtime-cu12",
"specs": [
[
"==",
"12.6.77"
]
]
},
{
"name": "nvidia-cudnn-cu12",
"specs": [
[
"==",
"9.5.1.17"
]
]
},
{
"name": "nvidia-cufft-cu12",
"specs": [
[
"==",
"11.3.0.4"
]
]
},
{
"name": "nvidia-cufile-cu12",
"specs": [
[
"==",
"1.11.1.6"
]
]
},
{
"name": "nvidia-curand-cu12",
"specs": [
[
"==",
"10.3.7.77"
]
]
},
{
"name": "nvidia-cusolver-cu12",
"specs": [
[
"==",
"11.7.1.2"
]
]
},
{
"name": "nvidia-cusparse-cu12",
"specs": [
[
"==",
"12.5.4.2"
]
]
},
{
"name": "nvidia-cusparselt-cu12",
"specs": [
[
"==",
"0.6.3"
]
]
},
{
"name": "nvidia-nccl-cu12",
"specs": [
[
"==",
"2.26.2"
]
]
},
{
"name": "nvidia-nvjitlink-cu12",
"specs": [
[
"==",
"12.6.85"
]
]
},
{
"name": "nvidia-nvtx-cu12",
"specs": [
[
"==",
"12.6.77"
]
]
},
{
"name": "openai",
"specs": [
[
"==",
"1.84.0"
]
]
},
{
"name": "opentelemetry-api",
"specs": [
[
"==",
"1.33.1"
]
]
},
{
"name": "opentelemetry-exporter-otlp-proto-common",
"specs": [
[
"==",
"1.33.1"
]
]
},
{
"name": "opentelemetry-exporter-otlp-proto-http",
"specs": [
[
"==",
"1.33.1"
]
]
},
{
"name": "opentelemetry-proto",
"specs": [
[
"==",
"1.33.1"
]
]
},
{
"name": "opentelemetry-sdk",
"specs": [
[
"==",
"1.33.1"
]
]
},
{
"name": "opentelemetry-semantic-conventions",
"specs": [
[
"==",
"0.54b1"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"25.0"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.2.3"
]
]
},
{
"name": "peft",
"specs": [
[
"==",
"0.15.2"
]
]
},
{
"name": "pillow",
"specs": [
[
"==",
"11.2.1"
]
]
},
{
"name": "polyleven",
"specs": [
[
"==",
"0.9.0"
]
]
},
{
"name": "prompt-toolkit",
"specs": [
[
"==",
"3.0.51"
]
]
},
{
"name": "propcache",
"specs": [
[
"==",
"0.3.1"
]
]
},
{
"name": "protobuf",
"specs": [
[
"==",
"5.29.5"
]
]
},
{
"name": "psutil",
"specs": [
[
"==",
"7.0.0"
]
]
},
{
"name": "pyaml",
"specs": [
[
"==",
"25.5.0"
]
]
},
{
"name": "pyarrow",
"specs": [
[
"==",
"20.0.0"
]
]
},
{
"name": "pyasn1",
"specs": [
[
"==",
"0.6.1"
]
]
},
{
"name": "pycryptodomex",
"specs": [
[
"==",
"3.23.0"
]
]
},
{
"name": "pydantic",
"specs": [
[
"==",
"2.11.5"
]
]
},
{
"name": "pydantic-core",
"specs": [
[
"==",
"2.33.2"
]
]
},
{
"name": "pydantic-settings",
"specs": [
[
"==",
"2.9.1"
]
]
},
{
"name": "pygments",
"specs": [
[
"==",
"2.19.1"
]
]
},
{
"name": "pymilvus",
"specs": [
[
"==",
"2.5.10"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.9.0.post0"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
"==",
"1.1.0"
]
]
},
{
"name": "python-jose",
"specs": [
[
"==",
"3.5.0"
]
]
},
{
"name": "python-multipart",
"specs": [
[
"==",
"0.0.20"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2025.2"
]
]
},
{
"name": "pyyaml",
"specs": [
[
"==",
"6.0.2"
]
]
},
{
"name": "ramalama",
"specs": [
[
"==",
"0.10.1"
]
]
},
{
"name": "referencing",
"specs": [
[
"==",
"0.36.2"
]
]
},
{
"name": "regex",
"specs": [
[
"==",
"2024.11.6"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.32.3"
]
]
},
{
"name": "rich",
"specs": [
[
"==",
"14.0.0"
]
]
},
{
"name": "rpds-py",
"specs": [
[
"==",
"0.25.1"
]
]
},
{
"name": "rsa",
"specs": [
[
"==",
"4.9.1"
]
]
},
{
"name": "safetensors",
"specs": [
[
"==",
"0.5.3"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
"==",
"1.7.0"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.15.3"
]
]
},
{
"name": "sentence-transformers",
"specs": [
[
"==",
"4.1.0"
]
]
},
{
"name": "setuptools",
"specs": [
[
"==",
"80.9.0"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.17.0"
]
]
},
{
"name": "sniffio",
"specs": [
[
"==",
"1.3.1"
]
]
},
{
"name": "sqlalchemy",
"specs": [
[
"==",
"2.0.41"
]
]
},
{
"name": "sse-starlette",
"specs": [
[
"==",
"2.3.6"
]
]
},
{
"name": "starlette",
"specs": [
[
"==",
"0.46.2"
]
]
},
{
"name": "sympy",
"specs": [
[
"==",
"1.14.0"
]
]
},
{
"name": "termcolor",
"specs": [
[
"==",
"3.1.0"
]
]
},
{
"name": "threadpoolctl",
"specs": [
[
"==",
"3.6.0"
]
]
},
{
"name": "tiktoken",
"specs": [
[
"==",
"0.9.0"
]
]
},
{
"name": "tokenizers",
"specs": [
[
"==",
"0.21.1"
]
]
},
{
"name": "torch",
"specs": [
[
"==",
"2.7.0"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.67.1"
]
]
},
{
"name": "transformers",
"specs": [
[
"==",
"4.52.4"
]
]
},
{
"name": "triton",
"specs": [
[
"==",
"3.3.0"
]
]
},
{
"name": "trl",
"specs": [
[
"==",
"0.18.1"
]
]
},
{
"name": "typing-extensions",
"specs": [
[
"==",
"4.14.0"
]
]
},
{
"name": "typing-inspection",
"specs": [
[
"==",
"0.4.1"
]
]
},
{
"name": "tzdata",
"specs": [
[
"==",
"2025.2"
]
]
},
{
"name": "ujson",
"specs": [
[
"==",
"5.10.0"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"2.4.0"
]
]
},
{
"name": "uvicorn",
"specs": [
[
"==",
"0.34.3"
]
]
},
{
"name": "wcwidth",
"specs": [
[
"==",
"0.2.13"
]
]
},
{
"name": "wrapt",
"specs": [
[
"==",
"1.17.2"
]
]
},
{
"name": "xxhash",
"specs": [
[
"==",
"3.5.0"
]
]
},
{
"name": "yarl",
"specs": [
[
"==",
"1.20.0"
]
]
},
{
"name": "zipp",
"specs": [
[
"==",
"3.22.0"
]
]
}
],
"lcname": "ramalama-stack"
}