aana

Name	aana JSON
Version	0.2.4 JSON
	download
home_page	None
Summary	Multimodal SDK
upload_time	2025-02-17 12:42:35
maintainer	None
docs_url	None
author	Mobius Labs GmbH
requires_python	<4.0,>=3.10
license	Apache-2.0
keywords	multimodal ray serving video images audio llm vlm asr
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![Build Status](https://github.com/mobiusml/aana_sdk/actions/workflows/tests.yml/badge.svg)](https://github.com/mobiusml/aana_sdk/actions/workflows/tests.yml)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](/LICENSE)
[![Website](https://img.shields.io/badge/website-online-brightgreen.svg)](http://www.mobiuslabs.com)
[![Documentation](https://img.shields.io/website?label=documentation&up_message=online&url=https://mobiusml.github.io/aana_sdk/)](https://mobiusml.github.io/aana_sdk/) 
[![PyPI version](https://img.shields.io/pypi/v/aana.svg)](https://pypi.org/project/aana/)
[![GitHub release](https://img.shields.io/github/v/release/mobiusml/aana_sdk.svg)](https://github.com/mobiusml/aana_sdk/releases)

<p align="center">
  <picture>
    <source srcset="https://raw.githubusercontent.com/mobiusml/aana_sdk/main/docs/images/AanaSDK_logo_dark_theme.png" media="(prefers-color-scheme: dark)">
    <img src="https://raw.githubusercontent.com/mobiusml/aana_sdk/main/docs/images/AanaSDK_logo_light_theme.png" alt="Aana Logo">
  </picture>
</p>

# Aana

Aana SDK is a powerful framework for building multimodal applications. It facilitates the large-scale deployment of machine learning models, including those for vision, audio, and language, and supports Retrieval-Augmented Generation (RAG) systems. This enables the development of advanced applications such as search engines, recommendation systems, and data insights platforms.

The SDK is designed according to the following principles:

- **Reliability**: Aana is designed to be reliable and robust. It is built to be fault-tolerant and to handle failures gracefully.
- **Scalability**: Aana is designed to be scalable. It is built on top of Ray, a distributed computing framework, and can be easily scaled to multiple servers.
- **Efficiency**: Aana is designed to be efficient. It is built to be fast and parallel and to use resources efficiently.
- **Easy to Use**: Aana is designed to be easy to use by developers. It is built to be modular, with a lot of automation and abstraction.

The SDK is still in development, and not all features are fully implemented. We are constantly working on improving the SDK, and we welcome any feedback or suggestions.

Check out the [documentation](https://mobiusml.github.io/aana_sdk/) for more information.

## Why use Aana SDK?

Nowadays, it is getting easier to experiment with machine learning models and build prototypes. However, deploying these models at scale and integrating them into real-world applications is still a challenge. 

Aana SDK simplifies this process by providing a framework that allows:
- Deploy and scale machine learning models on a single machine or a cluster.
- Build multimodal applications that combine multiple different machine learning models.

### Key Features

- **Model Deployment**:
  - Deploy models on a single machine or scale them across a cluster.

- **API Generation**:
  - Automatically generate an API for your application based on the endpoints you define.
  - Input and output of the endpoints will be automatically validated.
  - Simply annotate the types of input and output of the endpoint functions.
- **Predefined Types**:
  - Comes with a set of predefined types for various data such as images, videos, etc.

- **Documentation Generation**:
  - Automatically generate documentation for your application based on the defined endpoints.

- **Streaming Support**:
  - Stream the output of the endpoint to the client as it is generated.
  - Ideal for real-time applications and Large Language Models (LLMs).

- **Task Queue Support**:
  - Run every endpoint you define as a task in the background without any changes to your code.
- **Integrations**:  
   - Aana SDK has integrations with various machine learning models and libraries: Whisper, vLLM, Hugging Face Transformers, Deepset Haystack, and more to come (for more information see [Integrations](https://mobiusml.github.io/aana_sdk/pages/integrations/)).

## Installation

### Installing via PyPI

To install Aana SDK via PyPI, you can use the following command:

```bash
pip install aana
```

By default `aana` installs only the core dependencies. The deployment-specific dependencies are not installed by default. You have two options:
- Install the dependencies manually. You will be prompted to install the dependencies when you try to use a deployment that requires them.
- Use extras to install all dependencies. Here are the available extras:
  - `all`: Install dependencies for all deployments.
  - `vllm`: Install dependencies for the vLLM deployment.
  - `asr`: Install dependencies for the Automatic Speech Recognition (Whisper) deployment and other ASR models (diarization, voice activity detection, etc.).
  - `transformers`: Install dependencies for the Hugging Face Transformers deployment. There are multiple deployments that use Transformers.
  - `hqq`: Install dependencies for Half-Quadratic Quantization (HQQ) deployment.

For example, to install all dependencies, you can use the following command:

```bash 
pip install aana[all]
```

For optimal performance install [PyTorch](https://pytorch.org/get-started/locally/) version >=2.1 appropriate for your system. You can skip it, but it will install a default version that may not make optimal use of your system's resources, for example, a GPU or even some SIMD operations. Therefore we recommend choosing your PyTorch package carefully and installing it manually.

Some models use Flash Attention. Install Flash Attention library for better performance. See [flash attention installation instructions](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features) for more details and supported GPUs.

### Installing from GitHub

1. Clone the repository.

```bash
git clone https://github.com/mobiusml/aana_sdk.git
```

2. Install additional libraries.

For optimal performance install [PyTorch](https://pytorch.org/get-started/locally/) version >=2.1 appropriate for your system. You can continue directly to the next step, but it will install a default version that may not make optimal use of your system's resources, for example, a GPU or even some SIMD operations. Therefore we recommend choosing your PyTorch package carefully and installing it manually.

Some models use Flash Attention. Install Flash Attention library for better performance. See [flash attention installation instructions](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features) for more details and supported GPUs.

3. Install the package with poetry.

The project is managed with [Poetry](https://python-poetry.org/docs/). See the [Poetry installation instructions](https://python-poetry.org/docs/#installation) on how to install it on your system. Use poetry >= 2.0 for the best experience.

It will install the package in the virtual environment created by Poetry. Add `--extras all` to install all extra dependencies.

```bash
poetry install --extras all
```

For the development environment, it is recommended to install all extras and tests and dev dependencies. You can do this by running the following command:

```bash
poetry install --extras all --with dev,tests
```

## Getting Started

### Creating a New Application

You can quickly develop multimodal applications using Aana SDK's intuitive APIs and components.

If you want to start building a new application, you can use the following GitHub template: [Aana App Template](https://github.com/mobiusml/aana_app_template). It will help you get started with the Aana SDK and provide you with a basic structure for your application and its dependencies.

Let's create a simple application that transcribes a video. The application will download a video from YouTube, extract the audio, and transcribe it using an ASR model.

Aana SDK already provides a deployment for ASR (Automatic Speech Recognition) based on the Whisper model. We will use this [deployment](#Deployments) in the example.

```python
from aana.api.api_generation import Endpoint
from aana.core.models.video import VideoInput
from aana.deployments.aana_deployment_handle import AanaDeploymentHandle
from aana.deployments.whisper_deployment import (
    WhisperComputeType,
    WhisperConfig,
    WhisperDeployment,
    WhisperModelSize,
    WhisperOutput,
)
from aana.integrations.external.yt_dlp import download_video
from aana.processors.remote import run_remote
from aana.processors.video import extract_audio
from aana.sdk import AanaSDK


# Define the model deployments.
asr_deployment = WhisperDeployment.options(
    num_replicas=1,
    ray_actor_options={"num_gpus": 0.25}, # Remove this line if you want to run Whisper on a CPU.
    user_config=WhisperConfig(
        model_size=WhisperModelSize.MEDIUM,
        compute_type=WhisperComputeType.FLOAT16,
    ).model_dump(mode="json"),
)
deployments = [{"name": "asr_deployment", "instance": asr_deployment}]


# Define the endpoint to transcribe the video.
class TranscribeVideoEndpoint(Endpoint):
    """Transcribe video endpoint."""

    async def initialize(self):
        """Initialize the endpoint."""
        self.asr_handle = await AanaDeploymentHandle.create("asr_deployment")
        await super().initialize()

    async def run(self, video: VideoInput) -> WhisperOutput:
        """Transcribe video."""
        video_obj = await run_remote(download_video)(video_input=video)
        audio = extract_audio(video=video_obj)
        transcription = await self.asr_handle.transcribe(audio=audio)
        return transcription

endpoints = [
    {
        "name": "transcribe_video",
        "path": "/video/transcribe",
        "summary": "Transcribe a video",
        "endpoint_cls": TranscribeVideoEndpoint,
    },
]

aana_app = AanaSDK(name="transcribe_video_app")

for deployment in deployments:
    aana_app.register_deployment(**deployment)

for endpoint in endpoints:
    aana_app.register_endpoint(**endpoint)

if __name__ == "__main__":
    aana_app.connect(host="127.0.0.1", port=8000, show_logs=False)  # Connects to the Ray cluster or starts a new one.
    aana_app.migrate()                                              # Runs the migrations to create the database tables.
    aana_app.deploy(blocking=True)                                  # Deploys the application.
```

You have a few options to run the application:
- Copy the code above and run it in a Jupyter notebook.
- Save the code to a Python file, for example `app.py`, and run it as a Python script: `python app.py`.
- Save the code to a Python file, for example `app.py`, and run it using the Aana CLI: `aana deploy app:aana_app --host 127.0.0.1 --port 8000 --hide-logs`.

Once the application is running, you will see the message `Deployed successfully.` in the logs. You can now send a request to the application to transcribe a video.

To get an overview of the Ray cluster, you can use the Ray Dashboard. The Ray Dashboard is available at `http://127.0.0.1:8265` by default. You can see the status of the Ray cluster, the resources used, running applications and deployments, logs, and more. It is a useful tool for monitoring and debugging your applications. See [Ray Dashboard documentation](https://docs.ray.io/en/latest/ray-observability/getting-started.html) for more information.

Let's transcribe [Gordon Ramsay's perfect scrambled eggs tutorial](https://www.youtube.com/watch?v=VhJFyyukAzA) using the application.

```bash
curl -X POST http://127.0.0.1:8000/video/transcribe -Fbody='{"video":{"url":"https://www.youtube.com/watch?v=VhJFyyukAzA"}}'
```

This will return the full transcription of the video, transcription for each segment, and transcription info like identified language. You can also use the [Swagger UI](http://127.0.0.1:8000/docs) to send the request.

### Running Example Applications

We provide a few example applications that demonstrate the capabilities of Aana SDK.

- [Chat with Video](https://github.com/mobiusml/aana_chat_with_video): A multimodal chat application that allows users to upload a video and ask questions about the video content based on the visual and audio information. See [Chat with Video Demo notebook](https://github.com/mobiusml/aana_chat_with_video/blob/main/notebooks/chat_with_video_demo.ipynb) to see how to use the application.
- [Summarize Video](https://github.com/mobiusml/aana_summarize_video): An Aana application that summarizes a video by extracting transcription from the audio and generating a summary using a Language Model (LLM). This application is a part of the [tutorial](https://mobiusml.github.io/aana_sdk/pages/tutorial/) on how to build multimodal applications with Aana SDK.

See the README files of the applications for more information on how to install and run them.

The full list of example applications is available in the [Aana Examples](https://github.com/mobiusml/aana_examples) repository. You can use these examples as a starting point for building your own applications.

### Main components

There are three main components in Aana SDK: deployments, endpoints, and AanaSDK.

#### Deployments

Deployments are the building blocks of Aana SDK. They represent the machine learning models that you want to deploy. Aana SDK comes with a set of predefined deployments that you can use or you can define your own deployments. See [Integrations](https://mobiusml.github.io/aana_sdk/pages/integrations/) for more information about predefined deployments.

Each deployment has a main class that defines it and a configuration class that allows you to specify the deployment parameters.

For example, we have a predefined deployment for the Whisper model that allows you to transcribe audio. You can define the deployment like this:

```python
from aana.deployments.whisper_deployment import WhisperDeployment, WhisperConfig, WhisperModelSize, WhisperComputeType

asr_deployment = WhisperDeployment.options(
    num_replicas=1,
    ray_actor_options={"num_gpus": 0.25},
    user_config=WhisperConfig(model_size=WhisperModelSize.MEDIUM, compute_type=WhisperComputeType.FLOAT16).model_dump(mode="json"),
)
```

See [Model Hub](https://mobiusml.github.io/aana_sdk/pages/model_hub/) for a collection of configurations for different models that can be used with the predefined deployments.

#### Endpoints

Endpoints define the functionality of your application. They allow you to connect multiple deployments (models) to each other and define the input and output of your application.

Each endpoint is defined as a class that inherits from the `Endpoint` class. The class has two main methods: `initialize` and `run`.

For example, you can define an endpoint that transcribes a video like this:

```python
class TranscribeVideoEndpoint(Endpoint):
    """Transcribe video endpoint."""

    async def initialize(self):
        """Initialize the endpoint."""
        await super().initialize()
        self.asr_handle = await AanaDeploymentHandle.create("asr_deployment")

    async def run(self, video: VideoInput) -> WhisperOutput:
        """Transcribe video."""
        video_obj = await run_remote(download_video)(video_input=video)
        audio = extract_audio(video=video_obj)
        transcription = await self.asr_handle.transcribe(audio=audio)
        return transcription
```

#### AanaSDK

AanaSDK is the main class that you use to build your application. It allows you to deploy the deployments and endpoints you defined and start the application.

For example, you can define an application that transcribes a video like this:

```python
aana_app = AanaSDK(name="transcribe_video_app")

aana_app.register_deployment(name="asr_deployment", instance=asr_deployment)
aana_app.register_endpoint(
    name="transcribe_video",
    path="/video/transcribe",
    summary="Transcribe a video",
    endpoint_cls=TranscribeVideoEndpoint,
)

aana_app.connect()  # Connects to the Ray cluster or starts a new one.
aana_app.migrate()  # Runs the migrations to create the database tables.
aana_app.deploy()   # Deploys the application.
```

All you need to do is define the deployments and endpoints you want to use in your application, and Aana SDK will take care of the rest.


## API

Aana SDK uses form data for API requests, which allows sending both binary data and structured fields in a single request. The request body is sent as a JSON string in the `body` field, and any binary data is sent as files.

### Making API Requests

You can send requests to the SDK endpoints with only structured data or a combination of structured data and binary data.

#### Only Structured Data
When your request includes only structured data, you can send it as a JSON string in the `body` field.

- **cURL Example:**
    ```bash
    curl http://127.0.0.1:8000/endpoint \
        -F body='{"input": "data", "param": "value"}'
    ```

- **Python Example:**
    ```python
    import json, requests

    url = "http://127.0.0.1:8000/endpoint"
    body = {
        "input": "data",
        "param": "value"
    }

    response = requests.post(
        url,
        data={"body": json.dumps(body)}
    )

    print(response.json())
    ```

#### With Binary Data
When your request includes binary files (images, audio, etc.), you can send them as files in the request and include the names of the files in the `body` field as a reference.

For example, if you want to send an image, you can use [`aana.core.models.image.ImageInput`](https://mobiusml.github.io/aana_sdk/reference/models/media/#aana.core.models.ImageInput) as the input type that supports binary data upload. The `content` field in the input type should be set to the name of the file you are sending. 

- **cURL Example:**
    ```bash
    curl http://127.0.0.1:8000/process_images \
        -H "Content-Type: multipart/form-data" \
        -F body='{"image": {"content": "file1"}}' \
        -F file1="@image.jpeg"
    ```

- **Python Example:**
    ```python
    import json, requests 

    url = "http://127.0.0.1:8000/process_images"
    body = {
        "image": {"content": "file1"}
    }
    with open("image.jpeg", "rb") as file:
        files = {"file1": file}

        response = requests.post(
            url,
            data={"body": json.dumps(body)},
            files=files
        )

        print(response.text)
    ```

## Serve Config Files

The [Serve Config Files](https://docs.ray.io/en/latest/serve/production-guide/config.html#serve-config-files) is the recommended way to deploy and update your applications in production. Aana SDK provides a way to build the Serve Config Files for the Aana applications. See the [Serve Config Files documentation](https://mobiusml.github.io/aana_sdk/pages/serve_config_files/) on how to build and deploy the applications using the Serve Config Files.


## Run with Docker

You can deploy example applications using Docker. See the [documentation on how to run Aana SDK with Docker](https://mobiusml.github.io/aana_sdk/pages/docker/).

## Documentation

For more information on how to use Aana SDK, see the [documentation](https://mobiusml.github.io/aana_sdk/).

## License

Aana SDK is licensed under the [Apache License 2.0](https://github.com/mobiusml/aana_sdk?tab=Apache-2.0-1-ov-file#readme). Commercial licensing options are also available.

## Contributing

We welcome contributions from the community to enhance Aana SDK's functionality and usability. Feel free to open issues for bug reports, feature requests, or submit pull requests to contribute code improvements.

Check out the [Development Documentation](https://mobiusml.github.io/aana_sdk/pages/code_overview/) for more information on how to contribute.

We have adopted the [Contributor Covenant](https://www.contributor-covenant.org/) as our code of conduct.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "aana",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "multimodal, ray, serving, video, images, audio, llm, vlm, asr",
    "author": "Mobius Labs GmbH",
    "author_email": "dev@mobiuslabs.com",
    "download_url": "https://files.pythonhosted.org/packages/f2/3f/1e20f571b35b2b73dc79634ea4aa409234dd97cd37ce362e80da63c8d9eb/aana-0.2.4.tar.gz",
    "platform": null,
    "description": "[![Build Status](https://github.com/mobiusml/aana_sdk/actions/workflows/tests.yml/badge.svg)](https://github.com/mobiusml/aana_sdk/actions/workflows/tests.yml)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](/LICENSE)\n[![Website](https://img.shields.io/badge/website-online-brightgreen.svg)](http://www.mobiuslabs.com)\n[![Documentation](https://img.shields.io/website?label=documentation&up_message=online&url=https://mobiusml.github.io/aana_sdk/)](https://mobiusml.github.io/aana_sdk/) \n[![PyPI version](https://img.shields.io/pypi/v/aana.svg)](https://pypi.org/project/aana/)\n[![GitHub release](https://img.shields.io/github/v/release/mobiusml/aana_sdk.svg)](https://github.com/mobiusml/aana_sdk/releases)\n\n<p align=\"center\">\n  <picture>\n    <source srcset=\"https://raw.githubusercontent.com/mobiusml/aana_sdk/main/docs/images/AanaSDK_logo_dark_theme.png\" media=\"(prefers-color-scheme: dark)\">\n    <img src=\"https://raw.githubusercontent.com/mobiusml/aana_sdk/main/docs/images/AanaSDK_logo_light_theme.png\" alt=\"Aana Logo\">\n  </picture>\n</p>\n\n# Aana\n\nAana SDK is a powerful framework for building multimodal applications. It facilitates the large-scale deployment of machine learning models, including those for vision, audio, and language, and supports Retrieval-Augmented Generation (RAG) systems. This enables the development of advanced applications such as search engines, recommendation systems, and data insights platforms.\n\nThe SDK is designed according to the following principles:\n\n- **Reliability**: Aana is designed to be reliable and robust. It is built to be fault-tolerant and to handle failures gracefully.\n- **Scalability**: Aana is designed to be scalable. It is built on top of Ray, a distributed computing framework, and can be easily scaled to multiple servers.\n- **Efficiency**: Aana is designed to be efficient. It is built to be fast and parallel and to use resources efficiently.\n- **Easy to Use**: Aana is designed to be easy to use by developers. It is built to be modular, with a lot of automation and abstraction.\n\nThe SDK is still in development, and not all features are fully implemented. We are constantly working on improving the SDK, and we welcome any feedback or suggestions.\n\nCheck out the [documentation](https://mobiusml.github.io/aana_sdk/) for more information.\n\n## Why use Aana SDK?\n\nNowadays, it is getting easier to experiment with machine learning models and build prototypes. However, deploying these models at scale and integrating them into real-world applications is still a challenge. \n\nAana SDK simplifies this process by providing a framework that allows:\n- Deploy and scale machine learning models on a single machine or a cluster.\n- Build multimodal applications that combine multiple different machine learning models.\n\n### Key Features\n\n- **Model Deployment**:\n  - Deploy models on a single machine or scale them across a cluster.\n\n- **API Generation**:\n  - Automatically generate an API for your application based on the endpoints you define.\n  - Input and output of the endpoints will be automatically validated.\n  - Simply annotate the types of input and output of the endpoint functions.\n- **Predefined Types**:\n  - Comes with a set of predefined types for various data such as images, videos, etc.\n\n- **Documentation Generation**:\n  - Automatically generate documentation for your application based on the defined endpoints.\n\n- **Streaming Support**:\n  - Stream the output of the endpoint to the client as it is generated.\n  - Ideal for real-time applications and Large Language Models (LLMs).\n\n- **Task Queue Support**:\n  - Run every endpoint you define as a task in the background without any changes to your code.\n- **Integrations**:  \n   - Aana SDK has integrations with various machine learning models and libraries: Whisper, vLLM, Hugging Face Transformers, Deepset Haystack, and more to come (for more information see [Integrations](https://mobiusml.github.io/aana_sdk/pages/integrations/)).\n\n## Installation\n\n### Installing via PyPI\n\nTo install Aana SDK via PyPI, you can use the following command:\n\n```bash\npip install aana\n```\n\nBy default `aana` installs only the core dependencies. The deployment-specific dependencies are not installed by default. You have two options:\n- Install the dependencies manually. You will be prompted to install the dependencies when you try to use a deployment that requires them.\n- Use extras to install all dependencies. Here are the available extras:\n  - `all`: Install dependencies for all deployments.\n  - `vllm`: Install dependencies for the vLLM deployment.\n  - `asr`: Install dependencies for the Automatic Speech Recognition (Whisper) deployment and other ASR models (diarization, voice activity detection, etc.).\n  - `transformers`: Install dependencies for the Hugging Face Transformers deployment. There are multiple deployments that use Transformers.\n  - `hqq`: Install dependencies for Half-Quadratic Quantization (HQQ) deployment.\n\nFor example, to install all dependencies, you can use the following command:\n\n```bash \npip install aana[all]\n```\n\nFor optimal performance install [PyTorch](https://pytorch.org/get-started/locally/) version >=2.1 appropriate for your system. You can skip it, but it will install a default version that may not make optimal use of your system's resources, for example, a GPU or even some SIMD operations. Therefore we recommend choosing your PyTorch package carefully and installing it manually.\n\nSome models use Flash Attention. Install Flash Attention library for better performance. See [flash attention installation instructions](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features) for more details and supported GPUs.\n\n### Installing from GitHub\n\n1. Clone the repository.\n\n```bash\ngit clone https://github.com/mobiusml/aana_sdk.git\n```\n\n2. Install additional libraries.\n\nFor optimal performance install [PyTorch](https://pytorch.org/get-started/locally/) version >=2.1 appropriate for your system. You can continue directly to the next step, but it will install a default version that may not make optimal use of your system's resources, for example, a GPU or even some SIMD operations. Therefore we recommend choosing your PyTorch package carefully and installing it manually.\n\nSome models use Flash Attention. Install Flash Attention library for better performance. See [flash attention installation instructions](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features) for more details and supported GPUs.\n\n3. Install the package with poetry.\n\nThe project is managed with [Poetry](https://python-poetry.org/docs/). See the [Poetry installation instructions](https://python-poetry.org/docs/#installation) on how to install it on your system. Use poetry >= 2.0 for the best experience.\n\nIt will install the package in the virtual environment created by Poetry. Add `--extras all` to install all extra dependencies.\n\n```bash\npoetry install --extras all\n```\n\nFor the development environment, it is recommended to install all extras and tests and dev dependencies. You can do this by running the following command:\n\n```bash\npoetry install --extras all --with dev,tests\n```\n\n## Getting Started\n\n### Creating a New Application\n\nYou can quickly develop multimodal applications using Aana SDK's intuitive APIs and components.\n\nIf you want to start building a new application, you can use the following GitHub template: [Aana App Template](https://github.com/mobiusml/aana_app_template). It will help you get started with the Aana SDK and provide you with a basic structure for your application and its dependencies.\n\nLet's create a simple application that transcribes a video. The application will download a video from YouTube, extract the audio, and transcribe it using an ASR model.\n\nAana SDK already provides a deployment for ASR (Automatic Speech Recognition) based on the Whisper model. We will use this [deployment](#Deployments) in the example.\n\n```python\nfrom aana.api.api_generation import Endpoint\nfrom aana.core.models.video import VideoInput\nfrom aana.deployments.aana_deployment_handle import AanaDeploymentHandle\nfrom aana.deployments.whisper_deployment import (\n    WhisperComputeType,\n    WhisperConfig,\n    WhisperDeployment,\n    WhisperModelSize,\n    WhisperOutput,\n)\nfrom aana.integrations.external.yt_dlp import download_video\nfrom aana.processors.remote import run_remote\nfrom aana.processors.video import extract_audio\nfrom aana.sdk import AanaSDK\n\n\n# Define the model deployments.\nasr_deployment = WhisperDeployment.options(\n    num_replicas=1,\n    ray_actor_options={\"num_gpus\": 0.25}, # Remove this line if you want to run Whisper on a CPU.\n    user_config=WhisperConfig(\n        model_size=WhisperModelSize.MEDIUM,\n        compute_type=WhisperComputeType.FLOAT16,\n    ).model_dump(mode=\"json\"),\n)\ndeployments = [{\"name\": \"asr_deployment\", \"instance\": asr_deployment}]\n\n\n# Define the endpoint to transcribe the video.\nclass TranscribeVideoEndpoint(Endpoint):\n    \"\"\"Transcribe video endpoint.\"\"\"\n\n    async def initialize(self):\n        \"\"\"Initialize the endpoint.\"\"\"\n        self.asr_handle = await AanaDeploymentHandle.create(\"asr_deployment\")\n        await super().initialize()\n\n    async def run(self, video: VideoInput) -> WhisperOutput:\n        \"\"\"Transcribe video.\"\"\"\n        video_obj = await run_remote(download_video)(video_input=video)\n        audio = extract_audio(video=video_obj)\n        transcription = await self.asr_handle.transcribe(audio=audio)\n        return transcription\n\nendpoints = [\n    {\n        \"name\": \"transcribe_video\",\n        \"path\": \"/video/transcribe\",\n        \"summary\": \"Transcribe a video\",\n        \"endpoint_cls\": TranscribeVideoEndpoint,\n    },\n]\n\naana_app = AanaSDK(name=\"transcribe_video_app\")\n\nfor deployment in deployments:\n    aana_app.register_deployment(**deployment)\n\nfor endpoint in endpoints:\n    aana_app.register_endpoint(**endpoint)\n\nif __name__ == \"__main__\":\n    aana_app.connect(host=\"127.0.0.1\", port=8000, show_logs=False)  # Connects to the Ray cluster or starts a new one.\n    aana_app.migrate()                                              # Runs the migrations to create the database tables.\n    aana_app.deploy(blocking=True)                                  # Deploys the application.\n```\n\nYou have a few options to run the application:\n- Copy the code above and run it in a Jupyter notebook.\n- Save the code to a Python file, for example `app.py`, and run it as a Python script: `python app.py`.\n- Save the code to a Python file, for example `app.py`, and run it using the Aana CLI: `aana deploy app:aana_app --host 127.0.0.1 --port 8000 --hide-logs`.\n\nOnce the application is running, you will see the message `Deployed successfully.` in the logs. You can now send a request to the application to transcribe a video.\n\nTo get an overview of the Ray cluster, you can use the Ray Dashboard. The Ray Dashboard is available at `http://127.0.0.1:8265` by default. You can see the status of the Ray cluster, the resources used, running applications and deployments, logs, and more. It is a useful tool for monitoring and debugging your applications. See [Ray Dashboard documentation](https://docs.ray.io/en/latest/ray-observability/getting-started.html) for more information.\n\nLet's transcribe [Gordon Ramsay's perfect scrambled eggs tutorial](https://www.youtube.com/watch?v=VhJFyyukAzA) using the application.\n\n```bash\ncurl -X POST http://127.0.0.1:8000/video/transcribe -Fbody='{\"video\":{\"url\":\"https://www.youtube.com/watch?v=VhJFyyukAzA\"}}'\n```\n\nThis will return the full transcription of the video, transcription for each segment, and transcription info like identified language. You can also use the [Swagger UI](http://127.0.0.1:8000/docs) to send the request.\n\n### Running Example Applications\n\nWe provide a few example applications that demonstrate the capabilities of Aana SDK.\n\n- [Chat with Video](https://github.com/mobiusml/aana_chat_with_video): A multimodal chat application that allows users to upload a video and ask questions about the video content based on the visual and audio information. See [Chat with Video Demo notebook](https://github.com/mobiusml/aana_chat_with_video/blob/main/notebooks/chat_with_video_demo.ipynb) to see how to use the application.\n- [Summarize Video](https://github.com/mobiusml/aana_summarize_video): An Aana application that summarizes a video by extracting transcription from the audio and generating a summary using a Language Model (LLM). This application is a part of the [tutorial](https://mobiusml.github.io/aana_sdk/pages/tutorial/) on how to build multimodal applications with Aana SDK.\n\nSee the README files of the applications for more information on how to install and run them.\n\nThe full list of example applications is available in the [Aana Examples](https://github.com/mobiusml/aana_examples) repository. You can use these examples as a starting point for building your own applications.\n\n### Main components\n\nThere are three main components in Aana SDK: deployments, endpoints, and AanaSDK.\n\n#### Deployments\n\nDeployments are the building blocks of Aana SDK. They represent the machine learning models that you want to deploy. Aana SDK comes with a set of predefined deployments that you can use or you can define your own deployments. See [Integrations](https://mobiusml.github.io/aana_sdk/pages/integrations/) for more information about predefined deployments.\n\nEach deployment has a main class that defines it and a configuration class that allows you to specify the deployment parameters.\n\nFor example, we have a predefined deployment for the Whisper model that allows you to transcribe audio. You can define the deployment like this:\n\n```python\nfrom aana.deployments.whisper_deployment import WhisperDeployment, WhisperConfig, WhisperModelSize, WhisperComputeType\n\nasr_deployment = WhisperDeployment.options(\n    num_replicas=1,\n    ray_actor_options={\"num_gpus\": 0.25},\n    user_config=WhisperConfig(model_size=WhisperModelSize.MEDIUM, compute_type=WhisperComputeType.FLOAT16).model_dump(mode=\"json\"),\n)\n```\n\nSee [Model Hub](https://mobiusml.github.io/aana_sdk/pages/model_hub/) for a collection of configurations for different models that can be used with the predefined deployments.\n\n#### Endpoints\n\nEndpoints define the functionality of your application. They allow you to connect multiple deployments (models) to each other and define the input and output of your application.\n\nEach endpoint is defined as a class that inherits from the `Endpoint` class. The class has two main methods: `initialize` and `run`.\n\nFor example, you can define an endpoint that transcribes a video like this:\n\n```python\nclass TranscribeVideoEndpoint(Endpoint):\n    \"\"\"Transcribe video endpoint.\"\"\"\n\n    async def initialize(self):\n        \"\"\"Initialize the endpoint.\"\"\"\n        await super().initialize()\n        self.asr_handle = await AanaDeploymentHandle.create(\"asr_deployment\")\n\n    async def run(self, video: VideoInput) -> WhisperOutput:\n        \"\"\"Transcribe video.\"\"\"\n        video_obj = await run_remote(download_video)(video_input=video)\n        audio = extract_audio(video=video_obj)\n        transcription = await self.asr_handle.transcribe(audio=audio)\n        return transcription\n```\n\n#### AanaSDK\n\nAanaSDK is the main class that you use to build your application. It allows you to deploy the deployments and endpoints you defined and start the application.\n\nFor example, you can define an application that transcribes a video like this:\n\n```python\naana_app = AanaSDK(name=\"transcribe_video_app\")\n\naana_app.register_deployment(name=\"asr_deployment\", instance=asr_deployment)\naana_app.register_endpoint(\n    name=\"transcribe_video\",\n    path=\"/video/transcribe\",\n    summary=\"Transcribe a video\",\n    endpoint_cls=TranscribeVideoEndpoint,\n)\n\naana_app.connect()  # Connects to the Ray cluster or starts a new one.\naana_app.migrate()  # Runs the migrations to create the database tables.\naana_app.deploy()   # Deploys the application.\n```\n\nAll you need to do is define the deployments and endpoints you want to use in your application, and Aana SDK will take care of the rest.\n\n\n## API\n\nAana SDK uses form data for API requests, which allows sending both binary data and structured fields in a single request. The request body is sent as a JSON string in the `body` field, and any binary data is sent as files.\n\n### Making API Requests\n\nYou can send requests to the SDK endpoints with only structured data or a combination of structured data and binary data.\n\n#### Only Structured Data\nWhen your request includes only structured data, you can send it as a JSON string in the `body` field.\n\n- **cURL Example:**\n    ```bash\n    curl http://127.0.0.1:8000/endpoint \\\n        -F body='{\"input\": \"data\", \"param\": \"value\"}'\n    ```\n\n- **Python Example:**\n    ```python\n    import json, requests\n\n    url = \"http://127.0.0.1:8000/endpoint\"\n    body = {\n        \"input\": \"data\",\n        \"param\": \"value\"\n    }\n\n    response = requests.post(\n        url,\n        data={\"body\": json.dumps(body)}\n    )\n\n    print(response.json())\n    ```\n\n#### With Binary Data\nWhen your request includes binary files (images, audio, etc.), you can send them as files in the request and include the names of the files in the `body` field as a reference.\n\nFor example, if you want to send an image, you can use [`aana.core.models.image.ImageInput`](https://mobiusml.github.io/aana_sdk/reference/models/media/#aana.core.models.ImageInput) as the input type that supports binary data upload. The `content` field in the input type should be set to the name of the file you are sending. \n\n- **cURL Example:**\n    ```bash\n    curl http://127.0.0.1:8000/process_images \\\n        -H \"Content-Type: multipart/form-data\" \\\n        -F body='{\"image\": {\"content\": \"file1\"}}' \\\n        -F file1=\"@image.jpeg\"\n    ```\n\n- **Python Example:**\n    ```python\n    import json, requests \n\n    url = \"http://127.0.0.1:8000/process_images\"\n    body = {\n        \"image\": {\"content\": \"file1\"}\n    }\n    with open(\"image.jpeg\", \"rb\") as file:\n        files = {\"file1\": file}\n\n        response = requests.post(\n            url,\n            data={\"body\": json.dumps(body)},\n            files=files\n        )\n\n        print(response.text)\n    ```\n\n## Serve Config Files\n\nThe [Serve Config Files](https://docs.ray.io/en/latest/serve/production-guide/config.html#serve-config-files) is the recommended way to deploy and update your applications in production. Aana SDK provides a way to build the Serve Config Files for the Aana applications. See the [Serve Config Files documentation](https://mobiusml.github.io/aana_sdk/pages/serve_config_files/) on how to build and deploy the applications using the Serve Config Files.\n\n\n## Run with Docker\n\nYou can deploy example applications using Docker. See the [documentation on how to run Aana SDK with Docker](https://mobiusml.github.io/aana_sdk/pages/docker/).\n\n## Documentation\n\nFor more information on how to use Aana SDK, see the [documentation](https://mobiusml.github.io/aana_sdk/).\n\n## License\n\nAana SDK is licensed under the [Apache License 2.0](https://github.com/mobiusml/aana_sdk?tab=Apache-2.0-1-ov-file#readme). Commercial licensing options are also available.\n\n## Contributing\n\nWe welcome contributions from the community to enhance Aana SDK's functionality and usability. Feel free to open issues for bug reports, feature requests, or submit pull requests to contribute code improvements.\n\nCheck out the [Development Documentation](https://mobiusml.github.io/aana_sdk/pages/code_overview/) for more information on how to contribute.\n\nWe have adopted the [Contributor Covenant](https://www.contributor-covenant.org/) as our code of conduct.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Multimodal SDK",
    "version": "0.2.4",
    "project_urls": null,
    "split_keywords": [
        "multimodal",
        " ray",
        " serving",
        " video",
        " images",
        " audio",
        " llm",
        " vlm",
        " asr"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cb2148978359a2ecfa4c3bd56b98262794b157ec59068bf3bafb5202753bb517",
                "md5": "6ecb109db39b40b84d69c0de6504be2a",
                "sha256": "5cbd65f3e3b218a3e813c80ee29a1992e89a24b08900976b816e1fa2de6b87e0"
            },
            "downloads": -1,
            "filename": "aana-0.2.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6ecb109db39b40b84d69c0de6504be2a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 19014377,
            "upload_time": "2025-02-17T12:42:31",
            "upload_time_iso_8601": "2025-02-17T12:42:31.347707Z",
            "url": "https://files.pythonhosted.org/packages/cb/21/48978359a2ecfa4c3bd56b98262794b157ec59068bf3bafb5202753bb517/aana-0.2.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f23f1e20f571b35b2b73dc79634ea4aa409234dd97cd37ce362e80da63c8d9eb",
                "md5": "0a55988ed196ce3a74a27dd80b5e2897",
                "sha256": "ce643681d6b18db339eab89afbca9f6dbee2703c6a47d9624f33405bfcbfd52f"
            },
            "downloads": -1,
            "filename": "aana-0.2.4.tar.gz",
            "has_sig": false,
            "md5_digest": "0a55988ed196ce3a74a27dd80b5e2897",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 18919261,
            "upload_time": "2025-02-17T12:42:35",
            "upload_time_iso_8601": "2025-02-17T12:42:35.063532Z",
            "url": "https://files.pythonhosted.org/packages/f2/3f/1e20f571b35b2b73dc79634ea4aa409234dd97cd37ce362e80da63c8d9eb/aana-0.2.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-17 12:42:35",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "aana"
}

Mobius Labs GmbH