databricks-genai-inference


Namedatabricks-genai-inference JSON
Version 0.2.3 PyPI version JSON
download
home_pagehttps://docs.databricks.com/en/machine-learning/foundation-models/query-foundation-model-apis.html
SummaryInteract with the Databricks Foundation Model API from python
upload_time2024-03-27 21:58:39
maintainerNone
docs_urlNone
authorDatabricks
requires_python>=3.9
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Databricks Generative AI Inference SDK (Beta)

[![PyPI version](https://img.shields.io/pypi/v/databricks-genai-inference.svg)](https://pypi.org/project/databricks-genai-inference/)

The Databricks Generative AI Inference Python library provides a user-friendly python interface to use the Databricks [Foundation Model API](https://docs.databricks.com/en/machine-learning/foundation-models/api-reference.html). 

> [!NOTE]
> This SDK was primarily designed for pay-per-token endpoints (`databricks-*`). It has a list of known model names (eg. `dbrx-instruct`) and automatically maps them to the corresponding shared endpoint (`databricks-dbrx-instruct`).
> You can use this with provisioned throughput endpoints, as long as they do not match known model names.
> If there is an overlap, you can use the `DATABRICKS_MODEL_URL_ENV` URL to directly provide an endpoint URL.

This library includes a pre-defined set of API classes `Embedding`, `Completion`, `ChatCompletion` with convenient functions to make API request, and to parse contents from raw json response. 

We also offer a high level `ChatSession` object for easy management of multi-round chat completions, which is especially useful for your next chatbot development.

You can find more usage details in our [SDK onboarding doc](https://docs.databricks.com/en/machine-learning/foundation-models/query-foundation-model-apis.html).

> [!IMPORTANT]  
> We're preparing to release version 1.0 of the Databricks GenerativeAI Inference Python library.

## Installation

```sh
pip install databricks-genai-inference
```

## Usage

### Embedding

```python
from databricks_genai_inference import Embedding
```

#### Text embedding

```python
response = Embedding.create(
    model="bge-large-en", 
    input="3D ActionSLAM: wearable person tracking in multi-floor environments")
print(f'embeddings: {response.embeddings[0]}')
```

> [!TIP]  
> You may want to reuse http connection to improve request latency for large-scale workload, code example:

```python
with requests.Session() as client:
    for i, text in enumerate(texts):
        response = Embedding.create(
            client=client,
            model="bge-large-en",
            input=text
        )
```

#### Text embedding (async)

```python
async with httpx.AsyncClient() as client:
    response = await Embedding.acreate(
        client=client,
        model="bge-large-en", 
        input="3D ActionSLAM: wearable person tracking in multi-floor environments")
    print(f'embeddings: {response.embeddings[0]}')
```

#### Text embedding with instruction

```python
response = Embedding.create(
    model="bge-large-en", 
    instruction="Represent this sentence for searching relevant passages:", 
    input="3D ActionSLAM: wearable person tracking in multi-floor environments")
print(f'embeddings: {response.embeddings[0]}')
```

#### Text embedding (batching)

> [!IMPORTANT]  
> Support max batch size of 150

```python
response = Embedding.create(
    model="bge-large-en", 
    input=[
        "3D ActionSLAM: wearable person tracking in multi-floor environments",
        "3D ActionSLAM: wearable person tracking in multi-floor environments"])
print(f'response.embeddings[0]: {response.embeddings[0]}\n')
print(f'response.embeddings[1]: {response.embeddings[1]}')
```

#### Text embedding with instruction (batching)

> [!IMPORTANT]  
> Support one instruction per batch 
> Batch size

```python
response = Embedding.create(
    model="bge-large-en", 
    instruction="Represent this sentence for searching relevant passages:",
    input=[
        "3D ActionSLAM: wearable person tracking in multi-floor environments",
        "3D ActionSLAM: wearable person tracking in multi-floor environments"])
print(f'response.embeddings[0]: {response.embeddings[0]}\n')
print(f'response.embeddings[1]: {response.embeddings[1]}')
```

### Text completion

```python
from databricks_genai_inference import Completion
```

#### Text completion

```python
response = Completion.create(
    model="mpt-7b-instruct",
    prompt="Represent the Science title:")
print(f'response.text:{response.text:}')

```

#### Text completion (async)

```python
async with httpx.AsyncClient() as client:
    response = await Completion.acreate(
        client=client,
        model="mpt-7b-instruct",
        prompt="Represent the Science title:")
    print(f'response.text:{response.text:}')

```

#### Text completion (streaming)

> [!IMPORTANT]  
> Only support batch size = 1 in streaming mode

```python
response = Completion.create(
    model="mpt-7b-instruct", 
    prompt="Count from 1 to 100:",
    stream=True)
print(f'response.text:')
for chunk in response:
    print(f'{chunk.text}', end="")
```

#### Text completion (streaming + async)

```python
async with httpx.AsyncClient() as client:
    response = await Completion.acreate(
        client=client,
        model="mpt-7b-instruct", 
        prompt="Count from 1 to 10:",
        stream=True)
    print(f'response.text:')
    async for chunk in response:
        print(f'{chunk.text}', end="")

```


#### Text completion (batching)

> [!IMPORTANT]  
> Support max batch size of 16

```python
response = Completion.create(
    model="mpt-7b-instruct", 
    prompt=[
        "Represent the Science title:", 
        "Represent the Science title:"])
print(f'response.text[0]:{response.text[0]}')
print(f'response.text[1]:{response.text[1]}')
```

### Chat completion

```python
from databricks_genai_inference import ChatCompletion
```

> [!IMPORTANT]  
> Batching is not supported for `ChatCompletion`

#### Chat completion

```python
response = ChatCompletion.create(model="llama-2-70b-chat", messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Knock knock."}])
print(f'response.text:{response.message:}')
```

#### Chat completion (async)

```python
async with httpx.AsyncClient() as client:
    response = await ChatCompletion.acreate(
        client=client,
        model="llama-2-70b-chat",
        messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Knock knock."}],
    )
    print(f'response.text:{response.message:}')
```

#### Chat completion (streaming)

```python
response = ChatCompletion.create(model="llama-2-70b-chat", messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Count from 1 to 30, add one emoji after each number"}], stream=True)
for chunk in response:
    print(f'{chunk.message}', end="")
```

#### Chat completion (streaming + async)

```python
async with httpx.AsyncClient() as client:
    response = await ChatCompletion.acreate(
        client=client,
        model="llama-2-70b-chat",
        messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Count from 1 to 30, add one emoji after each number"}],
        stream=True,
    )
    async for chunk in response:
        print(f'{chunk.message}', end="")
```

### Chat session

```python
from databricks_genai_inference import ChatSession
```

> [!IMPORTANT]  
> Streaming mode is not supported for `ChatSession`

```python
chat = ChatSession(model="llama-2-70b-chat")
chat.reply("Kock, kock!")
print(f'chat.last: {chat.last}')
chat.reply("Take a guess!")
print(f'chat.last: {chat.last}')

print(f'chat.history: {chat.history}')
print(f'chat.count: {chat.count}')
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://docs.databricks.com/en/machine-learning/foundation-models/query-foundation-model-apis.html",
    "name": "databricks-genai-inference",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Databricks",
    "author_email": "eng-genai-inference@databricks.com",
    "download_url": "https://files.pythonhosted.org/packages/c8/b4/dc45abcd404144f257daae8c858ba5aac90958e17e8051867cdc7009c802/databricks-genai-inference-0.2.3.tar.gz",
    "platform": null,
    "description": "# Databricks Generative AI Inference SDK (Beta)\n\n[![PyPI version](https://img.shields.io/pypi/v/databricks-genai-inference.svg)](https://pypi.org/project/databricks-genai-inference/)\n\nThe Databricks Generative AI Inference Python library provides a user-friendly python interface to use the Databricks [Foundation Model API](https://docs.databricks.com/en/machine-learning/foundation-models/api-reference.html). \n\n> [!NOTE]\n> This SDK was primarily designed for pay-per-token endpoints (`databricks-*`). It has a list of known model names (eg. `dbrx-instruct`) and automatically maps them to the corresponding shared endpoint (`databricks-dbrx-instruct`).\n> You can use this with provisioned throughput endpoints, as long as they do not match known model names.\n> If there is an overlap, you can use the `DATABRICKS_MODEL_URL_ENV` URL to directly provide an endpoint URL.\n\nThis library includes a pre-defined set of API classes `Embedding`, `Completion`, `ChatCompletion` with convenient functions to make API request, and to parse contents from raw json response. \n\nWe also offer a high level `ChatSession` object for easy management of multi-round chat completions, which is especially useful for your next chatbot development.\n\nYou can find more usage details in our [SDK onboarding doc](https://docs.databricks.com/en/machine-learning/foundation-models/query-foundation-model-apis.html).\n\n> [!IMPORTANT]  \n> We're preparing to release version 1.0 of the Databricks GenerativeAI Inference Python library.\n\n## Installation\n\n```sh\npip install databricks-genai-inference\n```\n\n## Usage\n\n### Embedding\n\n```python\nfrom databricks_genai_inference import Embedding\n```\n\n#### Text embedding\n\n```python\nresponse = Embedding.create(\n    model=\"bge-large-en\", \n    input=\"3D ActionSLAM: wearable person tracking in multi-floor environments\")\nprint(f'embeddings: {response.embeddings[0]}')\n```\n\n> [!TIP]  \n> You may want to reuse http connection to improve request latency for large-scale workload, code example:\n\n```python\nwith requests.Session() as client:\n    for i, text in enumerate(texts):\n        response = Embedding.create(\n            client=client,\n            model=\"bge-large-en\",\n            input=text\n        )\n```\n\n#### Text embedding (async)\n\n```python\nasync with httpx.AsyncClient() as client:\n    response = await Embedding.acreate(\n        client=client,\n        model=\"bge-large-en\", \n        input=\"3D ActionSLAM: wearable person tracking in multi-floor environments\")\n    print(f'embeddings: {response.embeddings[0]}')\n```\n\n#### Text embedding with instruction\n\n```python\nresponse = Embedding.create(\n    model=\"bge-large-en\", \n    instruction=\"Represent this sentence for searching relevant passages:\", \n    input=\"3D ActionSLAM: wearable person tracking in multi-floor environments\")\nprint(f'embeddings: {response.embeddings[0]}')\n```\n\n#### Text embedding (batching)\n\n> [!IMPORTANT]  \n> Support max batch size of 150\n\n```python\nresponse = Embedding.create(\n    model=\"bge-large-en\", \n    input=[\n        \"3D ActionSLAM: wearable person tracking in multi-floor environments\",\n        \"3D ActionSLAM: wearable person tracking in multi-floor environments\"])\nprint(f'response.embeddings[0]: {response.embeddings[0]}\\n')\nprint(f'response.embeddings[1]: {response.embeddings[1]}')\n```\n\n#### Text embedding with instruction (batching)\n\n> [!IMPORTANT]  \n> Support one instruction per batch \n> Batch size\n\n```python\nresponse = Embedding.create(\n    model=\"bge-large-en\", \n    instruction=\"Represent this sentence for searching relevant passages:\",\n    input=[\n        \"3D ActionSLAM: wearable person tracking in multi-floor environments\",\n        \"3D ActionSLAM: wearable person tracking in multi-floor environments\"])\nprint(f'response.embeddings[0]: {response.embeddings[0]}\\n')\nprint(f'response.embeddings[1]: {response.embeddings[1]}')\n```\n\n### Text completion\n\n```python\nfrom databricks_genai_inference import Completion\n```\n\n#### Text completion\n\n```python\nresponse = Completion.create(\n    model=\"mpt-7b-instruct\",\n    prompt=\"Represent the Science title:\")\nprint(f'response.text:{response.text:}')\n\n```\n\n#### Text completion (async)\n\n```python\nasync with httpx.AsyncClient() as client:\n    response = await Completion.acreate(\n        client=client,\n        model=\"mpt-7b-instruct\",\n        prompt=\"Represent the Science title:\")\n    print(f'response.text:{response.text:}')\n\n```\n\n#### Text completion (streaming)\n\n> [!IMPORTANT]  \n> Only support batch size = 1 in streaming mode\n\n```python\nresponse = Completion.create(\n    model=\"mpt-7b-instruct\", \n    prompt=\"Count from 1 to 100:\",\n    stream=True)\nprint(f'response.text:')\nfor chunk in response:\n    print(f'{chunk.text}', end=\"\")\n```\n\n#### Text completion (streaming + async)\n\n```python\nasync with httpx.AsyncClient() as client:\n    response = await Completion.acreate(\n        client=client,\n        model=\"mpt-7b-instruct\", \n        prompt=\"Count from 1 to 10:\",\n        stream=True)\n    print(f'response.text:')\n    async for chunk in response:\n        print(f'{chunk.text}', end=\"\")\n\n```\n\n\n#### Text completion (batching)\n\n> [!IMPORTANT]  \n> Support max batch size of 16\n\n```python\nresponse = Completion.create(\n    model=\"mpt-7b-instruct\", \n    prompt=[\n        \"Represent the Science title:\", \n        \"Represent the Science title:\"])\nprint(f'response.text[0]:{response.text[0]}')\nprint(f'response.text[1]:{response.text[1]}')\n```\n\n### Chat completion\n\n```python\nfrom databricks_genai_inference import ChatCompletion\n```\n\n> [!IMPORTANT]  \n> Batching is not supported for `ChatCompletion`\n\n#### Chat completion\n\n```python\nresponse = ChatCompletion.create(model=\"llama-2-70b-chat\", messages=[{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},{\"role\": \"user\", \"content\": \"Knock knock.\"}])\nprint(f'response.text:{response.message:}')\n```\n\n#### Chat completion (async)\n\n```python\nasync with httpx.AsyncClient() as client:\n    response = await ChatCompletion.acreate(\n        client=client,\n        model=\"llama-2-70b-chat\",\n        messages=[{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},{\"role\": \"user\", \"content\": \"Knock knock.\"}],\n    )\n    print(f'response.text:{response.message:}')\n```\n\n#### Chat completion (streaming)\n\n```python\nresponse = ChatCompletion.create(model=\"llama-2-70b-chat\", messages=[{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},{\"role\": \"user\", \"content\": \"Count from 1 to 30, add one emoji after each number\"}], stream=True)\nfor chunk in response:\n    print(f'{chunk.message}', end=\"\")\n```\n\n#### Chat completion (streaming + async)\n\n```python\nasync with httpx.AsyncClient() as client:\n    response = await ChatCompletion.acreate(\n        client=client,\n        model=\"llama-2-70b-chat\",\n        messages=[{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},{\"role\": \"user\", \"content\": \"Count from 1 to 30, add one emoji after each number\"}],\n        stream=True,\n    )\n    async for chunk in response:\n        print(f'{chunk.message}', end=\"\")\n```\n\n### Chat session\n\n```python\nfrom databricks_genai_inference import ChatSession\n```\n\n> [!IMPORTANT]  \n> Streaming mode is not supported for `ChatSession`\n\n```python\nchat = ChatSession(model=\"llama-2-70b-chat\")\nchat.reply(\"Kock, kock!\")\nprint(f'chat.last: {chat.last}')\nchat.reply(\"Take a guess!\")\nprint(f'chat.last: {chat.last}')\n\nprint(f'chat.history: {chat.history}')\nprint(f'chat.count: {chat.count}')\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Interact with the Databricks Foundation Model API from python",
    "version": "0.2.3",
    "project_urls": {
        "Homepage": "https://docs.databricks.com/en/machine-learning/foundation-models/query-foundation-model-apis.html"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4205df2f827acfbe433be519d3b4ff9d98a8241a606139b6353cdcb2cd511952",
                "md5": "7110f7be748189768ef5caf65b3eca91",
                "sha256": "8fe8eceafa1e5f16cb04b6a683095bdc6d7ed86776ff03b9afe46594c83fae1a"
            },
            "downloads": -1,
            "filename": "databricks_genai_inference-0.2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7110f7be748189768ef5caf65b3eca91",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 17975,
            "upload_time": "2024-03-27T21:58:37",
            "upload_time_iso_8601": "2024-03-27T21:58:37.739061Z",
            "url": "https://files.pythonhosted.org/packages/42/05/df2f827acfbe433be519d3b4ff9d98a8241a606139b6353cdcb2cd511952/databricks_genai_inference-0.2.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c8b4dc45abcd404144f257daae8c858ba5aac90958e17e8051867cdc7009c802",
                "md5": "d7286c9e406d0e28c22766e2433de3e0",
                "sha256": "caf7861cc5be557bf1a580d155847556477405088e423fb5582d28e962f98167"
            },
            "downloads": -1,
            "filename": "databricks-genai-inference-0.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "d7286c9e406d0e28c22766e2433de3e0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 26867,
            "upload_time": "2024-03-27T21:58:39",
            "upload_time_iso_8601": "2024-03-27T21:58:39.799079Z",
            "url": "https://files.pythonhosted.org/packages/c8/b4/dc45abcd404144f257daae8c858ba5aac90958e17e8051867cdc7009c802/databricks-genai-inference-0.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-27 21:58:39",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "databricks-genai-inference"
}
        
Elapsed time: 0.74164s