assemblyai-haystack


Nameassemblyai-haystack JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/AssemblyAI/assemblyai-haystack
SummaryAssemblyAI Haystack Integration
upload_time2024-01-11 18:11:58
maintainer
docs_urlNone
authorAssemblyAI
requires_python>=3.8
licenseApache License 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <img src="https://github.com/AssemblyAI/assemblyai-python-sdk/blob/master/assemblyai.png?raw=true" width="500"/>

---


[![CI Passing](https://github.com/AssemblyAI/assemblyai-python-sdk/actions/workflows/test.yml/badge.svg)](https://github.com/AssemblyAI/assemblyai-haystack/actions/workflows/test.yml)
[![GitHub License](https://img.shields.io/github/license/AssemblyAI/assemblyai-haystack)](https://github.com/AssemblyAI/assemblyai-haystack/blob/main/LICENSE)
[![PyPI version](https://badge.fury.io/py/assemblyai-haystack.svg)](https://badge.fury.io/py/assemblyai-haystack)
[![PyPI Python Versions](https://img.shields.io/pypi/pyversions/assemblyai-haystack)](https://pypi.python.org/pypi/assemblyai-haystack/)
![PyPI - Wheel](https://img.shields.io/pypi/wheel/assemblyai-haystack)
[![AssemblyAI Twitter](https://img.shields.io/twitter/follow/AssemblyAI?label=%40AssemblyAI&style=social)](https://twitter.com/AssemblyAI)
[![AssemblyAI YouTube](https://img.shields.io/youtube/channel/subscribers/UCtatfZMf-8EkIwASXM4ts0A)](https://www.youtube.com/@AssemblyAI)
[![Discord](https://img.shields.io/discord/875120158014853141?logo=discord&label=Discord&link=https%3A%2F%2Fdiscord.com%2Fchannels%2F875120158014853141&style=social)
](https://assemblyai.com/discord)

# AssemblyAI Audio Transcript Loader

The AssemblyAI Audio Transcript Loader allows you to transcribe audio files with the AssemblyAI API and load the transcribed text into Haystack documents.

To use this package, you should have the environment variable ASSEMBLYAI_API_KEY set with your API key. Alternatively, the API key can also be passed as an argument while adding a component (see usage code example below).

More info about AssemblyAI:

* [Website](https://www.assemblyai.com/)
* [Get a Free API key](https://www.assemblyai.com/dashboard/signup)
* [AssemblyAI API Docs](https://www.assemblyai.com/docs)

## Installation

First, install the assemblyai-haystack python package.

```bash
pip install assemblyai-haystack
```

This package installs and uses the AssemblyAI Python SDK. You can find more info about the SDK at the [assemblyai-python-sdk GitHub repo]([https://www.assemblyai.com/docs](https://github.com/AssemblyAI/assemblyai-python-sdk)).

## Usage

The `AssemblyAITranscriber` needs to be initialized with the AssemblyAI API key. 
The `run` function needs at least the file_path argument. Audio files can be specified as an URL or a local file path.
You can also specify whether you want summarization and speaker diarization results in the `run` function.

```python
import os

from assemblyai_haystack.transcriber import AssemblyAITranscriber
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Pipeline
from haystack.components.writers import DocumentWriter

ASSEMBLYAI_API_KEY = os.environ.get("ASSEMBLYAI_API_KEY")

## Use AssemblyAITranscriber in a pipeline
document_store = InMemoryDocumentStore()
file_url = "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"

indexing = Pipeline()
indexing.add_component("transcriber", AssemblyAITranscriber(api_key=ASSEMBLYAI_API_KEY))
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("transcriber.transcription", "writer.documents")
indexing.run(
    {
        "transcriber": {
            "file_path": file_url,
            "summarization": None,
            "speaker_labels": None,
        }
    }
)

print("Indexed Document Count:", document_store.count_documents())
```

Note: Calling `indexing.run()` blocks until the transcription is finished.

The results of the transcription, summarization and speaker diarization are returned in separate document lists:
* transcription
* summarization
* speaker_labels

The metadata of the transcription document contains the transcription ID and url of the uploaded audio file.

```json
{
   "transcript_id":"73089e32-...-4ae9-97a4-eca7fe20a8b1",
   "audio_url":"https://storage.googleapis.com/aai-docs-samples/nbc.mp3"
}
```
  

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AssemblyAI/assemblyai-haystack",
    "name": "assemblyai-haystack",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "",
    "author": "AssemblyAI",
    "author_email": "marketing@assemblyai.com",
    "download_url": "https://files.pythonhosted.org/packages/6d/7a/6b125ba117a5bcd98c2178c7d6290af346e85dd7c2c110e0680a287e9dfb/assemblyai-haystack-0.1.1.tar.gz",
    "platform": null,
    "description": "<img src=\"https://github.com/AssemblyAI/assemblyai-python-sdk/blob/master/assemblyai.png?raw=true\" width=\"500\"/>\n\n---\n\n\n[![CI Passing](https://github.com/AssemblyAI/assemblyai-python-sdk/actions/workflows/test.yml/badge.svg)](https://github.com/AssemblyAI/assemblyai-haystack/actions/workflows/test.yml)\n[![GitHub License](https://img.shields.io/github/license/AssemblyAI/assemblyai-haystack)](https://github.com/AssemblyAI/assemblyai-haystack/blob/main/LICENSE)\n[![PyPI version](https://badge.fury.io/py/assemblyai-haystack.svg)](https://badge.fury.io/py/assemblyai-haystack)\n[![PyPI Python Versions](https://img.shields.io/pypi/pyversions/assemblyai-haystack)](https://pypi.python.org/pypi/assemblyai-haystack/)\n![PyPI - Wheel](https://img.shields.io/pypi/wheel/assemblyai-haystack)\n[![AssemblyAI Twitter](https://img.shields.io/twitter/follow/AssemblyAI?label=%40AssemblyAI&style=social)](https://twitter.com/AssemblyAI)\n[![AssemblyAI YouTube](https://img.shields.io/youtube/channel/subscribers/UCtatfZMf-8EkIwASXM4ts0A)](https://www.youtube.com/@AssemblyAI)\n[![Discord](https://img.shields.io/discord/875120158014853141?logo=discord&label=Discord&link=https%3A%2F%2Fdiscord.com%2Fchannels%2F875120158014853141&style=social)\n](https://assemblyai.com/discord)\n\n# AssemblyAI Audio Transcript Loader\n\nThe AssemblyAI Audio Transcript Loader allows you to transcribe audio files with the AssemblyAI API and load the transcribed text into Haystack documents.\n\nTo use this package, you should have the environment variable ASSEMBLYAI_API_KEY set with your API key. Alternatively, the API key can also be passed as an argument while adding a component (see usage code example below).\n\nMore info about AssemblyAI:\n\n* [Website](https://www.assemblyai.com/)\n* [Get a Free API key](https://www.assemblyai.com/dashboard/signup)\n* [AssemblyAI API Docs](https://www.assemblyai.com/docs)\n\n## Installation\n\nFirst, install the assemblyai-haystack python package.\n\n```bash\npip install assemblyai-haystack\n```\n\nThis package installs and uses the AssemblyAI Python SDK. You can find more info about the SDK at the [assemblyai-python-sdk GitHub repo]([https://www.assemblyai.com/docs](https://github.com/AssemblyAI/assemblyai-python-sdk)).\n\n## Usage\n\nThe `AssemblyAITranscriber` needs to be initialized with the AssemblyAI API key. \nThe `run` function needs at least the file_path argument. Audio files can be specified as an URL or a local file path.\nYou can also specify whether you want summarization and speaker diarization results in the `run` function.\n\n```python\nimport os\n\nfrom assemblyai_haystack.transcriber import AssemblyAITranscriber\nfrom haystack.document_stores.in_memory import InMemoryDocumentStore\nfrom haystack import Pipeline\nfrom haystack.components.writers import DocumentWriter\n\nASSEMBLYAI_API_KEY = os.environ.get(\"ASSEMBLYAI_API_KEY\")\n\n## Use AssemblyAITranscriber in a pipeline\ndocument_store = InMemoryDocumentStore()\nfile_url = \"https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3\"\n\nindexing = Pipeline()\nindexing.add_component(\"transcriber\", AssemblyAITranscriber(api_key=ASSEMBLYAI_API_KEY))\nindexing.add_component(\"writer\", DocumentWriter(document_store))\nindexing.connect(\"transcriber.transcription\", \"writer.documents\")\nindexing.run(\n    {\n        \"transcriber\": {\n            \"file_path\": file_url,\n            \"summarization\": None,\n            \"speaker_labels\": None,\n        }\n    }\n)\n\nprint(\"Indexed Document Count:\", document_store.count_documents())\n```\n\nNote: Calling `indexing.run()` blocks until the transcription is finished.\n\nThe results of the transcription, summarization and speaker diarization are returned in separate document lists:\n* transcription\n* summarization\n* speaker_labels\n\nThe metadata of the transcription document contains the transcription ID and url of the uploaded audio file.\n\n```json\n{\n   \"transcript_id\":\"73089e32-...-4ae9-97a4-eca7fe20a8b1\",\n   \"audio_url\":\"https://storage.googleapis.com/aai-docs-samples/nbc.mp3\"\n}\n```\n  \n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "AssemblyAI Haystack Integration",
    "version": "0.1.1",
    "project_urls": {
        "API Documentation": "https://www.assemblyai.com/docs/",
        "Code": "https://github.com/AssemblyAI/assemblyai-haystack",
        "Documentation": "https://github.com/AssemblyAI/assemblyai-haystack/blob/main/README.md",
        "Homepage": "https://github.com/AssemblyAI/assemblyai-haystack",
        "Issues": "https://github.com/AssemblyAI/assemblyai-haystack/issues",
        "Website": "https://assemblyai.com/"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a47b4b3d88a521d753d2c594e3b5ccdda409fa77f9542d42e8b74a1c40cb0ecb",
                "md5": "691b5fda7e946ac4de18893d7fc47964",
                "sha256": "e070b58f334776c79f9ff8607d793bab87edee393269da1aa65cff533c4f80a0"
            },
            "downloads": -1,
            "filename": "assemblyai_haystack-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "691b5fda7e946ac4de18893d7fc47964",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 8329,
            "upload_time": "2024-01-11T18:11:56",
            "upload_time_iso_8601": "2024-01-11T18:11:56.191102Z",
            "url": "https://files.pythonhosted.org/packages/a4/7b/4b3d88a521d753d2c594e3b5ccdda409fa77f9542d42e8b74a1c40cb0ecb/assemblyai_haystack-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6d7a6b125ba117a5bcd98c2178c7d6290af346e85dd7c2c110e0680a287e9dfb",
                "md5": "952f1eb8d7753b526c4f2ffba4d108ed",
                "sha256": "dbf6a00dbc503876e4f2d7be49cdc1297148a18b4b61f60eaa227c9b438cd1a4"
            },
            "downloads": -1,
            "filename": "assemblyai-haystack-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "952f1eb8d7753b526c4f2ffba4d108ed",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 7864,
            "upload_time": "2024-01-11T18:11:58",
            "upload_time_iso_8601": "2024-01-11T18:11:58.295346Z",
            "url": "https://files.pythonhosted.org/packages/6d/7a/6b125ba117a5bcd98c2178c7d6290af346e85dd7c2c110e0680a287e9dfb/assemblyai-haystack-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-11 18:11:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AssemblyAI",
    "github_project": "assemblyai-haystack",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "assemblyai-haystack"
}
        
Elapsed time: 0.59437s