pvorca


Namepvorca JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/Picovoice/orca
SummaryOrca Streaming Text-to-Speech Engine
upload_time2024-08-21 20:38:06
maintainerNone
docs_urlNone
authorPicovoice
requires_python>=3.8
licenseNone
keywords streaming text-to-speech tts speech synthesis voice generation speech engine
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Orca Binding for Python

## Orca Streaming Text-to-Speech Engine

Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)

Orca is an on-device streaming text-to-speech engine that is designed for use with LLMs, enabling zero-latency
voice assistants. Orca is:

- Private; All voice processing runs locally.
- Cross-Platform:
    - Linux (x86_64), macOS (x86_64, arm64), Windows (x86_64)
    - Android and iOS
    - Chrome, Safari, Firefox, and Edge
    - Raspberry Pi (3, 4, 5)

## Compatibility

- Python 3.8+
- Runs on Linux (x86_64), macOS (x86_64, arm64), Windows (x86_64), and Raspberry Pi (3, 4, 5).

## Installation

```console
pip3 install pvorca
```

## AccessKey

Orca requires a valid Picovoice `AccessKey` at initialization. `AccessKey` acts as your credentials when using Orca
SDKs. You can get your `AccessKey` for free. Make sure to keep your `AccessKey` secret.
Signup or Login to [Picovoice Console](https://console.picovoice.ai/) to get your `AccessKey`.

## Usage

Orca supports two modes of operation: streaming and single synthesis.
In the streaming synthesis mode, Orca processes an incoming text stream in real-time and generates audio in parallel.
In the single synthesis mode, a complete text is synthesized in a single call to the Orca engine.

Create an instance of the Orca engine:

```python
import pvorca

orca = pvorca.create(access_key='${ACCESS_KEY}')
```

Replace the `${ACCESS_KEY}` with your AccessKey obtained from [Picovoice Console](https://console.picovoice.ai/).

To synthesize a text stream, create an `Orca.OrcaStream` object and add text to it one-by-one:

```python
stream = orca.stream_open()

for text_chunk in text_generator():
    pcm = stream.synthesize(text_chunk)
    if pcm is not None:
        # handle pcm

pcm = stream.flush()
if pcm is not None:
    # handle pcm
```

The `text_generator()` function can be any stream generating text, for example an LLM response.
Orca produces audio chunks in parallel to the incoming text stream, and returns the raw PCM whenever enough context has
been added via `stream.synthesize()`.
To ensure smooth transitions between chunks, the `stream.synthesize()` function returns an audio chunk that only
includes the audio for a portion of the text that has been added.
To generate the audio for the remaining text, `stream.flush()` needs to be invoked.
When done with streaming text synthesis, the `Orca.OrcaStream` object needs to be closed:

```python
stream.close()
```

If the complete text is known before synthesis, single synthesis mode can be used to generate speech in a single call to
Orca:

```python
# Return raw PCM
pcm, alignments = orca.synthesize(text='${TEXT}')

# Save the generated audio to a WAV file directly
alignments = orca.synthesize_to_file(text='${TEXT}', path='${OUTPUT_PATH}')
```

Replace `${TEXT}` with the text to be synthesized and `${OUTPUT_PATH}` with the path to save the generated audio as a
single-channel 16-bit PCM WAV file.
In single synthesis mode, Orca returns metadata of the synthesized audio in the form of a list of `Orca.WordAlignment`
objects.
You can print the metadata with:

```python
for token in alignments:
    print(f"word=\"{token.word}\", start_sec={token.start_sec:.2f}, end_sec={token.end_sec:.2f}")
    for phoneme in token.phonemes:
        print(f"\tphoneme=\"{phoneme.phoneme}\", start_sec={phoneme.start_sec:.2f}, end_sec={phoneme.end_sec:.2f}")
```

When done make sure to explicitly release the resources using:

```python
orca.delete()
```

### Text input

Orca accepts the 26 lowercase (a-z) and 26 uppercase (A-Z) letters of the English alphabet, numbers,
basic symbols, as well as common punctuation marks. You can get a list of all supported characters by calling the
`valid_characters()` method provided in the Orca SDK you are using.
Pronunciations of characters or words not supported by this list can be achieved with
[custom pronunciations](#custom-pronunciations).

### Custom pronunciations

Orca allows to embed custom pronunciations in the text via the syntax: `{word|pronunciation}`.\
The pronunciation is expressed in [ARPAbet](https://en.wikipedia.org/wiki/ARPABET) phonemes, for example:

- "This is a {custom|K AH S T AH M} pronunciation"
- "{read|R IY D} this as {read|R EH D}, please."
- "I {live|L IH V} in {Sevilla|S EH V IY Y AH}. We have great {live|L AY V} sports!"

### Voices

Orca can synthesize speech with various voices, each of which is characterized by a model file located
in [lib/common](https://github.com/Picovoice/orca/tree/main/lib/common).
To create an instance of the engine with a specific voice, use:

```python
orca = pvorca.create(access_key='${ACCESS_KEY}', model_path='${MODEL_PATH}')
```

and replace `${MODEL_PATH}` with the path to the model file with the desired voice.

### Speech control

Orca allows for keyword arguments to control the synthesized speech. They can be provided to the `stream_open`
method or the single synthesis methods `synthesize` and `synthesize_to_file`:

- `speech_rate`: Controls the speed of the generated speech. Valid values are within [0.7, 1.3]. A higher (lower) value
  produces speech that is faster (slower). The default is `1.0`.
- `random_state`: Sets the random state for sampling during synthesis. This can be used to ensure that the synthesized
  speech is deterministic across different runs. Valid values are all non-negative integers. If not provided, a random
  seed will be chosen and the synthesis process will be non-deterministic.

### Orca properties

To obtain the set of valid characters, call `orca.valid_characters`.\
To retrieve the maximum number of characters allowed, call `orca.max_character_limit`.\
The sample rate of Orca is `orca.sample_rate`.

### Alignment Metadata

Along with the raw PCM or saved audio file, Orca returns metadata for the synthesized audio in single synthesis mode.
The `Orca.WordAlignment` object has the following properties:

- **Word:** String representation of the word.
- **Start Time:** Indicates when the word started in the synthesized audio. Value is in seconds.
- **End Time:** Indicates when the word ended in the synthesized audio. Value is in seconds.
- **Phonemes:** A list of `Orca.PhonemeAlignment` objects.

The `Orca.PhonemeAlignment` object has the following properties:

- **Phoneme:** String representation of the phoneme.
- **Start Time:** Indicates when the phoneme started in the synthesized audio. Value is in seconds.
- **End Time:** Indicates when the phoneme ended in the synthesized audio. Value is in seconds.

## Demos

[pvorcademo](https://pypi.org/project/pvorcademo/) provides command-line utilities for synthesizing audio using
Orca.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Picovoice/orca",
    "name": "pvorca",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "Streaming Text-to-Speech, TTS, Speech Synthesis, Voice Generation, Speech Engine",
    "author": "Picovoice",
    "author_email": "hello@picovoice.ai",
    "download_url": "https://files.pythonhosted.org/packages/ae/58/095a57bc6e41ad1259cfeb67ff3a3d4b39739db6cf95055f3d6e9bca0d0a/pvorca-1.0.0.tar.gz",
    "platform": null,
    "description": "# Orca Binding for Python\n\n## Orca Streaming Text-to-Speech Engine\n\nMade in Vancouver, Canada by [Picovoice](https://picovoice.ai)\n\nOrca is an on-device streaming text-to-speech engine that is designed for use with LLMs, enabling zero-latency\nvoice assistants. Orca is:\n\n- Private; All voice processing runs locally.\n- Cross-Platform:\n    - Linux (x86_64), macOS (x86_64, arm64), Windows (x86_64)\n    - Android and iOS\n    - Chrome, Safari, Firefox, and Edge\n    - Raspberry Pi (3, 4, 5)\n\n## Compatibility\n\n- Python 3.8+\n- Runs on Linux (x86_64), macOS (x86_64, arm64), Windows (x86_64), and Raspberry Pi (3, 4, 5).\n\n## Installation\n\n```console\npip3 install pvorca\n```\n\n## AccessKey\n\nOrca requires a valid Picovoice `AccessKey` at initialization. `AccessKey` acts as your credentials when using Orca\nSDKs. You can get your `AccessKey` for free. Make sure to keep your `AccessKey` secret.\nSignup or Login to [Picovoice Console](https://console.picovoice.ai/) to get your `AccessKey`.\n\n## Usage\n\nOrca supports two modes of operation: streaming and single synthesis.\nIn the streaming synthesis mode, Orca processes an incoming text stream in real-time and generates audio in parallel.\nIn the single synthesis mode, a complete text is synthesized in a single call to the Orca engine.\n\nCreate an instance of the Orca engine:\n\n```python\nimport pvorca\n\norca = pvorca.create(access_key='${ACCESS_KEY}')\n```\n\nReplace the `${ACCESS_KEY}` with your AccessKey obtained from [Picovoice Console](https://console.picovoice.ai/).\n\nTo synthesize a text stream, create an `Orca.OrcaStream` object and add text to it one-by-one:\n\n```python\nstream = orca.stream_open()\n\nfor text_chunk in text_generator():\n    pcm = stream.synthesize(text_chunk)\n    if pcm is not None:\n        # handle pcm\n\npcm = stream.flush()\nif pcm is not None:\n    # handle pcm\n```\n\nThe `text_generator()` function can be any stream generating text, for example an LLM response.\nOrca produces audio chunks in parallel to the incoming text stream, and returns the raw PCM whenever enough context has\nbeen added via `stream.synthesize()`.\nTo ensure smooth transitions between chunks, the `stream.synthesize()` function returns an audio chunk that only\nincludes the audio for a portion of the text that has been added.\nTo generate the audio for the remaining text, `stream.flush()` needs to be invoked.\nWhen done with streaming text synthesis, the `Orca.OrcaStream` object needs to be closed:\n\n```python\nstream.close()\n```\n\nIf the complete text is known before synthesis, single synthesis mode can be used to generate speech in a single call to\nOrca:\n\n```python\n# Return raw PCM\npcm, alignments = orca.synthesize(text='${TEXT}')\n\n# Save the generated audio to a WAV file directly\nalignments = orca.synthesize_to_file(text='${TEXT}', path='${OUTPUT_PATH}')\n```\n\nReplace `${TEXT}` with the text to be synthesized and `${OUTPUT_PATH}` with the path to save the generated audio as a\nsingle-channel 16-bit PCM WAV file.\nIn single synthesis mode, Orca returns metadata of the synthesized audio in the form of a list of `Orca.WordAlignment`\nobjects.\nYou can print the metadata with:\n\n```python\nfor token in alignments:\n    print(f\"word=\\\"{token.word}\\\", start_sec={token.start_sec:.2f}, end_sec={token.end_sec:.2f}\")\n    for phoneme in token.phonemes:\n        print(f\"\\tphoneme=\\\"{phoneme.phoneme}\\\", start_sec={phoneme.start_sec:.2f}, end_sec={phoneme.end_sec:.2f}\")\n```\n\nWhen done make sure to explicitly release the resources using:\n\n```python\norca.delete()\n```\n\n### Text input\n\nOrca accepts the 26 lowercase (a-z) and 26 uppercase (A-Z) letters of the English alphabet, numbers,\nbasic symbols, as well as common punctuation marks. You can get a list of all supported characters by calling the\n`valid_characters()` method provided in the Orca SDK you are using.\nPronunciations of characters or words not supported by this list can be achieved with\n[custom pronunciations](#custom-pronunciations).\n\n### Custom pronunciations\n\nOrca allows to embed custom pronunciations in the text via the syntax: `{word|pronunciation}`.\\\nThe pronunciation is expressed in [ARPAbet](https://en.wikipedia.org/wiki/ARPABET) phonemes, for example:\n\n- \"This is a {custom|K AH S T AH M} pronunciation\"\n- \"{read|R IY D} this as {read|R EH D}, please.\"\n- \"I {live|L IH V} in {Sevilla|S EH V IY Y AH}. We have great {live|L AY V} sports!\"\n\n### Voices\n\nOrca can synthesize speech with various voices, each of which is characterized by a model file located\nin [lib/common](https://github.com/Picovoice/orca/tree/main/lib/common).\nTo create an instance of the engine with a specific voice, use:\n\n```python\norca = pvorca.create(access_key='${ACCESS_KEY}', model_path='${MODEL_PATH}')\n```\n\nand replace `${MODEL_PATH}` with the path to the model file with the desired voice.\n\n### Speech control\n\nOrca allows for keyword arguments to control the synthesized speech. They can be provided to the `stream_open`\nmethod or the single synthesis methods `synthesize` and `synthesize_to_file`:\n\n- `speech_rate`: Controls the speed of the generated speech. Valid values are within [0.7, 1.3]. A higher (lower) value\n  produces speech that is faster (slower). The default is `1.0`.\n- `random_state`: Sets the random state for sampling during synthesis. This can be used to ensure that the synthesized\n  speech is deterministic across different runs. Valid values are all non-negative integers. If not provided, a random\n  seed will be chosen and the synthesis process will be non-deterministic.\n\n### Orca properties\n\nTo obtain the set of valid characters, call `orca.valid_characters`.\\\nTo retrieve the maximum number of characters allowed, call `orca.max_character_limit`.\\\nThe sample rate of Orca is `orca.sample_rate`.\n\n### Alignment Metadata\n\nAlong with the raw PCM or saved audio file, Orca returns metadata for the synthesized audio in single synthesis mode.\nThe `Orca.WordAlignment` object has the following properties:\n\n- **Word:** String representation of the word.\n- **Start Time:** Indicates when the word started in the synthesized audio. Value is in seconds.\n- **End Time:** Indicates when the word ended in the synthesized audio. Value is in seconds.\n- **Phonemes:** A list of `Orca.PhonemeAlignment` objects.\n\nThe `Orca.PhonemeAlignment` object has the following properties:\n\n- **Phoneme:** String representation of the phoneme.\n- **Start Time:** Indicates when the phoneme started in the synthesized audio. Value is in seconds.\n- **End Time:** Indicates when the phoneme ended in the synthesized audio. Value is in seconds.\n\n## Demos\n\n[pvorcademo](https://pypi.org/project/pvorcademo/) provides command-line utilities for synthesizing audio using\nOrca.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Orca Streaming Text-to-Speech Engine",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://github.com/Picovoice/orca"
    },
    "split_keywords": [
        "streaming text-to-speech",
        " tts",
        " speech synthesis",
        " voice generation",
        " speech engine"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ab1c27002e38af899e35958f9d33defb3c2535c7abb3646b64f0c20b6f8f8b3f",
                "md5": "e03c922ce454dce8244ec42590d17346",
                "sha256": "9fd2668f15f2134d76c9a82e158e88172b2ca1b123481ee80cd58be16499b27b"
            },
            "downloads": -1,
            "filename": "pvorca-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e03c922ce454dce8244ec42590d17346",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 8179301,
            "upload_time": "2024-08-21T20:38:03",
            "upload_time_iso_8601": "2024-08-21T20:38:03.837964Z",
            "url": "https://files.pythonhosted.org/packages/ab/1c/27002e38af899e35958f9d33defb3c2535c7abb3646b64f0c20b6f8f8b3f/pvorca-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ae58095a57bc6e41ad1259cfeb67ff3a3d4b39739db6cf95055f3d6e9bca0d0a",
                "md5": "a3b53c94fc4306f14f6949c7de6b0575",
                "sha256": "18efa8ee3b98c306acc5cd7804046e3b9fb55ea16cea9bdad1ff76604db76791"
            },
            "downloads": -1,
            "filename": "pvorca-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a3b53c94fc4306f14f6949c7de6b0575",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 8179124,
            "upload_time": "2024-08-21T20:38:06",
            "upload_time_iso_8601": "2024-08-21T20:38:06.748281Z",
            "url": "https://files.pythonhosted.org/packages/ae/58/095a57bc6e41ad1259cfeb67ff3a3d4b39739db6cf95055f3d6e9bca0d0a/pvorca-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-21 20:38:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Picovoice",
    "github_project": "orca",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pvorca"
}
        
Elapsed time: 4.87767s