zaphodvox

Name	zaphodvox JSON
Version	1.2.0 JSON
	download
home_page
Summary	A command-line interface and python library for encoding text into synthetic speech using Google Cloud Text-To-Speech or ElevenLabs APIs.
upload_time	2024-02-02 18:31:12
maintainer
docs_url	None
author
requires_python	>=3.10
license
keywords	audio elevenlabs google-cloud-texttospeech mp3 ogg text-to-speech tts wav
VCS
bugtrack_url
requirements	coverage elevenlabs google-cloud-texttospeech hatch pydantic pydub pydub-stubs pytest pytest-cov pytest-watch rich ruff tenacity Unidecode
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

The `zaphodvox` python package provides a command-line interface for encoding a text file into synthetic speech audio using either the [Google Text-to-Speech API](https://cloud.google.com/text-to-speech/docs) or the [ElevenLabs Speech Synthesis API](https://elevenlabs.io/docs).

# Installation

> "He was clearly a man of many qualities, even if they were mostly bad ones."

```console
$ pip install zaphodvox
...
Successfully installed zaphodvox...

$ zaphodvox --help
usage: zaphodvox --blah --blah --blah --beware --blah whatever
...
$ zaphodvox test.txt
Nothing to do... I'd give you advice, but you wouldn't listen. No one ever does.

$ pip uninstall zaphodvox
...
Successfully uninstalled zaphodvox...
```

`zaphodvox` also requires a working installation of [ffmpeg](https://ffmpeg.org/).

# Authorization

> "He didn't know why he had become President of the Galaxy, except that it seemed a fun thing to be."

Authorization credentials are required for both the Google Text-To-Speech APIs and the ElevenLabs Speech Synthesis APIs. These can be specified either by defining environment variables or using CLI arguments.

## Google

The Google encoder requires that you set up an account, project, and service account JSON key as described in the ["Before You Begin"](https://cloud.google.com/text-to-speech/docs/before-you-begin) documentation.

You can set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the downloaded service account file path or you can pass the file path directly to the CLI with the `--service-account` argument.

## ElevenLabs

The ElevenLabs encoder requires that you set up an account and generate an API key as described in the ["Authentication"](https://elevenlabs.io/docs/api-reference/text-to-speech#authentication) section of the [API Reference documentation](https://elevenlabs.io/docs/api-reference/text-to-speech).

You can set the `ELEVEN_API_KEY` environment variable to the API key copied from your profile page or you can pass the key directly to the CLI with the `--api-key` argument.

# Usage

> "I refuse to answer that question on the grounds that I don't know the answer."

Detailed usage information can be printed by running:

```bash
$ zaphodvox --help
```

Some examples using the following text file (`gone-bananas.txt`):

```text
This is the first line of text.
This is the next line. By default, each line of text is sent to the API individually.
This is the last line. And this is a sentence that is split
between lines. Ideally, these parts of the sentence would be on the same line.
```

```bash
$ zaphodvox --encoder=google --voice-id=A --encode gone-bananas.txt
```

Running the above command will encode the text file using the [Google Text-to-Speech API](https://cloud.google.com/text-to-speech/docs) with the `en-US-Wavenet-A` voice. This will result in the following fragment audio files being created in the working directory (one file for each line of text):

```console
gone-bananas-00000.wav ["This is the first line..."]
gone-bananas-00001.wav ["This is the next line. By default..."]
gone-bananas-00002.wav ["This is the last line. And this is..."]
gone-bananas-00003.wav ["between lines. Ideally, these..."]
```

In addition to the audio files, a manifest JSON file (`gone-bananas-manifest.json`) will also be written to the working directory. This file contains information about the fragment audio files encoded, including the text, the relative file name, and the voice used. This manifest file can also be used as input to the command rather than a text file. See the [manifest documentation](#manifest) for more information.

## Concatenation

To combine the individual fragment audio files into one, add the `--concat` argument:

```bash
$ zaphodvox --encoder=google --voice-id=A --encode --concat gone-bananas.txt
```

Now only a single audio file, `gone-bananas.wav`, will be created in the working directory.

## Cleaning

Note that there isn't much silence between individual lines of the text file. To add a delay between lines, simply add an extra newline between each line of text. The easiest way to do this is to use the `--clean` option:

```bash
$ zaphodvox --clean gone-bananas.txt
```

This command will convert the text file to plain text and attempt to add an extra newline between paragraphs as well as combine sentences that have been split between adjacent lines. It will output the newly "cleaned" text file as `gone-bananas-cleaned.txt`:

```text
This is the first line of text. It should probably be a paragraph.

This is the next line. By default, each line of text is sent to the API individually.

This is the last line. And this is a sentence that is split between lines. Ideally, these parts of the sentence would be on the same line.
```

Encode this new file:

```bash
$ zaphodvox --encoder=google --voice-id=A --encode --concat gone-bananas-cleaned.txt
```

Notice that there's 500ms of silence (the default) generated for each empty newline and that the last sentence is no longer split between lines.

The `--clean` and `--encode` arguments can be combined in a single call:

```bash
$ zaphodvox --encoder=google --voice-id=A --clean --encode --concat gone-bananas.txt
```

If the `--max-chars` argument is provided, the cleaning process will guarantee that every line is less than `max-chars` characters by splitting long lines at sentence boundaries.

The text file will be cleaned before being encoded and concatenated into `gone-bananas.wav`. The cleaned text file (i.e. `gone-bananas-cleaned.txt`) will still be created in the working directory.

# Voice Configurations

> "I'm so great even I get tongue-tied talking to myself."

Multiple voice configurations can be defined in a JSON file and loaded via the `--voices-file` argument.

This is an example voice configuration file (`voices.json`):

```json
{
"voices": {
"Marvin": {
"google": {
"voice_id": "A",
"language": "en",
"region": "US",
"type": "Neural2"
},
"elevenlabs": {
"voice_id": "EXAVITQu4vr4xnSDxMaL"
}
},
"Ford": {
"google": {
"voice_id": "D",
"language": "en",
"region": "GB",
"type": "Wavenet"
},
"elevenlabs": {
"voice_id": "TxGEqnHWrfWFTfGW9XjX"
}
},
"Trillian": {
"google": {
"voice_id": "C",
"language": "en",
"region": "GB",
"type": "Wavenet"
},
"elevenlabs": {
"voice_id": "EeMfvkfxDAepqPVNPE8M"
}
}
}
}
```

If a `--voices-file` is used, inline `ZVOX: [name]` tags in a text `inputfile` can specify the voice(s) to be used.

A example multi-voice text file (`heart-of-gold.txt`):

```text
ZVOX: Marvin
This text will be spoken by the Marvin voice defined in the voices JSON file. That is, it will use either the
specified Google voice (i.e. "en-US-F-Wavenet") or ElevenLabs voice (i.e. voice ID "EXAVITQu4vr4xnSDxMaL")
depending on which encoder is used.

ZVOX: Ford
This line will be spoken by the Ford voice. That is, it will use either the specified Google voice
(i.e. "en-US-D-Wavenet") or ElevenLabs voice (i.e. voice ID "TxGEqnHWrfWFTfGW9XjX") depending on which
encoder is used.

This paragraph will also be spoken by the Ford voice as it is still the "current" voice.

ZVOX: Trillian // This is a UK female voice
Finally, this will be read by the Trillian voice. Note that text following a space after the voice name will be ignored.
If a line contains the "ZVOX" tag, it will not be synthesized to speech.
```

```bash
$ zaphodvox --voices-file=voices.json --encoder=google --encode heart-of-gold.txt
```

The above command will result in the following fragment audio files to be created in the working directory:

```console
heart-of-gold-00000.wav ["This text will be spoken by the..." using "Marvin" google voice]
heart-of-gold-00001.wav [500ms silence]
heart-of-gold-00002.wav ["This line will be spoken by the..." using "Ford" google voice]
heart-of-gold-00003.wav [500ms silence]
heart-of-gold-00004.wav ["This paragraph will also..." using "Ford" google voice]
heart-of-gold-00005.wav [500ms silence]
heart-of-gold-00006.wav ["Finally, this will be read by the..." using "Trillian" google voice]
heart-of-gold-00007.wav ["If a line contains the..." using "Trillian" google voice]
```

## Voice Configurations

> "He preferred people to be puzzled rather than contemptuous."

The JSON voice configurations for both Google and ElevenLabs follow the CLI "voice" arguments fairly closely. One difference is that CLI defaults don't necessarily apply in all cases.

### Google

See the Google documentation for [Supported Voices](https://cloud.google.com/text-to-speech/docs/voices), [Voice Selection Params](https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.VoiceSelectionParams), and [Audio Config](https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.AudioConfig) for more detailed information on the individual settings.

The required settings are `voice_id`, `language`, `region`, and `type`. All other settings will default to their underlying Google defaults.

Example:

```json
{
"voice_id": "D",
"language": "en",
"region": "GB",
"type": "Wavenet",
"speaking_rate": null,
"pitch": null,
"volume_gain_db": null,
"sample_rate_hertz": null,
"effects_profile_id": null
}
```

### ElevenLabs

See the ElevenLabs documentation for [Voice Lab](https://elevenlabs.io/docs/voicelab/overview), [Models](https://elevenlabs.io/docs/speech-synthesis/models), and [Voice Settings](https://elevenlabs.io/docs/speech-synthesis/voice-settings) for more detailed information on the individual settings.

The only required setting is `voice_id`. All other settings will default to the underlying ElevenLabs defaults for the selected voice.

Example:

```json
{
"voice_id": "EXAVITQu4vr4xnSDxMaL",
"model": null,
"stability": null,
"similarity_boost": null,
"style": null,
"use_speaker_boost": null
}
```

# Manifest

> “If I ever meet myself, I'll hit myself so hard I won't know what's hit me.”

A manifest JSON file is created during the encoding process and can be used as the inputfile instead of a text file. If a manifest file is used, the fragment audio files to be re-encoded can be specified with the `--manifest-indexes` argument.

For example, consider this simple text file (`towel.txt`):

```text
Don't forget your towel.
And always know where it is.
```

The file is encoded with this command:

```bash
$ zaphodvox --encoder=google --voice-id=A --encode --copy towel.txt
```

Three files will be created in the current working directory: the two fragment audio files (`towel-00000.wav` and `towel-00001.wav`) and the manifest file (`towel-manifest.json`).

Here are the contents of `towel-manifest.json`:

```json
"fragments":
[
{
"text": "Don't forget your towel.",
"filename": "towel-00000.wav",
"voice":
{
"voice_id": "A",
"language": "en",
"region": "US",
"type": "Wavenet"
},
"silence_duration": null,
"encoder": "google",
"audio_format": "linear16"
},
{
"text": "And always know where it is.",
"filename": "towel-00001.wav",
"voice":
{
"voice_id": "A",
"language": "en",
"region": "US",
"type": "Wavenet"
},
"silence_duration": null,
"encoder": "google",
"audio_format": "linear16"
}
]
```

Note that the fragment `filename`s are relative to the manifest file's location.

Changes can be made to fragment `text`, `filename`, or `voice` items and the modified manifest file can be used to re-encode only the changes by specifying which fragment indexes to encode.

For example, if the second fragment's `text` field is modified to remove the initial "And", this command will re-encode only the second fragment audio file in place (i.e. `towel-00001.wav`):

```bash
$ zaphodvox --encoder=google --encode --manifest-indexes=1 towel-manifest.json
```

The `--concat` argument can also be added when using a manifest file as input. In this case, an attempt will be made to concatenate both the unmodified and newly encoded fragment audio files into one. If any of the files specified in the manifest is missing, the concatenation will fail.

A manifest plan can be created from a text file using the `--plan` argument:

```bash
$ zaphodvox --encoder=google --voice-id=A --plan gone-bananas.txt
```

The above command will write the manifest plan to `gone-bananas-plan.json` without doing any encoding. It can be reviewed and edited before being used as input to the command with the `--encode` argument.

By default, each line of the text file is encoded into its own audio fragment. If the `--max-chars` argument is provided when generating a plan manifest or encoding a text file, the planning process will attempt to combine multiple lines of text per audio fragment, up to `max-chars` characters. Larger fragments often encode with better results.

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "zaphodvox",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "audio,elevenlabs,google-cloud-texttospeech,mp3,ogg,text-to-speech,tts,wav",
    "author": "",
    "author_email": "Thomas Bohmbach Jr <thomasbohmbach@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/5b/cd/ca9cfb5058778e97cf0ce7141aebe27bfef61159a7d210c6c963084b516b/zaphodvox-1.2.0.tar.gz",
    "platform": null,
    "description": "The `zaphodvox` python package provides a command-line interface for encoding a text file into synthetic speech audio using either the [Google Text-to-Speech API](https://cloud.google.com/text-to-speech/docs) or the [ElevenLabs Speech Synthesis API](https://elevenlabs.io/docs).\n\n# Installation\n\n> \"He was clearly a man of many qualities, even if they were mostly bad ones.\"\n\n```console\n$ pip install zaphodvox\n...\nSuccessfully installed zaphodvox...\n\n$ zaphodvox --help\nusage: zaphodvox --blah --blah --blah --beware --blah whatever\n...\n$ zaphodvox test.txt\nNothing to do... I'd give you advice, but you wouldn't listen. No one ever does.\n\n$ pip uninstall zaphodvox\n...\nSuccessfully uninstalled zaphodvox...\n```\n\n`zaphodvox` also requires a working installation of [ffmpeg](https://ffmpeg.org/).\n\n# Authorization\n\n> \"He didn't know why he had become President of the Galaxy, except that it seemed a fun thing to be.\"\n\nAuthorization credentials are required for both the Google Text-To-Speech APIs and the ElevenLabs Speech Synthesis APIs. These can be specified either by defining environment variables or using CLI arguments.\n\n## Google\n\nThe Google encoder requires that you set up an account, project, and service account JSON key as described in the [\"Before You Begin\"](https://cloud.google.com/text-to-speech/docs/before-you-begin) documentation.\n\nYou can set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the downloaded service account file path or you can pass the file path directly to the CLI with the `--service-account` argument.\n\n## ElevenLabs\n\nThe ElevenLabs encoder requires that you set up an account and generate an API key as described in the [\"Authentication\"](https://elevenlabs.io/docs/api-reference/text-to-speech#authentication) section of the [API Reference documentation](https://elevenlabs.io/docs/api-reference/text-to-speech).\n\nYou can set the `ELEVEN_API_KEY` environment variable to the API key copied from your profile page or you can pass the key directly to the CLI with the `--api-key` argument.\n\n# Usage\n\n> \"I refuse to answer that question on the grounds that I don't know the answer.\"\n\nDetailed usage information can be printed by running:\n\n```bash\n$ zaphodvox --help\n```\n\nSome examples using the following text file (`gone-bananas.txt`):\n\n```text\nThis is the first line of text.\nThis is the next line. By default, each line of text is sent to the API individually.\nThis is the last line. And this is a sentence that is split\nbetween lines. Ideally, these parts of the sentence would be on the same line.\n```\n\n```bash\n$ zaphodvox --encoder=google --voice-id=A --encode gone-bananas.txt\n```\n\nRunning the above command will encode the text file using the [Google Text-to-Speech API](https://cloud.google.com/text-to-speech/docs) with the `en-US-Wavenet-A` voice. This will result in the following fragment audio files being created in the working directory (one file for each line of text):\n\n```console\ngone-bananas-00000.wav [\"This is the first line...\"]\ngone-bananas-00001.wav [\"This is the next line. By default...\"]\ngone-bananas-00002.wav [\"This is the last line. And this is...\"]\ngone-bananas-00003.wav [\"between lines. Ideally, these...\"]\n```\n\nIn addition to the audio files, a manifest JSON file (`gone-bananas-manifest.json`) will also be written to the working directory. This file contains information about the fragment audio files encoded, including the text, the relative file name, and the voice used. This manifest file can also be used as input to the command rather than a text file. See the [manifest documentation](#manifest) for more information.\n\n## Concatenation\n\nTo combine the individual fragment audio files into one, add the `--concat` argument:\n\n```bash\n$ zaphodvox --encoder=google --voice-id=A --encode --concat gone-bananas.txt\n```\n\nNow only a single audio file, `gone-bananas.wav`, will be created in the working directory.\n\n## Cleaning\n\nNote that there isn't much silence between individual lines of the text file. To add a delay between lines, simply add an extra newline between each line of text. The easiest way to do this is to use the `--clean` option:\n\n```bash\n$ zaphodvox --clean gone-bananas.txt\n```\n\nThis command will convert the text file to plain text and attempt to add an extra newline between paragraphs as well as combine sentences that have been split between adjacent lines. It will output the newly \"cleaned\" text file as `gone-bananas-cleaned.txt`:\n\n```text\nThis is the first line of text. It should probably be a paragraph.\n\nThis is the next line. By default, each line of text is sent to the API individually.\n\nThis is the last line. And this is a sentence that is split between lines. Ideally, these parts of the sentence would be on the same line.\n```\n\nEncode this new file:\n\n```bash\n$ zaphodvox --encoder=google --voice-id=A --encode --concat gone-bananas-cleaned.txt\n```\n\nNotice that there's 500ms of silence (the default) generated for each empty newline and that the last sentence is no longer split between lines.\n\nThe `--clean` and `--encode` arguments can be combined in a single call:\n\n```bash\n$ zaphodvox --encoder=google --voice-id=A --clean --encode --concat gone-bananas.txt\n```\n\nIf the `--max-chars` argument is provided, the cleaning process will guarantee that every line is less than `max-chars` characters by splitting long lines at sentence boundaries.\n\nThe text file will be cleaned before being encoded and concatenated into `gone-bananas.wav`. The cleaned text file (i.e. `gone-bananas-cleaned.txt`) will still be created in the working directory.\n\n# Voice Configurations\n\n> \"I'm so great even I get tongue-tied talking to myself.\"\n\nMultiple voice configurations can be defined in a JSON file and loaded via the `--voices-file` argument.\n\nThis is an example voice configuration file (`voices.json`):\n\n```json\n{\n    \"voices\": {\n        \"Marvin\": {\n            \"google\": {\n                \"voice_id\": \"A\",\n                \"language\": \"en\",\n                \"region\": \"US\",\n                \"type\": \"Neural2\"\n            },\n            \"elevenlabs\": {\n                \"voice_id\": \"EXAVITQu4vr4xnSDxMaL\"\n            }\n        },\n        \"Ford\": {\n            \"google\": {\n                \"voice_id\": \"D\",\n                \"language\": \"en\",\n                \"region\": \"GB\",\n                \"type\": \"Wavenet\"\n            },\n            \"elevenlabs\": {\n                \"voice_id\": \"TxGEqnHWrfWFTfGW9XjX\"\n            }\n        },\n        \"Trillian\": {\n            \"google\": {\n                \"voice_id\": \"C\",\n                \"language\": \"en\",\n                \"region\": \"GB\",\n                \"type\": \"Wavenet\"\n            },\n            \"elevenlabs\": {\n                \"voice_id\": \"EeMfvkfxDAepqPVNPE8M\"\n            }\n        }\n    }\n}\n```\n\nIf a `--voices-file` is used, inline `ZVOX: [name]` tags in a text `inputfile` can specify the voice(s) to be used.\n\nA example multi-voice text file (`heart-of-gold.txt`):\n\n```text\nZVOX: Marvin\nThis text will be spoken by the Marvin voice defined in the voices JSON file. That is, it will use either the\nspecified Google voice (i.e. \"en-US-F-Wavenet\") or ElevenLabs voice (i.e. voice ID \"EXAVITQu4vr4xnSDxMaL\")\ndepending on which encoder is used.\n\nZVOX: Ford\nThis line will be spoken by the Ford voice. That is, it will use either the specified Google voice\n(i.e. \"en-US-D-Wavenet\") or ElevenLabs voice (i.e. voice ID \"TxGEqnHWrfWFTfGW9XjX\") depending on which\nencoder is used.\n\nThis paragraph will also be spoken by the Ford voice as it is still the \"current\" voice.\n\nZVOX: Trillian // This is a UK female voice\nFinally, this will be read by the Trillian voice. Note that text following a space after the voice name will be ignored.\nIf a line contains the \"ZVOX\" tag, it will not be synthesized to speech.\n```\n\n```bash\n$ zaphodvox --voices-file=voices.json --encoder=google --encode heart-of-gold.txt\n```\n\nThe above command will result in the following fragment audio files to be created in the working directory:\n\n```console\nheart-of-gold-00000.wav [\"This text will be spoken by the...\" using \"Marvin\" google voice]\nheart-of-gold-00001.wav [500ms silence]\nheart-of-gold-00002.wav [\"This line will be spoken by the...\" using \"Ford\" google voice]\nheart-of-gold-00003.wav [500ms silence]\nheart-of-gold-00004.wav [\"This paragraph will also...\" using \"Ford\" google voice]\nheart-of-gold-00005.wav [500ms silence]\nheart-of-gold-00006.wav [\"Finally, this will be read by the...\" using \"Trillian\" google voice]\nheart-of-gold-00007.wav [\"If a line contains the...\" using \"Trillian\" google voice]\n```\n\n## Voice Configurations\n\n> \"He preferred people to be puzzled rather than contemptuous.\"\n\nThe JSON voice configurations for both Google and ElevenLabs follow the CLI \"voice\" arguments fairly closely. One difference is that CLI defaults don't necessarily apply in all cases.\n\n### Google\n\nSee the Google documentation for [Supported Voices](https://cloud.google.com/text-to-speech/docs/voices), [Voice Selection Params](https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.VoiceSelectionParams), and [Audio Config](https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.AudioConfig) for more detailed information on the individual settings.\n\nThe required settings are `voice_id`, `language`, `region`, and `type`. All other settings will default to their underlying Google defaults.\n\nExample:\n\n```json\n{\n    \"voice_id\": \"D\",\n    \"language\": \"en\",\n    \"region\": \"GB\",\n    \"type\": \"Wavenet\",\n    \"speaking_rate\": null,\n    \"pitch\": null,\n    \"volume_gain_db\": null,\n    \"sample_rate_hertz\": null,\n    \"effects_profile_id\": null\n}\n```\n\n### ElevenLabs\n\nSee the ElevenLabs documentation for [Voice Lab](https://elevenlabs.io/docs/voicelab/overview), [Models](https://elevenlabs.io/docs/speech-synthesis/models), and [Voice Settings](https://elevenlabs.io/docs/speech-synthesis/voice-settings) for more detailed information on the individual settings.\n\nThe only required setting is `voice_id`. All other settings will default to the underlying ElevenLabs defaults for the selected voice.\n\nExample:\n\n```json\n{\n    \"voice_id\": \"EXAVITQu4vr4xnSDxMaL\",\n    \"model\": null,\n    \"stability\": null,\n    \"similarity_boost\": null,\n    \"style\": null,\n    \"use_speaker_boost\": null\n}\n```\n\n# Manifest\n\n> \u201cIf I ever meet myself, I'll hit myself so hard I won't know what's hit me.\u201d\n\nA manifest JSON file is created during the encoding process and can be used as the inputfile instead of a text file. If a manifest file is used, the fragment audio files to be re-encoded can be specified with the `--manifest-indexes` argument.\n\nFor example, consider this simple text file (`towel.txt`):\n\n```text\nDon't forget your towel.\nAnd always know where it is.\n```\n\nThe file is encoded with this command:\n\n```bash\n$ zaphodvox --encoder=google --voice-id=A --encode --copy towel.txt\n```\n\nThree files will be created in the current working directory: the two fragment audio files (`towel-00000.wav` and `towel-00001.wav`) and the manifest file (`towel-manifest.json`).\n\nHere are the contents of `towel-manifest.json`:\n\n```json\n\"fragments\":\n[\n    {\n        \"text\": \"Don't forget your towel.\",\n        \"filename\": \"towel-00000.wav\",\n        \"voice\":\n        {\n            \"voice_id\": \"A\",\n            \"language\": \"en\",\n            \"region\": \"US\",\n            \"type\": \"Wavenet\"\n        },\n        \"silence_duration\": null,\n        \"encoder\": \"google\",\n        \"audio_format\": \"linear16\"\n    },\n    {\n        \"text\": \"And always know where it is.\",\n        \"filename\": \"towel-00001.wav\",\n        \"voice\":\n        {\n            \"voice_id\": \"A\",\n            \"language\": \"en\",\n            \"region\": \"US\",\n            \"type\": \"Wavenet\"\n        },\n        \"silence_duration\": null,\n        \"encoder\": \"google\",\n        \"audio_format\": \"linear16\"\n    }\n]\n```\n\nNote that the fragment `filename`s are relative to the manifest file's location.\n\nChanges can be made to fragment `text`, `filename`, or `voice` items and the modified manifest file can be used to re-encode only the changes by specifying which fragment indexes to encode.\n\nFor example, if the second fragment's `text` field is modified to remove the initial \"And\", this command will re-encode only the second fragment audio file in place (i.e. `towel-00001.wav`):\n\n```bash\n$ zaphodvox --encoder=google --encode --manifest-indexes=1 towel-manifest.json\n```\n\nThe `--concat` argument can also be added when using a manifest file as input. In this case, an attempt will be made to concatenate both the unmodified and newly encoded fragment audio files into one. If any of the files specified in the manifest is missing, the concatenation will fail.\n\nA manifest plan can be created from a text file using the `--plan` argument:\n\n```bash\n$ zaphodvox --encoder=google --voice-id=A --plan gone-bananas.txt\n```\n\nThe above command will write the manifest plan to `gone-bananas-plan.json` without doing any encoding. It can be reviewed and edited before being used as input to the command with the `--encode` argument.\n\nBy default, each line of the text file is encoded into its own audio fragment. If the `--max-chars` argument is provided when generating a plan manifest or encoding a text file, the planning process will attempt to combine multiple lines of text per audio fragment, up to `max-chars` characters. Larger fragments often encode with better results.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A command-line interface and python library for encoding text into synthetic speech using Google Cloud Text-To-Speech or ElevenLabs APIs.",
    "version": "1.2.0",
    "project_urls": {
        "Homepage": "https://github.com/gumptionthomas/zaphodvox",
        "Source": "https://github.com/gumptionthomas/zaphodvox",
        "Tracker": "https://github.com/gumptionthomas/zaphodvox/issues"
    },
    "split_keywords": [
        "audio",
        "elevenlabs",
        "google-cloud-texttospeech",
        "mp3",
        "ogg",
        "text-to-speech",
        "tts",
        "wav"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6877b19f3210fc6ea4f119c65d5228dca7450c1c6500016c01833c910d9d6bfc",
                "md5": "5579d83709de529dd24215a562a249fd",
                "sha256": "4190e1cbcf99a9368c368fdf15c93d3e74f12de911e9ff8d5d2f00df3597fd71"
            },
            "downloads": -1,
            "filename": "zaphodvox-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5579d83709de529dd24215a562a249fd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 26073,
            "upload_time": "2024-02-02T18:31:10",
            "upload_time_iso_8601": "2024-02-02T18:31:10.230877Z",
            "url": "https://files.pythonhosted.org/packages/68/77/b19f3210fc6ea4f119c65d5228dca7450c1c6500016c01833c910d9d6bfc/zaphodvox-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5bcdca9cfb5058778e97cf0ce7141aebe27bfef61159a7d210c6c963084b516b",
                "md5": "bc77d12406c8d707547dc3179cb7106c",
                "sha256": "406d3c093b7501ebc1fc700124bcdea8fbfdda8b3a125e3d7b5c219d84709532"
            },
            "downloads": -1,
            "filename": "zaphodvox-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "bc77d12406c8d707547dc3179cb7106c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 20464,
            "upload_time": "2024-02-02T18:31:12",
            "upload_time_iso_8601": "2024-02-02T18:31:12.009134Z",
            "url": "https://files.pythonhosted.org/packages/5b/cd/ca9cfb5058778e97cf0ce7141aebe27bfef61159a7d210c6c963084b516b/zaphodvox-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-02 18:31:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "gumptionthomas",
    "github_project": "zaphodvox",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "coverage",
            "specs": [
                [
                    "==",
                    "7.4.0"
                ]
            ]
        },
        {
            "name": "elevenlabs",
            "specs": [
                [
                    "==",
                    "0.2.27"
                ]
            ]
        },
        {
            "name": "google-cloud-texttospeech",
            "specs": [
                [
                    "==",
                    "2.15.1"
                ]
            ]
        },
        {
            "name": "hatch",
            "specs": [
                [
                    "==",
                    "1.9.2"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    "==",
                    "2.5.3"
                ]
            ]
        },
        {
            "name": "pydub",
            "specs": [
                [
                    "==",
                    "0.25.1"
                ]
            ]
        },
        {
            "name": "pydub-stubs",
            "specs": [
                [
                    "==",
                    "0.25.1.1"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "7.4.4"
                ]
            ]
        },
        {
            "name": "pytest-cov",
            "specs": [
                [
                    "==",
                    "4.1.0"
                ]
            ]
        },
        {
            "name": "pytest-watch",
            "specs": [
                [
                    "==",
                    "4.2.0"
                ]
            ]
        },
        {
            "name": "rich",
            "specs": [
                [
                    "==",
                    "13.7.0"
                ]
            ]
        },
        {
            "name": "ruff",
            "specs": [
                [
                    "==",
                    "0.1.14"
                ]
            ]
        },
        {
            "name": "tenacity",
            "specs": [
                [
                    "==",
                    "8.2.3"
                ]
            ]
        },
        {
            "name": "Unidecode",
            "specs": [
                [
                    "==",
                    "1.3.8"
                ]
            ]
        }
    ],
    "lcname": "zaphodvox"
}