Name | wyoming JSON |
Version |
1.7.2
JSON |
| download |
home_page | None |
Summary | Peer-to-peer protocol for voice assistants |
upload_time | 2025-08-04 20:50:56 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | MIT |
keywords |
voice
assistant
protocol
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Wyoming Protocol
A peer-to-peer protocol for voice assistants (basically [JSONL](https://jsonlines.org/) + PCM audio)
``` text
{ "type": "...", "data": { ... }, "data_length": ..., "payload_length": ... }\n
<data_length bytes (optional)>
<payload_length bytes (optional)>
```
Used in [Rhasspy](https://github.com/rhasspy/rhasspy3/) and [Home Assistant](https://www.home-assistant.io/integrations/wyoming) for communication with voice services.
[](https://www.openhomefoundation.org/)
## Wyoming Projects
* Voice satellites
* [Satellite](https://github.com/rhasspy/wyoming-satellite) for Home Assistant
* Audio input/output
* [mic-external](https://github.com/rhasspy/wyoming-mic-external)
* [snd-external](https://github.com/rhasspy/wyoming-snd-external)
* [SDL2](https://github.com/rhasspy/wyoming-sdl2)
* Wake word detection
* [openWakeWord](https://github.com/rhasspy/wyoming-openwakeword)
* [porcupine1](https://github.com/rhasspy/wyoming-porcupine1)
* [snowboy](https://github.com/rhasspy/wyoming-snowboy)
* [microWakeWord](https://github.com/rhasspy/wyoming-microwakeword)
* Speech-to-text
* [Faster Whisper](https://github.com/rhasspy/wyoming-faster-whisper)
* [Vosk](https://github.com/rhasspy/wyoming-vosk)
* [Whisper.cpp](https://github.com/rhasspy/wyoming-whisper-cpp)
* Text-to-speech
* [Piper](https://github.com/rhasspy/wyoming-piper)
* Intent handling
* [handle-external](https://github.com/rhasspy/wyoming-handle-external)
## Format
1. A JSON object header as a single line with `\n` (UTF-8, required)
* `type` - event type (string, required)
* `data` - event data (object, optional)
* `data_length` - bytes of additional data (int, optional)
* `payload_length` - bytes of binary payload (int, optional)
2. Additional data (UTF-8, optional)
* JSON object with additional event-specific data
* Merged on top of header `data`
* Exactly `data_length` bytes long
* Immediately follows header `\n`
3. Payload
* Typically PCM audio but can be any binary data
* Exactly `payload_length` bytes long
* Immediately follows additional data or header `\n` if no additional data
## Event Types
Available events with `type` and fields.
### Audio
Send raw audio and indicate begin/end of audio streams.
* `audio-chunk` - chunk of raw PCM audio
* `rate` - sample rate in hertz (int, required)
* `width` - sample width in bytes (int, required)
* `channels` - number of channels (int, required)
* `timestamp` - timestamp of audio chunk in milliseconds (int, optional)
* Payload is raw PCM audio samples
* `audio-start` - start of an audio stream
* `rate` - sample rate in hertz (int, required)
* `width` - sample width in bytes (int, required)
* `channels` - number of channels (int, required)
* `timestamp` - timestamp in milliseconds (int, optional)
* `audio-stop` - end of an audio stream
* `timestamp` - timestamp in milliseconds (int, optional)
### Info
Describe available services.
* `describe` - request for available voice services
* `info` - response describing available voice services
* `asr` - list speech recognition services (optional)
* `models` - list of available models (required)
* `name` - unique name (required)
* `languages` - supported languages by model (list of string, required)
* `attribution` (required)
* `name` - name of creator (required)
* `url` - URL of creator (required)
* `installed` - true if currently installed (bool, required)
* `description` - human-readable description (string, optional)
* `version` - version of the model (string, optional)
* `supports_transcript_streaming` - true if program can stream transcript chunks
* `tts` - list text to speech services (optional)
* `models` - list of available models
* `name` - unique name (required)
* `languages` - supported languages by model (list of string, required)
* `speakers` - list of speakers (optional)
* `name` - unique name of speaker (required)
* `attribution` (required)
* `name` - name of creator (required)
* `url` - URL of creator (required)
* `installed` - true if currently installed (bool, required)
* `description` - human-readable description (string, optional)
* `version` - version of the model (string, optional)
* `supports_synthesize_streaming` - true if program can stream text chunks
* `wake` - list wake word detection services( optional )
* `models` - list of available models (required)
* `name` - unique name (required)
* `languages` - supported languages by model (list of string, required)
* `attribution` (required)
* `name` - name of creator (required)
* `url` - URL of creator (required)
* `installed` - true if currently installed (bool, required)
* `description` - human-readable description (string, optional)
* `version` - version of the model (string, optional)
* `handle` - list intent handling services (optional)
* `models` - list of available models (required)
* `name` - unique name (required)
* `languages` - supported languages by model (list of string, required)
* `attribution` (required)
* `name` - name of creator (required)
* `url` - URL of creator (required)
* `installed` - true if currently installed (bool, required)
* `description` - human-readable description (string, optional)
* `version` - version of the model (string, optional)
* `supports_handled_streaming` - true if program can stream response chunks
* `intent` - list intent recognition services (optional)
* `models` - list of available models (required)
* `name` - unique name (required)
* `languages` - supported languages by model (list of string, required)
* `attribution` (required)
* `name` - name of creator (required)
* `url` - URL of creator (required)
* `installed` - true if currently installed (bool, required)
* `description` - human-readable description (string, optional)
* `version` - version of the model (string, optional)
* `satellite` - information about voice satellite (optional)
* `area` - name of area where satellite is located (string, optional)
* `has_vad` - true if the end of voice commands will be detected locally (boolean, optional)
* `active_wake_words` - list of wake words that are actively being listend for (list of string, optional)
* `max_active_wake_words` - maximum number of local wake words that can be run simultaneously (number, optional)
* `supports_trigger` - true if satellite supports remotely-triggered pipelines
* `mic` - list of audio input services (optional)
* `mic_format` - audio input format (required)
* `rate` - sample rate in hertz (int, required)
* `width` - sample width in bytes (int, required)
* `channels` - number of channels (int, required)
* `snd` - list of audio output services (optional)
* `snd_format` - audio output format (required)
* `rate` - sample rate in hertz (int, required)
* `width` - sample width in bytes (int, required)
* `channels` - number of channels (int, required)
### Speech Recognition
Transcribe audio into text.
* `transcribe` - request to transcribe an audio stream
* `name` - name of model to use (string, optional)
* `language` - language of spoken audio (string, optional)
* `context` - context from previous interactions (object, optional)
* `transcript` - response with transcription
* `text` - text transcription of spoken audio (string, required)
* `language` - language of transcript (string, optional)
* `context` - context for next interaction (object, optional)
Streaming:
1. `transcript-start` - starts stream
* `language` - language of transcript (string, optional)
* `context` - context from previous interactions (object, optional)
2. `transcript-chunk`
* `text` - part of transcript (string, required)
3. Original `transcript` event must be sent for backwards compatibility
4. `transcript-stop` - end of stream
### Text to Speech
Synthesize audio from text.
* `synthesize` - request to generate audio from text
* `text` - text to speak (string, required)
* `voice` - use a specific voice (optional)
* `name` - name of voice (string, optional)
* `language` - language of voice (string, optional)
* `speaker` - speaker of voice (string, optional)
Streaming:
1. `synthesize-start` - starts stream
* `context` - context from previous interactions (object, optional)
* `voice` - use a specific voice (optional)
* `name` - name of voice (string, optional)
* `language` - language of voice (string, optional)
* `speaker` - speaker of voice (string, optional)
2. `synthesize-chunk`
* `text` - part of text to synthesize (string, required)
3. Original `synthesize` message must be sent for backwards compatibility
4. `synthesize-stop` - end of stream, final audio must be sent
5. `synthesize-stopped` - sent back to server after final audio
### Wake Word
Detect wake words in an audio stream.
* `detect` - request detection of specific wake word(s)
* `names` - wake word names to detect (list of string, optional)
* `detection` - response when detection occurs
* `name` - name of wake word that was detected (int, optional)
* `timestamp` - timestamp of audio chunk in milliseconds when detection occurred (int optional)
* `not-detected` - response when audio stream ends without a detection
### Voice Activity Detection
Detects speech and silence in an audio stream.
* `voice-started` - user has started speaking
* `timestamp` - timestamp of audio chunk when speaking started in milliseconds (int, optional)
* `voice-stopped` - user has stopped speaking
* `timestamp` - timestamp of audio chunk when speaking stopped in milliseconds (int, optional)
### Intent Recognition
Recognizes intents from text.
* `recognize` - request to recognize an intent from text
* `text` - text to recognize (string, required)
* `context` - context from previous interactions (object, optional)
* `intent` - response with recognized intent
* `name` - name of intent (string, required)
* `entities` - list of entities (optional)
* `name` - name of entity (string, required)
* `value` - value of entity (any, optional)
* `text` - response for user (string, optional)
* `context` - context for next interactions (object, optional)
* `not-recognized` - response indicating no intent was recognized
* `text` - response for user (string, optional)
* `context` - context for next interactions (object, optional)
### Intent Handling
Handle structured intents or text directly.
* `handled` - response when intent was successfully handled
* `text` - response for user (string, optional)
* `context` - context for next interactions (object, optional)
* `not-handled` - response when intent was not handled
* `text` - response for user (string, optional)
* `context` - context for next interactions (object, optional)
Streaming:
1. `handled-start` - starts stream
* `context` - context from previous interactions (object, optional)
2. `handled-chunk`
* `text` - part of response (string, required)
3. Original `handled` message must be sent for backwards compatibility
4. `handled-stop` - end of stream
### Audio Output
Play audio stream.
* `played` - response when audio finishes playing
### Voice Satellite
Control of one or more remote voice satellites connected to a central server.
* `run-satellite` - informs satellite that server is ready to run pipelines
* `pause-satellite` - informs satellite that server is not ready anymore to run pipelines
* `satellite-connected` - satellite has connected to the server
* `satellite-disconnected` - satellite has been disconnected from the server
* `streaming-started` - satellite has started streaming audio to the server
* `streaming-stopped` - satellite has stopped streaming audio to the server
Pipelines are run on the server, but can be triggered remotely from the server as well.
* `run-pipeline` - runs a pipeline on the server or asks the satellite to run it when possible
* `start_stage` - pipeline stage to start at (string, required)
* `end_stage` - pipeline stage to end at (string, required)
* `wake_word_name` - name of detected wake word that started this pipeline (string, optional)
* From client only
* `wake_word_names` - names of wake words to listen for (list of string, optional)
* From server only
* `start_stage` must be "wake"
* `announce_text` - text to speak on the satellite
* From server only
* `start_stage` must be "tts"
* `restart_on_end` - true if the server should re-run the pipeline after it ends (boolean, default is false)
* Only used for always-on streaming satellites
### Timers
* `timer-started` - a new timer has started
* `id` - unique id of timer (string, required)
* `total_seconds` - number of seconds the timer should run for (int, required)
* `name` - user-provided name for timer (string, optional)
* `start_hours` - hours the timer should run for as spoken by user (int, optional)
* `start_minutes` - minutes the timer should run for as spoken by user (int, optional)
* `start_seconds` - seconds the timer should run for as spoken by user (int, optional)
* `command` - optional command that the server will execute when the timer is finished
* `text` - text of command to execute (string, required)
* `language` - language of the command (string, optional)
* `timer-updated` - timer has been paused/resumed or time has been added/removed
* `id` - unique id of timer (string, required)
* `is_active` - true if timer is running, false if paused (bool, required)
* `total_seconds` - number of seconds that the timer should run for now (int, required)
* `timer-cancelled` - timer was cancelled
* `id` - unique id of timer (string, required)
* `timer-finished` - timer finished without being cancelled
* `id` - unique id of timer (string, required)
## Event Flow
* → is an event from client to server
* ← is an event from server to client
### Service Description
1. → `describe` (required)
2. ← `info` (required)
### Speech to Text
1. → `transcribe` event with `name` of model to use or `language` (optional)
2. → `audio-start` (required)
3. → `audio-chunk` (required)
* Send audio chunks until silence is detected
4. → `audio-stop` (required)
5. ← `transcript` (required)
* Contains text transcription of spoken audio
Streaming:
1. → `transcribe` event (optional)
2. → `audio-start` (required)
3. → `audio-chunk` (required)
* Send audio chunks until silence is detected
4. ← `transcript-start` (required)
5. ← `transcript-chunk` (required)
* Send transcript chunks as they're produced
6. → `audio-stop` (required)
7. ← `transcript` (required)
* Sent for backwards compatibility
8. ← `transcript-stop` (required)
### Text to Speech
1. → `synthesize` event with `text` (required)
2. ← `audio-start`
3. ← `audio-chunk`
* One or more audio chunks
4. ← `audio-stop`
Streaming:
1. → `synthesize-start` event (required)
3. → `synthesize-chunk` event (required)
* Text chunks are sent as they're produced
3. ← `audio-start`, `audio-chunk` (one or more), `audio-stop`
* Audio chunks are sent as they're produced with start/stop
4. → `synthesize` event
* Sent for backwards compatibility
5. → `synthesize-stop` event
* End of text stream
6. ← Final audio must be sent
* `audio-start`, `audio-chunk` (one or more), `audio-stop`
7. ← `synthesize-stopped`
* Tells server that final audio has been sent
### Wake Word Detection
1. → `detect` event with `names` of wake words to detect (optional)
2. → `audio-start` (required)
3. → `audio-chunk` (required)
* Keep sending audio chunks until a `detection` is received
4. ← `detection`
* Sent for each wake word detection
5. → `audio-stop` (optional)
* Manually end audio stream
6. ← `not-detected`
* Sent after `audio-stop` if no detections occurred
### Voice Activity Detection
1. → `audio-chunk` (required)
* Send audio chunks until silence is detected
2. ← `voice-started`
* When speech starts
3. ← `voice-stopped`
* When speech stops
### Intent Recognition
1. → `recognize` (required)
2. ← `intent` if successful
3. ← `not-recognized` if not successful
### Intent Handling
For structured intents:
1. → `intent` (required)
2. ← `handled` if successful
3. ← `not-handled` if not successful
For text only:
1. → `transcript` with `text` to handle (required)
2. ← `handled` if successful
3. ← `not-handled` if not successful
Streaming text only (successful):
1. → `transcript` with `text` to handle (required)
2. ← `handled-start` (required)
3. ← `handled-chunk` (required)
* Chunk of response text
4. ← `handled` (required)
* Sent for backwards compatibility
5. ← `handled-stop` (required)
### Audio Output
1. → `audio-start` (required)
2. → `audio-chunk` (required)
* One or more audio chunks
3. → `audio-stop` (required)
4. ← `played`
Raw data
{
"_id": null,
"home_page": null,
"name": "wyoming",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "voice, assistant, protocol",
"author": null,
"author_email": "Michael Hansen <mike@rhasspy.org>",
"download_url": "https://files.pythonhosted.org/packages/ca/f6/01ff030ae3a6c198596d9fb23600757e3d59f82ac8ffbbf0fbcda6222ecf/wyoming-1.7.2.tar.gz",
"platform": null,
"description": "# Wyoming Protocol\n\nA peer-to-peer protocol for voice assistants (basically [JSONL](https://jsonlines.org/) + PCM audio)\n\n``` text\n{ \"type\": \"...\", \"data\": { ... }, \"data_length\": ..., \"payload_length\": ... }\\n\n<data_length bytes (optional)>\n<payload_length bytes (optional)>\n```\n\nUsed in [Rhasspy](https://github.com/rhasspy/rhasspy3/) and [Home Assistant](https://www.home-assistant.io/integrations/wyoming) for communication with voice services.\n\n[](https://www.openhomefoundation.org/)\n\n## Wyoming Projects\n\n* Voice satellites\n * [Satellite](https://github.com/rhasspy/wyoming-satellite) for Home Assistant \n* Audio input/output\n * [mic-external](https://github.com/rhasspy/wyoming-mic-external)\n * [snd-external](https://github.com/rhasspy/wyoming-snd-external)\n * [SDL2](https://github.com/rhasspy/wyoming-sdl2)\n* Wake word detection\n * [openWakeWord](https://github.com/rhasspy/wyoming-openwakeword)\n * [porcupine1](https://github.com/rhasspy/wyoming-porcupine1)\n * [snowboy](https://github.com/rhasspy/wyoming-snowboy)\n * [microWakeWord](https://github.com/rhasspy/wyoming-microwakeword)\n* Speech-to-text\n * [Faster Whisper](https://github.com/rhasspy/wyoming-faster-whisper)\n * [Vosk](https://github.com/rhasspy/wyoming-vosk)\n * [Whisper.cpp](https://github.com/rhasspy/wyoming-whisper-cpp)\n* Text-to-speech\n * [Piper](https://github.com/rhasspy/wyoming-piper)\n* Intent handling\n * [handle-external](https://github.com/rhasspy/wyoming-handle-external)\n\n## Format\n\n1. A JSON object header as a single line with `\\n` (UTF-8, required)\n * `type` - event type (string, required)\n * `data` - event data (object, optional)\n * `data_length` - bytes of additional data (int, optional)\n * `payload_length` - bytes of binary payload (int, optional)\n2. Additional data (UTF-8, optional)\n * JSON object with additional event-specific data\n * Merged on top of header `data`\n * Exactly `data_length` bytes long\n * Immediately follows header `\\n`\n3. Payload\n * Typically PCM audio but can be any binary data\n * Exactly `payload_length` bytes long\n * Immediately follows additional data or header `\\n` if no additional data\n\n\n## Event Types\n\nAvailable events with `type` and fields.\n\n### Audio\n\nSend raw audio and indicate begin/end of audio streams.\n\n* `audio-chunk` - chunk of raw PCM audio\n * `rate` - sample rate in hertz (int, required)\n * `width` - sample width in bytes (int, required)\n * `channels` - number of channels (int, required)\n * `timestamp` - timestamp of audio chunk in milliseconds (int, optional)\n * Payload is raw PCM audio samples\n* `audio-start` - start of an audio stream\n * `rate` - sample rate in hertz (int, required)\n * `width` - sample width in bytes (int, required)\n * `channels` - number of channels (int, required)\n * `timestamp` - timestamp in milliseconds (int, optional)\n* `audio-stop` - end of an audio stream\n * `timestamp` - timestamp in milliseconds (int, optional)\n \n \n### Info\n\nDescribe available services.\n\n* `describe` - request for available voice services\n* `info` - response describing available voice services\n * `asr` - list speech recognition services (optional)\n * `models` - list of available models (required)\n * `name` - unique name (required)\n * `languages` - supported languages by model (list of string, required)\n * `attribution` (required)\n * `name` - name of creator (required)\n * `url` - URL of creator (required)\n * `installed` - true if currently installed (bool, required)\n * `description` - human-readable description (string, optional)\n * `version` - version of the model (string, optional)\n * `supports_transcript_streaming` - true if program can stream transcript chunks\n * `tts` - list text to speech services (optional)\n * `models` - list of available models\n * `name` - unique name (required)\n * `languages` - supported languages by model (list of string, required)\n * `speakers` - list of speakers (optional)\n * `name` - unique name of speaker (required)\n * `attribution` (required)\n * `name` - name of creator (required)\n * `url` - URL of creator (required)\n * `installed` - true if currently installed (bool, required)\n * `description` - human-readable description (string, optional)\n * `version` - version of the model (string, optional)\n * `supports_synthesize_streaming` - true if program can stream text chunks\n * `wake` - list wake word detection services( optional )\n * `models` - list of available models (required)\n * `name` - unique name (required)\n * `languages` - supported languages by model (list of string, required)\n * `attribution` (required)\n * `name` - name of creator (required)\n * `url` - URL of creator (required)\n * `installed` - true if currently installed (bool, required)\n * `description` - human-readable description (string, optional)\n * `version` - version of the model (string, optional)\n * `handle` - list intent handling services (optional)\n * `models` - list of available models (required)\n * `name` - unique name (required)\n * `languages` - supported languages by model (list of string, required)\n * `attribution` (required)\n * `name` - name of creator (required)\n * `url` - URL of creator (required)\n * `installed` - true if currently installed (bool, required)\n * `description` - human-readable description (string, optional)\n * `version` - version of the model (string, optional)\n * `supports_handled_streaming` - true if program can stream response chunks\n * `intent` - list intent recognition services (optional)\n * `models` - list of available models (required)\n * `name` - unique name (required)\n * `languages` - supported languages by model (list of string, required)\n * `attribution` (required)\n * `name` - name of creator (required)\n * `url` - URL of creator (required)\n * `installed` - true if currently installed (bool, required)\n * `description` - human-readable description (string, optional)\n * `version` - version of the model (string, optional)\n * `satellite` - information about voice satellite (optional)\n * `area` - name of area where satellite is located (string, optional)\n * `has_vad` - true if the end of voice commands will be detected locally (boolean, optional)\n * `active_wake_words` - list of wake words that are actively being listend for (list of string, optional)\n * `max_active_wake_words` - maximum number of local wake words that can be run simultaneously (number, optional)\n * `supports_trigger` - true if satellite supports remotely-triggered pipelines\n * `mic` - list of audio input services (optional)\n * `mic_format` - audio input format (required)\n * `rate` - sample rate in hertz (int, required)\n * `width` - sample width in bytes (int, required)\n * `channels` - number of channels (int, required)\n * `snd` - list of audio output services (optional)\n * `snd_format` - audio output format (required)\n * `rate` - sample rate in hertz (int, required)\n * `width` - sample width in bytes (int, required)\n * `channels` - number of channels (int, required)\n \n### Speech Recognition\n\nTranscribe audio into text.\n\n* `transcribe` - request to transcribe an audio stream\n * `name` - name of model to use (string, optional)\n * `language` - language of spoken audio (string, optional)\n * `context` - context from previous interactions (object, optional)\n* `transcript` - response with transcription\n * `text` - text transcription of spoken audio (string, required)\n * `language` - language of transcript (string, optional)\n * `context` - context for next interaction (object, optional)\n\nStreaming:\n\n1. `transcript-start` - starts stream\n * `language` - language of transcript (string, optional)\n * `context` - context from previous interactions (object, optional)\n2. `transcript-chunk`\n * `text` - part of transcript (string, required)\n3. Original `transcript` event must be sent for backwards compatibility\n4. `transcript-stop` - end of stream\n\n### Text to Speech\n\nSynthesize audio from text.\n\n* `synthesize` - request to generate audio from text\n * `text` - text to speak (string, required)\n * `voice` - use a specific voice (optional)\n * `name` - name of voice (string, optional)\n * `language` - language of voice (string, optional)\n * `speaker` - speaker of voice (string, optional)\n \nStreaming:\n\n1. `synthesize-start` - starts stream\n * `context` - context from previous interactions (object, optional)\n * `voice` - use a specific voice (optional)\n * `name` - name of voice (string, optional)\n * `language` - language of voice (string, optional)\n * `speaker` - speaker of voice (string, optional)\n2. `synthesize-chunk`\n * `text` - part of text to synthesize (string, required)\n3. Original `synthesize` message must be sent for backwards compatibility\n4. `synthesize-stop` - end of stream, final audio must be sent\n5. `synthesize-stopped` - sent back to server after final audio\n \n### Wake Word\n\nDetect wake words in an audio stream.\n\n* `detect` - request detection of specific wake word(s)\n * `names` - wake word names to detect (list of string, optional)\n* `detection` - response when detection occurs\n * `name` - name of wake word that was detected (int, optional)\n * `timestamp` - timestamp of audio chunk in milliseconds when detection occurred (int optional)\n* `not-detected` - response when audio stream ends without a detection\n\n### Voice Activity Detection\n\nDetects speech and silence in an audio stream.\n\n* `voice-started` - user has started speaking\n * `timestamp` - timestamp of audio chunk when speaking started in milliseconds (int, optional)\n* `voice-stopped` - user has stopped speaking\n * `timestamp` - timestamp of audio chunk when speaking stopped in milliseconds (int, optional)\n \n### Intent Recognition\n\nRecognizes intents from text.\n\n* `recognize` - request to recognize an intent from text\n * `text` - text to recognize (string, required)\n * `context` - context from previous interactions (object, optional)\n* `intent` - response with recognized intent\n * `name` - name of intent (string, required)\n * `entities` - list of entities (optional)\n * `name` - name of entity (string, required)\n * `value` - value of entity (any, optional)\n * `text` - response for user (string, optional)\n * `context` - context for next interactions (object, optional)\n* `not-recognized` - response indicating no intent was recognized\n * `text` - response for user (string, optional)\n * `context` - context for next interactions (object, optional)\n\n### Intent Handling\n\nHandle structured intents or text directly.\n\n* `handled` - response when intent was successfully handled\n * `text` - response for user (string, optional)\n * `context` - context for next interactions (object, optional)\n* `not-handled` - response when intent was not handled\n * `text` - response for user (string, optional)\n * `context` - context for next interactions (object, optional)\n\nStreaming:\n\n1. `handled-start` - starts stream\n * `context` - context from previous interactions (object, optional)\n2. `handled-chunk`\n * `text` - part of response (string, required)\n3. Original `handled` message must be sent for backwards compatibility\n4. `handled-stop` - end of stream\n\n### Audio Output\n\nPlay audio stream.\n\n* `played` - response when audio finishes playing\n\n### Voice Satellite\n\nControl of one or more remote voice satellites connected to a central server.\n\n* `run-satellite` - informs satellite that server is ready to run pipelines\n* `pause-satellite` - informs satellite that server is not ready anymore to run pipelines\n* `satellite-connected` - satellite has connected to the server\n* `satellite-disconnected` - satellite has been disconnected from the server\n* `streaming-started` - satellite has started streaming audio to the server\n* `streaming-stopped` - satellite has stopped streaming audio to the server\n\nPipelines are run on the server, but can be triggered remotely from the server as well.\n\n* `run-pipeline` - runs a pipeline on the server or asks the satellite to run it when possible\n * `start_stage` - pipeline stage to start at (string, required)\n * `end_stage` - pipeline stage to end at (string, required)\n * `wake_word_name` - name of detected wake word that started this pipeline (string, optional)\n * From client only\n * `wake_word_names` - names of wake words to listen for (list of string, optional)\n * From server only\n * `start_stage` must be \"wake\"\n * `announce_text` - text to speak on the satellite\n * From server only\n * `start_stage` must be \"tts\"\n * `restart_on_end` - true if the server should re-run the pipeline after it ends (boolean, default is false)\n * Only used for always-on streaming satellites\n\n### Timers\n\n* `timer-started` - a new timer has started\n * `id` - unique id of timer (string, required)\n * `total_seconds` - number of seconds the timer should run for (int, required)\n * `name` - user-provided name for timer (string, optional)\n * `start_hours` - hours the timer should run for as spoken by user (int, optional)\n * `start_minutes` - minutes the timer should run for as spoken by user (int, optional)\n * `start_seconds` - seconds the timer should run for as spoken by user (int, optional)\n * `command` - optional command that the server will execute when the timer is finished\n * `text` - text of command to execute (string, required)\n * `language` - language of the command (string, optional)\n* `timer-updated` - timer has been paused/resumed or time has been added/removed\n * `id` - unique id of timer (string, required)\n * `is_active` - true if timer is running, false if paused (bool, required)\n * `total_seconds` - number of seconds that the timer should run for now (int, required)\n* `timer-cancelled` - timer was cancelled\n * `id` - unique id of timer (string, required)\n* `timer-finished` - timer finished without being cancelled\n * `id` - unique id of timer (string, required)\n\n## Event Flow\n\n* → is an event from client to server\n* ← is an event from server to client\n\n\n### Service Description\n\n1. → `describe` (required) \n2. ← `info` (required)\n\n\n### Speech to Text\n\n1. → `transcribe` event with `name` of model to use or `language` (optional)\n2. → `audio-start` (required)\n3. → `audio-chunk` (required)\n * Send audio chunks until silence is detected\n4. → `audio-stop` (required)\n5. ← `transcript` (required)\n * Contains text transcription of spoken audio\n \nStreaming:\n\n1. → `transcribe` event (optional)\n2. → `audio-start` (required)\n3. → `audio-chunk` (required)\n * Send audio chunks until silence is detected\n4. ← `transcript-start` (required)\n5. ← `transcript-chunk` (required)\n * Send transcript chunks as they're produced\n6. → `audio-stop` (required)\n7. ← `transcript` (required)\n * Sent for backwards compatibility\n8. ← `transcript-stop` (required)\n\n\n### Text to Speech\n\n1. → `synthesize` event with `text` (required)\n2. ← `audio-start`\n3. ← `audio-chunk`\n * One or more audio chunks\n4. ← `audio-stop`\n\nStreaming:\n\n1. → `synthesize-start` event (required)\n3. → `synthesize-chunk` event (required)\n * Text chunks are sent as they're produced\n3. ← `audio-start`, `audio-chunk` (one or more), `audio-stop`\n * Audio chunks are sent as they're produced with start/stop\n4. → `synthesize` event\n * Sent for backwards compatibility\n5. → `synthesize-stop` event\n * End of text stream\n6. ← Final audio must be sent\n * `audio-start`, `audio-chunk` (one or more), `audio-stop`\n7. ← `synthesize-stopped`\n * Tells server that final audio has been sent\n\n### Wake Word Detection\n\n1. → `detect` event with `names` of wake words to detect (optional)\n2. → `audio-start` (required)\n3. → `audio-chunk` (required)\n * Keep sending audio chunks until a `detection` is received\n4. ← `detection`\n * Sent for each wake word detection \n5. → `audio-stop` (optional)\n * Manually end audio stream\n6. ← `not-detected`\n * Sent after `audio-stop` if no detections occurred\n \n### Voice Activity Detection\n\n1. → `audio-chunk` (required)\n * Send audio chunks until silence is detected\n2. ← `voice-started`\n * When speech starts\n3. ← `voice-stopped`\n * When speech stops\n \n### Intent Recognition\n\n1. → `recognize` (required)\n2. ← `intent` if successful\n3. ← `not-recognized` if not successful\n\n### Intent Handling\n\nFor structured intents:\n\n1. → `intent` (required)\n2. ← `handled` if successful\n3. ← `not-handled` if not successful\n\nFor text only:\n\n1. → `transcript` with `text` to handle (required)\n2. ← `handled` if successful\n3. ← `not-handled` if not successful\n \nStreaming text only (successful):\n\n1. → `transcript` with `text` to handle (required)\n2. ← `handled-start` (required)\n3. ← `handled-chunk` (required)\n * Chunk of response text\n4. ← `handled` (required)\n * Sent for backwards compatibility\n5. ← `handled-stop` (required)\n \n### Audio Output\n\n1. → `audio-start` (required)\n2. → `audio-chunk` (required)\n * One or more audio chunks\n3. → `audio-stop` (required)\n4. ← `played`\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Peer-to-peer protocol for voice assistants",
"version": "1.7.2",
"project_urls": {
"Homepage": "http://github.com/OHF-voice/wyoming"
},
"split_keywords": [
"voice",
" assistant",
" protocol"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "85c512137c1ee153c7e08f1594a2374201fe6d7dc63beb57839cba239c7c2c05",
"md5": "1a270ebaf229ffed3aa55093fcb61eeb",
"sha256": "e3b55c464826e34ea20a8b8843456f962d6cc8deae27cfa5bc23f3a2693423d7"
},
"downloads": -1,
"filename": "wyoming-1.7.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1a270ebaf229ffed3aa55093fcb61eeb",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 41519,
"upload_time": "2025-08-04T20:50:55",
"upload_time_iso_8601": "2025-08-04T20:50:55.737396Z",
"url": "https://files.pythonhosted.org/packages/85/c5/12137c1ee153c7e08f1594a2374201fe6d7dc63beb57839cba239c7c2c05/wyoming-1.7.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "caf601ff030ae3a6c198596d9fb23600757e3d59f82ac8ffbbf0fbcda6222ecf",
"md5": "903fb8f5f76bfbe69afe2b4090495bba",
"sha256": "3f0630cafb03e87dee857bbcafc0e4d73b3285ec99fe99e1e7a8e3cba6ee3793"
},
"downloads": -1,
"filename": "wyoming-1.7.2.tar.gz",
"has_sig": false,
"md5_digest": "903fb8f5f76bfbe69afe2b4090495bba",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 39616,
"upload_time": "2025-08-04T20:50:56",
"upload_time_iso_8601": "2025-08-04T20:50:56.896776Z",
"url": "https://files.pythonhosted.org/packages/ca/f6/01ff030ae3a6c198596d9fb23600757e3d59f82ac8ffbbf0fbcda6222ecf/wyoming-1.7.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-04 20:50:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "OHF-voice",
"github_project": "wyoming",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "wyoming"
}