wyoming


Namewyoming JSON
Version 1.7.2 PyPI version JSON
download
home_pageNone
SummaryPeer-to-peer protocol for voice assistants
upload_time2025-08-04 20:50:56
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords voice assistant protocol
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Wyoming Protocol

A peer-to-peer protocol for voice assistants (basically [JSONL](https://jsonlines.org/) + PCM audio)

``` text
{ "type": "...", "data": { ... }, "data_length": ..., "payload_length": ... }\n
<data_length bytes (optional)>
<payload_length bytes (optional)>
```

Used in [Rhasspy](https://github.com/rhasspy/rhasspy3/) and [Home Assistant](https://www.home-assistant.io/integrations/wyoming) for communication with voice services.

[![An open standard from the Open Home Foundation](https://www.openhomefoundation.org/badges/ohf-open-standard.png)](https://www.openhomefoundation.org/)

## Wyoming Projects

* Voice satellites
    * [Satellite](https://github.com/rhasspy/wyoming-satellite) for Home Assistant 
* Audio input/output
    * [mic-external](https://github.com/rhasspy/wyoming-mic-external)
    * [snd-external](https://github.com/rhasspy/wyoming-snd-external)
    * [SDL2](https://github.com/rhasspy/wyoming-sdl2)
* Wake word detection
    * [openWakeWord](https://github.com/rhasspy/wyoming-openwakeword)
    * [porcupine1](https://github.com/rhasspy/wyoming-porcupine1)
    * [snowboy](https://github.com/rhasspy/wyoming-snowboy)
    * [microWakeWord](https://github.com/rhasspy/wyoming-microwakeword)
* Speech-to-text
    * [Faster Whisper](https://github.com/rhasspy/wyoming-faster-whisper)
    * [Vosk](https://github.com/rhasspy/wyoming-vosk)
    * [Whisper.cpp](https://github.com/rhasspy/wyoming-whisper-cpp)
* Text-to-speech
    * [Piper](https://github.com/rhasspy/wyoming-piper)
* Intent handling
    * [handle-external](https://github.com/rhasspy/wyoming-handle-external)

## Format

1. A JSON object header as a single line with `\n` (UTF-8, required)
    * `type` - event type (string, required)
    * `data` - event data (object, optional)
    * `data_length` - bytes of additional data (int, optional)
    * `payload_length` - bytes of binary payload (int, optional)
2. Additional data (UTF-8, optional)
    * JSON object with additional event-specific data
    * Merged on top of header `data`
    * Exactly `data_length` bytes long
    * Immediately follows header `\n`
3. Payload
    * Typically PCM audio but can be any binary data
    * Exactly `payload_length` bytes long
    * Immediately follows additional data or header `\n` if no additional data


## Event Types

Available events with `type` and fields.

### Audio

Send raw audio and indicate begin/end of audio streams.

* `audio-chunk` - chunk of raw PCM audio
    * `rate` - sample rate in hertz (int, required)
    * `width` - sample width in bytes (int, required)
    * `channels` - number of channels (int, required)
    * `timestamp` - timestamp of audio chunk in milliseconds (int, optional)
    * Payload is raw PCM audio samples
* `audio-start` - start of an audio stream
    * `rate` - sample rate in hertz (int, required)
    * `width` - sample width in bytes (int, required)
    * `channels` - number of channels (int, required)
    * `timestamp` - timestamp in milliseconds (int, optional)
* `audio-stop` - end of an audio stream
    * `timestamp` - timestamp in milliseconds (int, optional)
    
    
### Info

Describe available services.

* `describe` - request for available voice services
* `info` - response describing available voice services
    * `asr` - list speech recognition services (optional)
        * `models` - list of available models (required)
            * `name` - unique name (required)
            * `languages` - supported languages by model (list of string, required)
            * `attribution` (required)
                * `name` - name of creator (required)
                * `url` - URL of creator (required)
            * `installed` - true if currently installed (bool, required)
            * `description` - human-readable description (string, optional)
            * `version` - version of the model (string, optional)
        * `supports_transcript_streaming` - true if program can stream transcript chunks
    * `tts` - list text to speech services (optional)
        * `models` - list of available models
            * `name` - unique name (required)
            * `languages` - supported languages by model (list of string, required)
            * `speakers` - list of speakers (optional)
                * `name` - unique name of speaker (required)
            * `attribution` (required)
                * `name` - name of creator (required)
                * `url` - URL of creator (required)
            * `installed` - true if currently installed (bool, required)
            * `description` - human-readable description (string, optional)
            * `version` - version of the model (string, optional)
       * `supports_synthesize_streaming` - true if program can stream text chunks
    * `wake` - list wake word detection services( optional )
        * `models` - list of available models (required)
            * `name` - unique name (required)
            * `languages` - supported languages by model (list of string, required)
            * `attribution` (required)
                * `name` - name of creator (required)
                * `url` - URL of creator (required)
            * `installed` - true if currently installed (bool, required)
            * `description` - human-readable description (string, optional)
            * `version` - version of the model (string, optional)
    * `handle` - list intent handling services (optional)
        * `models` - list of available models (required)
            * `name` - unique name (required)
            * `languages` - supported languages by model (list of string, required)
            * `attribution` (required)
                * `name` - name of creator (required)
                * `url` - URL of creator (required)
            * `installed` - true if currently installed (bool, required)
            * `description` - human-readable description (string, optional)
            * `version` - version of the model (string, optional)
        * `supports_handled_streaming` - true if program can stream response chunks
    * `intent` - list intent recognition services (optional)
        * `models` - list of available models (required)
            * `name` - unique name (required)
            * `languages` - supported languages by model (list of string, required)
            * `attribution` (required)
                * `name` - name of creator (required)
                * `url` - URL of creator (required)
            * `installed` - true if currently installed (bool, required)
            * `description` - human-readable description (string, optional)
            * `version` - version of the model (string, optional)
    * `satellite` - information about voice satellite (optional)
        * `area` - name of area where satellite is located (string, optional)
        * `has_vad` - true if the end of voice commands will be detected locally (boolean, optional)
        * `active_wake_words` - list of wake words that are actively being listend for (list of string, optional)
        * `max_active_wake_words` - maximum number of local wake words that can be run simultaneously (number, optional)
        * `supports_trigger` - true if satellite supports remotely-triggered pipelines
    * `mic` - list of audio input services (optional)
        * `mic_format` - audio input format (required)
            * `rate` - sample rate in hertz (int, required)
            * `width` - sample width in bytes (int, required)
            * `channels` - number of channels (int, required)
    * `snd` - list of audio output services (optional)
        * `snd_format` - audio output format (required)
            * `rate` - sample rate in hertz (int, required)
            * `width` - sample width in bytes (int, required)
            * `channels` - number of channels (int, required)
    
### Speech Recognition

Transcribe audio into text.

* `transcribe` - request to transcribe an audio stream
    * `name` - name of model to use (string, optional)
    * `language` - language of spoken audio (string, optional)
    * `context` - context from previous interactions (object, optional)
* `transcript` - response with transcription
    * `text` - text transcription of spoken audio (string, required)
    * `language` - language of transcript (string, optional)
    * `context` - context for next interaction (object, optional)

Streaming:

1. `transcript-start` - starts stream
    * `language` - language of transcript (string, optional)
    * `context` - context from previous interactions (object, optional)
2. `transcript-chunk`
    * `text` - part of transcript (string, required)
3. Original `transcript` event must be sent for backwards compatibility
4. `transcript-stop` - end of stream

### Text to Speech

Synthesize audio from text.

* `synthesize` - request to generate audio from text
    * `text` - text to speak (string, required)
    * `voice` - use a specific voice (optional)
        * `name` - name of voice (string, optional)
        * `language` - language of voice (string, optional)
        * `speaker` - speaker of voice (string, optional)
        
Streaming:

1. `synthesize-start` - starts stream
    * `context` - context from previous interactions (object, optional)
    * `voice` - use a specific voice (optional)
        * `name` - name of voice (string, optional)
        * `language` - language of voice (string, optional)
        * `speaker` - speaker of voice (string, optional)
2. `synthesize-chunk`
    * `text` - part of text to synthesize (string, required)
3. Original `synthesize` message must be sent for backwards compatibility
4. `synthesize-stop` - end of stream, final audio must be sent
5. `synthesize-stopped` - sent back to server after final audio
    
### Wake Word

Detect wake words in an audio stream.

* `detect` - request detection of specific wake word(s)
    * `names` - wake word names to detect (list of string, optional)
* `detection` - response when detection occurs
    * `name` - name of wake word that was detected (int, optional)
    * `timestamp` - timestamp of audio chunk in milliseconds when detection occurred (int optional)
* `not-detected` - response when audio stream ends without a detection

### Voice Activity Detection

Detects speech and silence in an audio stream.

* `voice-started` - user has started speaking
    * `timestamp` - timestamp of audio chunk when speaking started in milliseconds (int, optional)
* `voice-stopped` - user has stopped speaking
    * `timestamp` - timestamp of audio chunk when speaking stopped in milliseconds (int, optional)
    
### Intent Recognition

Recognizes intents from text.

* `recognize` - request to recognize an intent from text
    * `text` - text to recognize (string, required)
    * `context` - context from previous interactions (object, optional)
* `intent` - response with recognized intent
    * `name` - name of intent (string, required)
    * `entities` - list of entities (optional)
        * `name` - name of entity (string, required)
        * `value` - value of entity (any, optional)
    * `text` - response for user (string, optional)
    * `context` - context for next interactions (object, optional)
* `not-recognized` - response indicating no intent was recognized
    * `text` - response for user (string, optional)
    * `context` - context for next interactions (object, optional)

### Intent Handling

Handle structured intents or text directly.

* `handled` - response when intent was successfully handled
    * `text` - response for user (string, optional)
    * `context` - context for next interactions (object, optional)
* `not-handled` - response when intent was not handled
    * `text` - response for user (string, optional)
    * `context` - context for next interactions (object, optional)

Streaming:

1. `handled-start` - starts stream
    * `context` - context from previous interactions (object, optional)
2. `handled-chunk`
    * `text` - part of response (string, required)
3. Original `handled` message must be sent for backwards compatibility
4. `handled-stop` - end of stream

### Audio Output

Play audio stream.

* `played` - response when audio finishes playing

### Voice Satellite

Control of one or more remote voice satellites connected to a central server.

* `run-satellite` - informs satellite that server is ready to run pipelines
* `pause-satellite` - informs satellite that server is not ready anymore to run pipelines
* `satellite-connected` - satellite has connected to the server
* `satellite-disconnected` - satellite has been disconnected from the server
* `streaming-started` - satellite has started streaming audio to the server
* `streaming-stopped` - satellite has stopped streaming audio to the server

Pipelines are run on the server, but can be triggered remotely from the server as well.

* `run-pipeline` - runs a pipeline on the server or asks the satellite to run it when possible
    * `start_stage` - pipeline stage to start at (string, required)
    * `end_stage` - pipeline stage to end at (string, required)
    * `wake_word_name` - name of detected wake word that started this pipeline (string, optional)
        * From client only
    * `wake_word_names` - names of wake words to listen for (list of string, optional)
        * From server only
        * `start_stage` must be "wake"
    * `announce_text` - text to speak on the satellite
        * From server only
        * `start_stage` must be "tts"
    * `restart_on_end` - true if the server should re-run the pipeline after it ends (boolean, default is false)
        * Only used for always-on streaming satellites

### Timers

* `timer-started` - a new timer has started
    * `id` - unique id of timer (string, required)
    * `total_seconds` - number of seconds the timer should run for (int, required)
    * `name` - user-provided name for timer (string, optional)
    * `start_hours` - hours the timer should run for as spoken by user (int, optional)
    * `start_minutes` - minutes the timer should run for as spoken by user (int, optional)
    * `start_seconds` - seconds the timer should run for as spoken by user (int, optional)
    * `command` - optional command that the server will execute when the timer is finished
        * `text` - text of command to execute (string, required)
        * `language` - language of the command (string, optional)
* `timer-updated` - timer has been paused/resumed or time has been added/removed
    * `id` - unique id of timer (string, required)
    * `is_active` - true if timer is running, false if paused (bool, required)
    * `total_seconds` - number of seconds that the timer should run for now (int, required)
* `timer-cancelled` - timer was cancelled
    * `id` - unique id of timer (string, required)
* `timer-finished` - timer finished without being cancelled
    * `id` - unique id of timer (string, required)

## Event Flow

* &rarr; is an event from client to server
* &larr; is an event from server to client


### Service Description

1. &rarr; `describe` (required) 
2. &larr; `info` (required)


### Speech to Text

1. &rarr; `transcribe` event with `name` of model to use or `language` (optional)
2. &rarr; `audio-start` (required)
3. &rarr; `audio-chunk` (required)
    * Send audio chunks until silence is detected
4. &rarr; `audio-stop` (required)
5. &larr; `transcript` (required)
    * Contains text transcription of spoken audio
    
Streaming:

1. &rarr; `transcribe` event (optional)
2. &rarr; `audio-start` (required)
3. &rarr; `audio-chunk` (required)
    * Send audio chunks until silence is detected
4. &larr; `transcript-start` (required)
5. &larr; `transcript-chunk` (required)
    * Send transcript chunks as they're produced
6. &rarr; `audio-stop` (required)
7. &larr; `transcript` (required)
    * Sent for backwards compatibility
8. &larr; `transcript-stop` (required)


### Text to Speech

1. &rarr; `synthesize` event with `text` (required)
2. &larr; `audio-start`
3. &larr; `audio-chunk`
    * One or more audio chunks
4. &larr; `audio-stop`

Streaming:

1. &rarr; `synthesize-start` event (required)
3. &rarr; `synthesize-chunk` event (required)
    * Text chunks are sent as they're produced
3. &larr; `audio-start`, `audio-chunk` (one or more), `audio-stop`
    * Audio chunks are sent as they're produced with start/stop
4. &rarr; `synthesize` event
    * Sent for backwards compatibility
5. &rarr; `synthesize-stop` event
    * End of text stream
6. &larr; Final audio must be sent
    * `audio-start`, `audio-chunk` (one or more), `audio-stop`
7. &larr; `synthesize-stopped`
    * Tells server that final audio has been sent

### Wake Word Detection

1. &rarr; `detect` event with `names` of wake words to detect (optional)
2. &rarr; `audio-start` (required)
3. &rarr; `audio-chunk` (required)
    * Keep sending audio chunks until a `detection` is received
4. &larr; `detection`
    * Sent for each wake word detection 
5. &rarr; `audio-stop` (optional)
    * Manually end audio stream
6. &larr; `not-detected`
    * Sent after `audio-stop` if no detections occurred
    
### Voice Activity Detection

1. &rarr; `audio-chunk` (required)
    * Send audio chunks until silence is detected
2. &larr; `voice-started`
    * When speech starts
3. &larr; `voice-stopped`
    * When speech stops
    
### Intent Recognition

1. &rarr; `recognize` (required)
2. &larr; `intent` if successful
3. &larr; `not-recognized` if not successful

### Intent Handling

For structured intents:

1. &rarr; `intent` (required)
2. &larr; `handled` if successful
3. &larr; `not-handled` if not successful

For text only:

1. &rarr; `transcript` with `text` to handle (required)
2. &larr; `handled` if successful
3. &larr; `not-handled` if not successful
    
Streaming text only (successful):

1. &rarr; `transcript` with `text` to handle (required)
2. &larr; `handled-start` (required)
3. &larr; `handled-chunk` (required)
    * Chunk of response text
4. &larr; `handled` (required)
    * Sent for backwards compatibility
5. &larr; `handled-stop` (required)
    
### Audio Output

1. &rarr; `audio-start` (required)
2. &rarr; `audio-chunk` (required)
    * One or more audio chunks
3. &rarr; `audio-stop` (required)
4. &larr; `played`

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "wyoming",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "voice, assistant, protocol",
    "author": null,
    "author_email": "Michael Hansen <mike@rhasspy.org>",
    "download_url": "https://files.pythonhosted.org/packages/ca/f6/01ff030ae3a6c198596d9fb23600757e3d59f82ac8ffbbf0fbcda6222ecf/wyoming-1.7.2.tar.gz",
    "platform": null,
    "description": "# Wyoming Protocol\n\nA peer-to-peer protocol for voice assistants (basically [JSONL](https://jsonlines.org/) + PCM audio)\n\n``` text\n{ \"type\": \"...\", \"data\": { ... }, \"data_length\": ..., \"payload_length\": ... }\\n\n<data_length bytes (optional)>\n<payload_length bytes (optional)>\n```\n\nUsed in [Rhasspy](https://github.com/rhasspy/rhasspy3/) and [Home Assistant](https://www.home-assistant.io/integrations/wyoming) for communication with voice services.\n\n[![An open standard from the Open Home Foundation](https://www.openhomefoundation.org/badges/ohf-open-standard.png)](https://www.openhomefoundation.org/)\n\n## Wyoming Projects\n\n* Voice satellites\n    * [Satellite](https://github.com/rhasspy/wyoming-satellite) for Home Assistant \n* Audio input/output\n    * [mic-external](https://github.com/rhasspy/wyoming-mic-external)\n    * [snd-external](https://github.com/rhasspy/wyoming-snd-external)\n    * [SDL2](https://github.com/rhasspy/wyoming-sdl2)\n* Wake word detection\n    * [openWakeWord](https://github.com/rhasspy/wyoming-openwakeword)\n    * [porcupine1](https://github.com/rhasspy/wyoming-porcupine1)\n    * [snowboy](https://github.com/rhasspy/wyoming-snowboy)\n    * [microWakeWord](https://github.com/rhasspy/wyoming-microwakeword)\n* Speech-to-text\n    * [Faster Whisper](https://github.com/rhasspy/wyoming-faster-whisper)\n    * [Vosk](https://github.com/rhasspy/wyoming-vosk)\n    * [Whisper.cpp](https://github.com/rhasspy/wyoming-whisper-cpp)\n* Text-to-speech\n    * [Piper](https://github.com/rhasspy/wyoming-piper)\n* Intent handling\n    * [handle-external](https://github.com/rhasspy/wyoming-handle-external)\n\n## Format\n\n1. A JSON object header as a single line with `\\n` (UTF-8, required)\n    * `type` - event type (string, required)\n    * `data` - event data (object, optional)\n    * `data_length` - bytes of additional data (int, optional)\n    * `payload_length` - bytes of binary payload (int, optional)\n2. Additional data (UTF-8, optional)\n    * JSON object with additional event-specific data\n    * Merged on top of header `data`\n    * Exactly `data_length` bytes long\n    * Immediately follows header `\\n`\n3. Payload\n    * Typically PCM audio but can be any binary data\n    * Exactly `payload_length` bytes long\n    * Immediately follows additional data or header `\\n` if no additional data\n\n\n## Event Types\n\nAvailable events with `type` and fields.\n\n### Audio\n\nSend raw audio and indicate begin/end of audio streams.\n\n* `audio-chunk` - chunk of raw PCM audio\n    * `rate` - sample rate in hertz (int, required)\n    * `width` - sample width in bytes (int, required)\n    * `channels` - number of channels (int, required)\n    * `timestamp` - timestamp of audio chunk in milliseconds (int, optional)\n    * Payload is raw PCM audio samples\n* `audio-start` - start of an audio stream\n    * `rate` - sample rate in hertz (int, required)\n    * `width` - sample width in bytes (int, required)\n    * `channels` - number of channels (int, required)\n    * `timestamp` - timestamp in milliseconds (int, optional)\n* `audio-stop` - end of an audio stream\n    * `timestamp` - timestamp in milliseconds (int, optional)\n    \n    \n### Info\n\nDescribe available services.\n\n* `describe` - request for available voice services\n* `info` - response describing available voice services\n    * `asr` - list speech recognition services (optional)\n        * `models` - list of available models (required)\n            * `name` - unique name (required)\n            * `languages` - supported languages by model (list of string, required)\n            * `attribution` (required)\n                * `name` - name of creator (required)\n                * `url` - URL of creator (required)\n            * `installed` - true if currently installed (bool, required)\n            * `description` - human-readable description (string, optional)\n            * `version` - version of the model (string, optional)\n        * `supports_transcript_streaming` - true if program can stream transcript chunks\n    * `tts` - list text to speech services (optional)\n        * `models` - list of available models\n            * `name` - unique name (required)\n            * `languages` - supported languages by model (list of string, required)\n            * `speakers` - list of speakers (optional)\n                * `name` - unique name of speaker (required)\n            * `attribution` (required)\n                * `name` - name of creator (required)\n                * `url` - URL of creator (required)\n            * `installed` - true if currently installed (bool, required)\n            * `description` - human-readable description (string, optional)\n            * `version` - version of the model (string, optional)\n       * `supports_synthesize_streaming` - true if program can stream text chunks\n    * `wake` - list wake word detection services( optional )\n        * `models` - list of available models (required)\n            * `name` - unique name (required)\n            * `languages` - supported languages by model (list of string, required)\n            * `attribution` (required)\n                * `name` - name of creator (required)\n                * `url` - URL of creator (required)\n            * `installed` - true if currently installed (bool, required)\n            * `description` - human-readable description (string, optional)\n            * `version` - version of the model (string, optional)\n    * `handle` - list intent handling services (optional)\n        * `models` - list of available models (required)\n            * `name` - unique name (required)\n            * `languages` - supported languages by model (list of string, required)\n            * `attribution` (required)\n                * `name` - name of creator (required)\n                * `url` - URL of creator (required)\n            * `installed` - true if currently installed (bool, required)\n            * `description` - human-readable description (string, optional)\n            * `version` - version of the model (string, optional)\n        * `supports_handled_streaming` - true if program can stream response chunks\n    * `intent` - list intent recognition services (optional)\n        * `models` - list of available models (required)\n            * `name` - unique name (required)\n            * `languages` - supported languages by model (list of string, required)\n            * `attribution` (required)\n                * `name` - name of creator (required)\n                * `url` - URL of creator (required)\n            * `installed` - true if currently installed (bool, required)\n            * `description` - human-readable description (string, optional)\n            * `version` - version of the model (string, optional)\n    * `satellite` - information about voice satellite (optional)\n        * `area` - name of area where satellite is located (string, optional)\n        * `has_vad` - true if the end of voice commands will be detected locally (boolean, optional)\n        * `active_wake_words` - list of wake words that are actively being listend for (list of string, optional)\n        * `max_active_wake_words` - maximum number of local wake words that can be run simultaneously (number, optional)\n        * `supports_trigger` - true if satellite supports remotely-triggered pipelines\n    * `mic` - list of audio input services (optional)\n        * `mic_format` - audio input format (required)\n            * `rate` - sample rate in hertz (int, required)\n            * `width` - sample width in bytes (int, required)\n            * `channels` - number of channels (int, required)\n    * `snd` - list of audio output services (optional)\n        * `snd_format` - audio output format (required)\n            * `rate` - sample rate in hertz (int, required)\n            * `width` - sample width in bytes (int, required)\n            * `channels` - number of channels (int, required)\n    \n### Speech Recognition\n\nTranscribe audio into text.\n\n* `transcribe` - request to transcribe an audio stream\n    * `name` - name of model to use (string, optional)\n    * `language` - language of spoken audio (string, optional)\n    * `context` - context from previous interactions (object, optional)\n* `transcript` - response with transcription\n    * `text` - text transcription of spoken audio (string, required)\n    * `language` - language of transcript (string, optional)\n    * `context` - context for next interaction (object, optional)\n\nStreaming:\n\n1. `transcript-start` - starts stream\n    * `language` - language of transcript (string, optional)\n    * `context` - context from previous interactions (object, optional)\n2. `transcript-chunk`\n    * `text` - part of transcript (string, required)\n3. Original `transcript` event must be sent for backwards compatibility\n4. `transcript-stop` - end of stream\n\n### Text to Speech\n\nSynthesize audio from text.\n\n* `synthesize` - request to generate audio from text\n    * `text` - text to speak (string, required)\n    * `voice` - use a specific voice (optional)\n        * `name` - name of voice (string, optional)\n        * `language` - language of voice (string, optional)\n        * `speaker` - speaker of voice (string, optional)\n        \nStreaming:\n\n1. `synthesize-start` - starts stream\n    * `context` - context from previous interactions (object, optional)\n    * `voice` - use a specific voice (optional)\n        * `name` - name of voice (string, optional)\n        * `language` - language of voice (string, optional)\n        * `speaker` - speaker of voice (string, optional)\n2. `synthesize-chunk`\n    * `text` - part of text to synthesize (string, required)\n3. Original `synthesize` message must be sent for backwards compatibility\n4. `synthesize-stop` - end of stream, final audio must be sent\n5. `synthesize-stopped` - sent back to server after final audio\n    \n### Wake Word\n\nDetect wake words in an audio stream.\n\n* `detect` - request detection of specific wake word(s)\n    * `names` - wake word names to detect (list of string, optional)\n* `detection` - response when detection occurs\n    * `name` - name of wake word that was detected (int, optional)\n    * `timestamp` - timestamp of audio chunk in milliseconds when detection occurred (int optional)\n* `not-detected` - response when audio stream ends without a detection\n\n### Voice Activity Detection\n\nDetects speech and silence in an audio stream.\n\n* `voice-started` - user has started speaking\n    * `timestamp` - timestamp of audio chunk when speaking started in milliseconds (int, optional)\n* `voice-stopped` - user has stopped speaking\n    * `timestamp` - timestamp of audio chunk when speaking stopped in milliseconds (int, optional)\n    \n### Intent Recognition\n\nRecognizes intents from text.\n\n* `recognize` - request to recognize an intent from text\n    * `text` - text to recognize (string, required)\n    * `context` - context from previous interactions (object, optional)\n* `intent` - response with recognized intent\n    * `name` - name of intent (string, required)\n    * `entities` - list of entities (optional)\n        * `name` - name of entity (string, required)\n        * `value` - value of entity (any, optional)\n    * `text` - response for user (string, optional)\n    * `context` - context for next interactions (object, optional)\n* `not-recognized` - response indicating no intent was recognized\n    * `text` - response for user (string, optional)\n    * `context` - context for next interactions (object, optional)\n\n### Intent Handling\n\nHandle structured intents or text directly.\n\n* `handled` - response when intent was successfully handled\n    * `text` - response for user (string, optional)\n    * `context` - context for next interactions (object, optional)\n* `not-handled` - response when intent was not handled\n    * `text` - response for user (string, optional)\n    * `context` - context for next interactions (object, optional)\n\nStreaming:\n\n1. `handled-start` - starts stream\n    * `context` - context from previous interactions (object, optional)\n2. `handled-chunk`\n    * `text` - part of response (string, required)\n3. Original `handled` message must be sent for backwards compatibility\n4. `handled-stop` - end of stream\n\n### Audio Output\n\nPlay audio stream.\n\n* `played` - response when audio finishes playing\n\n### Voice Satellite\n\nControl of one or more remote voice satellites connected to a central server.\n\n* `run-satellite` - informs satellite that server is ready to run pipelines\n* `pause-satellite` - informs satellite that server is not ready anymore to run pipelines\n* `satellite-connected` - satellite has connected to the server\n* `satellite-disconnected` - satellite has been disconnected from the server\n* `streaming-started` - satellite has started streaming audio to the server\n* `streaming-stopped` - satellite has stopped streaming audio to the server\n\nPipelines are run on the server, but can be triggered remotely from the server as well.\n\n* `run-pipeline` - runs a pipeline on the server or asks the satellite to run it when possible\n    * `start_stage` - pipeline stage to start at (string, required)\n    * `end_stage` - pipeline stage to end at (string, required)\n    * `wake_word_name` - name of detected wake word that started this pipeline (string, optional)\n        * From client only\n    * `wake_word_names` - names of wake words to listen for (list of string, optional)\n        * From server only\n        * `start_stage` must be \"wake\"\n    * `announce_text` - text to speak on the satellite\n        * From server only\n        * `start_stage` must be \"tts\"\n    * `restart_on_end` - true if the server should re-run the pipeline after it ends (boolean, default is false)\n        * Only used for always-on streaming satellites\n\n### Timers\n\n* `timer-started` - a new timer has started\n    * `id` - unique id of timer (string, required)\n    * `total_seconds` - number of seconds the timer should run for (int, required)\n    * `name` - user-provided name for timer (string, optional)\n    * `start_hours` - hours the timer should run for as spoken by user (int, optional)\n    * `start_minutes` - minutes the timer should run for as spoken by user (int, optional)\n    * `start_seconds` - seconds the timer should run for as spoken by user (int, optional)\n    * `command` - optional command that the server will execute when the timer is finished\n        * `text` - text of command to execute (string, required)\n        * `language` - language of the command (string, optional)\n* `timer-updated` - timer has been paused/resumed or time has been added/removed\n    * `id` - unique id of timer (string, required)\n    * `is_active` - true if timer is running, false if paused (bool, required)\n    * `total_seconds` - number of seconds that the timer should run for now (int, required)\n* `timer-cancelled` - timer was cancelled\n    * `id` - unique id of timer (string, required)\n* `timer-finished` - timer finished without being cancelled\n    * `id` - unique id of timer (string, required)\n\n## Event Flow\n\n* &rarr; is an event from client to server\n* &larr; is an event from server to client\n\n\n### Service Description\n\n1. &rarr; `describe` (required) \n2. &larr; `info` (required)\n\n\n### Speech to Text\n\n1. &rarr; `transcribe` event with `name` of model to use or `language` (optional)\n2. &rarr; `audio-start` (required)\n3. &rarr; `audio-chunk` (required)\n    * Send audio chunks until silence is detected\n4. &rarr; `audio-stop` (required)\n5. &larr; `transcript` (required)\n    * Contains text transcription of spoken audio\n    \nStreaming:\n\n1. &rarr; `transcribe` event (optional)\n2. &rarr; `audio-start` (required)\n3. &rarr; `audio-chunk` (required)\n    * Send audio chunks until silence is detected\n4. &larr; `transcript-start` (required)\n5. &larr; `transcript-chunk` (required)\n    * Send transcript chunks as they're produced\n6. &rarr; `audio-stop` (required)\n7. &larr; `transcript` (required)\n    * Sent for backwards compatibility\n8. &larr; `transcript-stop` (required)\n\n\n### Text to Speech\n\n1. &rarr; `synthesize` event with `text` (required)\n2. &larr; `audio-start`\n3. &larr; `audio-chunk`\n    * One or more audio chunks\n4. &larr; `audio-stop`\n\nStreaming:\n\n1. &rarr; `synthesize-start` event (required)\n3. &rarr; `synthesize-chunk` event (required)\n    * Text chunks are sent as they're produced\n3. &larr; `audio-start`, `audio-chunk` (one or more), `audio-stop`\n    * Audio chunks are sent as they're produced with start/stop\n4. &rarr; `synthesize` event\n    * Sent for backwards compatibility\n5. &rarr; `synthesize-stop` event\n    * End of text stream\n6. &larr; Final audio must be sent\n    * `audio-start`, `audio-chunk` (one or more), `audio-stop`\n7. &larr; `synthesize-stopped`\n    * Tells server that final audio has been sent\n\n### Wake Word Detection\n\n1. &rarr; `detect` event with `names` of wake words to detect (optional)\n2. &rarr; `audio-start` (required)\n3. &rarr; `audio-chunk` (required)\n    * Keep sending audio chunks until a `detection` is received\n4. &larr; `detection`\n    * Sent for each wake word detection \n5. &rarr; `audio-stop` (optional)\n    * Manually end audio stream\n6. &larr; `not-detected`\n    * Sent after `audio-stop` if no detections occurred\n    \n### Voice Activity Detection\n\n1. &rarr; `audio-chunk` (required)\n    * Send audio chunks until silence is detected\n2. &larr; `voice-started`\n    * When speech starts\n3. &larr; `voice-stopped`\n    * When speech stops\n    \n### Intent Recognition\n\n1. &rarr; `recognize` (required)\n2. &larr; `intent` if successful\n3. &larr; `not-recognized` if not successful\n\n### Intent Handling\n\nFor structured intents:\n\n1. &rarr; `intent` (required)\n2. &larr; `handled` if successful\n3. &larr; `not-handled` if not successful\n\nFor text only:\n\n1. &rarr; `transcript` with `text` to handle (required)\n2. &larr; `handled` if successful\n3. &larr; `not-handled` if not successful\n    \nStreaming text only (successful):\n\n1. &rarr; `transcript` with `text` to handle (required)\n2. &larr; `handled-start` (required)\n3. &larr; `handled-chunk` (required)\n    * Chunk of response text\n4. &larr; `handled` (required)\n    * Sent for backwards compatibility\n5. &larr; `handled-stop` (required)\n    \n### Audio Output\n\n1. &rarr; `audio-start` (required)\n2. &rarr; `audio-chunk` (required)\n    * One or more audio chunks\n3. &rarr; `audio-stop` (required)\n4. &larr; `played`\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Peer-to-peer protocol for voice assistants",
    "version": "1.7.2",
    "project_urls": {
        "Homepage": "http://github.com/OHF-voice/wyoming"
    },
    "split_keywords": [
        "voice",
        " assistant",
        " protocol"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "85c512137c1ee153c7e08f1594a2374201fe6d7dc63beb57839cba239c7c2c05",
                "md5": "1a270ebaf229ffed3aa55093fcb61eeb",
                "sha256": "e3b55c464826e34ea20a8b8843456f962d6cc8deae27cfa5bc23f3a2693423d7"
            },
            "downloads": -1,
            "filename": "wyoming-1.7.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1a270ebaf229ffed3aa55093fcb61eeb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 41519,
            "upload_time": "2025-08-04T20:50:55",
            "upload_time_iso_8601": "2025-08-04T20:50:55.737396Z",
            "url": "https://files.pythonhosted.org/packages/85/c5/12137c1ee153c7e08f1594a2374201fe6d7dc63beb57839cba239c7c2c05/wyoming-1.7.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "caf601ff030ae3a6c198596d9fb23600757e3d59f82ac8ffbbf0fbcda6222ecf",
                "md5": "903fb8f5f76bfbe69afe2b4090495bba",
                "sha256": "3f0630cafb03e87dee857bbcafc0e4d73b3285ec99fe99e1e7a8e3cba6ee3793"
            },
            "downloads": -1,
            "filename": "wyoming-1.7.2.tar.gz",
            "has_sig": false,
            "md5_digest": "903fb8f5f76bfbe69afe2b4090495bba",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 39616,
            "upload_time": "2025-08-04T20:50:56",
            "upload_time_iso_8601": "2025-08-04T20:50:56.896776Z",
            "url": "https://files.pythonhosted.org/packages/ca/f6/01ff030ae3a6c198596d9fb23600757e3d59f82ac8ffbbf0fbcda6222ecf/wyoming-1.7.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-04 20:50:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "OHF-voice",
    "github_project": "wyoming",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "wyoming"
}
        
Elapsed time: 3.26532s