# Cued Speech Processing Tools
A comprehensive Python package for processing cued speech videos with both decoding and generation capabilities. This package provides functionality to decode cued speech videos into subtitled output and generate cued speech videos from text input.
## Features
### Decoder Features
- **Real-time Video Processing**: Process cued speech videos using MediaPipe for landmark extraction
- **Neural Network Inference**: Use trained CTC models for phoneme recognition
- **French Language Correction**: Apply KenLM language models and homophone correction
- **Subtitle Generation**: Generate subtitled videos with French sentences
### Generator Features
- **Text-to-Cued Speech**: Generate cued speech videos from French text input
- **Whisper Integration**: Automatic speech recognition for accurate alignment
- **MFA Alignment**: Montreal Forced Alignment for precise phoneme timing
- **Hand Gesture Overlay**: Realistic hand shape and position rendering
- **Automatic Synchronization**: Perfect alignment between speech and visual cues
### Data Management Features
- **Automatic Data Download**: Automatically download required model files and data
- **GitHub Release Integration**: Seamless download from GitHub releases
- **Smart Caching**: Avoid re-downloading existing files
- **Easy Cleanup**: Simple commands to manage downloaded data
### General Features
- **Command Line Interface**: Easy-to-use CLI for both decoding and generation
- **Organized Output Structure**: Separate folders for decoder and generator outputs
- **Extensible Architecture**: Modular design for future enhancements
- **PyPI Ready**: Ready for publication and easy installation
## Installation
### Prerequisites
- Python 3.11 or higher
- Pixi (to install Montreal Forced Aligner)
- Optional: ffmpeg in PATH for video/audio handling
### Install with Pixi (Recommended)
Use Pixi to install MFA, then install `cued_speech` via pip inside the Pixi environment.
#### 1) Install Pixi
- macOS/Linux:
```bash
curl -fsSL https://pixi.sh/install.sh | bash
```
- Windows (PowerShell):
```powershell
irm https://pixi.sh/install.ps1 | iex
```
More options: https://pixi.sh/installation/
#### 2) Create a clean Pixi environment and install MFA
```bash
mkdir cued-speech-env && cd cued-speech-env
pixi init
pixi add montreal-forced-aligner=3.3.4
pixi run mfa --version
```
#### 3) Install the cued_speech package (pip inside Pixi)
```bash
pixi run python -m pip install cued-speech
```
#### 4) Verify and see available options
```bash
pixi run cued-speech
```
### Troubleshooting NumPy/PyTorch Compatibility Issues
If you encounter NumPy/PyTorch compatibility errors:
#### **Recommended Solution: Use Pixi**
The easiest way to avoid these issues is to use Pixi, which ensures exact dependency versions:
```bash
# Install Pixi
curl -fsSL https://pixi.sh/install.sh | bash
# Clone and setup the project
git clone https://github.com/bsow/cued-speech.git
cd cued-speech
pixi install
pixi shell
pixi run pip install -e .
```
#### **Alternative Solutions:**
1. NumPy/PyTorch incompatibility: ensure NumPy < 2.0 if using older PyTorch
```bash
pip install 'numpy>=1.24,<2.0'
```
2. Create a fresh Pixi environment and reinstall
```bash
mkdir fresh-env && cd fresh-env
pixi init
pixi add montreal-forced-aligner=3.3.4
pixi run python -m pip install cued-speech
```
**Note:** For other installation issues, see the [Troubleshooting Guide](TROUBLESHOOTING.md).
## Data Setup
The package requires several model files and data for operation. These are automatically downloaded on first use, but you can also manage them manually.
### Manual Data Management
You can manage data files manually using the provided commands:
```bash
# Download all required data files
cued-speech download-data
# List available data files
cued-speech list-data
# Clean up downloaded data files
cued-speech cleanup-data --confirm
```
### Required Data Files
The following files are automatically downloaded to a `download/` folder in your current working directory:
- `cuedspeech-model.pt` - Pre-trained neural network model
- `phonelist.csv` - Phoneme vocabulary
- `lexicon.txt` - French lexicon
- `kenlm_fr.bin` - French language model
- `homophones_dico.jsonl` - Homophone dictionary
- `kenlm_ipa.binary` - IPA language model
- `ipa_to_french.csv` - IPA to French mapping
- `test_decode.mp4` - Sample video for testing
- `test_generate.mp4` - Sample video for generation
- `rotated_images/` - Directory containing hand shape images for generation
**Note:** Data files are stored in `./download/` relative to where you run the commands, making them easy to find and manage.
## Usage
### Command Line Interface
The package provides a comprehensive command-line interface for both decoding and generating cued speech videos:
#### Decoding (Cued Speech → Text)
Decode a cued speech video into a subtitled video.
Options:
- `--video_path PATH` (default: `download/test_decode.mp4`): Input cued-speech video
- `--right_speaker [True|False]` (default: `True`): Whether the speaker uses the right hand
- `--model_path PATH` (default: `download/cuedspeech-model.pt`): Pretrained model file
- `--output_path PATH` (default: `output/decoder/decoded_video.mp4`): Output subtitled video
- `--vocab_path PATH` (default: `download/phonelist.csv`): Vocabulary file
- `--lexicon_path PATH` (default: `download/lexicon.txt`): Lexicon file
- `--kenlm_fr PATH` (default: `download/kenlm_fr.bin`): KenLM model file
- `--homophones_path PATH` (default: `download/homophones_dico.jsonl`): Homophones dictionary
- `--kenlm_ipa PATH` (default: `download/kenlm_ipa.binary`): IPA language model
- `--auto_download [True|False]` (default: `True`): Auto-download missing data files
```bash
# Basic usage (uses default paths, automatically downloads data if needed)
cued-speech decode
# With custom video path
cued-speech decode --video_path /path/to/your/video.mp4
# Disable automatic data download
cued-speech decode --auto_download False
# Advanced usage with custom settings
cued-speech decode \
--video_path /path/to/your/video.mp4 \
--output_path output/decoder/my_decoded_video.mp4 \
--model_path /path/to/custom_model.pt \
--vocab_path /path/to/custom_vocab.csv \
--lexicon_path /path/to/custom_lexicon.txt \
--kenlm_fr /path/to/custom_kenlm.bin \
--homophones_path /path/to/custom_homophones.jsonl \
--kenlm_ipa /path/to/custom_lm.binary \
--right_speaker True
```
#### Generation (Video → Cued Speech)
Generate a cued speech video from a video file. Text is extracted with Whisper unless `--skip-whisper` is used and `--text` is provided.
Arguments:
- `VIDEO_PATH` (positional): Path to input video file
Options:
- `--text TEXT` (default: None): Provide text manually (otherwise Whisper extracts it)
- `--output_path PATH` (default: `output/generator/generated_cued_speech.mp4`): Output video path
- `--audio_path PATH` (default: None): Optional audio file (extracted from video if not provided)
- `--language [french|...]` (default: `french`): Processing language
- `--skip-whisper` (flag): Skip Whisper download/transcription (requires `--text`)
- `--easing [linear|ease_in_out_cubic|ease_out_elastic|ease_in_out_back]` (default: `ease_in_out_cubic`): Gesture easing
- `--morphing/--no-morphing` (default: `--morphing`): Hand shape morphing
- `--transparency/--no-transparency` (default: `--transparency`): Transparency effects during transitions
- `--curving/--no-curving` (default: `--curving`): Curved trajectories
```bash
# Basic usage (text extracted automatically from video)
cued-speech generate input_video.mp4
# With custom output path
cued-speech generate speaker_video.mp4 --output_path output/generator/my_generated_video.mp4
# With custom audio file
cued-speech generate speaker_video.mp4 --audio_path custom_audio.wav
# With different language
cued-speech generate speaker_video.mp4 --language english
# With manual text (optional)
cued-speech generate speaker_video.mp4 --text "Merci beaucoup pour votre attention"
# Skip Whisper if you have SSL issues
cued-speech generate speaker_video.mp4 --skip-whisper --text "Merci beaucoup pour votre attention"
```
### Output Structure
The package organizes outputs in a structured way:
```
output/
├── decoder/ # Decoded videos with subtitles
│ └── decoded_video.mp4
└── generator/ # Generated cued speech videos
├── audio.wav # Extracted/processed audio
├── audio.TextGrid # MFA alignment results
├── rendered_video.mp4 # Video with hand cues (no audio)
├── final_rendered_video.mp4 # Final output with audio
└── mfa_input/ # MFA temporary files
```
### Python API
You can also use the package programmatically:
#### Decoder API
```python
from cued_speech import decode_video
# Decode a cued speech video
decode_video(
video_path="input.mp4",
right_speaker=True,
model_path="/path/to/model.pt",
output_path="output/decoder/decoded.mp4",
vocab_path="/path/to/vocab.csv",
lexicon_path="/path/to/lexicon.txt",
kenlm_model_path="/path/to/kenlm.bin",
homophones_path="/path/to/homophones.jsonl",
lm_path="/path/to/lm.binary"
)
```
#### Generator API
```python
from cued_speech import generate_cue
# Generate a cued speech video (text extracted automatically)
result_path = generate_cue(
text=None, # Will be extracted from video using Whisper
video_path="speaker_video.mp4",
output_path="output/generator/generated.mp4",
audio_path=None, # Will extract from video
config={
"language": "french",
"hand_scale_factor": 0.75,
"video_codec": "libx264",
"audio_codec": "aac"
}
)
print(f"Generated video saved to: {result_path}")
# Or with manual text
result_path = generate_cue(
text="Bonjour tout le monde",
video_path="speaker_video.mp4",
output_path="output/generator/generated.mp4"
)
```
## Architecture
### Core Components
#### Decoder Components
1. **MediaPipe Integration**: Extracts hand and lip landmarks from video frames
2. **Feature Extraction**: Processes landmarks into hand shape, position, and lip features
3. **Neural Network**: Three-stream fusion encoder with CTC output
4. **Language Model**: KenLM-based beam search for French sentence correction
5. **Video Processing**: Generates subtitled output with synchronized audio
#### Generator Components
1. **Whisper Integration**: Automatic speech recognition for transcription
2. **MFA Alignment**: Montreal Forced Alignment for precise phoneme timing
3. **Cue Mapping**: Maps phonemes to hand shapes and positions using cued speech rules
4. **Hand Rendering**: Overlays realistic hand gestures onto video frames
5. **Synchronization**: Ensures perfect timing between speech and visual cues
### Model Architecture
#### Decoder Architecture
The decoder uses a three-stream fusion encoder:
- **Hand Shape Stream**: Processes hand landmark positions and geometric features
- **Hand Position Stream**: Analyzes hand movement and positioning
- **Lips Stream**: Extracts lip movement and facial features
#### Generator Architecture
The generator follows a multi-stage pipeline:
- **Audio Processing**: Whisper-based transcription and feature extraction
- **Phoneme Alignment**: MFA-based precise timing alignment
- **Cue Generation**: Rule-based mapping from phonemes to hand configurations
- **Video Rendering**: Real-time hand overlay with facial landmark tracking
### Processing Pipeline
#### Decoding Pipeline
1. **Video Input**: Load and process video frames
2. **Landmark Extraction**: Use MediaPipe to extract hand and face landmarks
3. **Feature Computation**: Calculate geometric and temporal features
4. **Model Inference**: Run CTC model to predict phonemes
5. **Language Correction**: Apply beam search with language models
6. **Subtitle Generation**: Create output video with French subtitles
#### Generation Pipeline
1. **Text Input**: Process French text for cued speech generation
2. **Audio Extraction**: Extract or use provided audio track
3. **Speech Recognition**: Use Whisper for accurate transcription
4. **Phoneme Alignment**: Apply MFA for precise timing
5. **Cue Mapping**: Map phonemes to hand shapes and positions
6. **Video Rendering**: Overlay hand cues with perfect synchronization
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Acknowledgments
- MediaPipe for landmark extraction
- PyTorch for deep learning framework
- KenLM for language modeling
- The cued speech research community
## Support
For questions and support:
- Contact: boubasow.pro@gmail.com
Raw data
{
"_id": null,
"home_page": null,
"name": "cued-speech",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": "Boubacar Sow <boubasow.pro@gmail.com>",
"keywords": "accessibility, cued-speech, mediapipe, speech-recognition, video-processing, whisper",
"author": null,
"author_email": "Boubacar Sow <boubasow.pro@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/f0/f1/cbf5a0ccf0b3d7c39c05fc1f13b578eb88bf3c56de52f718575e0f44f6fe/cued_speech-0.1.0.tar.gz",
"platform": null,
"description": "# Cued Speech Processing Tools\n\nA comprehensive Python package for processing cued speech videos with both decoding and generation capabilities. This package provides functionality to decode cued speech videos into subtitled output and generate cued speech videos from text input.\n\n## Features\n\n### Decoder Features\n- **Real-time Video Processing**: Process cued speech videos using MediaPipe for landmark extraction\n- **Neural Network Inference**: Use trained CTC models for phoneme recognition\n- **French Language Correction**: Apply KenLM language models and homophone correction\n- **Subtitle Generation**: Generate subtitled videos with French sentences\n\n### Generator Features\n- **Text-to-Cued Speech**: Generate cued speech videos from French text input\n- **Whisper Integration**: Automatic speech recognition for accurate alignment\n- **MFA Alignment**: Montreal Forced Alignment for precise phoneme timing\n- **Hand Gesture Overlay**: Realistic hand shape and position rendering\n- **Automatic Synchronization**: Perfect alignment between speech and visual cues\n\n### Data Management Features\n- **Automatic Data Download**: Automatically download required model files and data\n- **GitHub Release Integration**: Seamless download from GitHub releases\n- **Smart Caching**: Avoid re-downloading existing files\n- **Easy Cleanup**: Simple commands to manage downloaded data\n\n### General Features\n- **Command Line Interface**: Easy-to-use CLI for both decoding and generation\n- **Organized Output Structure**: Separate folders for decoder and generator outputs\n- **Extensible Architecture**: Modular design for future enhancements\n- **PyPI Ready**: Ready for publication and easy installation\n\n## Installation\n\n### Prerequisites\n\n- Python 3.11 or higher\n- Pixi (to install Montreal Forced Aligner)\n- Optional: ffmpeg in PATH for video/audio handling\n\n### Install with Pixi (Recommended)\n\nUse Pixi to install MFA, then install `cued_speech` via pip inside the Pixi environment.\n\n#### 1) Install Pixi\n\n- macOS/Linux:\n```bash\ncurl -fsSL https://pixi.sh/install.sh | bash\n```\n\n- Windows (PowerShell):\n```powershell\nirm https://pixi.sh/install.ps1 | iex\n```\n\nMore options: https://pixi.sh/installation/\n\n#### 2) Create a clean Pixi environment and install MFA\n```bash\nmkdir cued-speech-env && cd cued-speech-env\npixi init\npixi add montreal-forced-aligner=3.3.4\npixi run mfa --version\n```\n\n#### 3) Install the cued_speech package (pip inside Pixi)\n\n```bash\npixi run python -m pip install cued-speech\n```\n\n#### 4) Verify and see available options\n```bash\npixi run cued-speech\n```\n\n\n### Troubleshooting NumPy/PyTorch Compatibility Issues\n\nIf you encounter NumPy/PyTorch compatibility errors:\n\n#### **Recommended Solution: Use Pixi**\nThe easiest way to avoid these issues is to use Pixi, which ensures exact dependency versions:\n\n```bash\n# Install Pixi\ncurl -fsSL https://pixi.sh/install.sh | bash\n\n# Clone and setup the project\ngit clone https://github.com/bsow/cued-speech.git\ncd cued-speech\npixi install\npixi shell\npixi run pip install -e .\n```\n\n#### **Alternative Solutions:**\n\n1. NumPy/PyTorch incompatibility: ensure NumPy < 2.0 if using older PyTorch\n ```bash\n pip install 'numpy>=1.24,<2.0'\n ```\n\n2. Create a fresh Pixi environment and reinstall\n ```bash\n mkdir fresh-env && cd fresh-env\n pixi init\n pixi add montreal-forced-aligner=3.3.4\n pixi run python -m pip install cued-speech\n ```\n\n**Note:** For other installation issues, see the [Troubleshooting Guide](TROUBLESHOOTING.md).\n\n## Data Setup\n\nThe package requires several model files and data for operation. These are automatically downloaded on first use, but you can also manage them manually.\n\n### Manual Data Management\n\nYou can manage data files manually using the provided commands:\n\n```bash\n# Download all required data files\ncued-speech download-data\n\n# List available data files\ncued-speech list-data\n\n# Clean up downloaded data files\ncued-speech cleanup-data --confirm\n```\n\n### Required Data Files\n\nThe following files are automatically downloaded to a `download/` folder in your current working directory:\n- `cuedspeech-model.pt` - Pre-trained neural network model\n- `phonelist.csv` - Phoneme vocabulary\n- `lexicon.txt` - French lexicon\n- `kenlm_fr.bin` - French language model\n- `homophones_dico.jsonl` - Homophone dictionary\n- `kenlm_ipa.binary` - IPA language model\n- `ipa_to_french.csv` - IPA to French mapping\n- `test_decode.mp4` - Sample video for testing\n- `test_generate.mp4` - Sample video for generation\n- `rotated_images/` - Directory containing hand shape images for generation\n\n**Note:** Data files are stored in `./download/` relative to where you run the commands, making them easy to find and manage.\n\n## Usage\n\n### Command Line Interface\n\nThe package provides a comprehensive command-line interface for both decoding and generating cued speech videos:\n\n#### Decoding (Cued Speech \u2192 Text)\n\nDecode a cued speech video into a subtitled video.\n\nOptions:\n- `--video_path PATH` (default: `download/test_decode.mp4`): Input cued-speech video\n- `--right_speaker [True|False]` (default: `True`): Whether the speaker uses the right hand\n- `--model_path PATH` (default: `download/cuedspeech-model.pt`): Pretrained model file\n- `--output_path PATH` (default: `output/decoder/decoded_video.mp4`): Output subtitled video\n- `--vocab_path PATH` (default: `download/phonelist.csv`): Vocabulary file\n- `--lexicon_path PATH` (default: `download/lexicon.txt`): Lexicon file\n- `--kenlm_fr PATH` (default: `download/kenlm_fr.bin`): KenLM model file\n- `--homophones_path PATH` (default: `download/homophones_dico.jsonl`): Homophones dictionary\n- `--kenlm_ipa PATH` (default: `download/kenlm_ipa.binary`): IPA language model\n- `--auto_download [True|False]` (default: `True`): Auto-download missing data files\n\n```bash\n# Basic usage (uses default paths, automatically downloads data if needed)\ncued-speech decode\n\n# With custom video path\ncued-speech decode --video_path /path/to/your/video.mp4\n\n# Disable automatic data download\ncued-speech decode --auto_download False\n\n# Advanced usage with custom settings\ncued-speech decode \\\n --video_path /path/to/your/video.mp4 \\\n --output_path output/decoder/my_decoded_video.mp4 \\\n --model_path /path/to/custom_model.pt \\\n --vocab_path /path/to/custom_vocab.csv \\\n --lexicon_path /path/to/custom_lexicon.txt \\\n --kenlm_fr /path/to/custom_kenlm.bin \\\n --homophones_path /path/to/custom_homophones.jsonl \\\n --kenlm_ipa /path/to/custom_lm.binary \\\n --right_speaker True\n```\n\n#### Generation (Video \u2192 Cued Speech)\n\nGenerate a cued speech video from a video file. Text is extracted with Whisper unless `--skip-whisper` is used and `--text` is provided.\n\nArguments:\n- `VIDEO_PATH` (positional): Path to input video file\n\nOptions:\n- `--text TEXT` (default: None): Provide text manually (otherwise Whisper extracts it)\n- `--output_path PATH` (default: `output/generator/generated_cued_speech.mp4`): Output video path\n- `--audio_path PATH` (default: None): Optional audio file (extracted from video if not provided)\n- `--language [french|...]` (default: `french`): Processing language\n- `--skip-whisper` (flag): Skip Whisper download/transcription (requires `--text`)\n- `--easing [linear|ease_in_out_cubic|ease_out_elastic|ease_in_out_back]` (default: `ease_in_out_cubic`): Gesture easing\n- `--morphing/--no-morphing` (default: `--morphing`): Hand shape morphing\n- `--transparency/--no-transparency` (default: `--transparency`): Transparency effects during transitions\n- `--curving/--no-curving` (default: `--curving`): Curved trajectories\n\n```bash\n# Basic usage (text extracted automatically from video)\ncued-speech generate input_video.mp4\n\n# With custom output path\ncued-speech generate speaker_video.mp4 --output_path output/generator/my_generated_video.mp4\n\n# With custom audio file\ncued-speech generate speaker_video.mp4 --audio_path custom_audio.wav\n\n# With different language\ncued-speech generate speaker_video.mp4 --language english\n\n# With manual text (optional)\ncued-speech generate speaker_video.mp4 --text \"Merci beaucoup pour votre attention\"\n\n# Skip Whisper if you have SSL issues\ncued-speech generate speaker_video.mp4 --skip-whisper --text \"Merci beaucoup pour votre attention\"\n```\n\n### Output Structure\n\nThe package organizes outputs in a structured way:\n\n```\noutput/\n\u251c\u2500\u2500 decoder/ # Decoded videos with subtitles\n\u2502 \u2514\u2500\u2500 decoded_video.mp4\n\u2514\u2500\u2500 generator/ # Generated cued speech videos\n \u251c\u2500\u2500 audio.wav # Extracted/processed audio\n \u251c\u2500\u2500 audio.TextGrid # MFA alignment results\n \u251c\u2500\u2500 rendered_video.mp4 # Video with hand cues (no audio)\n \u251c\u2500\u2500 final_rendered_video.mp4 # Final output with audio\n \u2514\u2500\u2500 mfa_input/ # MFA temporary files\n```\n\n### Python API\n\nYou can also use the package programmatically:\n\n#### Decoder API\n\n```python\nfrom cued_speech import decode_video\n\n# Decode a cued speech video\ndecode_video(\n video_path=\"input.mp4\",\n right_speaker=True,\n model_path=\"/path/to/model.pt\",\n output_path=\"output/decoder/decoded.mp4\",\n vocab_path=\"/path/to/vocab.csv\",\n lexicon_path=\"/path/to/lexicon.txt\",\n kenlm_model_path=\"/path/to/kenlm.bin\",\n homophones_path=\"/path/to/homophones.jsonl\",\n lm_path=\"/path/to/lm.binary\"\n)\n```\n\n#### Generator API\n\n```python\nfrom cued_speech import generate_cue\n\n# Generate a cued speech video (text extracted automatically)\nresult_path = generate_cue(\n text=None, # Will be extracted from video using Whisper\n video_path=\"speaker_video.mp4\",\n output_path=\"output/generator/generated.mp4\",\n audio_path=None, # Will extract from video\n config={\n \"language\": \"french\",\n \"hand_scale_factor\": 0.75,\n \"video_codec\": \"libx264\",\n \"audio_codec\": \"aac\"\n }\n)\nprint(f\"Generated video saved to: {result_path}\")\n\n# Or with manual text\nresult_path = generate_cue(\n text=\"Bonjour tout le monde\",\n video_path=\"speaker_video.mp4\",\n output_path=\"output/generator/generated.mp4\"\n)\n```\n\n## Architecture\n\n### Core Components\n\n#### Decoder Components\n1. **MediaPipe Integration**: Extracts hand and lip landmarks from video frames\n2. **Feature Extraction**: Processes landmarks into hand shape, position, and lip features\n3. **Neural Network**: Three-stream fusion encoder with CTC output\n4. **Language Model**: KenLM-based beam search for French sentence correction\n5. **Video Processing**: Generates subtitled output with synchronized audio\n\n#### Generator Components\n1. **Whisper Integration**: Automatic speech recognition for transcription\n2. **MFA Alignment**: Montreal Forced Alignment for precise phoneme timing\n3. **Cue Mapping**: Maps phonemes to hand shapes and positions using cued speech rules\n4. **Hand Rendering**: Overlays realistic hand gestures onto video frames\n5. **Synchronization**: Ensures perfect timing between speech and visual cues\n\n### Model Architecture\n\n#### Decoder Architecture\nThe decoder uses a three-stream fusion encoder:\n- **Hand Shape Stream**: Processes hand landmark positions and geometric features\n- **Hand Position Stream**: Analyzes hand movement and positioning\n- **Lips Stream**: Extracts lip movement and facial features\n\n#### Generator Architecture\nThe generator follows a multi-stage pipeline:\n- **Audio Processing**: Whisper-based transcription and feature extraction\n- **Phoneme Alignment**: MFA-based precise timing alignment\n- **Cue Generation**: Rule-based mapping from phonemes to hand configurations\n- **Video Rendering**: Real-time hand overlay with facial landmark tracking\n\n### Processing Pipeline\n\n#### Decoding Pipeline\n1. **Video Input**: Load and process video frames\n2. **Landmark Extraction**: Use MediaPipe to extract hand and face landmarks\n3. **Feature Computation**: Calculate geometric and temporal features\n4. **Model Inference**: Run CTC model to predict phonemes\n5. **Language Correction**: Apply beam search with language models\n6. **Subtitle Generation**: Create output video with French subtitles\n\n#### Generation Pipeline\n1. **Text Input**: Process French text for cued speech generation\n2. **Audio Extraction**: Extract or use provided audio track\n3. **Speech Recognition**: Use Whisper for accurate transcription\n4. **Phoneme Alignment**: Apply MFA for precise timing\n5. **Cue Mapping**: Map phonemes to hand shapes and positions\n6. **Video Rendering**: Overlay hand cues with perfect synchronization\n\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n\n## Acknowledgments\n\n- MediaPipe for landmark extraction\n- PyTorch for deep learning framework\n- KenLM for language modeling\n- The cued speech research community\n\n## Support\n\nFor questions and support:\n- Contact: boubasow.pro@gmail.com\n",
"bugtrack_url": null,
"license": null,
"summary": "Cued Speech Processing Tools - Decode and Generate cued speech videos",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/bsow/cued-speech/issues",
"Documentation": "https://github.com/bsow/cued-speech#readme",
"Homepage": "https://github.com/bsow/cued-speech",
"Repository": "https://github.com/bsow/cued-speech"
},
"split_keywords": [
"accessibility",
" cued-speech",
" mediapipe",
" speech-recognition",
" video-processing",
" whisper"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "dc120e15f991a1f51f2e241952dec0cf3fac0ea2061ba864a59e7eac24359409",
"md5": "0f433ec54162882ceb6b77130cad376b",
"sha256": "6bfde307654512f11aa42692b40bb2ac7e6bf40a52d0fcbdddb2e0b7f50d45c8"
},
"downloads": -1,
"filename": "cued_speech-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0f433ec54162882ceb6b77130cad376b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 119663,
"upload_time": "2025-09-02T14:27:19",
"upload_time_iso_8601": "2025-09-02T14:27:19.577157Z",
"url": "https://files.pythonhosted.org/packages/dc/12/0e15f991a1f51f2e241952dec0cf3fac0ea2061ba864a59e7eac24359409/cued_speech-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "f0f1cbf5a0ccf0b3d7c39c05fc1f13b578eb88bf3c56de52f718575e0f44f6fe",
"md5": "bbad9c055968fb9460cc6c44e66658f0",
"sha256": "5d9201f2d5988cd6b2bca429ad8dacf095234479fd3303c31962cffe28bc0c89"
},
"downloads": -1,
"filename": "cued_speech-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "bbad9c055968fb9460cc6c44e66658f0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 117792,
"upload_time": "2025-09-02T14:27:22",
"upload_time_iso_8601": "2025-09-02T14:27:22.247266Z",
"url": "https://files.pythonhosted.org/packages/f0/f1/cbf5a0ccf0b3d7c39c05fc1f13b578eb88bf3c56de52f718575e0f44f6fe/cued_speech-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-02 14:27:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "bsow",
"github_project": "cued-speech",
"github_not_found": true,
"lcname": "cued-speech"
}