cued-speech


Namecued-speech JSON
Version 0.4.0 PyPI version JSON
download
home_pageNone
SummaryCued Speech Processing Tools - Decode and Generate cued speech videos
upload_time2025-10-22 11:37:32
maintainerNone
docs_urlNone
authorNone
requires_python<3.12,>=3.11
licenseNone
keywords accessibility cued-speech mediapipe speech-recognition video-processing whisper
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Cued Speech Processing Tools

Python package for decoding and generating cued speech videos with MediaPipe and deep learning.

## Features

- **Decoder**: Convert cued speech videos to text with subtitles using neural networks and language models
- **Generator**: Create cued speech videos from text with automatic hand gesture overlay
- **Automatic Data Management**: Downloads required models and data automatically

## Installation

### Prerequisites
- Python 3.11.*
- Pixi (for Montreal Forced Aligner)

### Setup Steps

1. **Install Pixi**
```bash
# macOS/Linux
curl -fsSL https://pixi.sh/install.sh | bash

# Windows PowerShell
irm https://pixi.sh/install.ps1 | iex
```

2. **Create Pixi environment**
```bash
mkdir cued-speech-env && cd cued-speech-env
pixi init
pixi add "python==3.11"
pixi add montreal-forced-aligner=3.3.4
```

3. **Install package**
```bash
pixi run python -m pip install cued-speech
```

4. **Download data and setup MFA models**
```bash
pixi shell
cued-speech download-data
pixi run mfa models save acoustic download/french_mfa.zip --overwrite
pixi run mfa models save dictionary download/french_mfa.dict --overwrite
```

5. **Verify installation**
```bash
cued-speech --help
```

## Quick Start

### Decode Video (Cued Speech → Text)
```bash
# Basic usage with default parameters, we use the provided test video
cued-speech decode

# Custom video
cued-speech decode --video_path /path/to/video.mp4
```

### Generate Video (Text → Cued Speech)
```bash
# Text extracted automatically from video audio
cued-speech generate input_video.mp4

# Skip Whisper 
cued-speech generate video.mp4 --skip-whisper --text "Votre texte ici"
```

## Command Line Options

### Decoder
**Core Options:**
- `--video_path PATH` - Input video (default: `download/test_decode.mp4`)
- `--output_path PATH` - Output video (default: `output/decoder/decoded_video.mp4`)
- `--right_speaker [True|False]` - Speaker handedness (default: `True`)
- `--auto_download [True|False]` - Auto-download data (default: `True`)

**Model Paths (optional):**
- `--model_path PATH` - Neural network model
- `--vocab_path PATH` - Phoneme vocabulary
- `--face_tflite PATH` - Face landmark model
- `--hand_tflite PATH` - Hand landmark model
- `--pose_tflite PATH` - Pose landmark model

### Generator
**Options:**
- `VIDEO_PATH` (required) - Input video file
- `--text TEXT` - Manual text input (optional, otherwise extracted from audio)
- `--output_path PATH` - Output video (default: `output/generator/generated_cued_speech.mp4`)
- `--language LANG` - Language (default: `french`)
- `--skip-whisper` - Skip Whisper transcription (requires `--text`)
- `--easing TYPE` - Animation easing: `linear`, `ease_in_out_cubic`, `ease_out_elastic`, `ease_in_out_back`
- `--morphing/--no-morphing` - Hand shape morphing (default: enabled)
- `--transparency/--no-transparency` - Transparency effects (default: enabled)
- `--curving/--no-curving` - Curved trajectories (default: enabled)

## Python API

### Decoder
```python
from cued_speech import decode_video

decode_video(
    video_path="input.mp4",
    right_speaker=True,
    output_path="output/decoder/"
)
```

### Generator
```python
from cued_speech import generate_cue
import whisper

# Automatic text extraction
model = whisper.load_model("medium", download_root="download")
result_path = generate_cue(
    text=None,  # Extracted from video
    video_path="video.mp4",
    output_path="output/generator/",
    config={
        "model": model,  # Optional preloaded Whisper model
        "language": "french",
        "easing_function": "ease_in_out_cubic",
        "enable_morphing": True,
        "enable_transparency": True,
        "enable_curving": True,
    }
)

# With manual text
result_path = generate_cue(
    text="Bonjour tout le monde",
    video_path="video.mp4",
    output_path="output/generator/",
    config={"skip_whisper": True}
)
```

## Data Management

```bash
# Download all required data
cued-speech download-data

# List available data
cued-speech list-data

# Clean up data
cued-speech cleanup-data --confirm
```

### Downloaded Files

Data is stored in `./download/`:

**Decoder:**
- `cuedspeech-model.pt` - Neural network model
- `phonelist.csv`, `lexicon.txt` - Vocabularies
- `kenlm_fr.bin`, `kenlm_ipa.binary` - Language models
- `homophones_dico.jsonl` - Homophone dictionary
- `face_landmarker.task` - Face landmarks (478 points, 3.6 MB)
- `hand_landmarker.task` - Hand landmarks (21 points/hand, 7.5 MB)
- `pose_landmarker_full.task` - Pose landmarks (33 points, 9.0 MB)

**Generator:**
- `rotated_images/` - Hand shape images
- `french_mfa.dict`, `french_mfa.zip` - MFA models

**Test Files:**
- `test_decode.mp4`, `test_generate.mp4`

## Architecture

### Decoder
- **MediaPipe Tasks API**: Latest float16 models for landmark detection
- **Neural Network**: Three-stream fusion encoder (hand shape, position, lips)
- **CTC Decoder**: Phoneme recognition with beam search
- **Language Model**: KenLM for French sentence correction

### Generator
- **Whisper**: Speech-to-text transcription
- **MFA**: Montreal Forced Alignment for phoneme timing
- **Dynamic Scaling**: Hand size automatically adapts to face width
- **Hand Rendering**: MediaPipe-based hand landmark detection for accurate positioning

## Notes

- Models designed for 30 FPS videos
- Hand size automatically scales based on detected face width
- TFLite models use MediaPipe Tasks API (`.task` files) or fallback to TFLite Interpreter (`.tflite`)
- Automatic fallback to MediaPipe Holistic if TFLite models fail

## License

MIT License - see LICENSE file

## Support

Contact: boubasow.pro@gmail.com

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cued-speech",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.11",
    "maintainer_email": "Boubacar Sow <boubasow.pro@gmail.com>",
    "keywords": "accessibility, cued-speech, mediapipe, speech-recognition, video-processing, whisper",
    "author": null,
    "author_email": "Boubacar Sow <boubasow.pro@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/8a/29/a2792ee1f6d1e505d1f1b921d927775a31913efc84e8ec6f836e43e1492c/cued_speech-0.4.0.tar.gz",
    "platform": null,
    "description": "# Cued Speech Processing Tools\n\nPython package for decoding and generating cued speech videos with MediaPipe and deep learning.\n\n## Features\n\n- **Decoder**: Convert cued speech videos to text with subtitles using neural networks and language models\n- **Generator**: Create cued speech videos from text with automatic hand gesture overlay\n- **Automatic Data Management**: Downloads required models and data automatically\n\n## Installation\n\n### Prerequisites\n- Python 3.11.*\n- Pixi (for Montreal Forced Aligner)\n\n### Setup Steps\n\n1. **Install Pixi**\n```bash\n# macOS/Linux\ncurl -fsSL https://pixi.sh/install.sh | bash\n\n# Windows PowerShell\nirm https://pixi.sh/install.ps1 | iex\n```\n\n2. **Create Pixi environment**\n```bash\nmkdir cued-speech-env && cd cued-speech-env\npixi init\npixi add \"python==3.11\"\npixi add montreal-forced-aligner=3.3.4\n```\n\n3. **Install package**\n```bash\npixi run python -m pip install cued-speech\n```\n\n4. **Download data and setup MFA models**\n```bash\npixi shell\ncued-speech download-data\npixi run mfa models save acoustic download/french_mfa.zip --overwrite\npixi run mfa models save dictionary download/french_mfa.dict --overwrite\n```\n\n5. **Verify installation**\n```bash\ncued-speech --help\n```\n\n## Quick Start\n\n### Decode Video (Cued Speech \u2192 Text)\n```bash\n# Basic usage with default parameters, we use the provided test video\ncued-speech decode\n\n# Custom video\ncued-speech decode --video_path /path/to/video.mp4\n```\n\n### Generate Video (Text \u2192 Cued Speech)\n```bash\n# Text extracted automatically from video audio\ncued-speech generate input_video.mp4\n\n# Skip Whisper \ncued-speech generate video.mp4 --skip-whisper --text \"Votre texte ici\"\n```\n\n## Command Line Options\n\n### Decoder\n**Core Options:**\n- `--video_path PATH` - Input video (default: `download/test_decode.mp4`)\n- `--output_path PATH` - Output video (default: `output/decoder/decoded_video.mp4`)\n- `--right_speaker [True|False]` - Speaker handedness (default: `True`)\n- `--auto_download [True|False]` - Auto-download data (default: `True`)\n\n**Model Paths (optional):**\n- `--model_path PATH` - Neural network model\n- `--vocab_path PATH` - Phoneme vocabulary\n- `--face_tflite PATH` - Face landmark model\n- `--hand_tflite PATH` - Hand landmark model\n- `--pose_tflite PATH` - Pose landmark model\n\n### Generator\n**Options:**\n- `VIDEO_PATH` (required) - Input video file\n- `--text TEXT` - Manual text input (optional, otherwise extracted from audio)\n- `--output_path PATH` - Output video (default: `output/generator/generated_cued_speech.mp4`)\n- `--language LANG` - Language (default: `french`)\n- `--skip-whisper` - Skip Whisper transcription (requires `--text`)\n- `--easing TYPE` - Animation easing: `linear`, `ease_in_out_cubic`, `ease_out_elastic`, `ease_in_out_back`\n- `--morphing/--no-morphing` - Hand shape morphing (default: enabled)\n- `--transparency/--no-transparency` - Transparency effects (default: enabled)\n- `--curving/--no-curving` - Curved trajectories (default: enabled)\n\n## Python API\n\n### Decoder\n```python\nfrom cued_speech import decode_video\n\ndecode_video(\n    video_path=\"input.mp4\",\n    right_speaker=True,\n    output_path=\"output/decoder/\"\n)\n```\n\n### Generator\n```python\nfrom cued_speech import generate_cue\nimport whisper\n\n# Automatic text extraction\nmodel = whisper.load_model(\"medium\", download_root=\"download\")\nresult_path = generate_cue(\n    text=None,  # Extracted from video\n    video_path=\"video.mp4\",\n    output_path=\"output/generator/\",\n    config={\n        \"model\": model,  # Optional preloaded Whisper model\n        \"language\": \"french\",\n        \"easing_function\": \"ease_in_out_cubic\",\n        \"enable_morphing\": True,\n        \"enable_transparency\": True,\n        \"enable_curving\": True,\n    }\n)\n\n# With manual text\nresult_path = generate_cue(\n    text=\"Bonjour tout le monde\",\n    video_path=\"video.mp4\",\n    output_path=\"output/generator/\",\n    config={\"skip_whisper\": True}\n)\n```\n\n## Data Management\n\n```bash\n# Download all required data\ncued-speech download-data\n\n# List available data\ncued-speech list-data\n\n# Clean up data\ncued-speech cleanup-data --confirm\n```\n\n### Downloaded Files\n\nData is stored in `./download/`:\n\n**Decoder:**\n- `cuedspeech-model.pt` - Neural network model\n- `phonelist.csv`, `lexicon.txt` - Vocabularies\n- `kenlm_fr.bin`, `kenlm_ipa.binary` - Language models\n- `homophones_dico.jsonl` - Homophone dictionary\n- `face_landmarker.task` - Face landmarks (478 points, 3.6 MB)\n- `hand_landmarker.task` - Hand landmarks (21 points/hand, 7.5 MB)\n- `pose_landmarker_full.task` - Pose landmarks (33 points, 9.0 MB)\n\n**Generator:**\n- `rotated_images/` - Hand shape images\n- `french_mfa.dict`, `french_mfa.zip` - MFA models\n\n**Test Files:**\n- `test_decode.mp4`, `test_generate.mp4`\n\n## Architecture\n\n### Decoder\n- **MediaPipe Tasks API**: Latest float16 models for landmark detection\n- **Neural Network**: Three-stream fusion encoder (hand shape, position, lips)\n- **CTC Decoder**: Phoneme recognition with beam search\n- **Language Model**: KenLM for French sentence correction\n\n### Generator\n- **Whisper**: Speech-to-text transcription\n- **MFA**: Montreal Forced Alignment for phoneme timing\n- **Dynamic Scaling**: Hand size automatically adapts to face width\n- **Hand Rendering**: MediaPipe-based hand landmark detection for accurate positioning\n\n## Notes\n\n- Models designed for 30 FPS videos\n- Hand size automatically scales based on detected face width\n- TFLite models use MediaPipe Tasks API (`.task` files) or fallback to TFLite Interpreter (`.tflite`)\n- Automatic fallback to MediaPipe Holistic if TFLite models fail\n\n## License\n\nMIT License - see LICENSE file\n\n## Support\n\nContact: boubasow.pro@gmail.com\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Cued Speech Processing Tools - Decode and Generate cued speech videos",
    "version": "0.4.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/boubacar-sow/cued-speech/issues",
        "Documentation": "https://github.com/boubacar-sow/cued-speechblob/main/README.md",
        "Homepage": "https://github.com/boubacar-sow/cued-speech",
        "Repository": "https://github.com/boubacar-sow/cued-speech"
    },
    "split_keywords": [
        "accessibility",
        " cued-speech",
        " mediapipe",
        " speech-recognition",
        " video-processing",
        " whisper"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "24bf0e421bc8a96d91bdcf5fa056de8967825a331234c81a7fcb5ad863099916",
                "md5": "cc533a56b3d161df048c9ad3032544ce",
                "sha256": "9a7459c599798c8d840c4dc2df7f74043522bea9f859d7c4e098f4f2c8d0f05c"
            },
            "downloads": -1,
            "filename": "cued_speech-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cc533a56b3d161df048c9ad3032544ce",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.11",
            "size": 150704,
            "upload_time": "2025-10-22T11:37:30",
            "upload_time_iso_8601": "2025-10-22T11:37:30.268575Z",
            "url": "https://files.pythonhosted.org/packages/24/bf/0e421bc8a96d91bdcf5fa056de8967825a331234c81a7fcb5ad863099916/cued_speech-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8a29a2792ee1f6d1e505d1f1b921d927775a31913efc84e8ec6f836e43e1492c",
                "md5": "1e33e302dc6953ab17c863505c338e08",
                "sha256": "bde65f3c5c952d412b32341cdbf28e471f60685c141d02836683f20336bc4410"
            },
            "downloads": -1,
            "filename": "cued_speech-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "1e33e302dc6953ab17c863505c338e08",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.11",
            "size": 145093,
            "upload_time": "2025-10-22T11:37:32",
            "upload_time_iso_8601": "2025-10-22T11:37:32.397767Z",
            "url": "https://files.pythonhosted.org/packages/8a/29/a2792ee1f6d1e505d1f1b921d927775a31913efc84e8ec6f836e43e1492c/cued_speech-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-22 11:37:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "boubacar-sow",
    "github_project": "cued-speech",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "cued-speech"
}
        
Elapsed time: 1.78897s