cosyvoice2-eu


Namecosyvoice2-eu JSON
Version 0.2.7 PyPI version JSON
download
home_pageNone
SummaryMinimal CosyVoice2 European inference CLI (bundles runtime + Matcha)
upload_time2025-09-03 10:44:21
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseApache-2.0
keywords tts text-to-speech cosyvoice european voice-cloning french german
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # CosyVoice2-EU

<div align="center">
  <img src="https://horstmann.tech/cosyvoice2-demo/cosyvoice2-logo-clear.png" alt="CosyVoice2-EU Logo" width="400"/>
</div>

Minimal, plug-and-play CosyVoice2 European inference CLI that downloads our model from Hugging Face and runs cross-lingual zero-shot voice cloning TTS. It bundles the required `cosyvoice` runtime and `matcha` module so you don't need the full upstream repo.

Currently supports Chinese, English, Japanese, Korean, Chinese dialects (Cantonese, Sichuanese, Shanghainese, Tianjinese, Wuhanese, etc.) from the original CosyVoice2, plus our newly added French and German support!

## ⚠️ Important Notes

- **Limited Training Data**: This model was fine-tuned on 1,500 hours of French and 1,500 hours of German data. Support and capabilities for these languages may still be limited compared to the original CosyVoice2 languages.
- **Prompt Support**: You can use prompts by putting your prompt text followed by `<|endofprompt|>` at the beginning of your text (e.g., `"Speak sadly. <|endofprompt|> Your actual text here"`). However, prompt support is currently limited and experimental.

## Quick Start

1. **Install the package:**
   ```bash
   pip install cosyvoice2-eu
   ```

2. **Run French voice cloning:**
   ```bash
   cosy2-eu \
     --text "Salut ! Je vous présente CosyVoice 2, un système de synthèse vocale très avancé. Cette technologie permet de reproduire des voix de manière impressionnante." \
     --prompt french_speaker.wav \
     --out output_french.wav
   ```

3. **Run German voice cloning:**
   ```bash
   cosy2-eu \
     --text "Hallo! Ich stelle Ihnen CosyVoice 2 vor, ein sehr fortschrittliches Sprachsynthese-System. Diese Technologie kann Stimmen auf beeindruckende Weise reproduzieren." \
     --prompt german_speaker.wav \
     --out output_german.wav
   ```

4. **Use prompts for style control (experimental):**
   ```bash
   cosy2-eu \
     --text "Speak cheerfully. <|endofprompt|> Hallo! Wie geht es Ihnen heute? Ich hoffe, Sie haben einen wunderbaren Tag!" \
     --prompt german_speaker.wav \
     --out output_cheerful_german.wav
   ```

That's it! The first run will automatically download the model from Hugging Face. The model stays in memory between calls for faster subsequent inference.


## 🎯 Features

- **Easy Installation**: Simple `pip install cosyvoice2-eu` command
- **Cross-lingual Voice Cloning**: Clone voices across different languages
- **Multi-language Support**: 
  - **Original CosyVoice2**: Chinese, English, Japanese, Korean, Chinese dialects (Cantonese, Sichuanese, Shanghainese, Tianjinese, Wuhanese, etc.)
  - **European Extension**: French and German (fine-tuned on 1,500h each)
- **Model Caching**: Model stays in memory between calls for faster inference
- **Audio Concatenation**: Multiple audio segments are automatically concatenated into a single output file
- **Experimental Prompt Support**: Style control using `<|endofprompt|>` syntax (limited)
- **Bundled Runtime**: No need to install the full upstream CosyVoice2 repository
- **Hugging Face Integration**: Automatic model downloading from [Hugging Face](https://huggingface.co/Luka512/CosyVoice2-0.5B-EU)
- **Multiple LLM Backbones**: Support for different language model backbones (see below)
- **Text Frontend Disabled**: Text normalization is disabled by default for better multilingual support

## 🚀 Upcoming Features

**Multiple LLM Backbone Support** - Code is ready, models are currently training:
- **Qwen3 0.6B**: Lightweight model for efficient inference
- **EuroLLM 1.7B Instruct**: Specialized European language model
- **Mistral 7B v0.3**: Powerful multilingual capabilities

*Currently ships with the original CosyVoice2 "blankEN" backbone and our fine-tuned LM and flow models. New backbones will be available as separate model downloads once training is complete.*

## 📖 Model & Credits

This package uses our **CosyVoice2-0.5B-EU** model available at: 
🤗 [Luka512/CosyVoice2-0.5B-EU](https://huggingface.co/Luka512/CosyVoice2-0.5B-EU)

**Built on CosyVoice2**: This project is based on the excellent [CosyVoice2](https://github.com/FunAudioLLM/CosyVoice2) by FunAudioLLM, adapted for European language support with cross-lingual voice cloning capabilities.

## 📜 License

This project is licensed under the Apache License 2.0. 

**Note**: This package includes code from:
- [CosyVoice2](https://github.com/FunAudioLLM/CosyVoice2) (Apache 2.0) - Original TTS framework
- [Matcha-TTS](https://github.com/shivammathur/Matcha-TTS) (Apache 2.0) - Neural vocoder components

All original licenses and attributions are preserved.

## Installation

### From PyPI (Recommended)

```bash
pip install cosyvoice2-eu
```

### For enhanced English phonemization (optional):
```bash
pip install cosyvoice2-eu[piper]
```

**Note**: The `piper` optional dependency requires compilation tools and may fail in some environments (like Google Colab). The package will work without it, using the standard phonemizer as fallback.

If you are on Linux with GPU, ensure you install torch/torchaudio matching your CUDA and have `onnxruntime-gpu` available. If CPU-only, `onnxruntime` will be sufficient.

### Development Installation

```bash
cd standalone_infer
pip install -e .
```

## Usage

**French Example:**
```bash
cosy2-eu \
  --text "Salut ! Je vous présente CosyVoice 2, un système de synthèse vocale très avancé. Cette technologie permet de reproduire des voix de manière impressionnante." \
  --prompt french_speaker.wav \
  --out output_french.wav
```

**German Example:**
```bash
cosy2-eu \
  --text "Hallo! Ich stelle Ihnen CosyVoice 2 vor, ein sehr fortschrittliches Sprachsynthese-System. Diese Technologie kann Stimmen auf beeindruckende Weise reproduzieren." \
  --prompt german_speaker.wav \
  --out output_german.wav
```

**Prompt-based Style Control (Experimental):**
```bash
cosy2-eu \
  --text "Speak cheerfully. <|endofprompt|> Hallo! Wie geht es Ihnen heute? Ich hoffe, Sie haben einen wunderbaren Tag!" \
  --prompt german_speaker.wav \
  --out output_cheerful_german.wav
```

**English/Chinese/Japanese/Korean (Original CosyVoice2 languages):**
```bash
cosy2-eu \
  --text "Hello! This is CosyVoice 2, demonstrating cross-lingual voice cloning capabilities." \
  --prompt any_speaker.wav \
  --out output_english.wav
```

First run will download the model assets to `~/.cache/cosyvoice2-eu` (configurable via `--model-dir`). The model stays in memory between calls for faster subsequent inference.

**Advanced options:** `--setting`, `--llm-run-id`, `--flow-run-id`, `--hifigan-run-id`, `--final`, `--stream`, `--speed`, `--text-frontend` (enable text normalization), `--backbone`, `--repo-id`, `--no-hf`, `--clear-cache` (reload model).





            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cosyvoice2-eu",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "tts, text-to-speech, cosyvoice, european, voice-cloning, french, german",
    "author": null,
    "author_email": "Tim Luka Horstmann <lukahorstmann@gmx.de>",
    "download_url": "https://files.pythonhosted.org/packages/7a/55/e02b6111be8b23ae2fc0cff55ac793cf41209b5696c6aaa70cd960f2c6da/cosyvoice2_eu-0.2.7.tar.gz",
    "platform": null,
    "description": "# CosyVoice2-EU\n\n<div align=\"center\">\n  <img src=\"https://horstmann.tech/cosyvoice2-demo/cosyvoice2-logo-clear.png\" alt=\"CosyVoice2-EU Logo\" width=\"400\"/>\n</div>\n\nMinimal, plug-and-play CosyVoice2 European inference CLI that downloads our model from Hugging Face and runs cross-lingual zero-shot voice cloning TTS. It bundles the required `cosyvoice` runtime and `matcha` module so you don't need the full upstream repo.\n\nCurrently supports Chinese, English, Japanese, Korean, Chinese dialects (Cantonese, Sichuanese, Shanghainese, Tianjinese, Wuhanese, etc.) from the original CosyVoice2, plus our newly added French and German support!\n\n## \u26a0\ufe0f Important Notes\n\n- **Limited Training Data**: This model was fine-tuned on 1,500 hours of French and 1,500 hours of German data. Support and capabilities for these languages may still be limited compared to the original CosyVoice2 languages.\n- **Prompt Support**: You can use prompts by putting your prompt text followed by `<|endofprompt|>` at the beginning of your text (e.g., `\"Speak sadly. <|endofprompt|> Your actual text here\"`). However, prompt support is currently limited and experimental.\n\n## Quick Start\n\n1. **Install the package:**\n   ```bash\n   pip install cosyvoice2-eu\n   ```\n\n2. **Run French voice cloning:**\n   ```bash\n   cosy2-eu \\\n     --text \"Salut ! Je vous pr\u00e9sente CosyVoice 2, un syst\u00e8me de synth\u00e8se vocale tr\u00e8s avanc\u00e9. Cette technologie permet de reproduire des voix de mani\u00e8re impressionnante.\" \\\n     --prompt french_speaker.wav \\\n     --out output_french.wav\n   ```\n\n3. **Run German voice cloning:**\n   ```bash\n   cosy2-eu \\\n     --text \"Hallo! Ich stelle Ihnen CosyVoice 2 vor, ein sehr fortschrittliches Sprachsynthese-System. Diese Technologie kann Stimmen auf beeindruckende Weise reproduzieren.\" \\\n     --prompt german_speaker.wav \\\n     --out output_german.wav\n   ```\n\n4. **Use prompts for style control (experimental):**\n   ```bash\n   cosy2-eu \\\n     --text \"Speak cheerfully. <|endofprompt|> Hallo! Wie geht es Ihnen heute? Ich hoffe, Sie haben einen wunderbaren Tag!\" \\\n     --prompt german_speaker.wav \\\n     --out output_cheerful_german.wav\n   ```\n\nThat's it! The first run will automatically download the model from Hugging Face. The model stays in memory between calls for faster subsequent inference.\n\n\n## \ud83c\udfaf Features\n\n- **Easy Installation**: Simple `pip install cosyvoice2-eu` command\n- **Cross-lingual Voice Cloning**: Clone voices across different languages\n- **Multi-language Support**: \n  - **Original CosyVoice2**: Chinese, English, Japanese, Korean, Chinese dialects (Cantonese, Sichuanese, Shanghainese, Tianjinese, Wuhanese, etc.)\n  - **European Extension**: French and German (fine-tuned on 1,500h each)\n- **Model Caching**: Model stays in memory between calls for faster inference\n- **Audio Concatenation**: Multiple audio segments are automatically concatenated into a single output file\n- **Experimental Prompt Support**: Style control using `<|endofprompt|>` syntax (limited)\n- **Bundled Runtime**: No need to install the full upstream CosyVoice2 repository\n- **Hugging Face Integration**: Automatic model downloading from [Hugging Face](https://huggingface.co/Luka512/CosyVoice2-0.5B-EU)\n- **Multiple LLM Backbones**: Support for different language model backbones (see below)\n- **Text Frontend Disabled**: Text normalization is disabled by default for better multilingual support\n\n## \ud83d\ude80 Upcoming Features\n\n**Multiple LLM Backbone Support** - Code is ready, models are currently training:\n- **Qwen3 0.6B**: Lightweight model for efficient inference\n- **EuroLLM 1.7B Instruct**: Specialized European language model\n- **Mistral 7B v0.3**: Powerful multilingual capabilities\n\n*Currently ships with the original CosyVoice2 \"blankEN\" backbone and our fine-tuned LM and flow models. New backbones will be available as separate model downloads once training is complete.*\n\n## \ud83d\udcd6 Model & Credits\n\nThis package uses our **CosyVoice2-0.5B-EU** model available at: \n\ud83e\udd17 [Luka512/CosyVoice2-0.5B-EU](https://huggingface.co/Luka512/CosyVoice2-0.5B-EU)\n\n**Built on CosyVoice2**: This project is based on the excellent [CosyVoice2](https://github.com/FunAudioLLM/CosyVoice2) by FunAudioLLM, adapted for European language support with cross-lingual voice cloning capabilities.\n\n## \ud83d\udcdc License\n\nThis project is licensed under the Apache License 2.0. \n\n**Note**: This package includes code from:\n- [CosyVoice2](https://github.com/FunAudioLLM/CosyVoice2) (Apache 2.0) - Original TTS framework\n- [Matcha-TTS](https://github.com/shivammathur/Matcha-TTS) (Apache 2.0) - Neural vocoder components\n\nAll original licenses and attributions are preserved.\n\n## Installation\n\n### From PyPI (Recommended)\n\n```bash\npip install cosyvoice2-eu\n```\n\n### For enhanced English phonemization (optional):\n```bash\npip install cosyvoice2-eu[piper]\n```\n\n**Note**: The `piper` optional dependency requires compilation tools and may fail in some environments (like Google Colab). The package will work without it, using the standard phonemizer as fallback.\n\nIf you are on Linux with GPU, ensure you install torch/torchaudio matching your CUDA and have `onnxruntime-gpu` available. If CPU-only, `onnxruntime` will be sufficient.\n\n### Development Installation\n\n```bash\ncd standalone_infer\npip install -e .\n```\n\n## Usage\n\n**French Example:**\n```bash\ncosy2-eu \\\n  --text \"Salut ! Je vous pr\u00e9sente CosyVoice 2, un syst\u00e8me de synth\u00e8se vocale tr\u00e8s avanc\u00e9. Cette technologie permet de reproduire des voix de mani\u00e8re impressionnante.\" \\\n  --prompt french_speaker.wav \\\n  --out output_french.wav\n```\n\n**German Example:**\n```bash\ncosy2-eu \\\n  --text \"Hallo! Ich stelle Ihnen CosyVoice 2 vor, ein sehr fortschrittliches Sprachsynthese-System. Diese Technologie kann Stimmen auf beeindruckende Weise reproduzieren.\" \\\n  --prompt german_speaker.wav \\\n  --out output_german.wav\n```\n\n**Prompt-based Style Control (Experimental):**\n```bash\ncosy2-eu \\\n  --text \"Speak cheerfully. <|endofprompt|> Hallo! Wie geht es Ihnen heute? Ich hoffe, Sie haben einen wunderbaren Tag!\" \\\n  --prompt german_speaker.wav \\\n  --out output_cheerful_german.wav\n```\n\n**English/Chinese/Japanese/Korean (Original CosyVoice2 languages):**\n```bash\ncosy2-eu \\\n  --text \"Hello! This is CosyVoice 2, demonstrating cross-lingual voice cloning capabilities.\" \\\n  --prompt any_speaker.wav \\\n  --out output_english.wav\n```\n\nFirst run will download the model assets to `~/.cache/cosyvoice2-eu` (configurable via `--model-dir`). The model stays in memory between calls for faster subsequent inference.\n\n**Advanced options:** `--setting`, `--llm-run-id`, `--flow-run-id`, `--hifigan-run-id`, `--final`, `--stream`, `--speed`, `--text-frontend` (enable text normalization), `--backbone`, `--repo-id`, `--no-hf`, `--clear-cache` (reload model).\n\n\n\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Minimal CosyVoice2 European inference CLI (bundles runtime + Matcha)",
    "version": "0.2.7",
    "project_urls": {
        "homepage": "https://huggingface.co/Luka512/CosyVoice2-0.5B-EU"
    },
    "split_keywords": [
        "tts",
        " text-to-speech",
        " cosyvoice",
        " european",
        " voice-cloning",
        " french",
        " german"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7e0e92d8f368cd1ea9191eadca274271a752617e05fb816013a2ee5fcfcf0cba",
                "md5": "9fd092be99784d71cf3797b9248149c0",
                "sha256": "a8179415db95647af4a97b8c7684fd025bf67729cb53f8772a296dbae3961c0c"
            },
            "downloads": -1,
            "filename": "cosyvoice2_eu-0.2.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9fd092be99784d71cf3797b9248149c0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 743207,
            "upload_time": "2025-09-03T10:44:19",
            "upload_time_iso_8601": "2025-09-03T10:44:19.580685Z",
            "url": "https://files.pythonhosted.org/packages/7e/0e/92d8f368cd1ea9191eadca274271a752617e05fb816013a2ee5fcfcf0cba/cosyvoice2_eu-0.2.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7a55e02b6111be8b23ae2fc0cff55ac793cf41209b5696c6aaa70cd960f2c6da",
                "md5": "4666dc9f94efd614ed44ce0b25f9bbce",
                "sha256": "779c325f6e30d5428393a98a1e923067fbb825bd37b9bd99145e5ec990e210db"
            },
            "downloads": -1,
            "filename": "cosyvoice2_eu-0.2.7.tar.gz",
            "has_sig": false,
            "md5_digest": "4666dc9f94efd614ed44ce0b25f9bbce",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 696186,
            "upload_time": "2025-09-03T10:44:21",
            "upload_time_iso_8601": "2025-09-03T10:44:21.055198Z",
            "url": "https://files.pythonhosted.org/packages/7a/55/e02b6111be8b23ae2fc0cff55ac793cf41209b5696c6aaa70cd960f2c6da/cosyvoice2_eu-0.2.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-03 10:44:21",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "cosyvoice2-eu"
}
        
Elapsed time: 2.19642s