# minyt
<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
[](https://badge.fury.io/py/minyt)
[](https://opensource.org/licenses/Apache-2.0)
[](https://www.python.org/downloads/)
**minyt** (WIP) is a Python package that simplifies the process of
downloading YouTube audio and generating high-quality transcripts using
Google’s Gemini AI. It intelligently splits long audio files at natural
silence points and processes chunks in parallel for optimal performance.
## Features
- **YouTube Audio Download**: Extract audio from any YouTube video using
`yt-dlp`
- **Smart Audio Splitting**: Automatically detect silence and split
audio at natural break points
- **AI-Powered Transcription**: Use Google’s Gemini 2.0 Flash for
accurate, context-aware transcriptions
- **Parallel Processing**: Process multiple audio chunks concurrently
for faster results
- **Customizable**: Configure chunk sizes, silence detection, and
transcription prompts
- **Clean Output**: Generate well-formatted transcripts ready for
analysis
## Quick Start
### Installation
``` bash
pip install minyt
```
### Prerequisites
1. **FFmpeg**: Required for audio processing
``` bash
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg
# Windows
# Download from https://ffmpeg.org/download.html
```
2. **Google Gemini API Key**: Get your API key from [Google AI
Studio](https://makersuite.google.com/app/apikey)
``` bash
export GEMINI_API_KEY="your-api-key-here"
```
### Basic Usage
``` python
import asyncio
from pathlib import Path
from minyt.core import *
# Download audio from a YouTube video
video_id = "dQw4w9WgXcQ" # Replace with your video ID
audio_file = download_audio(video_id, Path("_audio"))
# Detect silence and find optimal split points
_, silence_data = detect_silence(audio_file)
silence_ends = parse_silence_ends(silence_data)
total_duration = get_audio_duration(audio_file)
split_points = find_split_points(silence_ends, total_duration, chunk_len=600)
# Split audio into manageable chunks
chunks = split_audio(audio_file, split_points, dest_dir="_audio_chunks")
# Transcribe all chunks using Gemini AI
async def main():
transcript = await transcribe_audio(
chunks_dir="_audio_chunks",
dest_file="_transcripts/transcript.txt",
prompt="Please transcribe this audio file verbatim, maintaining speaker clarity and context."
)
print(f"Transcript saved to: _transcripts/transcript.txt")
asyncio.run(main())
```
## Detailed Usage
### Step 1: Download YouTube Audio
``` python
from minyt.core import download_audio
from pathlib import Path
# Download audio from a YouTube video
video_id = "your-video-id-here"
audio_file = download_audio(video_id, Path("downloads"))
print(f"Audio downloaded to: {audio_file}")
```
### Step 2: Process Audio with Smart Splitting
``` python
from minyt.core import detect_silence, parse_silence_ends, find_split_points, split_audio
# Detect silence in the audio file
_, silence_data = detect_silence(audio_file)
# Parse silence end points
silence_ends = parse_silence_ends(silence_data)
# Find optimal split points (aiming for 10-minute chunks)
total_duration = get_audio_duration(audio_file)
split_points = find_split_points(silence_ends, total_duration, chunk_len=600)
# Split audio into chunks
chunks = split_audio(audio_file, split_points, dest_dir="audio_chunks")
print(f"Created {len(chunks)} audio chunks")
```
### Step 3: Transcribe with Gemini AI
``` python
import asyncio
from minyt.core import transcribe_audio
async def transcribe_video():
transcript = await transcribe_audio(
chunks_dir="audio_chunks",
dest_file="transcripts/final_transcript.txt",
model="gemini-2.0-flash-001", # Default model
max_concurrent=3, # Process 3 chunks simultaneously
prompt="Please transcribe this audio accurately, preserving speaker names and technical terms."
)
return transcript
# Run transcription
transcript = asyncio.run(transcribe_video())
print("Transcription completed!")
```
## Configuration
### Environment Variables
``` bash
# Required
export GEMINI_API_KEY="your-gemini-api-key"
# Optional: Configure logging level
export LOG_LEVEL="INFO"
```
### Customization Options
``` python
# Custom silence detection (adjust sensitivity)
_, silence_data = detect_silence(audio_file) # Uses -30dB threshold, 0.5s duration
# Custom chunk size (in seconds)
split_points = find_split_points(silence_ends, total_duration, chunk_len=300) # 5-minute chunks
# Custom transcription settings
transcript = await transcribe_audio(
chunks_dir="chunks",
dest_file="output.txt",
model="gemini-2.0-flash-001", # Different Gemini model
max_concurrent=5, # More parallel processing
prompt="Custom transcription instructions here..."
)
```
## Development
### Install in Development Mode
``` bash
# Clone the repository
git clone https://github.com/franckalbinet/minyt.git
cd minyt
# Install in development mode
pip install -e .
# Make changes in the nbs/ directory
# ...
# Compile changes to apply to minyt package
nbdev_prepare
```
### Dependencies
- `fastcore`: Core utilities
- `google-genai`: Google Gemini AI client
- `yt-dlp`: YouTube video downloader
- `ffmpeg-python`: Audio processing
- `tqdm`: Progress bars
- `rich`: Enhanced console output
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
For major changes, please open an issue first to discuss what you would
like to change.
## License
This project is licensed under the Apache License 2.0 - see the
[LICENSE](LICENSE) file for details.
## Acknowledgments
- [yt-dlp](https://github.com/yt-dlp/yt-dlp) for YouTube video
downloading
- [Google Gemini](https://ai.google.dev/) for AI-powered transcription
- [FFmpeg](https://ffmpeg.org/) for audio processing capabilities
## Support
If you encounter any issues or have questions:
1. Check the [documentation](https://franckalbinet.github.io/minyt/)
2. Open an [issue](https://github.com/franckalbinet/minyt/issues)
3. Contact the maintainer: franckalbinet@gmail.com
------------------------------------------------------------------------
Raw data
{
"_id": null,
"home_page": "https://github.com/franckalbinet/minyt",
"name": "minyt",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "nbdev jupyter notebook python",
"author": "Franck Albinet",
"author_email": "franckalbinet@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/32/77/b62b16072ebcefbd61ff2485b9b833a72b8b58e5f95ddd5933b17d929402/minyt-0.0.3.tar.gz",
"platform": null,
"description": "# minyt\n\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n[](https://badge.fury.io/py/minyt)\n[](https://opensource.org/licenses/Apache-2.0)\n[](https://www.python.org/downloads/)\n\n**minyt** (WIP) is a Python package that simplifies the process of\ndownloading YouTube audio and generating high-quality transcripts using\nGoogle\u2019s Gemini AI. It intelligently splits long audio files at natural\nsilence points and processes chunks in parallel for optimal performance.\n\n## Features\n\n- **YouTube Audio Download**: Extract audio from any YouTube video using\n `yt-dlp`\n- **Smart Audio Splitting**: Automatically detect silence and split\n audio at natural break points\n- **AI-Powered Transcription**: Use Google\u2019s Gemini 2.0 Flash for\n accurate, context-aware transcriptions\n- **Parallel Processing**: Process multiple audio chunks concurrently\n for faster results\n- **Customizable**: Configure chunk sizes, silence detection, and\n transcription prompts\n- **Clean Output**: Generate well-formatted transcripts ready for\n analysis\n\n## Quick Start\n\n### Installation\n\n``` bash\npip install minyt\n```\n\n### Prerequisites\n\n1. **FFmpeg**: Required for audio processing\n\n ``` bash\n # macOS\n brew install ffmpeg\n\n # Ubuntu/Debian\n sudo apt update && sudo apt install ffmpeg\n\n # Windows\n # Download from https://ffmpeg.org/download.html\n ```\n\n2. **Google Gemini API Key**: Get your API key from [Google AI\n Studio](https://makersuite.google.com/app/apikey)\n\n ``` bash\n export GEMINI_API_KEY=\"your-api-key-here\"\n ```\n\n### Basic Usage\n\n``` python\nimport asyncio\nfrom pathlib import Path\nfrom minyt.core import *\n\n# Download audio from a YouTube video\nvideo_id = \"dQw4w9WgXcQ\" # Replace with your video ID\naudio_file = download_audio(video_id, Path(\"_audio\"))\n\n# Detect silence and find optimal split points\n_, silence_data = detect_silence(audio_file)\nsilence_ends = parse_silence_ends(silence_data)\ntotal_duration = get_audio_duration(audio_file)\nsplit_points = find_split_points(silence_ends, total_duration, chunk_len=600)\n\n# Split audio into manageable chunks\nchunks = split_audio(audio_file, split_points, dest_dir=\"_audio_chunks\")\n\n# Transcribe all chunks using Gemini AI\nasync def main():\n transcript = await transcribe_audio(\n chunks_dir=\"_audio_chunks\",\n dest_file=\"_transcripts/transcript.txt\",\n prompt=\"Please transcribe this audio file verbatim, maintaining speaker clarity and context.\"\n )\n print(f\"Transcript saved to: _transcripts/transcript.txt\")\n\nasyncio.run(main())\n```\n\n## Detailed Usage\n\n### Step 1: Download YouTube Audio\n\n``` python\nfrom minyt.core import download_audio\nfrom pathlib import Path\n\n# Download audio from a YouTube video\nvideo_id = \"your-video-id-here\"\naudio_file = download_audio(video_id, Path(\"downloads\"))\nprint(f\"Audio downloaded to: {audio_file}\")\n```\n\n### Step 2: Process Audio with Smart Splitting\n\n``` python\nfrom minyt.core import detect_silence, parse_silence_ends, find_split_points, split_audio\n\n# Detect silence in the audio file\n_, silence_data = detect_silence(audio_file)\n\n# Parse silence end points\nsilence_ends = parse_silence_ends(silence_data)\n\n# Find optimal split points (aiming for 10-minute chunks)\ntotal_duration = get_audio_duration(audio_file)\nsplit_points = find_split_points(silence_ends, total_duration, chunk_len=600)\n\n# Split audio into chunks\nchunks = split_audio(audio_file, split_points, dest_dir=\"audio_chunks\")\nprint(f\"Created {len(chunks)} audio chunks\")\n```\n\n### Step 3: Transcribe with Gemini AI\n\n``` python\nimport asyncio\nfrom minyt.core import transcribe_audio\n\nasync def transcribe_video():\n transcript = await transcribe_audio(\n chunks_dir=\"audio_chunks\",\n dest_file=\"transcripts/final_transcript.txt\",\n model=\"gemini-2.0-flash-001\", # Default model\n max_concurrent=3, # Process 3 chunks simultaneously\n prompt=\"Please transcribe this audio accurately, preserving speaker names and technical terms.\"\n )\n return transcript\n\n# Run transcription\ntranscript = asyncio.run(transcribe_video())\nprint(\"Transcription completed!\")\n```\n\n## Configuration\n\n### Environment Variables\n\n``` bash\n# Required\nexport GEMINI_API_KEY=\"your-gemini-api-key\"\n\n# Optional: Configure logging level\nexport LOG_LEVEL=\"INFO\"\n```\n\n### Customization Options\n\n``` python\n# Custom silence detection (adjust sensitivity)\n_, silence_data = detect_silence(audio_file) # Uses -30dB threshold, 0.5s duration\n\n# Custom chunk size (in seconds)\nsplit_points = find_split_points(silence_ends, total_duration, chunk_len=300) # 5-minute chunks\n\n# Custom transcription settings\ntranscript = await transcribe_audio(\n chunks_dir=\"chunks\",\n dest_file=\"output.txt\",\n model=\"gemini-2.0-flash-001\", # Different Gemini model\n max_concurrent=5, # More parallel processing\n prompt=\"Custom transcription instructions here...\"\n)\n```\n\n## Development\n\n### Install in Development Mode\n\n``` bash\n# Clone the repository\ngit clone https://github.com/franckalbinet/minyt.git\ncd minyt\n\n# Install in development mode\npip install -e .\n\n# Make changes in the nbs/ directory\n# ...\n\n# Compile changes to apply to minyt package\nnbdev_prepare\n```\n\n### Dependencies\n\n- `fastcore`: Core utilities\n- `google-genai`: Google Gemini AI client\n- `yt-dlp`: YouTube video downloader\n- `ffmpeg-python`: Audio processing\n- `tqdm`: Progress bars\n- `rich`: Enhanced console output\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\nFor major changes, please open an issue first to discuss what you would\nlike to change.\n\n## License\n\nThis project is licensed under the Apache License 2.0 - see the\n[LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- [yt-dlp](https://github.com/yt-dlp/yt-dlp) for YouTube video\n downloading\n- [Google Gemini](https://ai.google.dev/) for AI-powered transcription\n- [FFmpeg](https://ffmpeg.org/) for audio processing capabilities\n\n## Support\n\nIf you encounter any issues or have questions:\n\n1. Check the [documentation](https://franckalbinet.github.io/minyt/)\n2. Open an [issue](https://github.com/franckalbinet/minyt/issues)\n3. Contact the maintainer: franckalbinet@gmail.com\n\n------------------------------------------------------------------------\n",
"bugtrack_url": null,
"license": "Apache Software License 2.0",
"summary": "Donwload audio from a youtube video and use Gemini LLM for cleaner and smarter transcibes",
"version": "0.0.3",
"project_urls": {
"Homepage": "https://github.com/franckalbinet/minyt"
},
"split_keywords": [
"nbdev",
"jupyter",
"notebook",
"python"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "dec86a02017b6a6e0e62d7d597bc6737ddaea753f3d6a739ab8242ca92dede38",
"md5": "8689f7f1d84df7e72e6e6509d3ccc61e",
"sha256": "4eedee94977035810dde23ecc35e9de4da3a7ff7aafa813848d036ffc59618a2"
},
"downloads": -1,
"filename": "minyt-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8689f7f1d84df7e72e6e6509d3ccc61e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 10959,
"upload_time": "2025-07-21T08:20:43",
"upload_time_iso_8601": "2025-07-21T08:20:43.050009Z",
"url": "https://files.pythonhosted.org/packages/de/c8/6a02017b6a6e0e62d7d597bc6737ddaea753f3d6a739ab8242ca92dede38/minyt-0.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "3277b62b16072ebcefbd61ff2485b9b833a72b8b58e5f95ddd5933b17d929402",
"md5": "364a2b45a87f10ea9cd819f7ea6ed3bc",
"sha256": "ab61c76e80aa8f0de887158eaec70b9797a07c5e1752d36a4b36b2d541db5b6f"
},
"downloads": -1,
"filename": "minyt-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "364a2b45a87f10ea9cd819f7ea6ed3bc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 12170,
"upload_time": "2025-07-21T08:20:43",
"upload_time_iso_8601": "2025-07-21T08:20:43.874505Z",
"url": "https://files.pythonhosted.org/packages/32/77/b62b16072ebcefbd61ff2485b9b833a72b8b58e5f95ddd5933b17d929402/minyt-0.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-21 08:20:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "franckalbinet",
"github_project": "minyt",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "minyt"
}