lecture-downloader


Namelecture-downloader JSON
Version 1.0.11 PyPI version JSON
download
home_pagehttps://github.com/dan-dev-ml/lecture-downloader
SummaryA comprehensive toolkit for downloading, merging, and transcribing lecture videos
upload_time2025-07-14 17:24:15
maintainerNone
docs_urlNone
authordan-dev-ml
requires_python>=3.8
licenseMIT
keywords education video transcription canvas brightspace whisper lecture
VCS
bugtrack_url
requirements click rich faster-whisper google-cloud-speech google-cloud-storage ffmpeg-python watchdog
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Lecture Downloader

A Python toolkit for downloading, merging, transcribing, and embedding subtitles from lecture videos hosted on platforms like Canvas and Brightspace. 

Use at your own risk. This tool is designed for educational purposes and should not be used to violate any terms of service or copyright laws.

## Quick Start

### 30-Second Setup
```bash
# Install FFmpeg (required for video processing)
brew install ffmpeg  # macOS
# sudo apt install ffmpeg  # Ubuntu/Debian
# Windows: https://www.wikihow.com/Install-FFmpeg-on-Windows

# Install lecture downloader
pip install lecture-downloader
```
## Obtaining Video URLs from Canvas/Brightspace

Implementation based off [this reddit post](https://www.reddit.com/r/VirginiaTech/comments/13l6983/how_to_download_videos_from_canvas/)

### Using Video DownloadHelper Extension

1. **Install Extension**: Download [VideoDownloadHelper](https://www.downloadhelper.net/)
2. **Navigate to Video**: Go to your lecture video in Canvas/Brightspace
3. **Start Playback**: Click play to begin streaming
4. **Extract URL**: Click the extension icon (should be colored, not grey)
6. **Copy URL**: Click the three dots → "Copy URL"

For example, visiit [Public Lecture sample](https://cdnapisec.kaltura.com/html5/html5lib/v2.82.1/mwEmbedFrame.php/p/2019031/uiconf_id/40436601?wid=1_la8dzpbb&iframeembed=true&playerId=kaltura_player_5b490d253ef1c&flashvars%5BplaylistAPI.kpl0Id%5D=1_9hckzp35&flashvars%5BplaylistAPI.autoContinue%5D=true&flashvars%5BplaylistAPI.autoInsert%5D=true&flashvars%5Bks%5D=&flashvars%5BlocalizationCode%5D=en&flashvars%5BimageDefaultDuration%5D=30&flashvars%5BleadWithHTML5%5D=true&flashvars%5BforceMobileHTML5%5D=true&flashvars%5BnextPrevBtn.plugin%5D=true&flashvars%5BsideBarContainer.plugin%5D=true&flashvars%5BsideBarContainer.position%5D=left&flashvars%5BsideBarContainer.clickToClose%5D=true&flashvars%5Bchapters.plugin%5D=true&flashvars%5Bchapters.layout%5D=vertical&flashvars%5Bchapters.thumbnailRotator%5D=false&flashvars%5BstreamSelector.plugin%5D=true&flashvars%5BEmbedPlayer.SpinnerTarget%5D=videoHolder&flashvars%5BdualScreen.plugin%5D=true), click play on a video, and copy the URL from the extension. To bulk download, paste it into a text file named `lecture_links.txt`, one URL per line.

<img src="images/65843ae08547dc26daa123c8dd3096dace4a87ccd7643a805a57550fda5e5a14.png" width="500" alt="Lecture Downloader screenshot">
  
<img src="images/9596a4b8afd26ddc068e3160cdce0ec02a1dc22b5b512e5e1ffabf5f06d48749.png" width="300" alt="Lecture Downloader screenshot">

### Basic Usage

### One-Command Pipeline
```python
# Complete pipeline: download → merge → transcribe

pipeline_results = ld.process_pipeline(
    links="lecture_links.txt",  # Can also use: single URL string, ["url1", "url2"]
    titles="lecture_titles.json",  # Can also use: ["Title 1", "Title 2"], {"Module 1": ["Lecture 1"]}
    output_dir="lecture_processing",
    inject_subtitles=True,          # False to skip subtitle injection
    transcription_method="whisper", # "auto", "gcloud", "whisper"
    language="en",                  
)
```

### Step-by-Step Commands
```python
import lecture_downloader as ld

# Complete workflow in 3 commands
base_dir = "Lecture-Downloads"

# 1. Download lectures
results = ld.download_lectures(
    links="lecture_links.txt",  # Can also use: single URL string, ["url1", "url2"]
    titles="lecture_titles.json",  # Can also use: ["Title 1", "Title 2"], {"Module 1": ["Lecture 1"]}
    base_dir=base_dir,  # Creates Lecture-Downloads/lecture-downloads/
)

# 2. Merge videos by module with chapters
merged = ld.merge_videos(
    base_dir=base_dir,  # Auto-detects input from lecture-downloads/
)

# 3. Transcribe with Whisper (local, no setup required)
transcripts = ld.transcribe_videos(
    base_dir=base_dir,  # Auto-detects input from merged-lectures/
    method="whisper",  # "auto" detects best available method
    language="en",  # Language code for Whisper
    inject_subtitles=True,  # False to skip subtitle injection
)
```

## Installation

```bash
# Basic installation
pip install lecture-downloader
```

**Required Dependencies:**
- `ffmpeg` - Install via package manager (brew, apt, etc.)
- Python 3.8+

## Configuration Options

### Download Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `links` | str/list | Required | File path, single URL, or list of URLs |
| `titles` | str/list/dict | None | File path, list, or dict mapping |
| `base_dir` | str | "." | Base directory (creates subdirectories) |
| `max_workers` | int | 5 | Concurrent downloads (1-10) |
| `verbose` | bool | False | Detailed progress output |

### Merge Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `base_dir` | str | "." | Base directory (auto-detects input) |
| `verbose` | bool | False | Detailed progress output |

### Transcribe Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `base_dir` | str | "." | Base directory (auto-detects input) |
| `method` | str | "auto" | "auto", "whisper" |
| `language` | str | "en" | Language code for Whisper |
| `max_workers` | int | 3 | Concurrent transcriptions (1-5) |
| `inject_subtitles` | bool | True | Inject SRT into video files |
| `verbose` | bool | False | Detailed progress output |

## Input Formats

### Links Input
```python
# File with URLs (one per line)
links = "lecture_links.txt"

# Single URL
links = "https://example.com/lecture.mp4"

# List of URLs
links = ["https://url1.mp4", "https://url2.mp4"]
```

### Titles Input
```python
# JSON file with module structure
titles = "lecture_titles.json"
**lecture_links.txt:**
```
https://example.com/lecture1.mp4
https://example.com/lecture2.mp4
```
# List of titles (matches link order)
titles = ["Lecture 1", "Lecture 2", "Lecture 3"]

# Dictionary mapping modules to lectures
titles = {
    "Module 1: Introduction": ["Lecture 1", "Lecture 2"],
    "Module 2: Advanced": ["Lecture 3", "Lecture 4"]}
```
**lecture_titles.json:**
```json
{ "Module 1: Introduction": [ "Lecture 1: Overview", "Lecture 2: Fundamentals"], 
  "Module 2: Advanced Topics": [  "Lecture 3: Advanced Concepts"]}
```


## CLI Usage

### Quick Commands
```bash
# Complete workflow
BASE_DIR="Lecture-Downloads"
lecture-downloader download -l links.txt -t titles.json -b $BASE_DIR
lecture-downloader merge -b $BASE_DIR
lecture-downloader transcribe -b $BASE_DIR

# One-command pipeline
lecture-downloader pipeline -l links.txt -t titles.json -o output
```

### CLI Options
```bash
# Download with options
lecture-downloader download \
  -l links.txt \
  -t titles.json \
  -b Lecture-Downloads \
  --max-workers 8 \
  --verbose

# Transcribe with options
lecture-downloader transcribe \
  -b Lecture-Downloads \
  --method whisper \
  --language en \
  --max-workers 4 \
  --no-inject
```




**FFmpeg not found:**
```bash
# Install FFmpeg
brew install ffmpeg  # macOS
sudo apt install ffmpeg  # Ubuntu/Debian
```

### Debug Mode
```python
# Enable verbose output for troubleshooting
ld.download_lectures(links, titles, verbose=True)
ld.merge_videos(base_dir="course", verbose=True)
ld.transcribe_videos(base_dir="course", verbose=True)
```

## License

MIT License - see LICENSE file for details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/dan-dev-ml/lecture-downloader",
    "name": "lecture-downloader",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "education, video, transcription, canvas, brightspace, whisper, lecture",
    "author": "dan-dev-ml",
    "author_email": "dan-dev-ml <dan.dev.ml@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/7e/fe/2806b3126e4e983d96bc3f620f23871a2757531dca0b3aba62c47d3715bb/lecture_downloader-1.0.11.tar.gz",
    "platform": null,
    "description": "# Lecture Downloader\n\nA Python toolkit for downloading, merging, transcribing, and embedding subtitles from lecture videos hosted on platforms like Canvas and Brightspace. \n\nUse at your own risk. This tool is designed for educational purposes and should not be used to violate any terms of service or copyright laws.\n\n## Quick Start\n\n### 30-Second Setup\n```bash\n# Install FFmpeg (required for video processing)\nbrew install ffmpeg  # macOS\n# sudo apt install ffmpeg  # Ubuntu/Debian\n# Windows: https://www.wikihow.com/Install-FFmpeg-on-Windows\n\n# Install lecture downloader\npip install lecture-downloader\n```\n## Obtaining Video URLs from Canvas/Brightspace\n\nImplementation based off [this reddit post](https://www.reddit.com/r/VirginiaTech/comments/13l6983/how_to_download_videos_from_canvas/)\n\n### Using Video DownloadHelper Extension\n\n1. **Install Extension**: Download [VideoDownloadHelper](https://www.downloadhelper.net/)\n2. **Navigate to Video**: Go to your lecture video in Canvas/Brightspace\n3. **Start Playback**: Click play to begin streaming\n4. **Extract URL**: Click the extension icon (should be colored, not grey)\n6. **Copy URL**: Click the three dots \u2192 \"Copy URL\"\n\nFor example, visiit [Public Lecture sample](https://cdnapisec.kaltura.com/html5/html5lib/v2.82.1/mwEmbedFrame.php/p/2019031/uiconf_id/40436601?wid=1_la8dzpbb&iframeembed=true&playerId=kaltura_player_5b490d253ef1c&flashvars%5BplaylistAPI.kpl0Id%5D=1_9hckzp35&flashvars%5BplaylistAPI.autoContinue%5D=true&flashvars%5BplaylistAPI.autoInsert%5D=true&flashvars%5Bks%5D=&flashvars%5BlocalizationCode%5D=en&flashvars%5BimageDefaultDuration%5D=30&flashvars%5BleadWithHTML5%5D=true&flashvars%5BforceMobileHTML5%5D=true&flashvars%5BnextPrevBtn.plugin%5D=true&flashvars%5BsideBarContainer.plugin%5D=true&flashvars%5BsideBarContainer.position%5D=left&flashvars%5BsideBarContainer.clickToClose%5D=true&flashvars%5Bchapters.plugin%5D=true&flashvars%5Bchapters.layout%5D=vertical&flashvars%5Bchapters.thumbnailRotator%5D=false&flashvars%5BstreamSelector.plugin%5D=true&flashvars%5BEmbedPlayer.SpinnerTarget%5D=videoHolder&flashvars%5BdualScreen.plugin%5D=true), click play on a video, and copy the URL from the extension. To bulk download, paste it into a text file named `lecture_links.txt`, one URL per line.\n\n<img src=\"images/65843ae08547dc26daa123c8dd3096dace4a87ccd7643a805a57550fda5e5a14.png\" width=\"500\" alt=\"Lecture Downloader screenshot\">\n  \n<img src=\"images/9596a4b8afd26ddc068e3160cdce0ec02a1dc22b5b512e5e1ffabf5f06d48749.png\" width=\"300\" alt=\"Lecture Downloader screenshot\">\n\n### Basic Usage\n\n### One-Command Pipeline\n```python\n# Complete pipeline: download \u2192 merge \u2192 transcribe\n\npipeline_results = ld.process_pipeline(\n    links=\"lecture_links.txt\",  # Can also use: single URL string, [\"url1\", \"url2\"]\n    titles=\"lecture_titles.json\",  # Can also use: [\"Title 1\", \"Title 2\"], {\"Module 1\": [\"Lecture 1\"]}\n    output_dir=\"lecture_processing\",\n    inject_subtitles=True,          # False to skip subtitle injection\n    transcription_method=\"whisper\", # \"auto\", \"gcloud\", \"whisper\"\n    language=\"en\",                  \n)\n```\n\n### Step-by-Step Commands\n```python\nimport lecture_downloader as ld\n\n# Complete workflow in 3 commands\nbase_dir = \"Lecture-Downloads\"\n\n# 1. Download lectures\nresults = ld.download_lectures(\n    links=\"lecture_links.txt\",  # Can also use: single URL string, [\"url1\", \"url2\"]\n    titles=\"lecture_titles.json\",  # Can also use: [\"Title 1\", \"Title 2\"], {\"Module 1\": [\"Lecture 1\"]}\n    base_dir=base_dir,  # Creates Lecture-Downloads/lecture-downloads/\n)\n\n# 2. Merge videos by module with chapters\nmerged = ld.merge_videos(\n    base_dir=base_dir,  # Auto-detects input from lecture-downloads/\n)\n\n# 3. Transcribe with Whisper (local, no setup required)\ntranscripts = ld.transcribe_videos(\n    base_dir=base_dir,  # Auto-detects input from merged-lectures/\n    method=\"whisper\",  # \"auto\" detects best available method\n    language=\"en\",  # Language code for Whisper\n    inject_subtitles=True,  # False to skip subtitle injection\n)\n```\n\n## Installation\n\n```bash\n# Basic installation\npip install lecture-downloader\n```\n\n**Required Dependencies:**\n- `ffmpeg` - Install via package manager (brew, apt, etc.)\n- Python 3.8+\n\n## Configuration Options\n\n### Download Parameters\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `links` | str/list | Required | File path, single URL, or list of URLs |\n| `titles` | str/list/dict | None | File path, list, or dict mapping |\n| `base_dir` | str | \".\" | Base directory (creates subdirectories) |\n| `max_workers` | int | 5 | Concurrent downloads (1-10) |\n| `verbose` | bool | False | Detailed progress output |\n\n### Merge Parameters\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `base_dir` | str | \".\" | Base directory (auto-detects input) |\n| `verbose` | bool | False | Detailed progress output |\n\n### Transcribe Parameters\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `base_dir` | str | \".\" | Base directory (auto-detects input) |\n| `method` | str | \"auto\" | \"auto\", \"whisper\" |\n| `language` | str | \"en\" | Language code for Whisper |\n| `max_workers` | int | 3 | Concurrent transcriptions (1-5) |\n| `inject_subtitles` | bool | True | Inject SRT into video files |\n| `verbose` | bool | False | Detailed progress output |\n\n## Input Formats\n\n### Links Input\n```python\n# File with URLs (one per line)\nlinks = \"lecture_links.txt\"\n\n# Single URL\nlinks = \"https://example.com/lecture.mp4\"\n\n# List of URLs\nlinks = [\"https://url1.mp4\", \"https://url2.mp4\"]\n```\n\n### Titles Input\n```python\n# JSON file with module structure\ntitles = \"lecture_titles.json\"\n**lecture_links.txt:**\n```\nhttps://example.com/lecture1.mp4\nhttps://example.com/lecture2.mp4\n```\n# List of titles (matches link order)\ntitles = [\"Lecture 1\", \"Lecture 2\", \"Lecture 3\"]\n\n# Dictionary mapping modules to lectures\ntitles = {\n    \"Module 1: Introduction\": [\"Lecture 1\", \"Lecture 2\"],\n    \"Module 2: Advanced\": [\"Lecture 3\", \"Lecture 4\"]}\n```\n**lecture_titles.json:**\n```json\n{ \"Module 1: Introduction\": [ \"Lecture 1: Overview\", \"Lecture 2: Fundamentals\"], \n  \"Module 2: Advanced Topics\": [  \"Lecture 3: Advanced Concepts\"]}\n```\n\n\n## CLI Usage\n\n### Quick Commands\n```bash\n# Complete workflow\nBASE_DIR=\"Lecture-Downloads\"\nlecture-downloader download -l links.txt -t titles.json -b $BASE_DIR\nlecture-downloader merge -b $BASE_DIR\nlecture-downloader transcribe -b $BASE_DIR\n\n# One-command pipeline\nlecture-downloader pipeline -l links.txt -t titles.json -o output\n```\n\n### CLI Options\n```bash\n# Download with options\nlecture-downloader download \\\n  -l links.txt \\\n  -t titles.json \\\n  -b Lecture-Downloads \\\n  --max-workers 8 \\\n  --verbose\n\n# Transcribe with options\nlecture-downloader transcribe \\\n  -b Lecture-Downloads \\\n  --method whisper \\\n  --language en \\\n  --max-workers 4 \\\n  --no-inject\n```\n\n\n\n\n**FFmpeg not found:**\n```bash\n# Install FFmpeg\nbrew install ffmpeg  # macOS\nsudo apt install ffmpeg  # Ubuntu/Debian\n```\n\n### Debug Mode\n```python\n# Enable verbose output for troubleshooting\nld.download_lectures(links, titles, verbose=True)\nld.merge_videos(base_dir=\"course\", verbose=True)\nld.transcribe_videos(base_dir=\"course\", verbose=True)\n```\n\n## License\n\nMIT License - see LICENSE file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A comprehensive toolkit for downloading, merging, and transcribing lecture videos",
    "version": "1.0.11",
    "project_urls": {
        "Bug Tracker": "https://github.com/dan-dev-ml/lecture-downloader/issues",
        "Documentation": "https://github.com/dan-dev-ml/lecture-downloader#readme",
        "Homepage": "https://github.com/dan-dev-ml/lecture-downloader",
        "PyPI": "https://pypi.org/project/lecture-downloader/",
        "Repository": "https://github.com/dan-dev-ml/lecture-downloader"
    },
    "split_keywords": [
        "education",
        " video",
        " transcription",
        " canvas",
        " brightspace",
        " whisper",
        " lecture"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1e1cc73aacfb8722eb26485b5b4cbdbe9011f9db566ed0d5f15e585cb45d9802",
                "md5": "151ea05de27f95502bba7273a8bf576a",
                "sha256": "0777ae80e5d68a7acdf14f409d9f70c15c68b14bf891301da62e50d2d6596e41"
            },
            "downloads": -1,
            "filename": "lecture_downloader-1.0.11-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "151ea05de27f95502bba7273a8bf576a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 43766,
            "upload_time": "2025-07-14T17:24:14",
            "upload_time_iso_8601": "2025-07-14T17:24:14.526060Z",
            "url": "https://files.pythonhosted.org/packages/1e/1c/c73aacfb8722eb26485b5b4cbdbe9011f9db566ed0d5f15e585cb45d9802/lecture_downloader-1.0.11-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7efe2806b3126e4e983d96bc3f620f23871a2757531dca0b3aba62c47d3715bb",
                "md5": "7883f4feaa05fbbccca105b5995d3fe3",
                "sha256": "71c1a30f4e9e3e48da3c1e8ded91a908d33379682bc9cf6e9614bb2af97602f2"
            },
            "downloads": -1,
            "filename": "lecture_downloader-1.0.11.tar.gz",
            "has_sig": false,
            "md5_digest": "7883f4feaa05fbbccca105b5995d3fe3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 765556,
            "upload_time": "2025-07-14T17:24:15",
            "upload_time_iso_8601": "2025-07-14T17:24:15.911390Z",
            "url": "https://files.pythonhosted.org/packages/7e/fe/2806b3126e4e983d96bc3f620f23871a2757531dca0b3aba62c47d3715bb/lecture_downloader-1.0.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-14 17:24:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dan-dev-ml",
    "github_project": "lecture-downloader",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "click",
            "specs": [
                [
                    ">=",
                    "8.0.0"
                ]
            ]
        },
        {
            "name": "rich",
            "specs": [
                [
                    ">=",
                    "13.0.0"
                ]
            ]
        },
        {
            "name": "faster-whisper",
            "specs": [
                [
                    ">=",
                    "0.10.0"
                ]
            ]
        },
        {
            "name": "google-cloud-speech",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "google-cloud-storage",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "ffmpeg-python",
            "specs": [
                [
                    ">=",
                    "0.2.0"
                ]
            ]
        },
        {
            "name": "watchdog",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        }
    ],
    "lcname": "lecture-downloader"
}
        
Elapsed time: 0.88735s