# Crawl YouTube Channel
This Python package provides tools to crawl and extract data from YouTube channels.
## Features
* Crawl an entire YouTube channel for video information.
* Extract metadata, comments, transcripts, audio, and video for each video.
* Provides a base class to easily implement your own video processing and storage logic.
* Includes a `Sqlite3YouTubeVideoProcessor` for storing data in a local SQLite database.
* Provides data classes for easy access to crawled data.
## Prerequisites
* Python 3.10+
* Google Cloud YouTube API Key
## Installation
1. **Install the package:**
```bash
pip install crawl-youtube-channel
```
2. **Set up your environment:**
Create a `.env` file in your project root and add your Google Cloud YouTube API key:
```
GOOGLE_CLOUD_YOUTUBE_API_KEY=your_api_key
```
## Usage
To use the crawler, you need to implement the `YouTubeVideoProcessorBase` abstract class. This class defines how to check for existing videos and how to process new ones.
Here is a basic skeleton for a custom processor:
```python
import asyncio
from crawl_youtube_channel import YouTubeVideoProcessorBase, YouTubeVideo
class MyVideoProcessor(YouTubeVideoProcessorBase):
async def check_video(self, video_id: str) -> bool:
# Implement logic to check if the video has already been processed.
# Return True if it exists, False otherwise.
...
async def process_video(self, v: YouTubeVideo) -> None:
# Implement logic to save or process the video data.
# For example, save it to a database, a file, or another service.
...
async def main():
# Initialize your custom processor
processor = MyVideoProcessor()
# Start crawling the channel
await processor.process_channel(channel_url='https://www.youtube.com/@YourFavoriteChannel/videos')
if __name__ == '__main__':
asyncio.run(main())
```
For a concrete implementation example, see the `Sqlite3YouTubeVideoProcessor` class in the source code, which stores video data in a SQLite database.
## Data Models
The following data classes are used to structure the crawled data:
* `YouTubeVideo`: The main container for all video-related data.
* `YouTubeThumbnail`: Basic information about a video thumbnail.
* `YouTubeData`: Contains detailed information about a video, including:
* `Meta`: Video metadata (title, description, tags, etc.).
* `Comment`: A YouTube comment, including replies.
* `Transcript`: The video's transcript.
* `audio`: The audio file in M4A format (as bytes).
* `video`: The video file in MP4 format (as bytes).
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "crawl-youtube-channel",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "youtube crawl crawler channel",
"author": "Daniel Olson",
"author_email": "daniel@orphos.cloud",
"download_url": "https://files.pythonhosted.org/packages/03/05/17061614e6b8f16e6406eb0489e4c0b92af3645261d6d519f3057338cee3/crawl_youtube_channel-0.0.1.tar.gz",
"platform": null,
"description": "# Crawl YouTube Channel\n\nThis Python package provides tools to crawl and extract data from YouTube channels.\n\n## Features\n\n* Crawl an entire YouTube channel for video information.\n* Extract metadata, comments, transcripts, audio, and video for each video.\n* Provides a base class to easily implement your own video processing and storage logic.\n* Includes a `Sqlite3YouTubeVideoProcessor` for storing data in a local SQLite database.\n* Provides data classes for easy access to crawled data.\n\n## Prerequisites\n\n* Python 3.10+\n* Google Cloud YouTube API Key\n\n## Installation\n\n1. **Install the package:**\n\n ```bash\n pip install crawl-youtube-channel\n ```\n\n2. **Set up your environment:**\n\n Create a `.env` file in your project root and add your Google Cloud YouTube API key:\n\n ```\n GOOGLE_CLOUD_YOUTUBE_API_KEY=your_api_key\n ```\n\n## Usage\n\nTo use the crawler, you need to implement the `YouTubeVideoProcessorBase` abstract class. This class defines how to check for existing videos and how to process new ones.\n\nHere is a basic skeleton for a custom processor:\n\n```python\nimport asyncio\nfrom crawl_youtube_channel import YouTubeVideoProcessorBase, YouTubeVideo\n\nclass MyVideoProcessor(YouTubeVideoProcessorBase):\n async def check_video(self, video_id: str) -> bool:\n # Implement logic to check if the video has already been processed.\n # Return True if it exists, False otherwise.\n ...\n\n async def process_video(self, v: YouTubeVideo) -> None:\n # Implement logic to save or process the video data.\n # For example, save it to a database, a file, or another service.\n ...\n\nasync def main():\n # Initialize your custom processor\n processor = MyVideoProcessor()\n\n # Start crawling the channel\n await processor.process_channel(channel_url='https://www.youtube.com/@YourFavoriteChannel/videos')\n\nif __name__ == '__main__':\n asyncio.run(main())\n```\n\nFor a concrete implementation example, see the `Sqlite3YouTubeVideoProcessor` class in the source code, which stores video data in a SQLite database.\n\n## Data Models\n\nThe following data classes are used to structure the crawled data:\n\n* `YouTubeVideo`: The main container for all video-related data.\n* `YouTubeThumbnail`: Basic information about a video thumbnail.\n* `YouTubeData`: Contains detailed information about a video, including:\n * `Meta`: Video metadata (title, description, tags, etc.).\n * `Comment`: A YouTube comment, including replies.\n * `Transcript`: The video's transcript.\n * `audio`: The audio file in M4A format (as bytes).\n * `video`: The video file in MP4 format (as bytes).\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
"bugtrack_url": null,
"license": null,
"summary": "A package to crawl a youtube channel",
"version": "0.0.1",
"project_urls": null,
"split_keywords": [
"youtube",
"crawl",
"crawler",
"channel"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "840ba873ca77d09279820f5549fa78e4f966fab58036080c887d8d9f4957ad99",
"md5": "2d89d4aa170b53920b2a829b5ad8d2f0",
"sha256": "abe18946c31aa067300dad9fe7418f4bf4ec81e37969daa4faaf0e5c8e9b5628"
},
"downloads": -1,
"filename": "crawl_youtube_channel-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2d89d4aa170b53920b2a829b5ad8d2f0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 13561,
"upload_time": "2025-07-11T02:21:28",
"upload_time_iso_8601": "2025-07-11T02:21:28.540078Z",
"url": "https://files.pythonhosted.org/packages/84/0b/a873ca77d09279820f5549fa78e4f966fab58036080c887d8d9f4957ad99/crawl_youtube_channel-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "030517061614e6b8f16e6406eb0489e4c0b92af3645261d6d519f3057338cee3",
"md5": "c4cca9f774a289e6903dda744527fa2e",
"sha256": "1f3089157e057d2039d75e887942258159cb8148e80a4c84ed5bc92681358320"
},
"downloads": -1,
"filename": "crawl_youtube_channel-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "c4cca9f774a289e6903dda744527fa2e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 11987,
"upload_time": "2025-07-11T02:21:29",
"upload_time_iso_8601": "2025-07-11T02:21:29.813034Z",
"url": "https://files.pythonhosted.org/packages/03/05/17061614e6b8f16e6406eb0489e4c0b92af3645261d6d519f3057338cee3/crawl_youtube_channel-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-11 02:21:29",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "crawl-youtube-channel"
}