crawl-youtube-channel

Name	crawl-youtube-channel JSON
Version	0.0.1 JSON
	download
home_page	None
Summary	A package to crawl a youtube channel
upload_time	2025-07-11 02:21:29
maintainer	None
docs_url	None
author	Daniel Olson
requires_python	>=3.10
license	None
keywords	youtube crawl crawler channel
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Crawl YouTube Channel

This Python package provides tools to crawl and extract data from YouTube channels.

## Features

*   Crawl an entire YouTube channel for video information.
*   Extract metadata, comments, transcripts, audio, and video for each video.
*   Provides a base class to easily implement your own video processing and storage logic.
*   Includes a `Sqlite3YouTubeVideoProcessor` for storing data in a local SQLite database.
*   Provides data classes for easy access to crawled data.

## Prerequisites

*   Python 3.10+
*   Google Cloud YouTube API Key

## Installation

1.  **Install the package:**

    ```bash
    pip install crawl-youtube-channel
    ```

2.  **Set up your environment:**

    Create a `.env` file in your project root and add your Google Cloud YouTube API key:

    ```
    GOOGLE_CLOUD_YOUTUBE_API_KEY=your_api_key
    ```

## Usage

To use the crawler, you need to implement the `YouTubeVideoProcessorBase` abstract class. This class defines how to check for existing videos and how to process new ones.

Here is a basic skeleton for a custom processor:

```python
import asyncio
from crawl_youtube_channel import YouTubeVideoProcessorBase, YouTubeVideo

class MyVideoProcessor(YouTubeVideoProcessorBase):
    async def check_video(self, video_id: str) -> bool:
        # Implement logic to check if the video has already been processed.
        # Return True if it exists, False otherwise.
        ...

    async def process_video(self, v: YouTubeVideo) -> None:
        # Implement logic to save or process the video data.
        # For example, save it to a database, a file, or another service.
        ...

async def main():
    # Initialize your custom processor
    processor = MyVideoProcessor()

    # Start crawling the channel
    await processor.process_channel(channel_url='https://www.youtube.com/@YourFavoriteChannel/videos')

if __name__ == '__main__':
    asyncio.run(main())
```

For a concrete implementation example, see the `Sqlite3YouTubeVideoProcessor` class in the source code, which stores video data in a SQLite database.

## Data Models

The following data classes are used to structure the crawled data:

*   `YouTubeVideo`: The main container for all video-related data.
*   `YouTubeThumbnail`: Basic information about a video thumbnail.
*   `YouTubeData`: Contains detailed information about a video, including:
    *   `Meta`: Video metadata (title, description, tags, etc.).
    *   `Comment`: A YouTube comment, including replies.
    *   `Transcript`: The video's transcript.
    *   `audio`: The audio file in M4A format (as bytes).
    *   `video`: The video file in MP4 format (as bytes).

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "crawl-youtube-channel",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "youtube crawl crawler channel",
    "author": "Daniel Olson",
    "author_email": "daniel@orphos.cloud",
    "download_url": "https://files.pythonhosted.org/packages/03/05/17061614e6b8f16e6406eb0489e4c0b92af3645261d6d519f3057338cee3/crawl_youtube_channel-0.0.1.tar.gz",
    "platform": null,
    "description": "# Crawl YouTube Channel\n\nThis Python package provides tools to crawl and extract data from YouTube channels.\n\n## Features\n\n*   Crawl an entire YouTube channel for video information.\n*   Extract metadata, comments, transcripts, audio, and video for each video.\n*   Provides a base class to easily implement your own video processing and storage logic.\n*   Includes a `Sqlite3YouTubeVideoProcessor` for storing data in a local SQLite database.\n*   Provides data classes for easy access to crawled data.\n\n## Prerequisites\n\n*   Python 3.10+\n*   Google Cloud YouTube API Key\n\n## Installation\n\n1.  **Install the package:**\n\n    ```bash\n    pip install crawl-youtube-channel\n    ```\n\n2.  **Set up your environment:**\n\n    Create a `.env` file in your project root and add your Google Cloud YouTube API key:\n\n    ```\n    GOOGLE_CLOUD_YOUTUBE_API_KEY=your_api_key\n    ```\n\n## Usage\n\nTo use the crawler, you need to implement the `YouTubeVideoProcessorBase` abstract class. This class defines how to check for existing videos and how to process new ones.\n\nHere is a basic skeleton for a custom processor:\n\n```python\nimport asyncio\nfrom crawl_youtube_channel import YouTubeVideoProcessorBase, YouTubeVideo\n\nclass MyVideoProcessor(YouTubeVideoProcessorBase):\n    async def check_video(self, video_id: str) -> bool:\n        # Implement logic to check if the video has already been processed.\n        # Return True if it exists, False otherwise.\n        ...\n\n    async def process_video(self, v: YouTubeVideo) -> None:\n        # Implement logic to save or process the video data.\n        # For example, save it to a database, a file, or another service.\n        ...\n\nasync def main():\n    # Initialize your custom processor\n    processor = MyVideoProcessor()\n\n    # Start crawling the channel\n    await processor.process_channel(channel_url='https://www.youtube.com/@YourFavoriteChannel/videos')\n\nif __name__ == '__main__':\n    asyncio.run(main())\n```\n\nFor a concrete implementation example, see the `Sqlite3YouTubeVideoProcessor` class in the source code, which stores video data in a SQLite database.\n\n## Data Models\n\nThe following data classes are used to structure the crawled data:\n\n*   `YouTubeVideo`: The main container for all video-related data.\n*   `YouTubeThumbnail`: Basic information about a video thumbnail.\n*   `YouTubeData`: Contains detailed information about a video, including:\n    *   `Meta`: Video metadata (title, description, tags, etc.).\n    *   `Comment`: A YouTube comment, including replies.\n    *   `Transcript`: The video's transcript.\n    *   `audio`: The audio file in M4A format (as bytes).\n    *   `video`: The video file in MP4 format (as bytes).\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A package to crawl a youtube channel",
    "version": "0.0.1",
    "project_urls": null,
    "split_keywords": [
        "youtube",
        "crawl",
        "crawler",
        "channel"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "840ba873ca77d09279820f5549fa78e4f966fab58036080c887d8d9f4957ad99",
                "md5": "2d89d4aa170b53920b2a829b5ad8d2f0",
                "sha256": "abe18946c31aa067300dad9fe7418f4bf4ec81e37969daa4faaf0e5c8e9b5628"
            },
            "downloads": -1,
            "filename": "crawl_youtube_channel-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2d89d4aa170b53920b2a829b5ad8d2f0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 13561,
            "upload_time": "2025-07-11T02:21:28",
            "upload_time_iso_8601": "2025-07-11T02:21:28.540078Z",
            "url": "https://files.pythonhosted.org/packages/84/0b/a873ca77d09279820f5549fa78e4f966fab58036080c887d8d9f4957ad99/crawl_youtube_channel-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "030517061614e6b8f16e6406eb0489e4c0b92af3645261d6d519f3057338cee3",
                "md5": "c4cca9f774a289e6903dda744527fa2e",
                "sha256": "1f3089157e057d2039d75e887942258159cb8148e80a4c84ed5bc92681358320"
            },
            "downloads": -1,
            "filename": "crawl_youtube_channel-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "c4cca9f774a289e6903dda744527fa2e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 11987,
            "upload_time": "2025-07-11T02:21:29",
            "upload_time_iso_8601": "2025-07-11T02:21:29.813034Z",
            "url": "https://files.pythonhosted.org/packages/03/05/17061614e6b8f16e6406eb0489e4c0b92af3645261d6d519f3057338cee3/crawl_youtube_channel-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-11 02:21:29",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "crawl-youtube-channel"
}

Daniel Olson