youtube-simple-scraper

Name	youtube-simple-scraper JSON
Version	0.0.8 JSON
	download
home_page	https://github.com/Eitol/youtube_simple_scraper
Summary	A simple scraper for Youtube
upload_time	2024-06-06 20:47:31
maintainer	None
docs_url	None
author	Hector Oliveros
requires_python	>=3.10
license	MIT
keywords
VCS
bugtrack_url
requirements	pydantic requests dateparser protobuf betterproto
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ## Youtube Simple Scraper

This is a simple youtube scraper that uses the youtube API to get the videos metadata and comments of a channel.

You don't need an API key to use this scraper, so there are no usage limits or associated costs.

It should be noted that although there are no limits on use, YouTube can block the IP if it detects abusive use of the API.

### Features

Scrape the following information of a channel:

- Channel metadata
- Videos metadata and comments
- Shorts metadata and comments

### Installation

```bash
pip install youtube_simple_scraper
```

### Usage

```python
from youtube_simple_scraper.entities import GetChannelOptions
from youtube_simple_scraper.list_video_comments import ApiVideoCommentRepository, ApiShortVideoCommentRepository
from youtube_simple_scraper.list_videos import ApiChannelRepository
from youtube_simple_scraper.logger import build_default_logger
from youtube_simple_scraper.network import Requester
from youtube_simple_scraper.stop_conditions import ListCommentMaxPagesStopCondition, \
    ListVideoMaxPagesStopCondition

if __name__ == '__main__':
    ##############################
    # To Avoid IP Blocking
    # Set the request rate per second to 0.5 seconds
    Requester.request_rate_per_second = 0.5
    
    # In every request sleep between 1 and 5 seconds
    Requester.min_sleep_time_sec = 1
    Requester.max_sleep_time_sec = 5
    
    # Every 100 requests sleep 30 seconds    
    Requester.long_sleep_time_sec = 30
    Requester.long_sleep_after_requests = 100   
    ##############################
    
    
    logger = build_default_logger()
    video_comment_repo = ApiVideoCommentRepository()
    short_comment_repo = ApiShortVideoCommentRepository()
    repo = ApiChannelRepository(
        video_comment_repo=video_comment_repo,
        shorts_comment_repo=short_comment_repo,
        logger=logger,
    )
    opts = GetChannelOptions(
        list_video_stop_conditions=[
          ListVideoMaxPagesStopCondition(2) # Stop after 2 pages of videos
        ],
        list_video_comment_stop_conditions=[
          ListCommentMaxPagesStopCondition(3) # Stop after 3 pages of comments
        ],
        list_short_stop_conditions=[
          ListVideoMaxPagesStopCondition(1) # Stop after 1 page of shorts
        ],
        list_short_comment_stop_conditions=[
          ListCommentMaxPagesStopCondition(4) # Stop after 4 pages of comments
        ]
    )
    channel_ = repo.get_channel("BancoFalabellaChile", opts)
    print(channel_.model_dump_json(indent=2))

```

Example of the output channel object parsed to json:
```json5
{
  "id": "UCaY_-ksFSQtTGk0y1HA_3YQ",
  "name": "IbaiLlanos",
  "target_id": "668be16f-0000-20de-b6a2-582429cfbdec",
  "title": "Ibai",
  "description": "contenido premium ▶️\n",
  "subscriber_count": 11600000,
  "video_count": 1400,
  "videos": [
    {
      "id": "VFXu8gzcpNc",
      "title": "EL RESTAURANTE MÁS ÚNICO AL QUE HE IDO NUNCA",
      "description": "MI CANAL DE DIRECTOS: https://www.youtube.com/@Ibai_TV\nExtraído de mi canal de TWITCH: https://www.twitch.tv/ibai/\nMI PODCAST: \nhttps://www.youtube.com/channel/UC6jNDNkoOKQfB5djK2IBDoA\nTWITTER:...",
      "date": "2024-06-02T19:18:27.647137",
      "view_count": 1455817,
      "like_count": 0,
      "dislike_count": 0,
      "comment_count": 0,
      "thumbnail_url": "https://i.ytimg.com/vi/VFXu8gzcpNc/hqdefault.jpg?sqp=-oaymwEbCKgBEF5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLCEmoQtslruHk-droajdw0KJUI_KA",
      "comments": [
        {
          "id": "UgzV8lY8eJ4dyHjl9Bp4AaABAg",
          "text": "Todo muy rico pero....Y la cuenta?",
          "user": "@eliasabregu2813",
          "date": "2024-06-03T19:11:28.109467",
          "likes": 0
        },
        {
          "id": "UgwHtPZb8jprbCH-ysp4AaABAg",
          "text": "Que humilde Ibai, comiendo todo para generar ingresos a los nuevos negocios",
          "user": "@user-ui2sk7sr5i",
          "date": "2024-06-03T19:04:28.112228",
          "likes": 0
        }
      ]
    },
    // More videos ...
  ],
  "shorts": [
    // the shorts videos and comments
  ]
}

```

### Stop conditions

#### Videos stop conditions

- ListVideoMaxPagesStopCondition: Stops the scraping process when the number of pages scraped is greater than the
  specified value.
- ListVideoNeverStopCondition: The scraping process stop when all the videos of the channel are scraped.

#### Comments stop conditions

- ListCommentMaxPagesStopCondition: Stops the scraping process when the number of pages scraped is greater than the
  specified value.
- ListCommentNeverStopCondition: The scraping process stop when all the comments of the video are scraped.

The stop conditions are used to stop the scraping process. The following stop conditions are available:

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Eitol/youtube_simple_scraper",
    "name": "youtube-simple-scraper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Hector Oliveros",
    "author_email": "hector.oliveros.leon@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/41/aa/3383d35f8bbfe5a08daac46140c9febf4b6a195d6303410a689db22e64df/youtube_simple_scraper-0.0.8.tar.gz",
    "platform": null,
    "description": "## Youtube Simple Scraper\n\nThis is a simple youtube scraper that uses the youtube API to get the videos metadata and comments of a channel.\n\nYou don't need an API key to use this scraper, so there are no usage limits or associated costs.\n\nIt should be noted that although there are no limits on use, YouTube can block the IP if it detects abusive use of the API.\n\n### Features\n\nScrape the following information of a channel:\n\n- Channel metadata\n- Videos metadata and comments\n- Shorts metadata and comments\n\n### Installation\n\n```bash\npip install youtube_simple_scraper\n```\n\n### Usage\n\n```python\nfrom youtube_simple_scraper.entities import GetChannelOptions\nfrom youtube_simple_scraper.list_video_comments import ApiVideoCommentRepository, ApiShortVideoCommentRepository\nfrom youtube_simple_scraper.list_videos import ApiChannelRepository\nfrom youtube_simple_scraper.logger import build_default_logger\nfrom youtube_simple_scraper.network import Requester\nfrom youtube_simple_scraper.stop_conditions import ListCommentMaxPagesStopCondition, \\\n    ListVideoMaxPagesStopCondition\n\nif __name__ == '__main__':\n    ##############################\n    # To Avoid IP Blocking\n    # Set the request rate per second to 0.5 seconds\n    Requester.request_rate_per_second = 0.5\n    \n    # In every request sleep between 1 and 5 seconds\n    Requester.min_sleep_time_sec = 1\n    Requester.max_sleep_time_sec = 5\n    \n    # Every 100 requests sleep 30 seconds    \n    Requester.long_sleep_time_sec = 30\n    Requester.long_sleep_after_requests = 100   \n    ##############################\n    \n    \n    logger = build_default_logger()\n    video_comment_repo = ApiVideoCommentRepository()\n    short_comment_repo = ApiShortVideoCommentRepository()\n    repo = ApiChannelRepository(\n        video_comment_repo=video_comment_repo,\n        shorts_comment_repo=short_comment_repo,\n        logger=logger,\n    )\n    opts = GetChannelOptions(\n        list_video_stop_conditions=[\n          ListVideoMaxPagesStopCondition(2) # Stop after 2 pages of videos\n        ],\n        list_video_comment_stop_conditions=[\n          ListCommentMaxPagesStopCondition(3) # Stop after 3 pages of comments\n        ],\n        list_short_stop_conditions=[\n          ListVideoMaxPagesStopCondition(1) # Stop after 1 page of shorts\n        ],\n        list_short_comment_stop_conditions=[\n          ListCommentMaxPagesStopCondition(4) # Stop after 4 pages of comments\n        ]\n    )\n    channel_ = repo.get_channel(\"BancoFalabellaChile\", opts)\n    print(channel_.model_dump_json(indent=2))\n\n```\n\nExample of the output channel object parsed to json:\n```json5\n{\n  \"id\": \"UCaY_-ksFSQtTGk0y1HA_3YQ\",\n  \"name\": \"IbaiLlanos\",\n  \"target_id\": \"668be16f-0000-20de-b6a2-582429cfbdec\",\n  \"title\": \"Ibai\",\n  \"description\": \"contenido premium \u25b6\ufe0f\\n\",\n  \"subscriber_count\": 11600000,\n  \"video_count\": 1400,\n  \"videos\": [\n    {\n      \"id\": \"VFXu8gzcpNc\",\n      \"title\": \"EL RESTAURANTE M\u00c1S \u00daNICO AL QUE HE IDO NUNCA\",\n      \"description\": \"MI CANAL DE DIRECTOS: https://www.youtube.com/@Ibai_TV\\nExtra\u00eddo de mi canal de TWITCH: https://www.twitch.tv/ibai/\\nMI PODCAST: \\nhttps://www.youtube.com/channel/UC6jNDNkoOKQfB5djK2IBDoA\\nTWITTER:...\",\n      \"date\": \"2024-06-02T19:18:27.647137\",\n      \"view_count\": 1455817,\n      \"like_count\": 0,\n      \"dislike_count\": 0,\n      \"comment_count\": 0,\n      \"thumbnail_url\": \"https://i.ytimg.com/vi/VFXu8gzcpNc/hqdefault.jpg?sqp=-oaymwEbCKgBEF5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLCEmoQtslruHk-droajdw0KJUI_KA\",\n      \"comments\": [\n        {\n          \"id\": \"UgzV8lY8eJ4dyHjl9Bp4AaABAg\",\n          \"text\": \"Todo muy rico pero....Y la cuenta?\",\n          \"user\": \"@eliasabregu2813\",\n          \"date\": \"2024-06-03T19:11:28.109467\",\n          \"likes\": 0\n        },\n        {\n          \"id\": \"UgwHtPZb8jprbCH-ysp4AaABAg\",\n          \"text\": \"Que humilde Ibai, comiendo todo para generar ingresos a los nuevos negocios\",\n          \"user\": \"@user-ui2sk7sr5i\",\n          \"date\": \"2024-06-03T19:04:28.112228\",\n          \"likes\": 0\n        }\n      ]\n    },\n    // More videos ...\n  ],\n  \"shorts\": [\n    // the shorts videos and comments\n  ]\n}\n\n```\n\n### Stop conditions\n\n#### Videos stop conditions\n\n- ListVideoMaxPagesStopCondition: Stops the scraping process when the number of pages scraped is greater than the\n  specified value.\n- ListVideoNeverStopCondition: The scraping process stop when all the videos of the channel are scraped.\n\n#### Comments stop conditions\n\n- ListCommentMaxPagesStopCondition: Stops the scraping process when the number of pages scraped is greater than the\n  specified value.\n- ListCommentNeverStopCondition: The scraping process stop when all the comments of the video are scraped.\n\nThe stop conditions are used to stop the scraping process. The following stop conditions are available:\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A simple scraper for Youtube",
    "version": "0.0.8",
    "project_urls": {
        "Homepage": "https://github.com/Eitol/youtube_simple_scraper"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ab1efca64e3f3e2b876a43b6a0a6dab2e8b4c96f610f9011947a702656336d3c",
                "md5": "da185a3b2bf74f434c5d14c27bb7829b",
                "sha256": "0b1f78caca409c4176cdc5eaab5b9ccceb25b49af41d6066b11d8fafb71f0ea8"
            },
            "downloads": -1,
            "filename": "youtube_simple_scraper-0.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "da185a3b2bf74f434c5d14c27bb7829b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 18782,
            "upload_time": "2024-06-06T20:47:30",
            "upload_time_iso_8601": "2024-06-06T20:47:30.070915Z",
            "url": "https://files.pythonhosted.org/packages/ab/1e/fca64e3f3e2b876a43b6a0a6dab2e8b4c96f610f9011947a702656336d3c/youtube_simple_scraper-0.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "41aa3383d35f8bbfe5a08daac46140c9febf4b6a195d6303410a689db22e64df",
                "md5": "6fb6cca682021897a41587ee171b1352",
                "sha256": "84b8e530d9b8df2c4f2fcd142fdd0902ba29b22fb7d59560128b3df493327afe"
            },
            "downloads": -1,
            "filename": "youtube_simple_scraper-0.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "6fb6cca682021897a41587ee171b1352",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 18172,
            "upload_time": "2024-06-06T20:47:31",
            "upload_time_iso_8601": "2024-06-06T20:47:31.162225Z",
            "url": "https://files.pythonhosted.org/packages/41/aa/3383d35f8bbfe5a08daac46140c9febf4b6a195d6303410a689db22e64df/youtube_simple_scraper-0.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-06 20:47:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Eitol",
    "github_project": "youtube_simple_scraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pydantic",
            "specs": [
                [
                    "~=",
                    "2.7.2"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "~=",
                    "2.32.3"
                ]
            ]
        },
        {
            "name": "dateparser",
            "specs": [
                [
                    "~=",
                    "1.2.0"
                ]
            ]
        },
        {
            "name": "protobuf",
            "specs": []
        },
        {
            "name": "betterproto",
            "specs": []
        }
    ],
    "lcname": "youtube-simple-scraper"
}

Hector Oliveros