vidcrawler


Namevidcrawler JSON
Version 1.0.39 PyPI version JSON
download
home_pagehttps://github.com/zackees/vidcrawler
SummaryVideo Crawler
upload_time2024-08-27 01:56:04
maintainerNone
docs_urlNone
authorZach Vorhies
requires_python>=3.6.0
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # vidcrawler

Crawls major videos sites like YouTube/Rumble/Bitchute/Brighteon for video content and outputs a json feed of all the videos that were found.

## Platform Unit Tests

[![MacOS_Tests](https://github.com/zackees/vidcrawler/actions/workflows/test_macos.yml/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_macos.yml)
[![Win_Tests](https://github.com/zackees/vidcrawler/actions/workflows/test_win.yml/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_win.yml)
[![Ubuntu_Tests](https://github.com/zackees/vidcrawler/actions/workflows/test_ubuntu.yml/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_ubuntu.yml)

## Scraper Tests

[![Scaper_Youtube](https://github.com/zackees/vidcrawler/actions/workflows/test_youtube.yml/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_youtube.yml)
[![Actions Status](https://github.com/zackees/vidcrawler/workflows/Scaper_Rumble/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_rumble.yml)
[![Scaper_Brighteon](https://github.com/zackees/vidcrawler/actions/workflows/test_brighteon.yml/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_brighteon.yml)
[![Actions Status](https://github.com/zackees/vidcrawler/workflows/Scraper_Gabtv/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_gabtv.yml)
[![Actions Status](https://github.com/zackees/vidcrawler/workflows/Scraper_Spotify/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_spotify.yml)

Note that bitchute doesn't like the github runner's IP and will fail with a 403 forbidden.
[![Actions Status](https://github.com/zackees/vidcrawler/workflows/Scaper_Bitchute/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_bitchute.yml)

## API

#### Command line

`vidcrawler --input_crawl_json "fetch_list.json" --output_json "out_list.json"`

#### Python

```python
import json
from vidcrawler import crawl_video_sites
crawl_list = [
    [
        "Computing Forever",  # Can be whatever you want.
        "bitchute",  # Must be "youtube", "rumble", "bitchute" (and others).
        "hybm74uihjkf"  # The channel id on the service.
    ]
]
output = crawl_video_sites(crawl_list)
print(json.dumps(output))
```

"source" and "channel_id" are used to generate the video-platform-specific urls to fetch data. The "channel name"
is echo'd back in the generated json feeds, but doesn't not affect the fetching process in any way.

## Testing

Install vidcrawler and then the command `vidcralwer_test` will become available.

```bash
> pip install vidcrawler
> vidcrawler_test
```

# youtube-pull-channel

This new command will a channel and all of it's files as mp3s. Great for transcribing and putting into an LLM.


#### Example input `fetch_list.json`

```json
[
    [
        "Health Ranger Report",
        "brighteon",
        "hrreport"
    ],
    [
        "Sydney Watson",
        "youtube",
        "UCSFy-1JrpZf0tFlRZfo-Rvw"
    ],
    [
        "Computing Forever",
        "bitchute",
        "hybm74uihjkf"
    ],
    [
        "ThePeteSantilliShow",
        "rumble",
        "ThePeteSantilliShow"
    ],
    [
        "Macroaggressions",
        "odysee",
        "Macroaggressions"
    ]
]
```

#### Example Output:

```json
[
  {
    "channel_name": "ThePeteSantilliShow",
    "title": "The damage this caused is now being totaled up",
    "date_published": "2022-05-17T05:00:11+00:00",
    "date_lastupdated": "2022-05-17T05:17:18.540084",
    "channel_url": "https://www.youtube.com/channel/UCXIJgqnII2ZOINSWNOGFThA",
    "source": "youtube.com",
    "url": "https://www.youtube.com/watch?v=bwqBudCzDrQ",
    "duration": 254,
    "description": "",
    "img_src": "https://i3.ytimg.com/vi/bwqBudCzDrQ/hqdefault.jpg",
    "iframe_src": "https://youtube.com/embed/bwqBudCzDrQ",
    "views": 1429
  },
  {
     "channel_name": "ThePeteSantilliShow",
     "title": "..."
  }
]
```

# Releases
  * 1.0.39: More pinned deps problems fixed.
  * 1.0.38: One of the scrapers has a pinned dependency, install it with [full]
  * 1.0.37: Misc fixes.
  * 1.0.36: Fixed youtube, rumble and brighteon parsers. Bitchute is still broken and now has rate limits.
  * 1.0.35: Added `update_yt_dlp()` to allow the client to update the downloader.
  * 1.0.34: Upgraded `open-webdriver` to version `1.5.0` to avoid `yt-dlp` urllib incompatibility.
  * 1.0.28: youtube_pull now takes in --channel-name and --output, like the other pullers
  * 1.0.27: Fixed polluting path space from multiple added static-ffmpeg
  * 1.0.24: Added `rumble-pull-channel`
  * 1.0.21: Misc fixes
  * 1.0.16: Make the library downloading more robust.
  * 1.0.15: Improve cleaning filepaths for brighteon_bot
  * 1.0.13: New `brighteon-pull-channel` command
  * 1.0.11: Improves `youtube-pull-channel`
  * 1.0.10: Adds `youtube-pull-channel` which pulls all files down as mp3s for a channel.
  * 1.0.9: Fixes crawler for rumble and minor fixes + linting fixes.
  * 1.0.8: Readme correction.
  * 1.0.7: Fixes Odysee scraper by including image/webp thumbnail format.
  * 1.0.4: Fixes local_now() to be local timezone aware.
  * 1.0.3: Bump
  * 1.0.2: Updates testing
  * 1.0.1: improves command line
  * 1.0.0: Initial release

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/zackees/vidcrawler",
    "name": "vidcrawler",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6.0",
    "maintainer_email": null,
    "keywords": null,
    "author": "Zach Vorhies",
    "author_email": "dont@email.me",
    "download_url": "https://files.pythonhosted.org/packages/80/a2/88fc4e4e107cbf01170a00a161ec3974b77b07ac8d27d2af8813179b0bae/vidcrawler-1.0.39.tar.gz",
    "platform": null,
    "description": "# vidcrawler\r\n\r\nCrawls major videos sites like YouTube/Rumble/Bitchute/Brighteon for video content and outputs a json feed of all the videos that were found.\r\n\r\n## Platform Unit Tests\r\n\r\n[![MacOS_Tests](https://github.com/zackees/vidcrawler/actions/workflows/test_macos.yml/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_macos.yml)\r\n[![Win_Tests](https://github.com/zackees/vidcrawler/actions/workflows/test_win.yml/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_win.yml)\r\n[![Ubuntu_Tests](https://github.com/zackees/vidcrawler/actions/workflows/test_ubuntu.yml/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_ubuntu.yml)\r\n\r\n## Scraper Tests\r\n\r\n[![Scaper_Youtube](https://github.com/zackees/vidcrawler/actions/workflows/test_youtube.yml/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_youtube.yml)\r\n[![Actions Status](https://github.com/zackees/vidcrawler/workflows/Scaper_Rumble/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_rumble.yml)\r\n[![Scaper_Brighteon](https://github.com/zackees/vidcrawler/actions/workflows/test_brighteon.yml/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_brighteon.yml)\r\n[![Actions Status](https://github.com/zackees/vidcrawler/workflows/Scraper_Gabtv/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_gabtv.yml)\r\n[![Actions Status](https://github.com/zackees/vidcrawler/workflows/Scraper_Spotify/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_spotify.yml)\r\n\r\nNote that bitchute doesn't like the github runner's IP and will fail with a 403 forbidden.\r\n[![Actions Status](https://github.com/zackees/vidcrawler/workflows/Scaper_Bitchute/badge.svg)](https://github.com/zackees/vidcrawler/actions/workflows/test_bitchute.yml)\r\n\r\n## API\r\n\r\n#### Command line\r\n\r\n`vidcrawler --input_crawl_json \"fetch_list.json\" --output_json \"out_list.json\"`\r\n\r\n#### Python\r\n\r\n```python\r\nimport json\r\nfrom vidcrawler import crawl_video_sites\r\ncrawl_list = [\r\n    [\r\n        \"Computing Forever\",  # Can be whatever you want.\r\n        \"bitchute\",  # Must be \"youtube\", \"rumble\", \"bitchute\" (and others).\r\n        \"hybm74uihjkf\"  # The channel id on the service.\r\n    ]\r\n]\r\noutput = crawl_video_sites(crawl_list)\r\nprint(json.dumps(output))\r\n```\r\n\r\n\"source\" and \"channel_id\" are used to generate the video-platform-specific urls to fetch data. The \"channel name\"\r\nis echo'd back in the generated json feeds, but doesn't not affect the fetching process in any way.\r\n\r\n## Testing\r\n\r\nInstall vidcrawler and then the command `vidcralwer_test` will become available.\r\n\r\n```bash\r\n> pip install vidcrawler\r\n> vidcrawler_test\r\n```\r\n\r\n# youtube-pull-channel\r\n\r\nThis new command will a channel and all of it's files as mp3s. Great for transcribing and putting into an LLM.\r\n\r\n\r\n#### Example input `fetch_list.json`\r\n\r\n```json\r\n[\r\n    [\r\n        \"Health Ranger Report\",\r\n        \"brighteon\",\r\n        \"hrreport\"\r\n    ],\r\n    [\r\n        \"Sydney Watson\",\r\n        \"youtube\",\r\n        \"UCSFy-1JrpZf0tFlRZfo-Rvw\"\r\n    ],\r\n    [\r\n        \"Computing Forever\",\r\n        \"bitchute\",\r\n        \"hybm74uihjkf\"\r\n    ],\r\n    [\r\n        \"ThePeteSantilliShow\",\r\n        \"rumble\",\r\n        \"ThePeteSantilliShow\"\r\n    ],\r\n    [\r\n        \"Macroaggressions\",\r\n        \"odysee\",\r\n        \"Macroaggressions\"\r\n    ]\r\n]\r\n```\r\n\r\n#### Example Output:\r\n\r\n```json\r\n[\r\n  {\r\n    \"channel_name\": \"ThePeteSantilliShow\",\r\n    \"title\": \"The damage this caused is now being totaled up\",\r\n    \"date_published\": \"2022-05-17T05:00:11+00:00\",\r\n    \"date_lastupdated\": \"2022-05-17T05:17:18.540084\",\r\n    \"channel_url\": \"https://www.youtube.com/channel/UCXIJgqnII2ZOINSWNOGFThA\",\r\n    \"source\": \"youtube.com\",\r\n    \"url\": \"https://www.youtube.com/watch?v=bwqBudCzDrQ\",\r\n    \"duration\": 254,\r\n    \"description\": \"\",\r\n    \"img_src\": \"https://i3.ytimg.com/vi/bwqBudCzDrQ/hqdefault.jpg\",\r\n    \"iframe_src\": \"https://youtube.com/embed/bwqBudCzDrQ\",\r\n    \"views\": 1429\r\n  },\r\n  {\r\n     \"channel_name\": \"ThePeteSantilliShow\",\r\n     \"title\": \"...\"\r\n  }\r\n]\r\n```\r\n\r\n# Releases\r\n  * 1.0.39: More pinned deps problems fixed.\r\n  * 1.0.38: One of the scrapers has a pinned dependency, install it with [full]\r\n  * 1.0.37: Misc fixes.\r\n  * 1.0.36: Fixed youtube, rumble and brighteon parsers. Bitchute is still broken and now has rate limits.\r\n  * 1.0.35: Added `update_yt_dlp()` to allow the client to update the downloader.\r\n  * 1.0.34: Upgraded `open-webdriver` to version `1.5.0` to avoid `yt-dlp` urllib incompatibility.\r\n  * 1.0.28: youtube_pull now takes in --channel-name and --output, like the other pullers\r\n  * 1.0.27: Fixed polluting path space from multiple added static-ffmpeg\r\n  * 1.0.24: Added `rumble-pull-channel`\r\n  * 1.0.21: Misc fixes\r\n  * 1.0.16: Make the library downloading more robust.\r\n  * 1.0.15: Improve cleaning filepaths for brighteon_bot\r\n  * 1.0.13: New `brighteon-pull-channel` command\r\n  * 1.0.11: Improves `youtube-pull-channel`\r\n  * 1.0.10: Adds `youtube-pull-channel` which pulls all files down as mp3s for a channel.\r\n  * 1.0.9: Fixes crawler for rumble and minor fixes + linting fixes.\r\n  * 1.0.8: Readme correction.\r\n  * 1.0.7: Fixes Odysee scraper by including image/webp thumbnail format.\r\n  * 1.0.4: Fixes local_now() to be local timezone aware.\r\n  * 1.0.3: Bump\r\n  * 1.0.2: Updates testing\r\n  * 1.0.1: improves command line\r\n  * 1.0.0: Initial release\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Video Crawler",
    "version": "1.0.39",
    "project_urls": {
        "Homepage": "https://github.com/zackees/vidcrawler"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "adf5534291f1786f0ca5e46296be55d15c9477890065479aaf3d8dc5d29d5758",
                "md5": "67248c8cf26816434cf059d5fa340791",
                "sha256": "4f2aa97bda48a5b66d9dff48a61a99f49e01bddd6e770d922af9c7e8c6cd8924"
            },
            "downloads": -1,
            "filename": "vidcrawler-1.0.39-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "67248c8cf26816434cf059d5fa340791",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.6.0",
            "size": 60279,
            "upload_time": "2024-08-27T01:56:03",
            "upload_time_iso_8601": "2024-08-27T01:56:03.275108Z",
            "url": "https://files.pythonhosted.org/packages/ad/f5/534291f1786f0ca5e46296be55d15c9477890065479aaf3d8dc5d29d5758/vidcrawler-1.0.39-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "80a288fc4e4e107cbf01170a00a161ec3974b77b07ac8d27d2af8813179b0bae",
                "md5": "a40a2158048fecfa298a85fc5d2c817d",
                "sha256": "f16b41c3f45803c7ca0106aa5f523e73ab85d1a14d897935223f95329d805bc3"
            },
            "downloads": -1,
            "filename": "vidcrawler-1.0.39.tar.gz",
            "has_sig": false,
            "md5_digest": "a40a2158048fecfa298a85fc5d2c817d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6.0",
            "size": 46600,
            "upload_time": "2024-08-27T01:56:04",
            "upload_time_iso_8601": "2024-08-27T01:56:04.698894Z",
            "url": "https://files.pythonhosted.org/packages/80/a2/88fc4e4e107cbf01170a00a161ec3974b77b07ac8d27d2af8813179b0bae/vidcrawler-1.0.39.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-27 01:56:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "zackees",
    "github_project": "vidcrawler",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "vidcrawler"
}
        
Elapsed time: 0.76786s