yt-community-post-archiver


Nameyt-community-post-archiver JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryArchives YouTube community posts.
upload_time2024-11-10 07:11:33
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseNone
keywords archiver cli community posts youtube
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # yt-community-post-archiver

Archives YouTube community posts. Will try and grab the post's text content, images at
as large of a resolution as possible, polls, and some other various bits of metadata.
Works on members posts too.

Note this was initially written _really_ quickly, and might not work every time
(my Python is also only good at a scripting level). It is also a bit fragile,
and YT updates might break it. Feel free to let me know if it's broken, and if I
have the bandwidth I'll try and fix it.

## Usage

### From pypi

The script is available via [pypi](https://pypi.org/project/yt-community-post-archiver/):

1. [Install Python](https://www.python.org/downloads/).
2. Install via `pip` (or alternatives like [`pipx`](https://github.com/pypa/pipx)):

    ```shell
    pip install yt-community-post-archiver
    ```

3. Run `yt-community-post-archiver`. For example:

   ```shell
   yt-community-post-archiver "https://www.youtube.com/@PomuRainpuff/community"
   ```

   This will spawn a headless Chrome instance (that is, you won't see a Chrome window) and download all posts
   it can find from the provided page, and save text metadata + images in an automatically created folder called
   `archive-output` in the same directory the program was called in. Note this will take a while!

   For info on the options you can use, run with `--help`:

   ```shell
   yt-community-post-archiver --help
   ```

### From the wheel

From [Releases](https://github.com/Pyreko/yt-community-post-archiver/releases), you can install a wheel for this using Python.

1. [Install Python](https://www.python.org/downloads/).

2. Download one of the `.whl` files from [Releases](https://github.com/Pyreko/yt-community-post-archiver/releases)

3. Install the wheel file. For example, if the file you downloaded is called `yt_community_post_archiver-0.1.0-py3-none-any.whl`:

    ```shell
    pip install yt_community_post_archiver-0.1.0-py3-none-any.whl
    ```

4. Run `yt-community-post-archiver`. For example:

   ```shell
   yt-community-post-archiver "https://www.youtube.com/@PomuRainpuff/community"
   ```

   This will spawn a headless Chrome instance (that is, you won't see a Chrome window) and download all posts
   it can find from the provided page, and save text metadata + images in an automatically created folder called
   `archive-output` in the same directory the program was called in. Note this will take a while!

   For info on the options you can use, run with `--help`:

   ```shell
   yt-community-post-archiver --help
   ```

### From the repo

1. Clone the repo.

2. [Install Python](https://www.python.org/downloads/).

3. (Optional) Create and source a venv:

   ```shell
   python3 -m venv venv
   source venv/bin/activate
   ```

4. (Optional) Install `hatch` if you do not already have it:

   ```shell
   pip3 install hatch
   ```

5. Make sure the computer you're running this on has Chrome or Firefox, as it uses a browser to grab posts.

6. Run the archiver using `hatch run yt-community-post-archiver`. For example:

   ```shell
   hatch run yt-community-post-archiver "https://www.youtube.com/@PomuRainpuff/community"
   ```

   This will spawn a headless Chrome instance (that is, you won't see a Chrome window) and download all posts
   it can find from the provided page, and save text metadata + images in an automatically created folder called
   `archive-output` in the same directory the program was called in. Note this will take a while!

   For info on the options you can use, run with `--help`:

   ```shell
   hatch run yt-community-post-archiver --help
   ```

### Example

For example, let's say I ran:

```shell
hatch run yt-community-post-archiver "https://www.youtube.com/@IRyS/community" -o "output/testing" -m 1  
```

This runs the archiver, directed to `https://www.youtube.com/@IRyS/community`, saving to `output/testing`, and gets
a maximum of one post.

At the time of writing, this gives me two files that look like this - `post.json`:

```json
{
    "url": "https://www.youtube.com/post/Ugkxbg1AcEsx5spUWRjgtF8cvXDDgUIW1SFo",
    "text": "Carbonated Love Wallpaper for those who love the thumbnail :D Courtesy of kanauru!  Stream the song if you haven't yet!!\n\n⬇️FULL MV⬇️\nhttps://youtu.be/DjNNpw2x2dU?si=B0heA...",
    "images": [
        "https://yt3.ggpht.com/KfLmUOa22rydRozKY34zopeHP39EN0u_X5qLplQiKQd1i2rxxidrcG4RxH5s3ceGY9ql8VfIQgdA=s3840"
    ],
    "links": [
        "https://www.youtube.com/post/Ugkxbg1AcEsx5spUWRjgtF8cvXDDgUIW1SFo",
        "https://www.youtube.com/watch?v=DjNNpw2x2dU&t=0s",
    ],
    "is_members": false,
    "relative_date": "3 months ago",
    "approximate_num_comments": "111",
    "num_comments": "111",
    "num_thumbs_up": "7.3K",
    "poll": null,
    "when_archived": "2024-10-16 05:20:18.045639+00:00"
}
```

and an image file (`Ugkxbg1AcEsx5spUWRjgtF8cvXDDgUIW1SFo-0`). Note that some details may change throughout the versions;
this document should be updated to reflect that though.

### Set save location

If you want to set the save location, then use `-o`:

```shell
hatch run yt-community-post-archiver "https://www.youtube.com/@IRyS/community" -o "/home/me/my_save"
```

### Logging in

You may want to provide a logged-in instance to this tool as this is the only way to get membership posts or certain details like poll vote percentages.
The tool supports two methods:

#### Use browser profile

I've found this way works a bit better from personal experience. You can re-use an existing browser profile that is
logged into your YouTube account to grab membership posts with the `-p` flag, where the path is where your user
profiles are located (for example, in Chrome, you can find this with `chrome://version`). For example:

```shell
venv/bin/python archiver.py -o output/ -p ~/.config/chromium/  "https://www.youtube.com/@WatsonAmelia/membership"
```

By default this will use the default profile name; if you need to override this then use `-n` as well.

#### Use cookies file

Another method is if you have a Netscape-format cookies file, which you can pass the path with `-c`:

```shell
hatch run yt-community-post-archiver "https://www.youtube.com/@WatsonAmelia/community" -c "/home/me/my_cookies_file.txt"
```

Note that I've personally found this much flakier and occasionally fails in certain situations. It should
work fine if you just want to get a few posts though, and already have a cookie file for things like
`ytarchive`.

### Use Firefox instead of Chrome as the driver

The default driver is Chrome, but Firefox should work as well.

```shell
hatch run yt-community-post-archiver "https://www.youtube.com/@PomuRainpuff/community" -d "firefox"
```

## Notes

- Poll vote percentages can only be shown if you are logged in due to how vote results are only shown if the user has voted before.
  - If you have not voted on the poll before, the tool will temporarily vote for you to grab vote percentages, but will then try to undo the
    vote to avoid messing with anything, but this isn't perfect!

## Other

### How does this work?

This is just a typical Selenium/BeautifulSoup program, that's it. As such, it's simulating being a user and manually
copying + formatting all the data via a browser window. This is very evident if you disable headless mode,
and see all the action.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "yt-community-post-archiver",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "archiver, cli, community, posts, youtube",
    "author": null,
    "author_email": "Pyreko <25498386+Pyreko@users.noreply.github.com>",
    "download_url": null,
    "platform": null,
    "description": "# yt-community-post-archiver\n\nArchives YouTube community posts. Will try and grab the post's text content, images at\nas large of a resolution as possible, polls, and some other various bits of metadata.\nWorks on members posts too.\n\nNote this was initially written _really_ quickly, and might not work every time\n(my Python is also only good at a scripting level). It is also a bit fragile,\nand YT updates might break it. Feel free to let me know if it's broken, and if I\nhave the bandwidth I'll try and fix it.\n\n## Usage\n\n### From pypi\n\nThe script is available via [pypi](https://pypi.org/project/yt-community-post-archiver/):\n\n1. [Install Python](https://www.python.org/downloads/).\n2. Install via `pip` (or alternatives like [`pipx`](https://github.com/pypa/pipx)):\n\n    ```shell\n    pip install yt-community-post-archiver\n    ```\n\n3. Run `yt-community-post-archiver`. For example:\n\n   ```shell\n   yt-community-post-archiver \"https://www.youtube.com/@PomuRainpuff/community\"\n   ```\n\n   This will spawn a headless Chrome instance (that is, you won't see a Chrome window) and download all posts\n   it can find from the provided page, and save text metadata + images in an automatically created folder called\n   `archive-output` in the same directory the program was called in. Note this will take a while!\n\n   For info on the options you can use, run with `--help`:\n\n   ```shell\n   yt-community-post-archiver --help\n   ```\n\n### From the wheel\n\nFrom [Releases](https://github.com/Pyreko/yt-community-post-archiver/releases), you can install a wheel for this using Python.\n\n1. [Install Python](https://www.python.org/downloads/).\n\n2. Download one of the `.whl` files from [Releases](https://github.com/Pyreko/yt-community-post-archiver/releases)\n\n3. Install the wheel file. For example, if the file you downloaded is called `yt_community_post_archiver-0.1.0-py3-none-any.whl`:\n\n    ```shell\n    pip install yt_community_post_archiver-0.1.0-py3-none-any.whl\n    ```\n\n4. Run `yt-community-post-archiver`. For example:\n\n   ```shell\n   yt-community-post-archiver \"https://www.youtube.com/@PomuRainpuff/community\"\n   ```\n\n   This will spawn a headless Chrome instance (that is, you won't see a Chrome window) and download all posts\n   it can find from the provided page, and save text metadata + images in an automatically created folder called\n   `archive-output` in the same directory the program was called in. Note this will take a while!\n\n   For info on the options you can use, run with `--help`:\n\n   ```shell\n   yt-community-post-archiver --help\n   ```\n\n### From the repo\n\n1. Clone the repo.\n\n2. [Install Python](https://www.python.org/downloads/).\n\n3. (Optional) Create and source a venv:\n\n   ```shell\n   python3 -m venv venv\n   source venv/bin/activate\n   ```\n\n4. (Optional) Install `hatch` if you do not already have it:\n\n   ```shell\n   pip3 install hatch\n   ```\n\n5. Make sure the computer you're running this on has Chrome or Firefox, as it uses a browser to grab posts.\n\n6. Run the archiver using `hatch run yt-community-post-archiver`. For example:\n\n   ```shell\n   hatch run yt-community-post-archiver \"https://www.youtube.com/@PomuRainpuff/community\"\n   ```\n\n   This will spawn a headless Chrome instance (that is, you won't see a Chrome window) and download all posts\n   it can find from the provided page, and save text metadata + images in an automatically created folder called\n   `archive-output` in the same directory the program was called in. Note this will take a while!\n\n   For info on the options you can use, run with `--help`:\n\n   ```shell\n   hatch run yt-community-post-archiver --help\n   ```\n\n### Example\n\nFor example, let's say I ran:\n\n```shell\nhatch run yt-community-post-archiver \"https://www.youtube.com/@IRyS/community\" -o \"output/testing\" -m 1  \n```\n\nThis runs the archiver, directed to `https://www.youtube.com/@IRyS/community`, saving to `output/testing`, and gets\na maximum of one post.\n\nAt the time of writing, this gives me two files that look like this - `post.json`:\n\n```json\n{\n    \"url\": \"https://www.youtube.com/post/Ugkxbg1AcEsx5spUWRjgtF8cvXDDgUIW1SFo\",\n    \"text\": \"Carbonated Love Wallpaper for those who love the thumbnail :D Courtesy of kanauru!  Stream the song if you haven't yet!!\\n\\n\u2b07\ufe0fFULL MV\u2b07\ufe0f\\nhttps://youtu.be/DjNNpw2x2dU?si=B0heA...\",\n    \"images\": [\n        \"https://yt3.ggpht.com/KfLmUOa22rydRozKY34zopeHP39EN0u_X5qLplQiKQd1i2rxxidrcG4RxH5s3ceGY9ql8VfIQgdA=s3840\"\n    ],\n    \"links\": [\n        \"https://www.youtube.com/post/Ugkxbg1AcEsx5spUWRjgtF8cvXDDgUIW1SFo\",\n        \"https://www.youtube.com/watch?v=DjNNpw2x2dU&t=0s\",\n    ],\n    \"is_members\": false,\n    \"relative_date\": \"3 months ago\",\n    \"approximate_num_comments\": \"111\",\n    \"num_comments\": \"111\",\n    \"num_thumbs_up\": \"7.3K\",\n    \"poll\": null,\n    \"when_archived\": \"2024-10-16 05:20:18.045639+00:00\"\n}\n```\n\nand an image file (`Ugkxbg1AcEsx5spUWRjgtF8cvXDDgUIW1SFo-0`). Note that some details may change throughout the versions;\nthis document should be updated to reflect that though.\n\n### Set save location\n\nIf you want to set the save location, then use `-o`:\n\n```shell\nhatch run yt-community-post-archiver \"https://www.youtube.com/@IRyS/community\" -o \"/home/me/my_save\"\n```\n\n### Logging in\n\nYou may want to provide a logged-in instance to this tool as this is the only way to get membership posts or certain details like poll vote percentages.\nThe tool supports two methods:\n\n#### Use browser profile\n\nI've found this way works a bit better from personal experience. You can re-use an existing browser profile that is\nlogged into your YouTube account to grab membership posts with the `-p` flag, where the path is where your user\nprofiles are located (for example, in Chrome, you can find this with `chrome://version`). For example:\n\n```shell\nvenv/bin/python archiver.py -o output/ -p ~/.config/chromium/  \"https://www.youtube.com/@WatsonAmelia/membership\"\n```\n\nBy default this will use the default profile name; if you need to override this then use `-n` as well.\n\n#### Use cookies file\n\nAnother method is if you have a Netscape-format cookies file, which you can pass the path with `-c`:\n\n```shell\nhatch run yt-community-post-archiver \"https://www.youtube.com/@WatsonAmelia/community\" -c \"/home/me/my_cookies_file.txt\"\n```\n\nNote that I've personally found this much flakier and occasionally fails in certain situations. It should\nwork fine if you just want to get a few posts though, and already have a cookie file for things like\n`ytarchive`.\n\n### Use Firefox instead of Chrome as the driver\n\nThe default driver is Chrome, but Firefox should work as well.\n\n```shell\nhatch run yt-community-post-archiver \"https://www.youtube.com/@PomuRainpuff/community\" -d \"firefox\"\n```\n\n## Notes\n\n- Poll vote percentages can only be shown if you are logged in due to how vote results are only shown if the user has voted before.\n  - If you have not voted on the poll before, the tool will temporarily vote for you to grab vote percentages, but will then try to undo the\n    vote to avoid messing with anything, but this isn't perfect!\n\n## Other\n\n### How does this work?\n\nThis is just a typical Selenium/BeautifulSoup program, that's it. As such, it's simulating being a user and manually\ncopying + formatting all the data via a browser window. This is very evident if you disable headless mode,\nand see all the action.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Archives YouTube community posts.",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://github.com/Pyreko/yt-community-post-archiver#readme",
        "Issues": "https://github.com/Pyreko/yt-community-post-archiver/issues",
        "Source": "https://github.com/Pyreko/yt-community-post-archiver"
    },
    "split_keywords": [
        "archiver",
        " cli",
        " community",
        " posts",
        " youtube"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b5fe76b07a2f7d91e96f3e7c97b960e6e1b14a508f75308c9aa33dbde65c61b0",
                "md5": "e9f11addb245de9024709ac30b03c4d9",
                "sha256": "2a2d2502c8491ef249e3487cb414c17633c6e088935acf90d4a5ddd7af554965"
            },
            "downloads": -1,
            "filename": "yt_community_post_archiver-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e9f11addb245de9024709ac30b03c4d9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 22125,
            "upload_time": "2024-11-10T07:11:33",
            "upload_time_iso_8601": "2024-11-10T07:11:33.313371Z",
            "url": "https://files.pythonhosted.org/packages/b5/fe/76b07a2f7d91e96f3e7c97b960e6e1b14a508f75308c9aa33dbde65c61b0/yt_community_post_archiver-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-10 07:11:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Pyreko",
    "github_project": "yt-community-post-archiver#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "yt-community-post-archiver"
}
        
Elapsed time: 3.94670s