sedd


Namesedd JSON
Version 2.2.1 PyPI version JSON
download
home_pageNone
SummaryUnofficial, community-made tool for downloading the Stack Exchange data dumps
upload_time2025-07-14 22:10:55
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords stack exchange data dump stack exchange data dump downloader archival internet archiving preservation
VCS
bugtrack_url
requirements selenium undetected-geckodriver-lw desktop-notifier watchdog requests loguru
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SE Data Dump Downloader


For more comprehensive information, please read the [main README](https://github.com/LunarWatcher/se-data-dump-transformer/tree/master) on GitHub. This README contains an abridged version of the main README specifically aimed at Pypi users. 

For usage problems not listed in this readme, see the main README. If no information exists, please open an issue on GitHub - keeping the tool accessible to everyone is a priority.

---

The SE Data Dump Downloader (abbreviated `sedd`) is a command line Selenium-based utility for downloading the entire Stack Exchange data dump in their new [anti-community format](https://stackoverflow.com/help/data-dumps), since they decided not to bother providing an official "download all" button. It's one of two components that operate on the data dump in the second project, the other being the (non-python-based) SE data dump transformer - a project that converts the data dump from the not-so-useful official `.xml` format to some other formats. The pypi package is exclusively for the downloader, and does not ship with a copy of the transformer. See the main README if you're looking for the transformer.


For the pypi version, you can download it with:
```python3
pip3 install sedd
```

Note that there are some additional steps before you can start using it, that are detailed in this README.

## Configuration

`sedd` requires a special `config.json` file in the current working directory. There's a template available [on GitHub](https://github.com/LunarWatcher/se-data-dump-transformer/blob/master/config.example.json).

The only two fields you _need_ to fill out in the template is the email and password fields with credentials for a Stack Exchange account. You need to be logged in to download the data dumps, so the downloader needs the credentials to log in on your behalf. It doesn't matter if you're logged into SE elsewhere, as Selenium automatically creates a blank profile every time it starts, which won't include any cookies from SE, which means login is required.

> [!tip]
>
> The downloader can automatically create new accounts in the network for you, if you don't have all 180-whatever accounts on every site in the network already. You can also create these by hand if you prefer for some reason, but you are not required to have all 180+ accounts before using the downloader.

## System requirements and pitfalls

`sedd` is exclusively Firefox-based, due to Chromium completely gutting support for uBlock Origin and custom filters. You need Firefox installed on your system to use `sedd`.

> [!note]
> On Linux and Windows-based systems, geckodriver is [slightly modified](https://pypi.org/project/undetected-geckodriver-lw/). This is an anti-anti-bot measure meant to prevent Cloudflare loops. If you're on macOS and get sent in a captcha loop, it's recommended you switch to Windows or Linux - a Linux VM is also an option if you have no way out of Apple's closed-down ecosystem.

Note that Ubuntu users, or other people who (for whatever reason) choose to use the Snap version of Firefox, have to jump through some extra hoops. The native version of Firefox is strongly encouraged, but if you run into problems with the snap version of Firefox and can't or won't switch, you need to define `export SE_GECKODRIVER=/snap/bin/geckodriver`. Selenium can and will find the snap version of `geckodriver` on its own, but for reasons I simply don't understand, it will still fail with several arbitrary errors. 

### Cloudflare issues or download issues.

Stack Exchange has configured Cloudflare to be _highly_ aggressive, especially to certain countries. You will almost certainly run into captchas, and the downloader is designed to deal with this. After an initial attempt  to solve the captcha on its own, you'll be notified (provided you don't disable the notification provider in `config.json`) and asked to solve it manually. 

If, at this point, it appears to succeed, but you're redirected back  to a full-screen Cloudflare captcha wall, you've likely run into a Cloudflare loop. See [the main README](https://github.com/LunarWatcher/se-data-dump-transformer/tree/master?tab=readme-ov-file#cloudflare-loops) for further help. If this doesn't help, please open an issue.

If the downloads start fine, but later suddenly fail for no good reason, you're likely running into general download instability. This especially applies to `stackoverflow.com.7z`, as its massive size simply increases the chance you wait for it long enough that it flakes out. See [the main README](https://github.com/LunarWatcher/se-data-dump-transformer/tree/master?tab=readme-ov-file#download-instability-particularly-of-stackoverflowcom7z) for further help.

The "Warnings" section in the README may contain additional information about other failure modes not listed here in the future. 

## Using the downloader

With `./config.json` in the current working directory and Firefox installed, you can now run the downloader with:
```python3
sedd
```

For command line flags, see `sedd --help`, or [the main readme](https://github.com/LunarWatcher/se-data-dump-transformer/tree/master?tab=readme-ov-file#cli-options).


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "sedd",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "Stack Exchange, data dump, Stack Exchange data dump, downloader, archival, internet archiving, preservation",
    "author": null,
    "author_email": "LunarWatcher <oliviawolfie@pm.me>",
    "download_url": "https://files.pythonhosted.org/packages/ab/64/a393258afa2bed3985a90eda709830f02abbeff992538a451246ac214956/sedd-2.2.1.tar.gz",
    "platform": null,
    "description": "# SE Data Dump Downloader\n\n\nFor more comprehensive information, please read the [main README](https://github.com/LunarWatcher/se-data-dump-transformer/tree/master) on GitHub. This README contains an abridged version of the main README specifically aimed at Pypi users. \n\nFor usage problems not listed in this readme, see the main README. If no information exists, please open an issue on GitHub - keeping the tool accessible to everyone is a priority.\n\n---\n\nThe SE Data Dump Downloader (abbreviated `sedd`) is a command line Selenium-based utility for downloading the entire Stack Exchange data dump in their new [anti-community format](https://stackoverflow.com/help/data-dumps), since they decided not to bother providing an official \"download all\" button. It's one of two components that operate on the data dump in the second project, the other being the (non-python-based) SE data dump transformer - a project that converts the data dump from the not-so-useful official `.xml` format to some other formats. The pypi package is exclusively for the downloader, and does not ship with a copy of the transformer. See the main README if you're looking for the transformer.\n\n\nFor the pypi version, you can download it with:\n```python3\npip3 install sedd\n```\n\nNote that there are some additional steps before you can start using it, that are detailed in this README.\n\n## Configuration\n\n`sedd` requires a special `config.json` file in the current working directory. There's a template available [on GitHub](https://github.com/LunarWatcher/se-data-dump-transformer/blob/master/config.example.json).\n\nThe only two fields you _need_ to fill out in the template is the email and password fields with credentials for a Stack Exchange account. You need to be logged in to download the data dumps, so the downloader needs the credentials to log in on your behalf. It doesn't matter if you're logged into SE elsewhere, as Selenium automatically creates a blank profile every time it starts, which won't include any cookies from SE, which means login is required.\n\n> [!tip]\n>\n> The downloader can automatically create new accounts in the network for you, if you don't have all 180-whatever accounts on every site in the network already. You can also create these by hand if you prefer for some reason, but you are not required to have all 180+ accounts before using the downloader.\n\n## System requirements and pitfalls\n\n`sedd` is exclusively Firefox-based, due to Chromium completely gutting support for uBlock Origin and custom filters. You need Firefox installed on your system to use `sedd`.\n\n> [!note]\n> On Linux and Windows-based systems, geckodriver is [slightly modified](https://pypi.org/project/undetected-geckodriver-lw/). This is an anti-anti-bot measure meant to prevent Cloudflare loops. If you're on macOS and get sent in a captcha loop, it's recommended you switch to Windows or Linux - a Linux VM is also an option if you have no way out of Apple's closed-down ecosystem.\n\nNote that Ubuntu users, or other people who (for whatever reason) choose to use the Snap version of Firefox, have to jump through some extra hoops. The native version of Firefox is strongly encouraged, but if you run into problems with the snap version of Firefox and can't or won't switch, you need to define `export SE_GECKODRIVER=/snap/bin/geckodriver`. Selenium can and will find the snap version of `geckodriver` on its own, but for reasons I simply don't understand, it will still fail with several arbitrary errors. \n\n### Cloudflare issues or download issues.\n\nStack Exchange has configured Cloudflare to be _highly_ aggressive, especially to certain countries. You will almost certainly run into captchas, and the downloader is designed to deal with this. After an initial attempt  to solve the captcha on its own, you'll be notified (provided you don't disable the notification provider in `config.json`) and asked to solve it manually. \n\nIf, at this point, it appears to succeed, but you're redirected back  to a full-screen Cloudflare captcha wall, you've likely run into a Cloudflare loop. See [the main README](https://github.com/LunarWatcher/se-data-dump-transformer/tree/master?tab=readme-ov-file#cloudflare-loops) for further help. If this doesn't help, please open an issue.\n\nIf the downloads start fine, but later suddenly fail for no good reason, you're likely running into general download instability. This especially applies to `stackoverflow.com.7z`, as its massive size simply increases the chance you wait for it long enough that it flakes out. See [the main README](https://github.com/LunarWatcher/se-data-dump-transformer/tree/master?tab=readme-ov-file#download-instability-particularly-of-stackoverflowcom7z) for further help.\n\nThe \"Warnings\" section in the README may contain additional information about other failure modes not listed here in the future. \n\n## Using the downloader\n\nWith `./config.json` in the current working directory and Firefox installed, you can now run the downloader with:\n```python3\nsedd\n```\n\nFor command line flags, see `sedd --help`, or [the main readme](https://github.com/LunarWatcher/se-data-dump-transformer/tree/master?tab=readme-ov-file#cli-options).\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Unofficial, community-made tool for downloading the Stack Exchange data dumps",
    "version": "2.2.1",
    "project_urls": {
        "Changelog": "https://github.com/LunarWatcher/se-data-dump-transformer/blob/master/CHANGELOG.md",
        "Documentation": "https://github.com/LunarWatcher/se-data-dump-transformer",
        "Homepage": "https://github.com/LunarWatcher/se-data-dump-transformer",
        "Issues": "https://github.com/LunarWatcher/se-data-dump-transformer/issues",
        "Repository": "https://github.com/LunarWatcher/se-data-dump-transformer.git"
    },
    "split_keywords": [
        "stack exchange",
        " data dump",
        " stack exchange data dump",
        " downloader",
        " archival",
        " internet archiving",
        " preservation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cc1d1285529b518696616cfe1e2ff5647d1b74c07e1447966f6554668391f71b",
                "md5": "04a8259f12832c05e2c398dfdee2a79b",
                "sha256": "e8d7f55c912f07a99e2e5d5f0621290c7f7a77dea021589d301f920bd719c27d"
            },
            "downloads": -1,
            "filename": "sedd-2.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "04a8259f12832c05e2c398dfdee2a79b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 20336,
            "upload_time": "2025-07-14T22:10:53",
            "upload_time_iso_8601": "2025-07-14T22:10:53.641819Z",
            "url": "https://files.pythonhosted.org/packages/cc/1d/1285529b518696616cfe1e2ff5647d1b74c07e1447966f6554668391f71b/sedd-2.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ab64a393258afa2bed3985a90eda709830f02abbeff992538a451246ac214956",
                "md5": "b0f62ca4e5c3877f345a27e112970937",
                "sha256": "adc6780c71a9ab077ac00d88a921352074687a7e021bc2918d56e7dfa4493f22"
            },
            "downloads": -1,
            "filename": "sedd-2.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "b0f62ca4e5c3877f345a27e112970937",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 29717,
            "upload_time": "2025-07-14T22:10:55",
            "upload_time_iso_8601": "2025-07-14T22:10:55.118001Z",
            "url": "https://files.pythonhosted.org/packages/ab/64/a393258afa2bed3985a90eda709830f02abbeff992538a451246ac214956/sedd-2.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-14 22:10:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "LunarWatcher",
    "github_project": "se-data-dump-transformer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "selenium",
            "specs": [
                [
                    "==",
                    "4.32.0"
                ]
            ]
        },
        {
            "name": "undetected-geckodriver-lw",
            "specs": [
                [
                    ">=",
                    "2.1.0"
                ]
            ]
        },
        {
            "name": "desktop-notifier",
            "specs": [
                [
                    "==",
                    "5.0.1"
                ]
            ]
        },
        {
            "name": "watchdog",
            "specs": [
                [
                    "==",
                    "4.0.2"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": []
        },
        {
            "name": "loguru",
            "specs": [
                [
                    ">=",
                    "0.7.3"
                ]
            ]
        }
    ],
    "lcname": "sedd"
}
        
Elapsed time: 0.76724s