discord-data


Namediscord-data JSON
Version 0.2.2 PyPI version JSON
download
home_pagehttps://github.com/seanbreckenridge/discord_data
SummaryLibrary to parse the Discord GDPR export
upload_time2024-04-29 02:41:48
maintainerNone
docs_urlNone
authorSean Breckenridge
requires_python>=3.8
licenseMIT
keywords discord data
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## discord_data

Library to parse information from the discord data export, see more info [here](https://support.discord.com/hc/en-us/articles/360004027692).

The request to process the data has to be done manually, and it can take a while for them to deliver it to you.

This supports both the old CSV and new JSON formats for messages.

### Install:

Requires `python3.8+`. To install with pip, run:

    pip install discord_data

## Single Export

This takes the `messages` and `activity` directories as arguments, like:

```python
>>> from discord_data import parse_messages, parse_activity
>>> next(parse_messages("./discord/october_2020/messages"))
>>> next(parse_activity("./discord/october_2020/activity"))
```

`Message(mid='747951969171275807', dt=datetime.datetime(2020, 8, 25, 22, 54, 5, 726000, tzinfo=datetime.timezone.utc), channel=Channel(cid='464051583559139340', name='general', server_name='Dream World'), content='<:NotLikeThis:237729324885606403>', attachments='')`

`Activity(event_id='AQICfXBljgG+pYXCTRrwzy6MqgAAAAA=', event_type='start_listening', region_info=RegionInfo(city='cityNameHere', country_code='US', region_code='CA', time_zone='America/Los_Angeles'), fingerprint=Fingerprint(os='Mac OS X', os_version='16.1.0', browser='Discord Client', ip='216.58.195.78', isp=None, device=None, distro=None), timestamp=datetime.datetime(2016, 11, 26, 7, 8, 47))`

Each of these returns a `Generator`, so they only read from the (giant) JSON files as needed. If you want to process all the data, you can call `list` on it to consume the whole generator:

```python
from discord_data import parse_messages, parse_activity
msg = list(parse_messages("./discord/october_2020/messages"))
acts = list(parse_activity("./discord/october_2020/activity"))
```

The raw activity data includes lots of additional fields, this only includes items I thought would be useful. If you want to parse the JSON blobs yourself, you do so by using `from discord_data import parse_raw_activity`

If you just want to quickly load the parsed data into a REPL:

```shell
python3 -m discord_data ./discord/october_2020
```

That drops you into a python shell with access to `activity` and `messages` variables which include the parsed data

Or, to dump it to JSON:

```
python3 -m discord_data ./discord/october_2020 -o json > discord_data.json
```

## Merge Exports

Exports seem to be complete, but when a server or channel is deleted, all messages in that channel are deleted permanently, so I'd recommend periodically doing an export to make sure you don't lose anything.

I recommend you organize your exports like this:

```
discord
├── march_2021
│   ├── account
│   ├── activity
│   ├── messages
│   ├── programs
│   ├── README.txt
│   └── servers
└── october_2020
    ├── account
    ├── activity
    ├── messages
    ├── programs
    ├── README.txt
    └── servers
```

The `discord` folder at the top would be the `export_dir` keyword argument to the `merge_activity` and `merge_messages` functions, which call the underlying parse functions:

You can choose to supply the arguments with `export_dir` or `paths`:

```python
# locates the corresponding `messages` directories in the folder structure
list(merge_messages(export_dir="./discord"))`
# supply a list of the message directories yourself
list(merge_messages(paths=["./discord/march_2021/messages", "./discord/october_2020/messages"]))
```

If the format for the discord export changes, the parse/merge functions will still work, they just might yield errors as part of their output. To ignore those, you can do:

```python
for msg in merge_messages(export_dir="./discord"):
    if isinstance(msg, Exception):
        logger.warning(msg)
        continue
    # do something with msg
    print(msg.content)
```

Created to be used as part of [`HPI`](https://github.com/seanbreckenridge/HPI)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/seanbreckenridge/discord_data",
    "name": "discord-data",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "discord data",
    "author": "Sean Breckenridge",
    "author_email": "\"seanbrecke@gmail.com\"",
    "download_url": "https://files.pythonhosted.org/packages/24/d2/6c27854ee6d1d29e9cc296d61a9228f4c84b9f4bc6955e17a3fca0e3b37e/discord_data-0.2.2.tar.gz",
    "platform": null,
    "description": "## discord_data\n\nLibrary to parse information from the discord data export, see more info [here](https://support.discord.com/hc/en-us/articles/360004027692).\n\nThe request to process the data has to be done manually, and it can take a while for them to deliver it to you.\n\nThis supports both the old CSV and new JSON formats for messages.\n\n### Install:\n\nRequires `python3.8+`. To install with pip, run:\n\n    pip install discord_data\n\n## Single Export\n\nThis takes the `messages` and `activity` directories as arguments, like:\n\n```python\n>>> from discord_data import parse_messages, parse_activity\n>>> next(parse_messages(\"./discord/october_2020/messages\"))\n>>> next(parse_activity(\"./discord/october_2020/activity\"))\n```\n\n`Message(mid='747951969171275807', dt=datetime.datetime(2020, 8, 25, 22, 54, 5, 726000, tzinfo=datetime.timezone.utc), channel=Channel(cid='464051583559139340', name='general', server_name='Dream World'), content='<:NotLikeThis:237729324885606403>', attachments='')`\n\n`Activity(event_id='AQICfXBljgG+pYXCTRrwzy6MqgAAAAA=', event_type='start_listening', region_info=RegionInfo(city='cityNameHere', country_code='US', region_code='CA', time_zone='America/Los_Angeles'), fingerprint=Fingerprint(os='Mac OS X', os_version='16.1.0', browser='Discord Client', ip='216.58.195.78', isp=None, device=None, distro=None), timestamp=datetime.datetime(2016, 11, 26, 7, 8, 47))`\n\nEach of these returns a `Generator`, so they only read from the (giant) JSON files as needed. If you want to process all the data, you can call `list` on it to consume the whole generator:\n\n```python\nfrom discord_data import parse_messages, parse_activity\nmsg = list(parse_messages(\"./discord/october_2020/messages\"))\nacts = list(parse_activity(\"./discord/october_2020/activity\"))\n```\n\nThe raw activity data includes lots of additional fields, this only includes items I thought would be useful. If you want to parse the JSON blobs yourself, you do so by using `from discord_data import parse_raw_activity`\n\nIf you just want to quickly load the parsed data into a REPL:\n\n```shell\npython3 -m discord_data ./discord/october_2020\n```\n\nThat drops you into a python shell with access to `activity` and `messages` variables which include the parsed data\n\nOr, to dump it to JSON:\n\n```\npython3 -m discord_data ./discord/october_2020 -o json > discord_data.json\n```\n\n## Merge Exports\n\nExports seem to be complete, but when a server or channel is deleted, all messages in that channel are deleted permanently, so I'd recommend periodically doing an export to make sure you don't lose anything.\n\nI recommend you organize your exports like this:\n\n```\ndiscord\n\u251c\u2500\u2500 march_2021\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 account\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 activity\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 messages\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 programs\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 README.txt\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 servers\n\u2514\u2500\u2500 october_2020\n    \u251c\u2500\u2500 account\n    \u251c\u2500\u2500 activity\n    \u251c\u2500\u2500 messages\n    \u251c\u2500\u2500 programs\n    \u251c\u2500\u2500 README.txt\n    \u2514\u2500\u2500 servers\n```\n\nThe `discord` folder at the top would be the `export_dir` keyword argument to the `merge_activity` and `merge_messages` functions, which call the underlying parse functions:\n\nYou can choose to supply the arguments with `export_dir` or `paths`:\n\n```python\n# locates the corresponding `messages` directories in the folder structure\nlist(merge_messages(export_dir=\"./discord\"))`\n# supply a list of the message directories yourself\nlist(merge_messages(paths=[\"./discord/march_2021/messages\", \"./discord/october_2020/messages\"]))\n```\n\nIf the format for the discord export changes, the parse/merge functions will still work, they just might yield errors as part of their output. To ignore those, you can do:\n\n```python\nfor msg in merge_messages(export_dir=\"./discord\"):\n    if isinstance(msg, Exception):\n        logger.warning(msg)\n        continue\n    # do something with msg\n    print(msg.content)\n```\n\nCreated to be used as part of [`HPI`](https://github.com/seanbreckenridge/HPI)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Library to parse the Discord GDPR export",
    "version": "0.2.2",
    "project_urls": {
        "Homepage": "https://github.com/seanbreckenridge/discord_data"
    },
    "split_keywords": [
        "discord",
        "data"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c54d5e1d7f1979801a5fc4b86d324ec045b57a7f24d3e1035bb60a2e0a7e5359",
                "md5": "1fb09cc2231718322cbcf0a8cafa347a",
                "sha256": "8c55290a2d2168d5b5b600c863085c939a6d80240a351017004ee3fdb8810251"
            },
            "downloads": -1,
            "filename": "discord_data-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1fb09cc2231718322cbcf0a8cafa347a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 10709,
            "upload_time": "2024-04-29T02:41:47",
            "upload_time_iso_8601": "2024-04-29T02:41:47.251204Z",
            "url": "https://files.pythonhosted.org/packages/c5/4d/5e1d7f1979801a5fc4b86d324ec045b57a7f24d3e1035bb60a2e0a7e5359/discord_data-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "24d26c27854ee6d1d29e9cc296d61a9228f4c84b9f4bc6955e17a3fca0e3b37e",
                "md5": "98f867518bbbfe6d2bdf0efa1ef7f4d0",
                "sha256": "5fd6a03424acbd5c63bcb7f8a9eac0f7de9ecbd7b92c50d5b215d665515c5b96"
            },
            "downloads": -1,
            "filename": "discord_data-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "98f867518bbbfe6d2bdf0efa1ef7f4d0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 10734,
            "upload_time": "2024-04-29T02:41:48",
            "upload_time_iso_8601": "2024-04-29T02:41:48.914370Z",
            "url": "https://files.pythonhosted.org/packages/24/d2/6c27854ee6d1d29e9cc296d61a9228f4c84b9f4bc6955e17a3fca0e3b37e/discord_data-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-29 02:41:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "seanbreckenridge",
    "github_project": "discord_data",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "discord-data"
}
        
Elapsed time: 0.25205s