sampleReddit


NamesampleReddit JSON
Version 0.1.5 PyPI version JSON
download
home_page
SummaryTake snowball samples of Reddit data
upload_time2024-03-07 17:28:50
maintainer
docs_urlNone
author
requires_python>=3.10
license
keywords data praw reddit sample snowball social media
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # sampleReddit 🫴

A streamlined interface for generating snowball samples of Reddit data. 

Snowball sampling is a data collection method that starts with a small set of seeds and iteratively collects data from their connections. This method is particularly useful for collecting data from social media platforms, where the connections between users and communities are often of primary interest. sampleReddit also outputs full documentation of each sampling process.

## Installation

sampleReddit can be installed from PyPI using pip:

```bash
pip install sampleReddit
```

## Quick Start

An annotated example of how to go from a list of seed subreddits to a snowball sample of Reddit comments can be found in this [script](https://github.com/ReedMerrill/sampleReddit-example-files/blob/main/scripts/example-comment-sampling.py).

## Usage

The core functionality of sampleReddit resides in the `sample_reddit` function:

```python
import sampleReddit as sr

sampling_frame, users_df = sr.sample_reddit(
    api_instance=instance,
    seed_subreddits=["politics", "news"],
    post_filter="top",
    time_period="year",
    n_posts="3",
    log_file_path="path/to/log/file.log",
)
```

The above function will conduct a snowball sample of Reddit users by collecting the top 3 posts from the "politics" and "news" subreddits from the past year and then the usernames of all the users who commented on those posts. The function returns two things:

1. A Python dictionary object that documents the sampling frame. It maps subreddits to posts and posts to comments.
2. A `pandas` `DataFrame` with a single column called "users" that lists the users who were sampled.

The library also provides lower-level functions that only sample posts from a subreddit, or comments from a list of posts IDs. For a full list of functions, see the [documentation](https://github.com/ReedMerrill/sampleReddit/wiki).

**Note:** Any access to the Reddit API requires an application that is registered with Reddit via their developer portal. Once your app is registered the `setup_access` function can be used to create an authenticated Reddit API instance. For instructions on how to set up a registered Reddit API application, refer to [this guide](https://github.com/reddit-archive/reddit/wiki/OAuth2-App-Types#script-app).[^1]

[^1]: You will need a regular Reddit user account to complete the app authentication setup.

Testing is performed on Python 3.10, but everything should work on 3.6 or later.

## Documentation

Full package documentation can be found in this repo's [wiki](https://github.com/ReedMerrill/sampleReddit/wiki).

## Acknowledgments

sampleReddit is built on top of the [PRAW](https://github.com/praw-dev/praw) (Python Reddit API Wrapper) library, which provides a comprehensive and flexible interface for the Reddit API.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "sampleReddit",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "data,praw,reddit,sample,snowball,social media",
    "author": "",
    "author_email": "Reed Merrill <reedjmerrill@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/fc/10/ffbf8e7d7e21d7962cf633a466504daed6e373e19394694aea3c11fea8d1/samplereddit-0.1.5.tar.gz",
    "platform": null,
    "description": "# sampleReddit \ud83e\udef4\n\nA streamlined interface for generating snowball samples of Reddit data. \n\nSnowball sampling is a data collection method that starts with a small set of seeds and iteratively collects data from their connections. This method is particularly useful for collecting data from social media platforms, where the connections between users and communities are often of primary interest. sampleReddit also outputs full documentation of each sampling process.\n\n## Installation\n\nsampleReddit can be installed from PyPI using pip:\n\n```bash\npip install sampleReddit\n```\n\n## Quick Start\n\nAn annotated example of how to go from a list of seed subreddits to a snowball sample of Reddit comments can be found in this [script](https://github.com/ReedMerrill/sampleReddit-example-files/blob/main/scripts/example-comment-sampling.py).\n\n## Usage\n\nThe core functionality of sampleReddit resides in the `sample_reddit` function:\n\n```python\nimport sampleReddit as sr\n\nsampling_frame, users_df = sr.sample_reddit(\n    api_instance=instance,\n    seed_subreddits=[\"politics\", \"news\"],\n    post_filter=\"top\",\n    time_period=\"year\",\n    n_posts=\"3\",\n    log_file_path=\"path/to/log/file.log\",\n)\n```\n\nThe above function will conduct a snowball sample of Reddit users by collecting the top 3 posts from the \"politics\" and \"news\" subreddits from the past year and then the usernames of all the users who commented on those posts. The function returns two things:\n\n1. A Python dictionary object that documents the sampling frame. It maps subreddits to posts and posts to comments.\n2. A `pandas` `DataFrame` with a single column called \"users\" that lists the users who were sampled.\n\nThe library also provides lower-level functions that only sample posts from a subreddit, or comments from a list of posts IDs. For a full list of functions, see the [documentation](https://github.com/ReedMerrill/sampleReddit/wiki).\n\n**Note:** Any access to the Reddit API requires an application that is registered with Reddit via their developer portal. Once your app is registered the `setup_access` function can be used to create an authenticated Reddit API instance. For instructions on how to set up a registered Reddit API application, refer to [this guide](https://github.com/reddit-archive/reddit/wiki/OAuth2-App-Types#script-app).[^1]\n\n[^1]: You will need a regular Reddit user account to complete the app authentication setup.\n\nTesting is performed on Python 3.10, but everything should work on 3.6 or later.\n\n## Documentation\n\nFull package documentation can be found in this repo's [wiki](https://github.com/ReedMerrill/sampleReddit/wiki).\n\n## Acknowledgments\n\nsampleReddit is built on top of the [PRAW](https://github.com/praw-dev/praw) (Python Reddit API Wrapper) library, which provides a comprehensive and flexible interface for the Reddit API.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Take snowball samples of Reddit data",
    "version": "0.1.5",
    "project_urls": {
        "Homepage": "https://github.com/ReedMerrill/sampleReddit",
        "Issues": "https://github.com/ReedMerrill/sampleReddit/issues"
    },
    "split_keywords": [
        "data",
        "praw",
        "reddit",
        "sample",
        "snowball",
        "social media"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "349681f5c995c80e3a449726f47cae06cd18b4acfef5d7ddab2c62622bd4bc7a",
                "md5": "766b730a6d7d30108dce94f213469bd1",
                "sha256": "62379fd6c10b10e342e63fa1e68c2b3df59b77a786c2425b009061d87a4704a8"
            },
            "downloads": -1,
            "filename": "samplereddit-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "766b730a6d7d30108dce94f213469bd1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 9261,
            "upload_time": "2024-03-07T17:28:48",
            "upload_time_iso_8601": "2024-03-07T17:28:48.920009Z",
            "url": "https://files.pythonhosted.org/packages/34/96/81f5c995c80e3a449726f47cae06cd18b4acfef5d7ddab2c62622bd4bc7a/samplereddit-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fc10ffbf8e7d7e21d7962cf633a466504daed6e373e19394694aea3c11fea8d1",
                "md5": "78177436c163018dca4e2656485c1162",
                "sha256": "7a2a7f39ad2d131f5e1451d7840db95799e7c4c6dc5976a76ee745197f94e645"
            },
            "downloads": -1,
            "filename": "samplereddit-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "78177436c163018dca4e2656485c1162",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 8787,
            "upload_time": "2024-03-07T17:28:50",
            "upload_time_iso_8601": "2024-03-07T17:28:50.518689Z",
            "url": "https://files.pythonhosted.org/packages/fc/10/ffbf8e7d7e21d7962cf633a466504daed6e373e19394694aea3c11fea8d1/samplereddit-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-07 17:28:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ReedMerrill",
    "github_project": "sampleReddit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "samplereddit"
}
        
Elapsed time: 0.20110s