basketball-reference-scraper


Namebasketball-reference-scraper JSON
Version 2.0.0 PyPI version JSON
download
home_pagehttps://github.com/vishaalagartha/basketball_reference_scraper
SummaryA Python client for scraping stats and data from Basketball Reference
upload_time2024-03-16 02:28:51
maintainer
docs_urlNone
authorVishaal Agartha
requires_python>=3.6
licenseMIT
keywords nba sports data mining basketball basketball reference basketball-reference.com
VCS
bugtrack_url
requirements beautifulsoup4 bs4 lxml numpy pandas python-dateutil pytz requests six soupsieve unidecode
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # basketball_reference_scraper

[Basketball Reference](https://www.basketball-reference.com/) is a resource to aggregate statistics on NBA teams, seasons, players, and games. This package provides methods to acquire data for all these categories in pre-parsed and simplified formats.

## Installing
### Via `pip`
I wrote this library as an exercise for creating my first PyPi package. Hopefully, you find it easy to use.
Install using the following command:

```
pip install basketball-reference-scraper
```

### Via GitHub
Alternatively, you can just clone this repo and import the libraries at your own discretion.

### Selenium

This package can also capture dynamically rendered content that is being added to the page via JavaScript, rather than baked into the HTML. To achieve this, it uses [Python Selenium](https://selenium-python.readthedocs.io/). Please refer to their [installation instructions](https://selenium-python.readthedocs.io/installation.html) and ensure you have [Chrome webdriver](https://selenium-python.readthedocs.io/installation.html#drivers) installed in and in your `PATH` variable.

## Wait, don't scrapers like this already exist?

Yes, scrapers and APIs do exist. The primary API used currently is for [stats.nba.com](https://stats.nba.com/), but the website blocks too many requests, hindering those who want to acquire a lot of data. Additionally, scrapers for [Basketball Reference](https://www.basketball-reference.com/) do exist, but none of them load dynamically rendered content. These scrapers can only acquire statically loaded content, preventing those who want statistics in certain formats (for example, Player Advanced Stats Per Game).

Most of the scrapers use outdated methodologies of scraping from `'https://widgets.sports-reference.com/'`. This is outdated and Basketball Reference no longer acquires their data from there. Additionally, [Sports Reference recently instituted a rate limiter](https://www.sports-reference.com/bot-traffic.html) preventing users from making an excess of 20 requests/minute. This package abstracts the waiting logic to ensure you never hit this threshold.

### API
Currently, the package contains 5 modules: `teams`, `players`, `seasons`, `box_scores`, `pbp`, `shot_charts`, and `injury_report`. 
The package will be expanding to include other content as well, but this is a start.

For full details on the API please refer to the [documentation](https://github.com/vishaalagartha/basketball_reference_scraper/blob/master/API.md).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/vishaalagartha/basketball_reference_scraper",
    "name": "basketball-reference-scraper",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "nba,sports,data mining,basketball,basketball reference,basketball-reference.com",
    "author": "Vishaal Agartha",
    "author_email": "vishaalagartha@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/19/55/e47c7b646366359413bcd798962a2aa6629cf9404c12e01cc69fc4cce4aa/basketball_reference_scraper-2.0.0.tar.gz",
    "platform": null,
    "description": "# basketball_reference_scraper\n\n[Basketball Reference](https://www.basketball-reference.com/) is a resource to aggregate statistics on NBA teams, seasons, players, and games. This package provides methods to acquire data for all these categories in pre-parsed and simplified formats.\n\n## Installing\n### Via `pip`\nI wrote this library as an exercise for creating my first PyPi package. Hopefully, you find it easy to use.\nInstall using the following command:\n\n```\npip install basketball-reference-scraper\n```\n\n### Via GitHub\nAlternatively, you can just clone this repo and import the libraries at your own discretion.\n\n### Selenium\n\nThis package can also capture dynamically rendered content that is being added to the page via JavaScript, rather than baked into the HTML. To achieve this, it uses [Python Selenium](https://selenium-python.readthedocs.io/). Please refer to their [installation instructions](https://selenium-python.readthedocs.io/installation.html) and ensure you have [Chrome webdriver](https://selenium-python.readthedocs.io/installation.html#drivers) installed in and in your `PATH` variable.\n\n## Wait, don't scrapers like this already exist?\n\nYes, scrapers and APIs do exist. The primary API used currently is for [stats.nba.com](https://stats.nba.com/), but the website blocks too many requests, hindering those who want to acquire a lot of data. Additionally, scrapers for [Basketball Reference](https://www.basketball-reference.com/) do exist, but none of them load dynamically rendered content. These scrapers can only acquire statically loaded content, preventing those who want statistics in certain formats (for example, Player Advanced Stats Per Game).\n\nMost of the scrapers use outdated methodologies of scraping from `'https://widgets.sports-reference.com/'`. This is outdated and Basketball Reference no longer acquires their data from there. Additionally, [Sports Reference recently instituted a rate limiter](https://www.sports-reference.com/bot-traffic.html) preventing users from making an excess of 20 requests/minute. This package abstracts the waiting logic to ensure you never hit this threshold.\n\n### API\nCurrently, the package contains 5 modules: `teams`, `players`, `seasons`, `box_scores`, `pbp`, `shot_charts`, and `injury_report`. \nThe package will be expanding to include other content as well, but this is a start.\n\nFor full details on the API please refer to the [documentation](https://github.com/vishaalagartha/basketball_reference_scraper/blob/master/API.md).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python client for scraping stats and data from Basketball Reference",
    "version": "2.0.0",
    "project_urls": {
        "Homepage": "https://github.com/vishaalagartha/basketball_reference_scraper"
    },
    "split_keywords": [
        "nba",
        "sports",
        "data mining",
        "basketball",
        "basketball reference",
        "basketball-reference.com"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bd874ccef2cf6882d1e62ea4c901725250fc5c61400604053ab9cc72a511c509",
                "md5": "2846129948fa0a0e4f7c8ae4215a98ca",
                "sha256": "6567096ce632a645f36f8956230b93814d16dd1339710be74eb952428e0c2ab5"
            },
            "downloads": -1,
            "filename": "basketball_reference_scraper-2.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2846129948fa0a0e4f7c8ae4215a98ca",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 51915,
            "upload_time": "2024-03-16T02:28:49",
            "upload_time_iso_8601": "2024-03-16T02:28:49.201843Z",
            "url": "https://files.pythonhosted.org/packages/bd/87/4ccef2cf6882d1e62ea4c901725250fc5c61400604053ab9cc72a511c509/basketball_reference_scraper-2.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1955e47c7b646366359413bcd798962a2aa6629cf9404c12e01cc69fc4cce4aa",
                "md5": "edb13cadc80685ba526b42cc405207f5",
                "sha256": "64465aae00f77c73ca2940d810e9311a091b6335fac63b467417c55a33b5674a"
            },
            "downloads": -1,
            "filename": "basketball_reference_scraper-2.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "edb13cadc80685ba526b42cc405207f5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 46685,
            "upload_time": "2024-03-16T02:28:51",
            "upload_time_iso_8601": "2024-03-16T02:28:51.215115Z",
            "url": "https://files.pythonhosted.org/packages/19/55/e47c7b646366359413bcd798962a2aa6629cf9404c12e01cc69fc4cce4aa/basketball_reference_scraper-2.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-16 02:28:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "vishaalagartha",
    "github_project": "basketball_reference_scraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    "==",
                    "4.8.2"
                ]
            ]
        },
        {
            "name": "bs4",
            "specs": [
                [
                    "==",
                    "0.0.1"
                ]
            ]
        },
        {
            "name": "lxml",
            "specs": [
                [
                    "==",
                    "4.6.5"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.21.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "1.3.5"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.8.1"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2019.3"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.22.0"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.13.0"
                ]
            ]
        },
        {
            "name": "soupsieve",
            "specs": [
                [
                    "==",
                    "1.9.5"
                ]
            ]
        },
        {
            "name": "unidecode",
            "specs": [
                [
                    "==",
                    "1.2.0"
                ]
            ]
        }
    ],
    "lcname": "basketball-reference-scraper"
}
        
Elapsed time: 0.22957s