sportsball


Namesportsball JSON
Version 0.3.14 PyPI version JSON
download
home_pagehttps://github.com/8W9aG/sportsball
SummaryA library for pulling in and normalising sports stats.
upload_time2025-01-28 13:41:23
maintainerNone
docs_urlNone
authorWill Sackfield
requires_pythonNone
licenseMIT
keywords sports data betting
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # sportsball

<a href="https://pypi.org/project/sportsball/">
    <img alt="PyPi" src="https://img.shields.io/pypi/v/sportsball">
</a>

A library for pulling in and normalising sports stats.

<p align="center">
    <img src="sportsball.png" alt="sportsball" width="200"/>
</p>

## Dependencies :globe_with_meridians:

Python 3.11.6:

- [pandas](https://pandas.pydata.org/)
- [requests](https://requests.readthedocs.io/en/latest/)
- [requests-cache](https://requests-cache.readthedocs.io/en/stable/)
- [python-dateutil](https://github.com/dateutil/dateutil)
- [tqdm](https://github.com/tqdm/tqdm)
- [beautifulsoup](https://www.crummy.com/software/BeautifulSoup/)
- [openpyxl](https://openpyxl.readthedocs.io/en/stable/)
- [joblib](https://joblib.readthedocs.io/en/stable/)
- [pyarrow](https://arrow.apache.org/docs/python/index.html)
- [ipython](https://ipython.org/)
- [pytz](https://pythonhosted.org/pytz/)
- [python-dotenv](https://github.com/theskumar/python-dotenv)
- [geocoder](https://geocoder.readthedocs.io/)
- [retry-requests](https://github.com/bustawin/retry-requests)
- [timezonefinder](https://timezonefinder.michelfe.it/gui)
- [nba_api](https://github.com/swar/nba_api)
- [pydantic](https://docs.pydantic.dev/latest/)
- [flatten_json](https://github.com/amirziai/flatten)
- [pygooglenews](https://github.com/kotartemiy/pygooglenews)
- [extruct](https://github.com/scrapinghub/extruct)
- [wikipedia-api](https://github.com/martin-majlis/Wikipedia-API)
- [tweepy](https://www.tweepy.org/)
- [pytest-is-running](https://github.com/adamchainz/pytest-is-running)
- [PySocks](https://github.com/Anorov/PySocks)
- [func-timeout](https://github.com/kata198/func_timeout)
- [tenacity](https://github.com/jd/tenacity)
- [random_user_agent](https://github.com/Luqman-Ud-Din/random_user_agent)
- [wayback](https://github.com/edgi-govdata-archiving/wayback)

## Raison D'Γͺtre :thought_balloon:

`sportsball` aims to be a library for pulling in historical information about previous sporting games in a standardised fashion for easy data processing.
The models it uses are designed to be used for many different types of sports.

The supported leagues are:

* πŸ‰ [AFL](https://www.afl.com.au/)
* πŸ€ [NBA](https://www.nba.com/)
* πŸ€ [NCAAB](https://www.ncaa.com/sports/basketball-men/d1)
* 🏈 [NCAAF](https://www.ncaa.com/sports/football/fbs)
* 🏈 [NFL](https://www.nfl.com/)

## Architecture :triangular_ruler:

`sportsball` is an object orientated library. The entities are organised like so:

* **Game**: A game within a season.
    * **Team**: The team within the game. Note that in games with individual players a team exists as a wrapper.
        * **Player**: A player within the team.
        * **Odds**: The odds for the team to win the game.
            * **Bookie**: The bookie publishing the odds.
        * **News**: News about the team the day before the game.
        * **Social**: Social posts from the team the day before the game.
    * **Venue**: The venue the game was played in.
        * **Address**: The address information of a venue.
            * **Weather**: The weather at the address.

### Objects

A list of the attributes on each object.

#### Game

A representation of the game within a season.

* **dt**: The timezone aware date/time of the game start.
* **week**: The round of the game within the season.
* **game_number**: The index of the game within the round.
* **venue**: The venue the game took place at.
* **teams**: A list of teams within the game.
* **home_team**: The team representing the home team.
* **away_team**: The ream representing the away team.
* **end_dt**: The timzone aware date/time of the game end.
* **attendance**: How many people attended the game.
* **league**: The league the game belongs to.
* **year** The year the game was in.
* **season_type**: The type of the season the game was played in.
* **postponed**: Whether the game was postponed.

#### Team

A representation of a team within a game.

* **identifier**: The unique identifier for the team.
* **name**: The name of the team.
* **location**: The home location of the team.
* **players**: A list of players with the team for the game.
* **odds**: A list of odds for the team on the game to win.
* **points**: The amount of points scored by this team on the game.
* **ladder_rank**: The ladder rank of the team at the beginning of the round of the game.
* **kicks**: The number of kicks a team produced.
* **news**: News articles about the team a day from the game.
* **social**: Social media posts from the team a day from the game.
* **field_goals**: The sum of the field goals made by the team in the game.
* **field_goals_attempted**: The sum of the field goals attempted by the team in the game.
* **offensive_rebounds**: The number of rebounds during offense by the team in the game.

#### Player

A representation of a player within a team within a game.

* **identifier**: The unique identifier for the player.
* **jersey**: The jersey identifying the player.
* **kicks**: The number of kicks the player made in the game.
* **fumbles**: The number of times the player fumbled the ball in the game.
* **fumbles_lost**: The number of times the player loses possession of the ball due to a fumble and the opposing team recovers the ball.
* **field_goals**: The number of field goals the player made in the game.
* **field_goals_attempted**: The number of field goal attempts the player made in the game.
* **offensive_rebounds**: The number of rebounds during offense by the player made in the game.

#### Odds

A representation of the odds for a team to win within a game.

* **odds**: The decimal odds offered by a bookie for the team to win in the game.
* **bookie**: The bookie offering these odds.

#### Venue

The venue the game is played at.

* **identifier**: The unique identifier for the venue.
* **names**: The name of the venue.
* **address**: The address of the venue.
* **is_grass**: Whether the venue has a grass field.
* **is_indoor**: Whether the venue is indoors.

#### Address

The address of the venue.

* **city**: The city of the address.
* **state**: The state of the address.
* **zipcode**: The postal/zip code of the address.
* **latitude**: The latitude of the address.
* **longitude**: The longitude of the address.
* **housenumber**: The house/street number of the address.
* **weather**: The weather at the address at the game start time.
* **timezone**: The time zone at the address.
* **country**: The country of the address.

#### Weather

The forecasted weather one day out at the address of the game start time.

* **temperature**: The temperature at the address at the game start time.
* **relative_humidity**: The relative humidity at the address at the game start time.

#### News

The news one day out from the game.

* **title**: The title of the article
* **published**: When the article was published.
* **summary**: The summary of the article.
* **source**: The source of the article.

#### Social

Social media posts one day out from the game.

* **network**: The social network this post was made from.
* **post**: The text of the post.
* **comments**: The number of comments on the post.
* **reposts**: The number of reposts.
* **likes**: The number of likes the post received.
* **views**: The number of views the post has.
* **published**: When the post was published.

## Caching

This library uses very aggressive caching due to the large data requirements. If the requests are about a recent game (generally in the last 7 days) the caching is bypassed. The caching is as follows:

1. A joblib disk cache that caches calls to pydantic model creation functions. This changes on every version update to keep the models in sync. This is the fastest cache.
2. A requests cache backed by sqlite that caches requests for 1 year.
3. An attempt to find the response is made to the wayback machine, and used if found.

It's very recommended that the user uses proxies defined in the `PROXIES` environment variable. The more proxies the easier it is to collect data.

## Installation :inbox_tray:

This is a python package hosted on pypi, so to install simply run the following command:

`pip install sportsball`

## Usage example :eyes:

There are many different ways of using sportsball, but we generally recommend the CLI.

### CLI

To fetch a dataframe containing information about a league, you can use the following CLI:

```
sportsball --league=nfl -
```

The final argument denotes the file to write to, in this case `-` is stdout.

### Python

To pull a dataframe containing all the information for a particular league, the following example can be used:

```python
from sportsball import sportsball as spb

ball = spb.SportsBall()
league = ball.league(spb.League.AFL)
df = league.to_frame()
```

This results in a dataframe where each game is represented by all its features.

### Environment

If you wish to use the providers that require API keys, you can create a `.env` file with the following variables inside it:

```
GOOGLE_API_KEY=APIKEY
GRIBSTREAM_API_KEY=APIKEY
X_API_KEY=APIKEY
X_API_SECRET_KEY=APISECRETKEY
X_ACCESS_TOKEN=ACCESSTOKEN
X_ACCESS_TOKEN_SECRET=ACCESSTOKENSECRET
PROXIES=CSVPROXIESLIST
```

## License :memo:

The project is available under the [MIT License](LICENSE).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/8W9aG/sportsball",
    "name": "sportsball",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "sports data betting",
    "author": "Will Sackfield",
    "author_email": "will.sackfield@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ea/49/42d9307782c3d54531911c28431fb364588238aac5be75b002a74cdb1179/sportsball-0.3.14.tar.gz",
    "platform": null,
    "description": "# sportsball\n\n<a href=\"https://pypi.org/project/sportsball/\">\n    <img alt=\"PyPi\" src=\"https://img.shields.io/pypi/v/sportsball\">\n</a>\n\nA library for pulling in and normalising sports stats.\n\n<p align=\"center\">\n    <img src=\"sportsball.png\" alt=\"sportsball\" width=\"200\"/>\n</p>\n\n## Dependencies :globe_with_meridians:\n\nPython 3.11.6:\n\n- [pandas](https://pandas.pydata.org/)\n- [requests](https://requests.readthedocs.io/en/latest/)\n- [requests-cache](https://requests-cache.readthedocs.io/en/stable/)\n- [python-dateutil](https://github.com/dateutil/dateutil)\n- [tqdm](https://github.com/tqdm/tqdm)\n- [beautifulsoup](https://www.crummy.com/software/BeautifulSoup/)\n- [openpyxl](https://openpyxl.readthedocs.io/en/stable/)\n- [joblib](https://joblib.readthedocs.io/en/stable/)\n- [pyarrow](https://arrow.apache.org/docs/python/index.html)\n- [ipython](https://ipython.org/)\n- [pytz](https://pythonhosted.org/pytz/)\n- [python-dotenv](https://github.com/theskumar/python-dotenv)\n- [geocoder](https://geocoder.readthedocs.io/)\n- [retry-requests](https://github.com/bustawin/retry-requests)\n- [timezonefinder](https://timezonefinder.michelfe.it/gui)\n- [nba_api](https://github.com/swar/nba_api)\n- [pydantic](https://docs.pydantic.dev/latest/)\n- [flatten_json](https://github.com/amirziai/flatten)\n- [pygooglenews](https://github.com/kotartemiy/pygooglenews)\n- [extruct](https://github.com/scrapinghub/extruct)\n- [wikipedia-api](https://github.com/martin-majlis/Wikipedia-API)\n- [tweepy](https://www.tweepy.org/)\n- [pytest-is-running](https://github.com/adamchainz/pytest-is-running)\n- [PySocks](https://github.com/Anorov/PySocks)\n- [func-timeout](https://github.com/kata198/func_timeout)\n- [tenacity](https://github.com/jd/tenacity)\n- [random_user_agent](https://github.com/Luqman-Ud-Din/random_user_agent)\n- [wayback](https://github.com/edgi-govdata-archiving/wayback)\n\n## Raison D'\u00eatre :thought_balloon:\n\n`sportsball` aims to be a library for pulling in historical information about previous sporting games in a standardised fashion for easy data processing.\nThe models it uses are designed to be used for many different types of sports.\n\nThe supported leagues are:\n\n* \ud83c\udfc9 [AFL](https://www.afl.com.au/)\n* \ud83c\udfc0 [NBA](https://www.nba.com/)\n* \ud83c\udfc0 [NCAAB](https://www.ncaa.com/sports/basketball-men/d1)\n* \ud83c\udfc8 [NCAAF](https://www.ncaa.com/sports/football/fbs)\n* \ud83c\udfc8 [NFL](https://www.nfl.com/)\n\n## Architecture :triangular_ruler:\n\n`sportsball` is an object orientated library. The entities are organised like so:\n\n* **Game**: A game within a season.\n    * **Team**: The team within the game. Note that in games with individual players a team exists as a wrapper.\n        * **Player**: A player within the team.\n        * **Odds**: The odds for the team to win the game.\n            * **Bookie**: The bookie publishing the odds.\n        * **News**: News about the team the day before the game.\n        * **Social**: Social posts from the team the day before the game.\n    * **Venue**: The venue the game was played in.\n        * **Address**: The address information of a venue.\n            * **Weather**: The weather at the address.\n\n### Objects\n\nA list of the attributes on each object.\n\n#### Game\n\nA representation of the game within a season.\n\n* **dt**: The timezone aware date/time of the game start.\n* **week**: The round of the game within the season.\n* **game_number**: The index of the game within the round.\n* **venue**: The venue the game took place at.\n* **teams**: A list of teams within the game.\n* **home_team**: The team representing the home team.\n* **away_team**: The ream representing the away team.\n* **end_dt**: The timzone aware date/time of the game end.\n* **attendance**: How many people attended the game.\n* **league**: The league the game belongs to.\n* **year** The year the game was in.\n* **season_type**: The type of the season the game was played in.\n* **postponed**: Whether the game was postponed.\n\n#### Team\n\nA representation of a team within a game.\n\n* **identifier**: The unique identifier for the team.\n* **name**: The name of the team.\n* **location**: The home location of the team.\n* **players**: A list of players with the team for the game.\n* **odds**: A list of odds for the team on the game to win.\n* **points**: The amount of points scored by this team on the game.\n* **ladder_rank**: The ladder rank of the team at the beginning of the round of the game.\n* **kicks**: The number of kicks a team produced.\n* **news**: News articles about the team a day from the game.\n* **social**: Social media posts from the team a day from the game.\n* **field_goals**: The sum of the field goals made by the team in the game.\n* **field_goals_attempted**: The sum of the field goals attempted by the team in the game.\n* **offensive_rebounds**: The number of rebounds during offense by the team in the game.\n\n#### Player\n\nA representation of a player within a team within a game.\n\n* **identifier**: The unique identifier for the player.\n* **jersey**: The jersey identifying the player.\n* **kicks**: The number of kicks the player made in the game.\n* **fumbles**: The number of times the player fumbled the ball in the game.\n* **fumbles_lost**: The number of times the player loses possession of the ball due to a fumble and the opposing team recovers the ball.\n* **field_goals**: The number of field goals the player made in the game.\n* **field_goals_attempted**: The number of field goal attempts the player made in the game.\n* **offensive_rebounds**: The number of rebounds during offense by the player made in the game.\n\n#### Odds\n\nA representation of the odds for a team to win within a game.\n\n* **odds**: The decimal odds offered by a bookie for the team to win in the game.\n* **bookie**: The bookie offering these odds.\n\n#### Venue\n\nThe venue the game is played at.\n\n* **identifier**: The unique identifier for the venue.\n* **names**: The name of the venue.\n* **address**: The address of the venue.\n* **is_grass**: Whether the venue has a grass field.\n* **is_indoor**: Whether the venue is indoors.\n\n#### Address\n\nThe address of the venue.\n\n* **city**: The city of the address.\n* **state**: The state of the address.\n* **zipcode**: The postal/zip code of the address.\n* **latitude**: The latitude of the address.\n* **longitude**: The longitude of the address.\n* **housenumber**: The house/street number of the address.\n* **weather**: The weather at the address at the game start time.\n* **timezone**: The time zone at the address.\n* **country**: The country of the address.\n\n#### Weather\n\nThe forecasted weather one day out at the address of the game start time.\n\n* **temperature**: The temperature at the address at the game start time.\n* **relative_humidity**: The relative humidity at the address at the game start time.\n\n#### News\n\nThe news one day out from the game.\n\n* **title**: The title of the article\n* **published**: When the article was published.\n* **summary**: The summary of the article.\n* **source**: The source of the article.\n\n#### Social\n\nSocial media posts one day out from the game.\n\n* **network**: The social network this post was made from.\n* **post**: The text of the post.\n* **comments**: The number of comments on the post.\n* **reposts**: The number of reposts.\n* **likes**: The number of likes the post received.\n* **views**: The number of views the post has.\n* **published**: When the post was published.\n\n## Caching\n\nThis library uses very aggressive caching due to the large data requirements. If the requests are about a recent game (generally in the last 7 days) the caching is bypassed. The caching is as follows:\n\n1. A joblib disk cache that caches calls to pydantic model creation functions. This changes on every version update to keep the models in sync. This is the fastest cache.\n2. A requests cache backed by sqlite that caches requests for 1 year.\n3. An attempt to find the response is made to the wayback machine, and used if found.\n\nIt's very recommended that the user uses proxies defined in the `PROXIES` environment variable. The more proxies the easier it is to collect data.\n\n## Installation :inbox_tray:\n\nThis is a python package hosted on pypi, so to install simply run the following command:\n\n`pip install sportsball`\n\n## Usage example :eyes:\n\nThere are many different ways of using sportsball, but we generally recommend the CLI.\n\n### CLI\n\nTo fetch a dataframe containing information about a league, you can use the following CLI:\n\n```\nsportsball --league=nfl -\n```\n\nThe final argument denotes the file to write to, in this case `-` is stdout.\n\n### Python\n\nTo pull a dataframe containing all the information for a particular league, the following example can be used:\n\n```python\nfrom sportsball import sportsball as spb\n\nball = spb.SportsBall()\nleague = ball.league(spb.League.AFL)\ndf = league.to_frame()\n```\n\nThis results in a dataframe where each game is represented by all its features.\n\n### Environment\n\nIf you wish to use the providers that require API keys, you can create a `.env` file with the following variables inside it:\n\n```\nGOOGLE_API_KEY=APIKEY\nGRIBSTREAM_API_KEY=APIKEY\nX_API_KEY=APIKEY\nX_API_SECRET_KEY=APISECRETKEY\nX_ACCESS_TOKEN=ACCESSTOKEN\nX_ACCESS_TOKEN_SECRET=ACCESSTOKENSECRET\nPROXIES=CSVPROXIESLIST\n```\n\n## License :memo:\n\nThe project is available under the [MIT License](LICENSE).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A library for pulling in and normalising sports stats.",
    "version": "0.3.14",
    "project_urls": {
        "Homepage": "https://github.com/8W9aG/sportsball"
    },
    "split_keywords": [
        "sports",
        "data",
        "betting"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ea4942d9307782c3d54531911c28431fb364588238aac5be75b002a74cdb1179",
                "md5": "23f9a57f220e7c955d9f0eee2806fe59",
                "sha256": "b696c0d2ac1739ed9fbdab2c0060bfe710c2f7799301d34508bf345df1fb9115"
            },
            "downloads": -1,
            "filename": "sportsball-0.3.14.tar.gz",
            "has_sig": false,
            "md5_digest": "23f9a57f220e7c955d9f0eee2806fe59",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 114978,
            "upload_time": "2025-01-28T13:41:23",
            "upload_time_iso_8601": "2025-01-28T13:41:23.643861Z",
            "url": "https://files.pythonhosted.org/packages/ea/49/42d9307782c3d54531911c28431fb364588238aac5be75b002a74cdb1179/sportsball-0.3.14.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-28 13:41:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "8W9aG",
    "github_project": "sportsball",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "sportsball"
}
        
Elapsed time: 0.97241s