# sportsball
<a href="https://pypi.org/project/sportsball/">
<img alt="PyPi" src="https://img.shields.io/pypi/v/sportsball">
</a>
A library for pulling in and normalising sports stats.
<p align="center">
<img src="sportsball.png" alt="sportsball" width="200"/>
</p>
## Dependencies :globe_with_meridians:
Python 3.11.6:
- [pandas](https://pandas.pydata.org/)
- [requests](https://requests.readthedocs.io/en/latest/)
- [requests-cache](https://requests-cache.readthedocs.io/en/stable/)
- [python-dateutil](https://github.com/dateutil/dateutil)
- [tqdm](https://github.com/tqdm/tqdm)
- [beautifulsoup](https://www.crummy.com/software/BeautifulSoup/)
- [openpyxl](https://openpyxl.readthedocs.io/en/stable/)
- [joblib](https://joblib.readthedocs.io/en/stable/)
- [pyarrow](https://arrow.apache.org/docs/python/index.html)
- [ipython](https://ipython.org/)
- [pytz](https://pythonhosted.org/pytz/)
- [python-dotenv](https://github.com/theskumar/python-dotenv)
- [geocoder](https://geocoder.readthedocs.io/)
- [retry-requests](https://github.com/bustawin/retry-requests)
- [timezonefinder](https://timezonefinder.michelfe.it/gui)
- [nba_api](https://github.com/swar/nba_api)
- [pydantic](https://docs.pydantic.dev/latest/)
- [flatten_json](https://github.com/amirziai/flatten)
- [pygooglenews](https://github.com/kotartemiy/pygooglenews)
- [extruct](https://github.com/scrapinghub/extruct)
- [wikipedia-api](https://github.com/martin-majlis/Wikipedia-API)
- [tweepy](https://www.tweepy.org/)
- [pytest-is-running](https://github.com/adamchainz/pytest-is-running)
- [PySocks](https://github.com/Anorov/PySocks)
- [func-timeout](https://github.com/kata198/func_timeout)
- [tenacity](https://github.com/jd/tenacity)
- [random_user_agent](https://github.com/Luqman-Ud-Din/random_user_agent)
- [wayback](https://github.com/edgi-govdata-archiving/wayback)
## Raison D'Γͺtre :thought_balloon:
`sportsball` aims to be a library for pulling in historical information about previous sporting games in a standardised fashion for easy data processing.
The models it uses are designed to be used for many different types of sports.
The supported leagues are:
* π [AFL](https://www.afl.com.au/)
* π [NBA](https://www.nba.com/)
* π [NCAAB](https://www.ncaa.com/sports/basketball-men/d1)
* π [NCAAF](https://www.ncaa.com/sports/football/fbs)
* π [NFL](https://www.nfl.com/)
## Architecture :triangular_ruler:
`sportsball` is an object orientated library. The entities are organised like so:
* **Game**: A game within a season.
* **Team**: The team within the game. Note that in games with individual players a team exists as a wrapper.
* **Player**: A player within the team.
* **Odds**: The odds for the team to win the game.
* **Bookie**: The bookie publishing the odds.
* **News**: News about the team the day before the game.
* **Social**: Social posts from the team the day before the game.
* **Venue**: The venue the game was played in.
* **Address**: The address information of a venue.
* **Weather**: The weather at the address.
### Objects
A list of the attributes on each object.
#### Game
A representation of the game within a season.
* **dt**: The timezone aware date/time of the game start.
* **week**: The round of the game within the season.
* **game_number**: The index of the game within the round.
* **venue**: The venue the game took place at.
* **teams**: A list of teams within the game.
* **home_team**: The team representing the home team.
* **away_team**: The ream representing the away team.
* **end_dt**: The timzone aware date/time of the game end.
* **attendance**: How many people attended the game.
* **league**: The league the game belongs to.
* **year** The year the game was in.
* **season_type**: The type of the season the game was played in.
* **postponed**: Whether the game was postponed.
#### Team
A representation of a team within a game.
* **identifier**: The unique identifier for the team.
* **name**: The name of the team.
* **location**: The home location of the team.
* **players**: A list of players with the team for the game.
* **odds**: A list of odds for the team on the game to win.
* **points**: The amount of points scored by this team on the game.
* **ladder_rank**: The ladder rank of the team at the beginning of the round of the game.
* **kicks**: The number of kicks a team produced.
* **news**: News articles about the team a day from the game.
* **social**: Social media posts from the team a day from the game.
* **field_goals**: The sum of the field goals made by the team in the game.
* **field_goals_attempted**: The sum of the field goals attempted by the team in the game.
* **offensive_rebounds**: The number of rebounds during offense by the team in the game.
#### Player
A representation of a player within a team within a game.
* **identifier**: The unique identifier for the player.
* **jersey**: The jersey identifying the player.
* **kicks**: The number of kicks the player made in the game.
* **fumbles**: The number of times the player fumbled the ball in the game.
* **fumbles_lost**: The number of times the player loses possession of the ball due to a fumble and the opposing team recovers the ball.
* **field_goals**: The number of field goals the player made in the game.
* **field_goals_attempted**: The number of field goal attempts the player made in the game.
* **offensive_rebounds**: The number of rebounds during offense by the player made in the game.
#### Odds
A representation of the odds for a team to win within a game.
* **odds**: The decimal odds offered by a bookie for the team to win in the game.
* **bookie**: The bookie offering these odds.
#### Venue
The venue the game is played at.
* **identifier**: The unique identifier for the venue.
* **names**: The name of the venue.
* **address**: The address of the venue.
* **is_grass**: Whether the venue has a grass field.
* **is_indoor**: Whether the venue is indoors.
#### Address
The address of the venue.
* **city**: The city of the address.
* **state**: The state of the address.
* **zipcode**: The postal/zip code of the address.
* **latitude**: The latitude of the address.
* **longitude**: The longitude of the address.
* **housenumber**: The house/street number of the address.
* **weather**: The weather at the address at the game start time.
* **timezone**: The time zone at the address.
* **country**: The country of the address.
#### Weather
The forecasted weather one day out at the address of the game start time.
* **temperature**: The temperature at the address at the game start time.
* **relative_humidity**: The relative humidity at the address at the game start time.
#### News
The news one day out from the game.
* **title**: The title of the article
* **published**: When the article was published.
* **summary**: The summary of the article.
* **source**: The source of the article.
#### Social
Social media posts one day out from the game.
* **network**: The social network this post was made from.
* **post**: The text of the post.
* **comments**: The number of comments on the post.
* **reposts**: The number of reposts.
* **likes**: The number of likes the post received.
* **views**: The number of views the post has.
* **published**: When the post was published.
## Caching
This library uses very aggressive caching due to the large data requirements. If the requests are about a recent game (generally in the last 7 days) the caching is bypassed. The caching is as follows:
1. A joblib disk cache that caches calls to pydantic model creation functions. This changes on every version update to keep the models in sync. This is the fastest cache.
2. A requests cache backed by sqlite that caches requests for 1 year.
3. An attempt to find the response is made to the wayback machine, and used if found.
It's very recommended that the user uses proxies defined in the `PROXIES` environment variable. The more proxies the easier it is to collect data.
## Installation :inbox_tray:
This is a python package hosted on pypi, so to install simply run the following command:
`pip install sportsball`
## Usage example :eyes:
There are many different ways of using sportsball, but we generally recommend the CLI.
### CLI
To fetch a dataframe containing information about a league, you can use the following CLI:
```
sportsball --league=nfl -
```
The final argument denotes the file to write to, in this case `-` is stdout.
### Python
To pull a dataframe containing all the information for a particular league, the following example can be used:
```python
from sportsball import sportsball as spb
ball = spb.SportsBall()
league = ball.league(spb.League.AFL)
df = league.to_frame()
```
This results in a dataframe where each game is represented by all its features.
### Environment
If you wish to use the providers that require API keys, you can create a `.env` file with the following variables inside it:
```
GOOGLE_API_KEY=APIKEY
GRIBSTREAM_API_KEY=APIKEY
X_API_KEY=APIKEY
X_API_SECRET_KEY=APISECRETKEY
X_ACCESS_TOKEN=ACCESSTOKEN
X_ACCESS_TOKEN_SECRET=ACCESSTOKENSECRET
PROXIES=CSVPROXIESLIST
```
## License :memo:
The project is available under the [MIT License](LICENSE).
Raw data
{
"_id": null,
"home_page": "https://github.com/8W9aG/sportsball",
"name": "sportsball",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "sports data betting",
"author": "Will Sackfield",
"author_email": "will.sackfield@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/ea/49/42d9307782c3d54531911c28431fb364588238aac5be75b002a74cdb1179/sportsball-0.3.14.tar.gz",
"platform": null,
"description": "# sportsball\n\n<a href=\"https://pypi.org/project/sportsball/\">\n <img alt=\"PyPi\" src=\"https://img.shields.io/pypi/v/sportsball\">\n</a>\n\nA library for pulling in and normalising sports stats.\n\n<p align=\"center\">\n <img src=\"sportsball.png\" alt=\"sportsball\" width=\"200\"/>\n</p>\n\n## Dependencies :globe_with_meridians:\n\nPython 3.11.6:\n\n- [pandas](https://pandas.pydata.org/)\n- [requests](https://requests.readthedocs.io/en/latest/)\n- [requests-cache](https://requests-cache.readthedocs.io/en/stable/)\n- [python-dateutil](https://github.com/dateutil/dateutil)\n- [tqdm](https://github.com/tqdm/tqdm)\n- [beautifulsoup](https://www.crummy.com/software/BeautifulSoup/)\n- [openpyxl](https://openpyxl.readthedocs.io/en/stable/)\n- [joblib](https://joblib.readthedocs.io/en/stable/)\n- [pyarrow](https://arrow.apache.org/docs/python/index.html)\n- [ipython](https://ipython.org/)\n- [pytz](https://pythonhosted.org/pytz/)\n- [python-dotenv](https://github.com/theskumar/python-dotenv)\n- [geocoder](https://geocoder.readthedocs.io/)\n- [retry-requests](https://github.com/bustawin/retry-requests)\n- [timezonefinder](https://timezonefinder.michelfe.it/gui)\n- [nba_api](https://github.com/swar/nba_api)\n- [pydantic](https://docs.pydantic.dev/latest/)\n- [flatten_json](https://github.com/amirziai/flatten)\n- [pygooglenews](https://github.com/kotartemiy/pygooglenews)\n- [extruct](https://github.com/scrapinghub/extruct)\n- [wikipedia-api](https://github.com/martin-majlis/Wikipedia-API)\n- [tweepy](https://www.tweepy.org/)\n- [pytest-is-running](https://github.com/adamchainz/pytest-is-running)\n- [PySocks](https://github.com/Anorov/PySocks)\n- [func-timeout](https://github.com/kata198/func_timeout)\n- [tenacity](https://github.com/jd/tenacity)\n- [random_user_agent](https://github.com/Luqman-Ud-Din/random_user_agent)\n- [wayback](https://github.com/edgi-govdata-archiving/wayback)\n\n## Raison D'\u00eatre :thought_balloon:\n\n`sportsball` aims to be a library for pulling in historical information about previous sporting games in a standardised fashion for easy data processing.\nThe models it uses are designed to be used for many different types of sports.\n\nThe supported leagues are:\n\n* \ud83c\udfc9 [AFL](https://www.afl.com.au/)\n* \ud83c\udfc0 [NBA](https://www.nba.com/)\n* \ud83c\udfc0 [NCAAB](https://www.ncaa.com/sports/basketball-men/d1)\n* \ud83c\udfc8 [NCAAF](https://www.ncaa.com/sports/football/fbs)\n* \ud83c\udfc8 [NFL](https://www.nfl.com/)\n\n## Architecture :triangular_ruler:\n\n`sportsball` is an object orientated library. The entities are organised like so:\n\n* **Game**: A game within a season.\n * **Team**: The team within the game. Note that in games with individual players a team exists as a wrapper.\n * **Player**: A player within the team.\n * **Odds**: The odds for the team to win the game.\n * **Bookie**: The bookie publishing the odds.\n * **News**: News about the team the day before the game.\n * **Social**: Social posts from the team the day before the game.\n * **Venue**: The venue the game was played in.\n * **Address**: The address information of a venue.\n * **Weather**: The weather at the address.\n\n### Objects\n\nA list of the attributes on each object.\n\n#### Game\n\nA representation of the game within a season.\n\n* **dt**: The timezone aware date/time of the game start.\n* **week**: The round of the game within the season.\n* **game_number**: The index of the game within the round.\n* **venue**: The venue the game took place at.\n* **teams**: A list of teams within the game.\n* **home_team**: The team representing the home team.\n* **away_team**: The ream representing the away team.\n* **end_dt**: The timzone aware date/time of the game end.\n* **attendance**: How many people attended the game.\n* **league**: The league the game belongs to.\n* **year** The year the game was in.\n* **season_type**: The type of the season the game was played in.\n* **postponed**: Whether the game was postponed.\n\n#### Team\n\nA representation of a team within a game.\n\n* **identifier**: The unique identifier for the team.\n* **name**: The name of the team.\n* **location**: The home location of the team.\n* **players**: A list of players with the team for the game.\n* **odds**: A list of odds for the team on the game to win.\n* **points**: The amount of points scored by this team on the game.\n* **ladder_rank**: The ladder rank of the team at the beginning of the round of the game.\n* **kicks**: The number of kicks a team produced.\n* **news**: News articles about the team a day from the game.\n* **social**: Social media posts from the team a day from the game.\n* **field_goals**: The sum of the field goals made by the team in the game.\n* **field_goals_attempted**: The sum of the field goals attempted by the team in the game.\n* **offensive_rebounds**: The number of rebounds during offense by the team in the game.\n\n#### Player\n\nA representation of a player within a team within a game.\n\n* **identifier**: The unique identifier for the player.\n* **jersey**: The jersey identifying the player.\n* **kicks**: The number of kicks the player made in the game.\n* **fumbles**: The number of times the player fumbled the ball in the game.\n* **fumbles_lost**: The number of times the player loses possession of the ball due to a fumble and the opposing team recovers the ball.\n* **field_goals**: The number of field goals the player made in the game.\n* **field_goals_attempted**: The number of field goal attempts the player made in the game.\n* **offensive_rebounds**: The number of rebounds during offense by the player made in the game.\n\n#### Odds\n\nA representation of the odds for a team to win within a game.\n\n* **odds**: The decimal odds offered by a bookie for the team to win in the game.\n* **bookie**: The bookie offering these odds.\n\n#### Venue\n\nThe venue the game is played at.\n\n* **identifier**: The unique identifier for the venue.\n* **names**: The name of the venue.\n* **address**: The address of the venue.\n* **is_grass**: Whether the venue has a grass field.\n* **is_indoor**: Whether the venue is indoors.\n\n#### Address\n\nThe address of the venue.\n\n* **city**: The city of the address.\n* **state**: The state of the address.\n* **zipcode**: The postal/zip code of the address.\n* **latitude**: The latitude of the address.\n* **longitude**: The longitude of the address.\n* **housenumber**: The house/street number of the address.\n* **weather**: The weather at the address at the game start time.\n* **timezone**: The time zone at the address.\n* **country**: The country of the address.\n\n#### Weather\n\nThe forecasted weather one day out at the address of the game start time.\n\n* **temperature**: The temperature at the address at the game start time.\n* **relative_humidity**: The relative humidity at the address at the game start time.\n\n#### News\n\nThe news one day out from the game.\n\n* **title**: The title of the article\n* **published**: When the article was published.\n* **summary**: The summary of the article.\n* **source**: The source of the article.\n\n#### Social\n\nSocial media posts one day out from the game.\n\n* **network**: The social network this post was made from.\n* **post**: The text of the post.\n* **comments**: The number of comments on the post.\n* **reposts**: The number of reposts.\n* **likes**: The number of likes the post received.\n* **views**: The number of views the post has.\n* **published**: When the post was published.\n\n## Caching\n\nThis library uses very aggressive caching due to the large data requirements. If the requests are about a recent game (generally in the last 7 days) the caching is bypassed. The caching is as follows:\n\n1. A joblib disk cache that caches calls to pydantic model creation functions. This changes on every version update to keep the models in sync. This is the fastest cache.\n2. A requests cache backed by sqlite that caches requests for 1 year.\n3. An attempt to find the response is made to the wayback machine, and used if found.\n\nIt's very recommended that the user uses proxies defined in the `PROXIES` environment variable. The more proxies the easier it is to collect data.\n\n## Installation :inbox_tray:\n\nThis is a python package hosted on pypi, so to install simply run the following command:\n\n`pip install sportsball`\n\n## Usage example :eyes:\n\nThere are many different ways of using sportsball, but we generally recommend the CLI.\n\n### CLI\n\nTo fetch a dataframe containing information about a league, you can use the following CLI:\n\n```\nsportsball --league=nfl -\n```\n\nThe final argument denotes the file to write to, in this case `-` is stdout.\n\n### Python\n\nTo pull a dataframe containing all the information for a particular league, the following example can be used:\n\n```python\nfrom sportsball import sportsball as spb\n\nball = spb.SportsBall()\nleague = ball.league(spb.League.AFL)\ndf = league.to_frame()\n```\n\nThis results in a dataframe where each game is represented by all its features.\n\n### Environment\n\nIf you wish to use the providers that require API keys, you can create a `.env` file with the following variables inside it:\n\n```\nGOOGLE_API_KEY=APIKEY\nGRIBSTREAM_API_KEY=APIKEY\nX_API_KEY=APIKEY\nX_API_SECRET_KEY=APISECRETKEY\nX_ACCESS_TOKEN=ACCESSTOKEN\nX_ACCESS_TOKEN_SECRET=ACCESSTOKENSECRET\nPROXIES=CSVPROXIESLIST\n```\n\n## License :memo:\n\nThe project is available under the [MIT License](LICENSE).\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A library for pulling in and normalising sports stats.",
"version": "0.3.14",
"project_urls": {
"Homepage": "https://github.com/8W9aG/sportsball"
},
"split_keywords": [
"sports",
"data",
"betting"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ea4942d9307782c3d54531911c28431fb364588238aac5be75b002a74cdb1179",
"md5": "23f9a57f220e7c955d9f0eee2806fe59",
"sha256": "b696c0d2ac1739ed9fbdab2c0060bfe710c2f7799301d34508bf345df1fb9115"
},
"downloads": -1,
"filename": "sportsball-0.3.14.tar.gz",
"has_sig": false,
"md5_digest": "23f9a57f220e7c955d9f0eee2806fe59",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 114978,
"upload_time": "2025-01-28T13:41:23",
"upload_time_iso_8601": "2025-01-28T13:41:23.643861Z",
"url": "https://files.pythonhosted.org/packages/ea/49/42d9307782c3d54531911c28431fb364588238aac5be75b002a74cdb1179/sportsball-0.3.14.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-28 13:41:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "8W9aG",
"github_project": "sportsball",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "sportsball"
}