[![PyPI](https://img.shields.io/pypi/v/nba-on-court)](https://pypi.python.org/pypi/nba-on-court)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/shufinskiy/nba-on-court/blob/master/LICENSE)
[![Downloads](https://static.pepy.tech/badge/nba-on-court)](https://pepy.tech/project/nba-on-court)
[![Telegram](https://img.shields.io/badge/telegram-write%20me-blue.svg)](https://t.me/brains14482)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1MuUcSj59kl-FO4X-LBRxxOZtEZLfWLDT)
Fast download of play-by-play data and adding data about players on court in NBA games.
================================================
Update [31-05-2024]: Added the opportunity to work with WNBA data from the repository [nba_data](https://github.com/shufinskiy/nba_data)
------
**nba_on_court** package allows you next things:
1. Fast download play-by-play data from [nba_data](https://github.com/shufinskiy/nba_data) repository
2. Add to play-by-play data information about players who were on court at any given time.
3. Merge play-by-play data from different sources
Instalation
-----------
```bash
pip install nba-on-court
```
Tutorial
--------
To understand work of library, you can study tutorials: in [russian](https://github.com/shufinskiy/nba-on-court/blob/main/example/tutorial_ru.ipynb) and [english](https://github.com/shufinskiy/nba-on-court/blob/main/example/tutorial_en.ipynb). There is also an interactive tutorial on **Google Colab**.
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1MuUcSj59kl-FO4X-LBRxxOZtEZLfWLDT)
### Fast download play-by-play data from [nba_data](https://github.com/shufinskiy/nba_data) repository
With help of the previous version of the library, it was not possible to get play-by-play data, for this it was
necessary to use third-party solutions, for example the library [nba_api](https://github.com/swar/nba_api). The disadvantage of this approach is speed:
NBA website has quite strict limits on the number of requests, so collecting play-by-play data from one
season can take several hours.
[nba_data](https://github.com/shufinskiy/nba_data) repository, which containing play-by-play data from three sources (nba.stats.com , pbpstats.com , data.nba.com),
as well as shotdetail for all games (regular season and playoffs) since the 1996/97 season
(data from pbpstats.com and data.nba.com from the season of their appearance).
Due to the fact that you just download a file from github, downloading one season of play-by-play data will take several
seconds (depends on your internet speed). In 5-10 minutes, you can download the entire array of data for 28 seasons.
Fast loading of play-by-play data is carried out using the **load_nba_data** function.
```python
import nba_on_court as noc
noc.load_nba_data(seasons=2022, data='nbastats')
```
### Add to play-by-play data information about players who were on court at any given time
Play-by-play NBA data contains information about each event in the game
(throw, substitution, foul, etc.) and players who participated in it
(PLAYER1_ID, PLAYER2_ID, PLAYER3_ID).
From this data, we get a list of players who were on court in this
quarter. Then, we need to filter this list to 10 people who started
quarter. This is done by analyzing substitutions in quarter.
**players_on_court** takes play-by-play data as input and returns it with 10
columns of the PLAYER_ID of players who were on court at each time.
**players_name** allows you to replace PLAYER_ID with first and last name of player.
This allows user to understand exactly which players were on court (few know PLAYER_ID
all players in NBA),but it is not necessary to do this before calculations, because the
player's NAME_SURNAME is not unique, unlike PLAYER_ID.
```python
import nba_on_court as noc
from nba_api.stats.endpoints import playbyplayv2
pbp = playbyplayv2.PlayByPlayV2(game_id="0022100001").play_by_play.get_data_frame()
pbp_with_players = noc.players_on_court(pbp)
len(pbp_with_players.columns) - len(pbp.columns)
10
players_id = list(pbp_with_players.iloc[0, 34:].reset_index(drop=True))
print(players_id)
[201142, 1629651, 201933, 201935, 203925, 201572, 201950, 1628960, 203114, 203507]
players_name = noc.players_name(players_id)
print(players_name)
['Kevin Durant', 'Nic Claxton', 'Blake Griffin', 'James Harden', 'Joe Harris',
'Brook Lopez', 'Jrue Holiday', 'Grayson Allen', 'Khris Middleton', 'Giannis Antetokounmpo']
```
You can also replace the PLAYER_ID with the player's name in the entire data frame at once.
```python
cols = ["PLAYER1", "PLAYER2", "PLAYER3", "PLAYER4", "PLAYER5", "PLAYER6", "PLAYER7", "PLAYER8", "PLAYER9", "PLAYER10"]
pbp_with_players.loc[:, cols] = pbp_with_players.loc[:, cols].apply(noc.players_name, result_type="expand")
```
### Merge play-by-play data from different sources
Sometimes you need to combine data from different sources to solve a problem. For example, we want to find out how the
on/off of a partner on the floor affects the player's shot selection. To do this, we need detailed throw data (where
there are coordinates and throw zones), as well as play-by-play data with information about the presence on court in
order to divide the throws according to condition. In the repository nba_data 3 data sources (nba.stats, data.stats and
shotdetail) have a single source: NBA website. Therefore, it is quite easy to combine them by two keys (the name of the
columns differs in different sources):
- Game ID
- Event ID
With data from pbpstats.com more complicated: they initially have a another structure (grouped by possessions),
so they do not have an event ID. At the same time, they contain useful information that is not explicitly available
in other sources (time of possessions, type of possessions start, url of video episode). The only way to combine them
is to use the event DESCRIPTION. The problem here is that the descriptions in nba.stats and pbpstats also do not match
and an attempt to merge them directly will lead to the loss of a certain number of rows.
To solve this problem, I created the functions **left_join_nbastats** and **left_join_pbpstats**. They allow you to
merge play-by-play data with NBA.stats and pbpstats with almost no errors.
```python
import pandas as pd
import nba_on_court as noc
noc.load_nba_data(seasons=2022, data=('nbastats', 'pbpstats'), seasontype='po', untar=True)
nbastats = pd.read_csv('nbastats_po_2022.csv')
pbpstats = pd.read_csv('pbpstats_po_2022.csv')
nbastats = nbastats.loc[nbastats['GAME_ID'] == 42200405].reset_index(drop=True)
pbpstats = pbpstats.loc[pbpstats['GAMEID'] == 42200405].reset_index(drop=True)
print(nbastats.shape, pbpstats.shape)
((463, 34), (396, 19))
full_pbp = noc.left_join_nbastats(nbastats, pbpstats)
print(full_pbp.shape)
(463, 50)
```
### Contact me:
If you have questions or proposal about dataset, you can write me convenient for you in a way.
<div id="header" align="left">
<div id="badges">
<a href="https://www.linkedin.com/in/vladislav-shufinskiy/">
<img src="https://img.shields.io/badge/LinkedIn-blue?style=for-the-badge&logo=linkedin&logoColor=white" alt="LinkedIn Badge"/>
</a>
<a href="https://t.me/brains14482">
<img src="https://img.shields.io/badge/Telegram-blue?style=for-the-badge&logo=telegram&logoColor=white" alt="Telegram Badge"/>
</a>
<a href="https://twitter.com/vshufinskiy">
<img src="https://img.shields.io/badge/Twitter-blue?style=for-the-badge&logo=twitter&logoColor=white" alt="Twitter Badge"/>
</a>
</div>
</div>
Raw data
{
"_id": null,
"home_page": "https://github.com/shufinskiy/nba-on-court",
"name": "nba-on-court",
"maintainer": "Vladislav Shufinsky",
"docs_url": null,
"requires_python": "<4.0,>=3.8",
"maintainer_email": "shufinsky.90210@gmail.com",
"keywords": "api, basketball, data, nba, sports, stats",
"author": "Vladislav Shufinsky",
"author_email": "shufinsky.90210@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/29/23/08ef92efe9a8eb2db1d7c7d55c21c8913fe0f7023f0538a3e4c25a9e6663/nba_on_court-0.2.1.tar.gz",
"platform": null,
"description": "[![PyPI](https://img.shields.io/pypi/v/nba-on-court)](https://pypi.python.org/pypi/nba-on-court)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/shufinskiy/nba-on-court/blob/master/LICENSE)\n[![Downloads](https://static.pepy.tech/badge/nba-on-court)](https://pepy.tech/project/nba-on-court)\n[![Telegram](https://img.shields.io/badge/telegram-write%20me-blue.svg)](https://t.me/brains14482)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1MuUcSj59kl-FO4X-LBRxxOZtEZLfWLDT)\n\nFast download of play-by-play data and adding data about players on court in NBA games.\n================================================\n\nUpdate [31-05-2024]: Added the opportunity to work with WNBA data from the repository [nba_data](https://github.com/shufinskiy/nba_data)\n------\n\n**nba_on_court** package allows you next things:\n1. Fast download play-by-play data from [nba_data](https://github.com/shufinskiy/nba_data) repository\n2. Add to play-by-play data information about players who were on court at any given time.\n3. Merge play-by-play data from different sources\n\nInstalation\n-----------\n\n```bash\npip install nba-on-court\n```\n\nTutorial\n--------\nTo understand work of library, you can study tutorials: in [russian](https://github.com/shufinskiy/nba-on-court/blob/main/example/tutorial_ru.ipynb) and [english](https://github.com/shufinskiy/nba-on-court/blob/main/example/tutorial_en.ipynb). There is also an interactive tutorial on **Google Colab**.\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1MuUcSj59kl-FO4X-LBRxxOZtEZLfWLDT)\n\n### Fast download play-by-play data from [nba_data](https://github.com/shufinskiy/nba_data) repository\n\nWith help of the previous version of the library, it was not possible to get play-by-play data, for this it was \nnecessary to use third-party solutions, for example the library [nba_api](https://github.com/swar/nba_api). The disadvantage of this approach is speed: \nNBA website has quite strict limits on the number of requests, so collecting play-by-play data from one \nseason can take several hours.\n\n[nba_data](https://github.com/shufinskiy/nba_data) repository, which containing play-by-play data from three sources (nba.stats.com , pbpstats.com , data.nba.com),\nas well as shotdetail for all games (regular season and playoffs) since the 1996/97 season \n(data from pbpstats.com and data.nba.com from the season of their appearance). \nDue to the fact that you just download a file from github, downloading one season of play-by-play data will take several \nseconds (depends on your internet speed). In 5-10 minutes, you can download the entire array of data for 28 seasons. \nFast loading of play-by-play data is carried out using the **load_nba_data** function.\n\n```python\nimport nba_on_court as noc\nnoc.load_nba_data(seasons=2022, data='nbastats')\n```\n\n### Add to play-by-play data information about players who were on court at any given time\n\nPlay-by-play NBA data contains information about each event in the game\n(throw, substitution, foul, etc.) and players who participated in it\n(PLAYER1_ID, PLAYER2_ID, PLAYER3_ID).\n\nFrom this data, we get a list of players who were on court in this\nquarter. Then, we need to filter this list to 10 people who started\nquarter. This is done by analyzing substitutions in quarter.\n\n**players_on_court** takes play-by-play data as input and returns it with 10\ncolumns of the PLAYER_ID of players who were on court at each time.\n\n**players_name** allows you to replace PLAYER_ID with first and last name of player.\nThis allows user to understand exactly which players were on court (few know PLAYER_ID\nall players in NBA),but it is not necessary to do this before calculations, because the\nplayer's NAME_SURNAME is not unique, unlike PLAYER_ID.\n\n```python\n\n\nimport nba_on_court as noc\nfrom nba_api.stats.endpoints import playbyplayv2\n\npbp = playbyplayv2.PlayByPlayV2(game_id=\"0022100001\").play_by_play.get_data_frame()\npbp_with_players = noc.players_on_court(pbp)\nlen(pbp_with_players.columns) - len(pbp.columns)\n10\n\nplayers_id = list(pbp_with_players.iloc[0, 34:].reset_index(drop=True))\nprint(players_id)\n[201142, 1629651, 201933, 201935, 203925, 201572, 201950, 1628960, 203114, 203507]\n\nplayers_name = noc.players_name(players_id)\nprint(players_name)\n['Kevin Durant', 'Nic Claxton', 'Blake Griffin', 'James Harden', 'Joe Harris',\n 'Brook Lopez', 'Jrue Holiday', 'Grayson Allen', 'Khris Middleton', 'Giannis Antetokounmpo']\n```\nYou can also replace the PLAYER_ID with the player's name in the entire data frame at once.\n\n```python\n cols = [\"PLAYER1\", \"PLAYER2\", \"PLAYER3\", \"PLAYER4\", \"PLAYER5\", \"PLAYER6\", \"PLAYER7\", \"PLAYER8\", \"PLAYER9\", \"PLAYER10\"]\n pbp_with_players.loc[:, cols] = pbp_with_players.loc[:, cols].apply(noc.players_name, result_type=\"expand\")\n```\n\n### Merge play-by-play data from different sources\n\nSometimes you need to combine data from different sources to solve a problem. For example, we want to find out how the \non/off of a partner on the floor affects the player's shot selection. To do this, we need detailed throw data (where \nthere are coordinates and throw zones), as well as play-by-play data with information about the presence on court in \norder to divide the throws according to condition. In the repository nba_data 3 data sources (nba.stats, data.stats and \nshotdetail) have a single source: NBA website. Therefore, it is quite easy to combine them by two keys (the name of the \ncolumns differs in different sources):\n\n- Game ID\n- Event ID\n\nWith data from pbpstats.com more complicated: they initially have a another structure (grouped by possessions), \nso they do not have an event ID. At the same time, they contain useful information that is not explicitly available \nin other sources (time of possessions, type of possessions start, url of video episode). The only way to combine them \nis to use the event DESCRIPTION. The problem here is that the descriptions in nba.stats and pbpstats also do not match \nand an attempt to merge them directly will lead to the loss of a certain number of rows.\n\nTo solve this problem, I created the functions **left_join_nbastats** and **left_join_pbpstats**. They allow you to \nmerge play-by-play data with NBA.stats and pbpstats with almost no errors.\n\n```python\nimport pandas as pd\nimport nba_on_court as noc\n\nnoc.load_nba_data(seasons=2022, data=('nbastats', 'pbpstats'), seasontype='po', untar=True)\n\nnbastats = pd.read_csv('nbastats_po_2022.csv')\npbpstats = pd.read_csv('pbpstats_po_2022.csv')\n\nnbastats = nbastats.loc[nbastats['GAME_ID'] == 42200405].reset_index(drop=True)\npbpstats = pbpstats.loc[pbpstats['GAMEID'] == 42200405].reset_index(drop=True)\n\nprint(nbastats.shape, pbpstats.shape)\n((463, 34), (396, 19))\n\nfull_pbp = noc.left_join_nbastats(nbastats, pbpstats)\nprint(full_pbp.shape)\n(463, 50)\n```\n\n### Contact me:\n\nIf you have questions or proposal about dataset, you can write me convenient for you in a way.\n\n<div id=\"header\" align=\"left\">\n <div id=\"badges\">\n <a href=\"https://www.linkedin.com/in/vladislav-shufinskiy/\">\n <img src=\"https://img.shields.io/badge/LinkedIn-blue?style=for-the-badge&logo=linkedin&logoColor=white\" alt=\"LinkedIn Badge\"/>\n </a>\n <a href=\"https://t.me/brains14482\">\n <img src=\"https://img.shields.io/badge/Telegram-blue?style=for-the-badge&logo=telegram&logoColor=white\" alt=\"Telegram Badge\"/>\n </a>\n <a href=\"https://twitter.com/vshufinskiy\">\n <img src=\"https://img.shields.io/badge/Twitter-blue?style=for-the-badge&logo=twitter&logoColor=white\" alt=\"Twitter Badge\"/>\n </a>\n </div>\n</div>\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Adding players on court to play-by-play data",
"version": "0.2.1",
"project_urls": {
"Bug Tracker": "https://github.com/shufinskiy/nba-on-court/issues",
"Documentation": "https://github.com/shufinskiy/nba-on-court/blob/master/README.md",
"Homepage": "https://github.com/shufinskiy/nba-on-court",
"Repository": "https://github.com/shufinskiy/nba-on-court"
},
"split_keywords": [
"api",
" basketball",
" data",
" nba",
" sports",
" stats"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ed9161198a698e3e3647fb0428646f4266e54030371062edaf5b280d19ce8130",
"md5": "9c6034fa17c6e45b0ca9ba0ff84dc516",
"sha256": "f29080ff91646ce7b35d18af0ab5edbb533b05937e4ffffedf4f3a694d186fe4"
},
"downloads": -1,
"filename": "nba_on_court-0.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9c6034fa17c6e45b0ca9ba0ff84dc516",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8",
"size": 11276,
"upload_time": "2024-05-31T14:29:06",
"upload_time_iso_8601": "2024-05-31T14:29:06.422898Z",
"url": "https://files.pythonhosted.org/packages/ed/91/61198a698e3e3647fb0428646f4266e54030371062edaf5b280d19ce8130/nba_on_court-0.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "292308ef92efe9a8eb2db1d7c7d55c21c8913fe0f7023f0538a3e4c25a9e6663",
"md5": "9c90a4ef3baf23aa3d312bd66d035aca",
"sha256": "c18384c5db90b6a6168645a33bec2f22d1846603147f51ce3e31349ced5a4960"
},
"downloads": -1,
"filename": "nba_on_court-0.2.1.tar.gz",
"has_sig": false,
"md5_digest": "9c90a4ef3baf23aa3d312bd66d035aca",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8",
"size": 13305,
"upload_time": "2024-05-31T14:30:12",
"upload_time_iso_8601": "2024-05-31T14:30:12.655941Z",
"url": "https://files.pythonhosted.org/packages/29/23/08ef92efe9a8eb2db1d7c7d55c21c8913fe0f7023f0538a3e4c25a9e6663/nba_on_court-0.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-31 14:30:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "shufinskiy",
"github_project": "nba-on-court",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "nba-on-court"
}