Name | CBBpy JSON |
Version |
2.0.2
JSON |
| download |
home_page | |
Summary | A Python-based web scraper for NCAA basketball. |
upload_time | 2024-01-09 04:08:23 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.9 |
license | MIT License Copyright (c) 2022 Daniel Cowan Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
keywords |
college
basketball
scraper
scraping
web scraper
data
espn
analysis
science
analytics
cbb
cbbpy
ncaa
ncaam
ncaaw
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
[![PyPi Version](https://img.shields.io/pypi/v/cbbpy.svg)](https://pypi.org/project/cbbpy/) [![Downloads](https://img.shields.io/pypi/dm/cbbpy?color=be94e4
)](https://pypistats.org/packages/cbbpy)
# CBBpy: A Python-based web scraper for NCAA basketball
## Purpose
This package is designed to bridge the gap between data and analysis for NCAA D1 basketball. CBBpy can grab play-by-play, boxscore, and other game metadata for any NCAA D1 men's or women's basketball game. Inspired by the [ncaahoopR package](https://github.com/lbenz730/ncaahoopR) by Luke Benz - check that out if you are an R user!
## Installation and import
CBBpy requires Python >= 3.9 as well as the following packages:
* pandas>=1.4.2
* numpy>=1.22.3
* python-dateutil>=2.8.2
* pytz>=2022.1
* tqdm>=4.63.0
* lxml>=4.9.0
* joblib>=1.1.0
* bs4>=0.0.1
* requests>=2.27.0
Install using pip:
```
pip install cbbpy
```
The men's and women's scrapers can be imported as such:
```
import cbbpy.mens_scraper as s
import cbbpy.womens_scraper as s
```
## Functions available in CBBpy
NOTE: game ID, as far as CBBpy is concernced, is a valid **ESPN** game ID
`s.get_game_info(game_id: str)` grabs all the metadata (game date, time, score, teams, referees, etc) for a particular game.
`s.get_game_boxscore(game_id: str)` returns a pandas DataFrame with each player's stats for a particular game.
`s.get_game_pbp(game_id: str)` scrapes the play-by-play tables for a game and returns a pandas DataFrame, with each entry representing a play made during the game.
`s.get_game(game_id: str, info: bool = True, box: bool = True, pbp: bool = True)` gets *all* information about a game (game info, boxscore, PBP) and returns a tuple of results `(game_info, boxscore, pbp)`. `info, box, pbp` are booleans which users can set to `False` if there is any information they wish not to scrape. For example, `box = False` would return an empty DataFrame for the boxscore info, while scraping PBP and metadata info normally.
`s.get_games_season(season: int, info: bool = True, box: bool = True, pbp: bool = True)` scrapes all game information for all games in a particular season. As an example, to scrape games for the 2020-21 season, call `get_games_season(2021)`. Returns a tuple of 3 DataFrames, similar to `get_game`. See `get_game` for an explanation of booleans `info, box, pbp`.
`s.get_games_range(start_date: str, end_date: str, info: bool = True, box: bool = True, pbp: bool = True)` scrapes all game information for all games between `start_date` and `end_date` (inclusive). As an example, to scrape games between November 30, 2022 and December 10, 2022, call `get_games_season('11-30-2022', '12-10-2022')`. Returns a tuple of 3 DataFrames, similar to `get_game`. See `get_game` for an explanation of booleans `info, box, pbp`.
`s.get_game_ids(date: Union[str, datetime])` returns a list of all game IDs for a particular date.
## Examples
Function call:
```
import cbbpy.mens_scraper as s
s.get_game_info('401522202')
```
Returns:
| | game_id | home_team | home_id | home_rank | home_record | home_score | away_team | away_id | away_rank | away_record | away_score | home_win | num_ots | is_conference | is_neutral | is_postseason | tournament | game_day | game_time | game_loc | arena | arena_capacity | attendance | tv_network | referee_1 | referee_2 | referee_3 |
|---:|----------:|:--------------|----------:|------------:|:--------------|-------------:|:-----------------------|----------:|------------:|:--------------|-------------:|:-----------|----------:|:----------------|:-------------|:----------------|:------------------------------------------------------|:---------------|:-------------|:------------|:------------|-----------------:|-------------:|:-------------|:------------|:--------------|:-------------|
| 0 | 401522202 | UConn Huskies | 41 | 4 | 31-8 | 76 | San Diego State Aztecs | 21 | 5 | 32-7 | 59 | True | 0 | False | True | True | Men's Basketball Championship - National Championship | April 03, 2023 | 06:20 PM PDT | Houston, TX | NRG Stadium | 0 | 72423 | CBS | Ron Groover | Terry Oglesby | Keith Kimble |
Function call:
```
import cbbpy.womens_scraper as s
s.get_game_boxscore('401528028')
```
Returns (partially):
| | game_id | team | player | player_id | position | starter | min | fgm | fga | 2pm | 2pa | 3pm | 3pa | ftm | fta | oreb | dreb | reb | ast | stl | blk | to | pf | pts |
|---:|----------:|:-----------|:------------|------------:|:-----------|:----------|------:|------:|------:|------:|------:|------:|------:|------:|------:|-------:|-------:|------:|------:|------:|------:|-----:|-----:|------:|
| 0 | 401528028 | LSU Tigers | A. Reese | 4433402 | F | True | 29 | 5 | 12 | 5 | 12 | 0 | 0 | 5 | 8 | 6 | 4 | 10 | 5 | 3 | 1 | 0 | 3 | 15 |
| 1 | 401528028 | LSU Tigers | L. Williams | 4280886 | F | True | 37 | 9 | 16 | 9 | 16 | 0 | 0 | 2 | 2 | 1 | 4 | 5 | 0 | 3 | 0 | 3 | 4 | 20 |
| 2 | 401528028 | LSU Tigers | F. Johnson | 4698736 | G | True | 37 | 4 | 11 | 3 | 7 | 1 | 4 | 1 | 1 | 2 | 5 | 7 | 4 | 1 | 0 | 4 | 1 | 10 |
| 3 | 401528028 | LSU Tigers | K. Poole | 4433418 | G | True | 24 | 2 | 3 | 0 | 1 | 2 | 2 | 0 | 2 | 0 | 3 | 3 | 1 | 0 | 1 | 1 | 2 | 6 |
| 4 | 401528028 | LSU Tigers | A. Morris | 4281251 | G | True | 33 | 8 | 14 | 7 | 11 | 1 | 3 | 4 | 4 | 1 | 1 | 2 | 9 | 1 | 0 | 2 | 3 | 21 |
Function call:
```
import cbbpy.mens_scraper as s
s.get_game_pbp('401522202')
```
Returns (partially):
| | game_id | home_team | away_team | play_desc | home_score | away_score | half | secs_left_half | secs_left_reg | play_team | play_type | shooting_play | scoring_play | is_three | shooter | is_assisted | assist_player | shot_x | shot_y |
|---:|----------:|:--------------|:-----------------------|:----------------------------------------------------------------------|-------------:|-------------:|-------:|-----------------:|----------------:|:-----------------------|:-------------------|:----------------|:---------------|:-----------|:-----------------|:--------------|:----------------|---------:|---------:|
| 0 | 401522202 | UConn Huskies | San Diego State Aztecs | Jump Ball won by UConn | 0 | 0 | 1 | 1200 | 2400 | UConn Huskies | jump ball | False | False | False | | False | | nan | nan |
| 1 | 401522202 | UConn Huskies | San Diego State Aztecs | Jordan Hawkins made Jumper. Assisted by Adama Sanogo. | 2 | 0 | 1 | 1174 | 2374 | UConn Huskies | jumper | True | True | False | Jordan Hawkins | True | Adama Sanogo | 18 | 15 |
| 2 | 401522202 | UConn Huskies | San Diego State Aztecs | Lamont Butler made Three Point Jumper. Assisted by Matt Bradley. | 2 | 3 | 1 | 1152 | 2352 | San Diego State Aztecs | three point jumper | True | True | True | Lamont Butler | True | Matt Bradley | 39 | 22 |
| 3 | 401522202 | UConn Huskies | San Diego State Aztecs | Tristen Newton Turnover. | 2 | 3 | 1 | 1130 | 2330 | UConn Huskies | turnover | False | False | False | | False | | nan | nan |
| 4 | 401522202 | UConn Huskies | San Diego State Aztecs | Darrion Trammell made Three Point Jumper. Assisted by Keshad Johnson. | 2 | 6 | 1 | 1108 | 2308 | San Diego State Aztecs | three point jumper | True | True | True | Darrion Trammell | True | Keshad Johnson | 1 | 0 |
## Contact
Feel free to reach out to me directly with any questions, requests, or suggestions at <dnlcowan37@gmail.com>.
Raw data
{
"_id": null,
"home_page": "",
"name": "CBBpy",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "",
"keywords": "college,basketball,scraper,scraping,web scraper,data,espn,analysis,science,analytics,cbb,cbbpy,ncaa,ncaam,ncaaw",
"author": "",
"author_email": "Daniel Cowan <dnlcowan37@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/a2/73/9ce9857cb19177dccfa2224f33d911b0b372ca8d37f90457c4664a762d2c/CBBpy-2.0.2.tar.gz",
"platform": null,
"description": "[![PyPi Version](https://img.shields.io/pypi/v/cbbpy.svg)](https://pypi.org/project/cbbpy/) [![Downloads](https://img.shields.io/pypi/dm/cbbpy?color=be94e4\n)](https://pypistats.org/packages/cbbpy)\n\n# CBBpy: A Python-based web scraper for NCAA basketball\n\n## Purpose\nThis package is designed to bridge the gap between data and analysis for NCAA D1 basketball. CBBpy can grab play-by-play, boxscore, and other game metadata for any NCAA D1 men's or women's basketball game. Inspired by the [ncaahoopR package](https://github.com/lbenz730/ncaahoopR) by Luke Benz - check that out if you are an R user!\n\n## Installation and import\nCBBpy requires Python >= 3.9 as well as the following packages:\n* pandas>=1.4.2\n* numpy>=1.22.3\n* python-dateutil>=2.8.2\n* pytz>=2022.1\n* tqdm>=4.63.0\n* lxml>=4.9.0\n* joblib>=1.1.0\n* bs4>=0.0.1\n* requests>=2.27.0\n\n\nInstall using pip:\n```\npip install cbbpy\n```\n\nThe men's and women's scrapers can be imported as such:\n```\nimport cbbpy.mens_scraper as s\nimport cbbpy.womens_scraper as s\n```\n\n## Functions available in CBBpy\nNOTE: game ID, as far as CBBpy is concernced, is a valid **ESPN** game ID\n\n`s.get_game_info(game_id: str)` grabs all the metadata (game date, time, score, teams, referees, etc) for a particular game.\n\n`s.get_game_boxscore(game_id: str)` returns a pandas DataFrame with each player's stats for a particular game.\n\n`s.get_game_pbp(game_id: str)` scrapes the play-by-play tables for a game and returns a pandas DataFrame, with each entry representing a play made during the game.\n\n`s.get_game(game_id: str, info: bool = True, box: bool = True, pbp: bool = True)` gets *all* information about a game (game info, boxscore, PBP) and returns a tuple of results `(game_info, boxscore, pbp)`. `info, box, pbp` are booleans which users can set to `False` if there is any information they wish not to scrape. For example, `box = False` would return an empty DataFrame for the boxscore info, while scraping PBP and metadata info normally.\n\n`s.get_games_season(season: int, info: bool = True, box: bool = True, pbp: bool = True)` scrapes all game information for all games in a particular season. As an example, to scrape games for the 2020-21 season, call `get_games_season(2021)`. Returns a tuple of 3 DataFrames, similar to `get_game`. See `get_game` for an explanation of booleans `info, box, pbp`.\n\n`s.get_games_range(start_date: str, end_date: str, info: bool = True, box: bool = True, pbp: bool = True)` scrapes all game information for all games between `start_date` and `end_date` (inclusive). As an example, to scrape games between November 30, 2022 and December 10, 2022, call `get_games_season('11-30-2022', '12-10-2022')`. Returns a tuple of 3 DataFrames, similar to `get_game`. See `get_game` for an explanation of booleans `info, box, pbp`.\n\n`s.get_game_ids(date: Union[str, datetime])` returns a list of all game IDs for a particular date.\n\n## Examples\n\nFunction call: \n\n```\nimport cbbpy.mens_scraper as s\ns.get_game_info('401522202')\n```\n\nReturns: \n| | game_id | home_team | home_id | home_rank | home_record | home_score | away_team | away_id | away_rank | away_record | away_score | home_win | num_ots | is_conference | is_neutral | is_postseason | tournament | game_day | game_time | game_loc | arena | arena_capacity | attendance | tv_network | referee_1 | referee_2 | referee_3 |\n|---:|----------:|:--------------|----------:|------------:|:--------------|-------------:|:-----------------------|----------:|------------:|:--------------|-------------:|:-----------|----------:|:----------------|:-------------|:----------------|:------------------------------------------------------|:---------------|:-------------|:------------|:------------|-----------------:|-------------:|:-------------|:------------|:--------------|:-------------|\n| 0 | 401522202 | UConn Huskies | 41 | 4 | 31-8 | 76 | San Diego State Aztecs | 21 | 5 | 32-7 | 59 | True | 0 | False | True | True | Men's Basketball Championship - National Championship | April 03, 2023 | 06:20 PM PDT | Houston, TX | NRG Stadium | 0 | 72423 | CBS | Ron Groover | Terry Oglesby | Keith Kimble |\n\nFunction call: \n\n```\nimport cbbpy.womens_scraper as s \ns.get_game_boxscore('401528028')\n```\n\nReturns (partially): \n| | game_id | team | player | player_id | position | starter | min | fgm | fga | 2pm | 2pa | 3pm | 3pa | ftm | fta | oreb | dreb | reb | ast | stl | blk | to | pf | pts |\n|---:|----------:|:-----------|:------------|------------:|:-----------|:----------|------:|------:|------:|------:|------:|------:|------:|------:|------:|-------:|-------:|------:|------:|------:|------:|-----:|-----:|------:|\n| 0 | 401528028 | LSU Tigers | A. Reese | 4433402 | F | True | 29 | 5 | 12 | 5 | 12 | 0 | 0 | 5 | 8 | 6 | 4 | 10 | 5 | 3 | 1 | 0 | 3 | 15 |\n| 1 | 401528028 | LSU Tigers | L. Williams | 4280886 | F | True | 37 | 9 | 16 | 9 | 16 | 0 | 0 | 2 | 2 | 1 | 4 | 5 | 0 | 3 | 0 | 3 | 4 | 20 |\n| 2 | 401528028 | LSU Tigers | F. Johnson | 4698736 | G | True | 37 | 4 | 11 | 3 | 7 | 1 | 4 | 1 | 1 | 2 | 5 | 7 | 4 | 1 | 0 | 4 | 1 | 10 |\n| 3 | 401528028 | LSU Tigers | K. Poole | 4433418 | G | True | 24 | 2 | 3 | 0 | 1 | 2 | 2 | 0 | 2 | 0 | 3 | 3 | 1 | 0 | 1 | 1 | 2 | 6 |\n| 4 | 401528028 | LSU Tigers | A. Morris | 4281251 | G | True | 33 | 8 | 14 | 7 | 11 | 1 | 3 | 4 | 4 | 1 | 1 | 2 | 9 | 1 | 0 | 2 | 3 | 21 |\n\nFunction call: \n\n```\nimport cbbpy.mens_scraper as s\ns.get_game_pbp('401522202')\n```\n\nReturns (partially): \n| | game_id | home_team | away_team | play_desc | home_score | away_score | half | secs_left_half | secs_left_reg | play_team | play_type | shooting_play | scoring_play | is_three | shooter | is_assisted | assist_player | shot_x | shot_y |\n|---:|----------:|:--------------|:-----------------------|:----------------------------------------------------------------------|-------------:|-------------:|-------:|-----------------:|----------------:|:-----------------------|:-------------------|:----------------|:---------------|:-----------|:-----------------|:--------------|:----------------|---------:|---------:|\n| 0 | 401522202 | UConn Huskies | San Diego State Aztecs | Jump Ball won by UConn | 0 | 0 | 1 | 1200 | 2400 | UConn Huskies | jump ball | False | False | False | | False | | nan | nan |\n| 1 | 401522202 | UConn Huskies | San Diego State Aztecs | Jordan Hawkins made Jumper. Assisted by Adama Sanogo. | 2 | 0 | 1 | 1174 | 2374 | UConn Huskies | jumper | True | True | False | Jordan Hawkins | True | Adama Sanogo | 18 | 15 |\n| 2 | 401522202 | UConn Huskies | San Diego State Aztecs | Lamont Butler made Three Point Jumper. Assisted by Matt Bradley. | 2 | 3 | 1 | 1152 | 2352 | San Diego State Aztecs | three point jumper | True | True | True | Lamont Butler | True | Matt Bradley | 39 | 22 |\n| 3 | 401522202 | UConn Huskies | San Diego State Aztecs | Tristen Newton Turnover. | 2 | 3 | 1 | 1130 | 2330 | UConn Huskies | turnover | False | False | False | | False | | nan | nan |\n| 4 | 401522202 | UConn Huskies | San Diego State Aztecs | Darrion Trammell made Three Point Jumper. Assisted by Keshad Johnson. | 2 | 6 | 1 | 1108 | 2308 | San Diego State Aztecs | three point jumper | True | True | True | Darrion Trammell | True | Keshad Johnson | 1 | 0 |\n\n## Contact\nFeel free to reach out to me directly with any questions, requests, or suggestions at <dnlcowan37@gmail.com>.\n",
"bugtrack_url": null,
"license": "MIT License Copyright (c) 2022 Daniel Cowan Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
"summary": "A Python-based web scraper for NCAA basketball.",
"version": "2.0.2",
"project_urls": {
"Homepage": "https://github.com/dcstats/CBBpy/"
},
"split_keywords": [
"college",
"basketball",
"scraper",
"scraping",
"web scraper",
"data",
"espn",
"analysis",
"science",
"analytics",
"cbb",
"cbbpy",
"ncaa",
"ncaam",
"ncaaw"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f416b75ad758ec8450044dbc9eefe57dd70c5d973c04e76c6f405d7cbd414f97",
"md5": "7a5f192214ece723e9b4acb82fa85027",
"sha256": "db3c2e12160b1f32d0858d8c1f9a6ffd0edd792894087b6ebbd73a6cd745dace"
},
"downloads": -1,
"filename": "CBBpy-2.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7a5f192214ece723e9b4acb82fa85027",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 15424,
"upload_time": "2024-01-09T04:08:21",
"upload_time_iso_8601": "2024-01-09T04:08:21.490523Z",
"url": "https://files.pythonhosted.org/packages/f4/16/b75ad758ec8450044dbc9eefe57dd70c5d973c04e76c6f405d7cbd414f97/CBBpy-2.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a2739ce9857cb19177dccfa2224f33d911b0b372ca8d37f90457c4664a762d2c",
"md5": "32581fba00048e800cd464472d644638",
"sha256": "337a08d77741f04048c7f04e5db687e0d046385490509d57972f71b7914203bd"
},
"downloads": -1,
"filename": "CBBpy-2.0.2.tar.gz",
"has_sig": false,
"md5_digest": "32581fba00048e800cd464472d644638",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 13864,
"upload_time": "2024-01-09T04:08:23",
"upload_time_iso_8601": "2024-01-09T04:08:23.135900Z",
"url": "https://files.pythonhosted.org/packages/a2/73/9ce9857cb19177dccfa2224f33d911b0b372ca8d37f90457c4664a762d2c/CBBpy-2.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-09 04:08:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dcstats",
"github_project": "CBBpy",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "cbbpy"
}