# ncaa_stats_py
Allows a user to download and parse data from the National Collegiate Athletics Association (NCAA), and it's member sports.
# Basic Setup
## How to Install
This package is is available through the
[`pip` package manager](https://en.wikipedia.org/wiki/Pip_(package_manager)),
and can be installed through one of the following commands
in your terminal/shell:
```bash
pip install ncaa_stats_py
```
OR
```bash
python -m pip install ncaa_stats_py
```
If you are using a Linux/Mac instance,
you may need to specify `python3` when installing, as shown below:
```bash
python3 -m pip install ncaa_stats_py
```
Alternatively, `cfbd-json-py` can be installed from
this GitHub repository with the following command through pip:
```bash
pip install git+https://github.com/armstjc/ncaa_stats_py
```
OR
```bash
python -m pip install git+https://github.com/armstjc/ncaa_stats_py
```
OR
```bash
python3 -m pip install git+https://github.com/armstjc/ncaa_stats_py
```
## How to Use
`ncaa_stats_py` separates itself by doing the following
things when attempting to get data:
1. Automatically caching any data that is already parsed
2. Automatically forcing a 5 second sleep timer for any HTML call,
to ensure that any function call from this package
won't result in you getting IP banned
(you do not *need* to add sleep timers if you're looping through,
and calling functions in this python package).
3. Automatically refreshing any cached data if the data hasn't been refreshed in a while.
For example, the following code will work as-is,
and in the second loop, the code will load in the teams
even faster because the data is cached
on the device you're running this code.
```python
from timeit import default_timer as timer
from ncaa_stats_py.baseball import (
get_baseball_team_roster,
get_baseball_teams
)
start_time = timer()
# Loads in a table with every DI NCAA baseball team in the 2024 season.
# If this is the first time you run this script,
# it may take some time to repopulate the NCAA baseball team information data.
teams_df = get_baseball_teams(season=2024, level="I")
end_time = timer()
time_elapsed = end_time - start_time
print(f"Elapsed time: {time_elapsed:03f} seconds.\n\n")
# Gets 5 random D1 teams from 2024
teams_df = teams_df.sample(5)
print(teams_df)
print()
# Let's send this to a list to make the loop slightly faster
team_ids_list = teams_df["team_id"].to_list()
# First loop
# If the data isn't cached, it should take 35-40 seconds to do this loop
start_time = timer()
for t_id in team_ids_list:
print(f"On Team ID: {t_id}")
df = get_baseball_team_roster(team_id=t_id)
# print(df)
end_time = timer()
time_elapsed = end_time - start_time
print(f"Elapsed time: {time_elapsed:03f} seconds.\n\n")
# Second loop
# Because the data has been parsed and cached,
# this shouldn't take that long to loop through
start_time = timer()
for t_id in team_ids_list:
print(f"On Team ID: {t_id}")
df = get_baseball_team_roster(team_id=t_id)
# print(df)
end_time = timer()
time_elapsed = end_time - start_time
print(f"Elapsed time: {time_elapsed:03f} seconds.\n\n")
```
# Dependencies
`ncaa_stats_py` is dependent on the following python packages:
- [`beautifulsoup4`](https://www.crummy.com/software/BeautifulSoup/): To assist with parsing HTML data.
- [`lxml`](https://lxml.de/): To work with `beautifulsoup4` in assisting with parsing HTML data.
- [`pandas`](https://github.com/pandas-dev/pandas): For `DataFrame` creation within package functions.
- [`pytz`](https://pythonhosted.org/pytz/): Used to attach timezone information for any date/date time objects encountered by this package.
- [`requests`](https://github.com/psf/requests): Used to make HTTPS requests.
- [`tqdm`](https://github.com/tqdm/tqdm): Used to show progress bars for actions in functions that are known to take minutes to load.
# License
This package is licensed under the MIT license. You can view the package's license [here](https://github.com/armstjc/ncaa_stats_py/blob/main/LICENSE).
# Documentation
For more information about this package, its functions, and ways you can use said functions can be found at [https://armstjc.github.io/ncaa_stats_py/](https://armstjc.github.io/ncaa_stats_py/).
Raw data
{
"_id": null,
"home_page": null,
"name": "ncaa-stats-py",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "Joseph Armstrong <armstrongjoseph08@gmail.com>",
"keywords": "sports, college, college sports, baseball",
"author": null,
"author_email": "Joseph Armstrong <armstrongjoseph08@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/f9/a6/9d8805d25c2e5356d4ae1bad5140a7c0f9612479484875561c6f3146fe84/ncaa_stats_py-0.0.5.tar.gz",
"platform": null,
"description": "# ncaa_stats_py\nAllows a user to download and parse data from the National Collegiate Athletics Association (NCAA), and it's member sports.\n\n# Basic Setup\n\n## How to Install\n\nThis package is is available through the\n[`pip` package manager](https://en.wikipedia.org/wiki/Pip_(package_manager)),\nand can be installed through one of the following commands\nin your terminal/shell:\n\n```bash\npip install ncaa_stats_py\n```\n\nOR\n\n```bash\npython -m pip install ncaa_stats_py\n```\n\nIf you are using a Linux/Mac instance,\nyou may need to specify `python3` when installing, as shown below:\n\n```bash\npython3 -m pip install ncaa_stats_py\n```\n\nAlternatively, `cfbd-json-py` can be installed from\nthis GitHub repository with the following command through pip:\n\n```bash\npip install git+https://github.com/armstjc/ncaa_stats_py\n```\n\nOR\n\n```bash\npython -m pip install git+https://github.com/armstjc/ncaa_stats_py\n```\n\nOR\n\n```bash\npython3 -m pip install git+https://github.com/armstjc/ncaa_stats_py\n```\n\n## How to Use\n`ncaa_stats_py` separates itself by doing the following\nthings when attempting to get data:\n1. Automatically caching any data that is already parsed\n2. Automatically forcing a 5 second sleep timer for any HTML call,\n to ensure that any function call from this package\n won't result in you getting IP banned\n (you do not *need* to add sleep timers if you're looping through,\n and calling functions in this python package).\n3. Automatically refreshing any cached data if the data hasn't been refreshed in a while.\n\nFor example, the following code will work as-is,\n and in the second loop, the code will load in the teams\n even faster because the data is cached\n on the device you're running this code.\n\n```python\nfrom timeit import default_timer as timer\n\nfrom ncaa_stats_py.baseball import (\n get_baseball_team_roster,\n get_baseball_teams\n)\n\nstart_time = timer()\n\n# Loads in a table with every DI NCAA baseball team in the 2024 season.\n# If this is the first time you run this script,\n# it may take some time to repopulate the NCAA baseball team information data.\n\nteams_df = get_baseball_teams(season=2024, level=\"I\")\n\nend_time = timer()\n\ntime_elapsed = end_time - start_time\nprint(f\"Elapsed time: {time_elapsed:03f} seconds.\\n\\n\")\n\n# Gets 5 random D1 teams from 2024\nteams_df = teams_df.sample(5)\nprint(teams_df)\nprint()\n\n\n# Let's send this to a list to make the loop slightly faster\nteam_ids_list = teams_df[\"team_id\"].to_list()\n\n# First loop\n# If the data isn't cached, it should take 35-40 seconds to do this loop\nstart_time = timer()\n\nfor t_id in team_ids_list:\n print(f\"On Team ID: {t_id}\")\n df = get_baseball_team_roster(team_id=t_id)\n # print(df)\n\nend_time = timer()\n\ntime_elapsed = end_time - start_time\nprint(f\"Elapsed time: {time_elapsed:03f} seconds.\\n\\n\")\n\n# Second loop\n# Because the data has been parsed and cached,\n# this shouldn't take that long to loop through\nstart_time = timer()\n\nfor t_id in team_ids_list:\n print(f\"On Team ID: {t_id}\")\n df = get_baseball_team_roster(team_id=t_id)\n # print(df)\n\nend_time = timer()\ntime_elapsed = end_time - start_time\nprint(f\"Elapsed time: {time_elapsed:03f} seconds.\\n\\n\")\n\n```\n\n# Dependencies\n\n`ncaa_stats_py` is dependent on the following python packages:\n- [`beautifulsoup4`](https://www.crummy.com/software/BeautifulSoup/): To assist with parsing HTML data.\n- [`lxml`](https://lxml.de/): To work with `beautifulsoup4` in assisting with parsing HTML data.\n- [`pandas`](https://github.com/pandas-dev/pandas): For `DataFrame` creation within package functions.\n- [`pytz`](https://pythonhosted.org/pytz/): Used to attach timezone information for any date/date time objects encountered by this package.\n- [`requests`](https://github.com/psf/requests): Used to make HTTPS requests.\n- [`tqdm`](https://github.com/tqdm/tqdm): Used to show progress bars for actions in functions that are known to take minutes to load.\n\n# License\n\nThis package is licensed under the MIT license. You can view the package's license [here](https://github.com/armstjc/ncaa_stats_py/blob/main/LICENSE).\n\n# Documentation\n\nFor more information about this package, its functions, and ways you can use said functions can be found at [https://armstjc.github.io/ncaa_stats_py/](https://armstjc.github.io/ncaa_stats_py/).\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Allows a user to download and parse data from the National Collegiate Athletics Association (NCAA), and it's member sports.",
"version": "0.0.5",
"project_urls": {
"changelog": "https://github.com/armstjc/ncaa_stats_py/blob/main/CHANGELOG.md",
"documentation": "https://github.com/armstjc/ncaa_stats_py",
"homepage": "https://github.com/armstjc/ncaa_stats_py",
"repository": "https://github.com/armstjc/ncaa_stats_py.git"
},
"split_keywords": [
"sports",
" college",
" college sports",
" baseball"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "711c5864849affcd0ed343720e5694935b70fa51dc76b86217cfdbf5334101ac",
"md5": "38c82cf3907add94d2b828262250c736",
"sha256": "4792ff23099bad3cdfafc5db8ef8d19717d9c291887ec96a2f9872ef4c4280d5"
},
"downloads": -1,
"filename": "ncaa_stats_py-0.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "38c82cf3907add94d2b828262250c736",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 95043,
"upload_time": "2024-12-17T16:08:15",
"upload_time_iso_8601": "2024-12-17T16:08:15.258638Z",
"url": "https://files.pythonhosted.org/packages/71/1c/5864849affcd0ed343720e5694935b70fa51dc76b86217cfdbf5334101ac/ncaa_stats_py-0.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f9a69d8805d25c2e5356d4ae1bad5140a7c0f9612479484875561c6f3146fe84",
"md5": "8c11260d7d4a29f1f5bc338b69b93618",
"sha256": "b61bba4b202c3edec387cc2e80bfaaf49660e907a2606d380adf094e341f654e"
},
"downloads": -1,
"filename": "ncaa_stats_py-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "8c11260d7d4a29f1f5bc338b69b93618",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 88303,
"upload_time": "2024-12-17T16:08:16",
"upload_time_iso_8601": "2024-12-17T16:08:16.730158Z",
"url": "https://files.pythonhosted.org/packages/f9/a6/9d8805d25c2e5356d4ae1bad5140a7c0f9612479484875561c6f3146fe84/ncaa_stats_py-0.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-17 16:08:16",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "armstjc",
"github_project": "ncaa_stats_py",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "beautifulsoup4",
"specs": [
[
">=",
"4.12.2"
]
]
},
{
"name": "lxml",
"specs": [
[
">=",
"5.3"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"2.2.3"
]
]
},
{
"name": "pytz",
"specs": [
[
">=",
"2024.2"
]
]
},
{
"name": "requests",
"specs": [
[
">=",
"2.32.3"
]
]
},
{
"name": "tqdm",
"specs": [
[
">=",
"4.67.1"
]
]
}
],
"lcname": "ncaa-stats-py"
}