baskref


Namebaskref JSON
Version 0.0.5 PyPI version JSON
download
home_page
SummarybaskRef is a tool to scrape basketball Data from the web.
upload_time2022-12-22 11:34:49
maintainer
docs_urlNone
author
requires_python>=3.9
licenseMIT License Copyright (c) 2022 Dominik Zulovec Sajovic Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords basketball web scraper python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # BaskRef (Basketball Scraper)
BaskRef is a tool to scrape basketball Data from the web.

The goal of this project is to provide a data collection utility for 
NBA basketball data. The collection strategy is to scrape data from 
https://www.basketball-reference.com.
The data can then be saved into a csv to be used by a different utility.

## About the Package

### What data are we collecting?

- games & game stats (in depth stats of the games)
- players game stats

All datasets are available to be collected:
- by day (all games in one day)
- by whole season (regular + playoffs)
- by playoffs

#### Future Collections (Not yet implemented)
- players meta data (Not Implemented)
- game logs (Not Implemented)


## How to Install & Run the Package?

### Install the project
```bash
pip install baskref

# optional set logging level. Default value is INFO
export LOG_LEVEL=DEBUG # INFO, DEBUG, ERROR
```

### Scrape Game Data

Scrape all games for the 7th of January 2022.
```bash
baskref -t g -d 2022-01-07 -fp datasets
# python -c "from baskref import run_baskref; run_baskref()" -t g -d 2022-01-07 -fp datasets
```

Scrape all games for the 2006 NBA season (regular season + playoffs).
```bash
baskref -t gs -y 2006 -fp datasets
# python -c "from baskref import run_baskref; run_baskref()" -t gs -y 2006 -fp datasets
```

Scrape all games for the 2006 NBA playoffs.
```bash
baskref -t gp -y 2006 -fp datasets
# if you don't install the package
# python -c "from baskref import run_baskref; run_baskref()" -t gp -y 2006 -fp datasets
```

### Scrape Game URLs only

```bash
# simply add "u" to any of the three scraping types:
# g -> gu, gs -> gsu, gp -> gpu
baskref -t gu -d 2022-01-07 -fp datasets
```

### Scrape Player Stats Data

```bash
# simply add "pl" to any of the three scraping types:
# g -> gpl, gs -> gspl, gp -> gppl
baskref -t gpl -d 2022-01-07 -fp datasets
```

### Scrape Using a Proxy
Use proxy for scraping.
```bash
baskref -t g -d 2022-01-07 -fp datasets -p http://someproxy.com
```


## How to Use the Package?

Install requirements
```bash
pip install -r requirements.txt
```

### Data Collection Utility
This refers to the scraping functionalities.

For any mode of collection first you need to import and initialize 
the below classes.
```python
from baskref.data_collection import (
    BaskRefUrlScraper,
    BaskRefDataScraper,
)

url_scraper = BaskRefUrlScraper()
data_scraper = BaskRefDataScraper()

# optionally you can set a proxy
proxy_url_scraper = BaskRefUrlScraper("http://someproxy.com")
proxy_data_scraper = BaskRefDataScraper("http://someproxy.com")
```
The BaskRefDataScraper.get_games_data returns a list of dictionaries.

Collect games for a specific day
```python
from datetime import date

game_urls = url_scraper.get_game_urls_day(date(2022,1,7))
game_data = data_scraper.get_games_data(game_urls)
```

Collect games for a specific season (regular + playoffs)
```python
game_urls = url_scraper.get_game_urls_year(2006)
game_data = data_scraper.get_games_data(game_urls)
```

Collect games for a specific postseason
```python
game_urls = url_scraper.get_game_urls_playoffs(2006)
game_data = data_scraper.get_games_data(game_urls)
```

Collect player stats for for a specific day
```python
from datetime import date

game_urls = url_scraper.get_game_urls_day(date(2022,1,7))
pl_stats_data = data_scraper.get_player_stats_data(game_urls)
```

### Data Saving Package
This refers to the saving of the data.

Save a list of dictionaries to a CSV file.
```python
import os
from baskref.data_saving.file_saver import save_file_from_list

save_path = os.path.join('datasets', 'file_name.csv')
save_file_from_list(game_data, save_path)
```

## How to Run Tests?

Run all tests with Pytest
```
pytest
```

Run all tests with coverage
```
coverage run --source=baskref -m pytest
coverage report --omit="*/test*" -m --skip-empty
```

## Code Formating

The code base uses black for automatic formating.
the configuration for black is stored in pyproject.toml file.

```bash
# run black over the entire code base
black .
```

## Linting

The code base uses pylint and mypy for code linting.

### Pylint

the configuration for pylint is stored in .pylintrc file.

```bash 
# run pylint over the entire code base
pylint --recursive=y ./
```

### MyPy

the configuration for mypy is stored in pyproject.toml file.

```bash 
# run mypy over the entire code base
mypy baskref
```

## Bonus

### Prepare project for development

1. Create Virtual Environment

- You might want to use a virtual environment for executing the project.
- this is an optional step (if skipping go straight to step 2)

Create a new virtual environemnt
```
python -m venv venv  # The second parameter is a path to the virtual env.
```

Activate the new virtual environment
```
# Windows
.\venv\Scripts\activate

# Unix
source venv/bin/activate
```

Leaving the virtual environment
```
deactivate
```

2. Install all the dev requirements

```
pip install -r requirements_dev.txt

# uninstall all packages Windows
pip freeze > unins && pip uninstall -y -r unins && del unins

# uninstall all packages linux
pip freeze | xargs pip uninstall -y
```

3. Install the pre-commit hook
```
pre-commit install
```

### Prepare a new Version
This section describes some of the steps when preparing a new baskref version.

- adjust the pyproject.toml file
    - version
    - dependencies
- install project locally and test it
```
python -m build
pip install .
```
- publish project to test.pypi
```
pip install --upgrade twine
twine upload --repository testpypi dist/*
# install from test.pypi
pip install --index-url https://test.pypi.org/simple/ baskref
```
- publish a new version
```
twine upload dist/*
```


## Contributors

1. [Dominik Zulovec Sajovic](https://www.linkedin.com/in/dominik-zulovec-sajovic/)

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "baskref",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "basketball,web scraper,python",
    "author": "",
    "author_email": "Dominik Zulovec Sajovic <dominik.zsajovic@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/ce/6e/4f465440263e4bb29d4c6f21b661790ed7d0b7879ed2dc35dd848e3e4ec8/baskref-0.0.5.tar.gz",
    "platform": null,
    "description": "# BaskRef (Basketball Scraper)\r\nBaskRef is a tool to scrape basketball Data from the web.\r\n\r\nThe goal of this project is to provide a data collection utility for \r\nNBA basketball data. The collection strategy is to scrape data from \r\nhttps://www.basketball-reference.com.\r\nThe data can then be saved into a csv to be used by a different utility.\r\n\r\n## About the Package\r\n\r\n### What data are we collecting?\r\n\r\n- games & game stats (in depth stats of the games)\r\n- players game stats\r\n\r\nAll datasets are available to be collected:\r\n- by day (all games in one day)\r\n- by whole season (regular + playoffs)\r\n- by playoffs\r\n\r\n#### Future Collections (Not yet implemented)\r\n- players meta data (Not Implemented)\r\n- game logs (Not Implemented)\r\n\r\n\r\n## How to Install & Run the Package?\r\n\r\n### Install the project\r\n```bash\r\npip install baskref\r\n\r\n# optional set logging level. Default value is INFO\r\nexport LOG_LEVEL=DEBUG # INFO, DEBUG, ERROR\r\n```\r\n\r\n### Scrape Game Data\r\n\r\nScrape all games for the 7th of January 2022.\r\n```bash\r\nbaskref -t g -d 2022-01-07 -fp datasets\r\n# python -c \"from baskref import run_baskref; run_baskref()\" -t g -d 2022-01-07 -fp datasets\r\n```\r\n\r\nScrape all games for the 2006 NBA season (regular season + playoffs).\r\n```bash\r\nbaskref -t gs -y 2006 -fp datasets\r\n# python -c \"from baskref import run_baskref; run_baskref()\" -t gs -y 2006 -fp datasets\r\n```\r\n\r\nScrape all games for the 2006 NBA playoffs.\r\n```bash\r\nbaskref -t gp -y 2006 -fp datasets\r\n# if you don't install the package\r\n# python -c \"from baskref import run_baskref; run_baskref()\" -t gp -y 2006 -fp datasets\r\n```\r\n\r\n### Scrape Game URLs only\r\n\r\n```bash\r\n# simply add \"u\" to any of the three scraping types:\r\n# g -> gu, gs -> gsu, gp -> gpu\r\nbaskref -t gu -d 2022-01-07 -fp datasets\r\n```\r\n\r\n### Scrape Player Stats Data\r\n\r\n```bash\r\n# simply add \"pl\" to any of the three scraping types:\r\n# g -> gpl, gs -> gspl, gp -> gppl\r\nbaskref -t gpl -d 2022-01-07 -fp datasets\r\n```\r\n\r\n### Scrape Using a Proxy\r\nUse proxy for scraping.\r\n```bash\r\nbaskref -t g -d 2022-01-07 -fp datasets -p http://someproxy.com\r\n```\r\n\r\n\r\n## How to Use the Package?\r\n\r\nInstall requirements\r\n```bash\r\npip install -r requirements.txt\r\n```\r\n\r\n### Data Collection Utility\r\nThis refers to the scraping functionalities.\r\n\r\nFor any mode of collection first you need to import and initialize \r\nthe below classes.\r\n```python\r\nfrom baskref.data_collection import (\r\n    BaskRefUrlScraper,\r\n    BaskRefDataScraper,\r\n)\r\n\r\nurl_scraper = BaskRefUrlScraper()\r\ndata_scraper = BaskRefDataScraper()\r\n\r\n# optionally you can set a proxy\r\nproxy_url_scraper = BaskRefUrlScraper(\"http://someproxy.com\")\r\nproxy_data_scraper = BaskRefDataScraper(\"http://someproxy.com\")\r\n```\r\nThe BaskRefDataScraper.get_games_data returns a list of dictionaries.\r\n\r\nCollect games for a specific day\r\n```python\r\nfrom datetime import date\r\n\r\ngame_urls = url_scraper.get_game_urls_day(date(2022,1,7))\r\ngame_data = data_scraper.get_games_data(game_urls)\r\n```\r\n\r\nCollect games for a specific season (regular + playoffs)\r\n```python\r\ngame_urls = url_scraper.get_game_urls_year(2006)\r\ngame_data = data_scraper.get_games_data(game_urls)\r\n```\r\n\r\nCollect games for a specific postseason\r\n```python\r\ngame_urls = url_scraper.get_game_urls_playoffs(2006)\r\ngame_data = data_scraper.get_games_data(game_urls)\r\n```\r\n\r\nCollect player stats for for a specific day\r\n```python\r\nfrom datetime import date\r\n\r\ngame_urls = url_scraper.get_game_urls_day(date(2022,1,7))\r\npl_stats_data = data_scraper.get_player_stats_data(game_urls)\r\n```\r\n\r\n### Data Saving Package\r\nThis refers to the saving of the data.\r\n\r\nSave a list of dictionaries to a CSV file.\r\n```python\r\nimport os\r\nfrom baskref.data_saving.file_saver import save_file_from_list\r\n\r\nsave_path = os.path.join('datasets', 'file_name.csv')\r\nsave_file_from_list(game_data, save_path)\r\n```\r\n\r\n## How to Run Tests?\r\n\r\nRun all tests with Pytest\r\n```\r\npytest\r\n```\r\n\r\nRun all tests with coverage\r\n```\r\ncoverage run --source=baskref -m pytest\r\ncoverage report --omit=\"*/test*\" -m --skip-empty\r\n```\r\n\r\n## Code Formating\r\n\r\nThe code base uses black for automatic formating.\r\nthe configuration for black is stored in pyproject.toml file.\r\n\r\n```bash\r\n# run black over the entire code base\r\nblack .\r\n```\r\n\r\n## Linting\r\n\r\nThe code base uses pylint and mypy for code linting.\r\n\r\n### Pylint\r\n\r\nthe configuration for pylint is stored in .pylintrc file.\r\n\r\n```bash \r\n# run pylint over the entire code base\r\npylint --recursive=y ./\r\n```\r\n\r\n### MyPy\r\n\r\nthe configuration for mypy is stored in pyproject.toml file.\r\n\r\n```bash \r\n# run mypy over the entire code base\r\nmypy baskref\r\n```\r\n\r\n## Bonus\r\n\r\n### Prepare project for development\r\n\r\n1. Create Virtual Environment\r\n\r\n- You might want to use a virtual environment for executing the project.\r\n- this is an optional step (if skipping go straight to step 2)\r\n\r\nCreate a new virtual environemnt\r\n```\r\npython -m venv venv  # The second parameter is a path to the virtual env.\r\n```\r\n\r\nActivate the new virtual environment\r\n```\r\n# Windows\r\n.\\venv\\Scripts\\activate\r\n\r\n# Unix\r\nsource venv/bin/activate\r\n```\r\n\r\nLeaving the virtual environment\r\n```\r\ndeactivate\r\n```\r\n\r\n2. Install all the dev requirements\r\n\r\n```\r\npip install -r requirements_dev.txt\r\n\r\n# uninstall all packages Windows\r\npip freeze > unins && pip uninstall -y -r unins && del unins\r\n\r\n# uninstall all packages linux\r\npip freeze | xargs pip uninstall -y\r\n```\r\n\r\n3. Install the pre-commit hook\r\n```\r\npre-commit install\r\n```\r\n\r\n### Prepare a new Version\r\nThis section describes some of the steps when preparing a new baskref version.\r\n\r\n- adjust the pyproject.toml file\r\n    - version\r\n    - dependencies\r\n- install project locally and test it\r\n```\r\npython -m build\r\npip install .\r\n```\r\n- publish project to test.pypi\r\n```\r\npip install --upgrade twine\r\ntwine upload --repository testpypi dist/*\r\n# install from test.pypi\r\npip install --index-url https://test.pypi.org/simple/ baskref\r\n```\r\n- publish a new version\r\n```\r\ntwine upload dist/*\r\n```\r\n\r\n\r\n## Contributors\r\n\r\n1. [Dominik Zulovec Sajovic](https://www.linkedin.com/in/dominik-zulovec-sajovic/)\r\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2022 Dominik Zulovec Sajovic  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "baskRef is a tool to scrape basketball Data from the web.",
    "version": "0.0.5",
    "split_keywords": [
        "basketball",
        "web scraper",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "78de21ffc0c0f3dd9f9fa94484d2e99c",
                "sha256": "540fd235e0c87f7dbeed6a14a7f335278ecc563efc5746ec4008c7f8c11b4401"
            },
            "downloads": -1,
            "filename": "baskref-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "78de21ffc0c0f3dd9f9fa94484d2e99c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 17522,
            "upload_time": "2022-12-22T11:34:47",
            "upload_time_iso_8601": "2022-12-22T11:34:47.766826Z",
            "url": "https://files.pythonhosted.org/packages/a4/2c/035897c0baf2751e3edd20ab850ad0b572db6f888dd01295a319a18f1bed/baskref-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "fabc8087d36fdfdc7cb0703926d34004",
                "sha256": "c06b47632dc7bd01ca0a88ec547da8935b86dcce52565d1733cf02ed2a8d957a"
            },
            "downloads": -1,
            "filename": "baskref-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "fabc8087d36fdfdc7cb0703926d34004",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 16999,
            "upload_time": "2022-12-22T11:34:49",
            "upload_time_iso_8601": "2022-12-22T11:34:49.466532Z",
            "url": "https://files.pythonhosted.org/packages/ce/6e/4f465440263e4bb29d4c6f21b661790ed7d0b7879ed2dc35dd848e3e4ec8/baskref-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-22 11:34:49",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "baskref"
}
        
Elapsed time: 0.02054s