# swehockey_scraper
This package can be used to collect data with web scraping from the page stats.swehockey.se.
This is the website where the Swedish Icehockey Federation stores all data.
This package is only for personal usage.
## Getting started
Package can be installed with pip
```pip install swehockey_scraper```
In python, import module with
```import swehockey.swehockey_scraper as swe```
See description of functions in package with
```help(swe)```
Functions can be used together and input and output is linked.
## Data structure
On the page for swehockey, there are two keys available, season_id and game_id.
### Season ID
For each season and league there is a schedule id.
This is found in the URL, for example https://stats.swehockey.se/ScheduleAndResults/Schedule/6108 the season id is 6108.
### Game ID
Each game can be found with URL of structure https://stats.swehockey.se/Game/Events/252961
Here, the game id is the last part of the URL, e.g. 252961
### Functions
### getGames(season_id)
Input is a list of season ids. This returns a dataframe containing all games for the specific season together with results.
### cleanGames(df_games)
Input is a list of the structure as returned from getGames().
This step cleans up the data and adds additional columns for further data processing.
### getTeamData(df_games_clean)
Input is a list of the structure as returned from cleanGames().
This step make a dataframe on team level. It calculate season specific metrics for each team, Head-to-Head comparisons and table positions.
### getGameData(df_games_clean)
Input is a list game_ids (for example can be extracted from the output from getGames).
This function extracts game specific data like penaltys, goals, shot statistics.
## Example Notebook
See [this notebook](https://github.com/msjoelin/swehockey_scraper/blob/master/sample_workbook.ipynb) for examples of how to use the package, and in what order you can run the functions.
# swehockey_scraper
Package to scrape hockey data from swehockey
Raw data
{
"_id": null,
"home_page": "https://github.com/msjoelin/swehockey_scraper",
"name": "swehockey-scraper",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "ice hockey, scraping, sport analytics, shl, analys, ishockey, hockeyallsvenskan",
"author": "Marcus Sj\u00f6lin",
"author_email": "marcussjolin89@gmail.com",
"download_url": "https://pypi.org/project/swehockey_scraper/",
"platform": null,
"description": "# swehockey_scraper\r\nThis package can be used to collect data with web scraping from the page stats.swehockey.se. \r\nThis is the website where the Swedish Icehockey Federation stores all data. \r\n\r\nThis package is only for personal usage. \r\n\r\n## Getting started\r\nPackage can be installed with pip \r\n```pip install swehockey_scraper```\r\n\r\nIn python, import module with \r\n```import swehockey.swehockey_scraper as swe```\r\n\r\nSee description of functions in package with \r\n```help(swe)```\r\n\r\nFunctions can be used together and input and output is linked. \r\n\r\n\r\n## Data structure\r\n\r\nOn the page for swehockey, there are two keys available, season_id and game_id. \r\n\r\n### Season ID \r\nFor each season and league there is a schedule id. \r\nThis is found in the URL, for example https://stats.swehockey.se/ScheduleAndResults/Schedule/6108 the season id is 6108. \r\n\r\n### Game ID \r\nEach game can be found with URL of structure https://stats.swehockey.se/Game/Events/252961\r\nHere, the game id is the last part of the URL, e.g. 252961\r\n\r\n### Functions\r\n\r\n### getGames(season_id)\r\nInput is a list of season ids. This returns a dataframe containing all games for the specific season together with results. \r\n\r\n### cleanGames(df_games)\r\nInput is a list of the structure as returned from getGames(). \r\nThis step cleans up the data and adds additional columns for further data processing. \r\n\r\n### getTeamData(df_games_clean)\r\nInput is a list of the structure as returned from cleanGames(). \r\nThis step make a dataframe on team level. It calculate season specific metrics for each team, Head-to-Head comparisons and table positions. \r\n\r\n### getGameData(df_games_clean)\r\nInput is a list game_ids (for example can be extracted from the output from getGames). \r\nThis function extracts game specific data like penaltys, goals, shot statistics. \r\n\r\n\r\n## Example Notebook \r\n\r\nSee [this notebook](https://github.com/msjoelin/swehockey_scraper/blob/master/sample_workbook.ipynb) for examples of how to use the package, and in what order you can run the functions. \r\n\r\n\r\n# swehockey_scraper\r\nPackage to scrape hockey data from swehockey\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Functions to scrape ice hockey data and statistics from swehockey",
"version": "1.4",
"project_urls": {
"Download": "https://pypi.org/project/swehockey_scraper/",
"Homepage": "https://github.com/msjoelin/swehockey_scraper"
},
"split_keywords": [
"ice hockey",
" scraping",
" sport analytics",
" shl",
" analys",
" ishockey",
" hockeyallsvenskan"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cbc283bf1b802f9e7ae39ceb98574da3809296a274c41415c087bb776c7896ce",
"md5": "c717ce1ac1a8cc61b481703cf703a15d",
"sha256": "3ddb19002de745e467f2bce09c54578499cb8d750bc618acd6d8076c9b17d8bb"
},
"downloads": -1,
"filename": "swehockey_scraper-1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c717ce1ac1a8cc61b481703cf703a15d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 6040,
"upload_time": "2024-08-26T21:55:10",
"upload_time_iso_8601": "2024-08-26T21:55:10.183386Z",
"url": "https://files.pythonhosted.org/packages/cb/c2/83bf1b802f9e7ae39ceb98574da3809296a274c41415c087bb776c7896ce/swehockey_scraper-1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-26 21:55:10",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "msjoelin",
"github_project": "swehockey_scraper",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "swehockey-scraper"
}