# eliteprospect_scraper
Package to scrape ice hockey data from eliteprospect.com.
My aim is to keep the package up-to-date so that it works also when the webpage structure is changing.
If something is not working please reach out so that we can fix it.
Please only use collected data for personal use - there are real APIs for professional usage of eliteprospects data.
## Getting started
Install the package with pip
```pip install eliteprospect_scraper```
Import module
```import eliteprospect.eliteprospect_scraper as ep```
Show function descriptions with
```help(ep)```
## Functions
Descriptions of the functions in the package.
### getPlayers(league, year)
Get all players for a specific year and league from the page with structure
'https://www.eliteprospects.com/league/' + league + '/stats/' + year
Example: https://www.eliteprospects.com/league/shl/stats/2016-2017
The function takes input parameters league and year
* League: valid league [from eliteprospects](https://www.eliteprospects.com/leagues)
* year: valid combination of year in format 2015-2016, 2016-2017 etc.
Example:
```getPlayers('shl', '2015-2016')```
The page contains pagination, and the function loops over 10 pages.
This is typically enough to extract all players.
### getGoalies(league, year)
Same as getPlayers, but returns dataframe with goalies.
### getPlayerMetadata(dfplayers)
Create dataframe with metadata by players.
Input is dataframe created with function getPlayers
### getPlayerStats(playerlinks)
Create dataframe with all statistics [from playerpages](https://eliteprospects.com/player/2050/mattias-ritola).
Takes a series of playerlinks as input. Playerlinks are also included in return output from ```getPlayerMetadata```
```ep.getPlayerStats(["https://eliteprospects.com/player/2050/mattias-ritola"])```
### dataprep_players(playerstats, league_mapping, players):
```dataprep_players(playerstats, league_mapping, players)```
Takes series of playerlinks to eliteprospect-profiles,
Return dataframe with stats by player and season
## Example Notebook
[See this notebook](https://github.com/msjoelin/eliteprospect_scraper/blob/master/sample_workbook.ipynb) for examples of how to use the package, and in what order you can run the functions.
# eliteprospect_scraper
Package to scrape hockey data from eliteprospect
## [1.2] - 2024-09-12
### Added
- New function getGoalies(), to extract also Goaltender links similar to getPlayers().
### Fixed
- The getPlayers() now accepts input league "Hockeyallsvenskan" (swedish 2nd league) instead of "Allsvenskan".
Allsvenskan is a legacy name used before 2012 season.
Raw data
{
"_id": null,
"home_page": "https://github.com/msjoelin/eliteprospect_scraper",
"name": "eliteprospect-scraper",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "ice hockey, scraping, sport analytics, eliteprospects",
"author": "Marcus Sj\u00f6lin",
"author_email": "marcussjolin89@gmail.com",
"download_url": "https://pypi.org/project/eliteprospect_scraper/",
"platform": null,
"description": "# eliteprospect_scraper\r\nPackage to scrape ice hockey data from eliteprospect.com. \r\n\r\nMy aim is to keep the package up-to-date so that it works also when the webpage structure is changing. \r\nIf something is not working please reach out so that we can fix it. \r\n\r\nPlease only use collected data for personal use - there are real APIs for professional usage of eliteprospects data.\r\n\r\n## Getting started\r\nInstall the package with pip \r\n```pip install eliteprospect_scraper```\r\n\r\nImport module\r\n```import eliteprospect.eliteprospect_scraper as ep```\r\n\r\nShow function descriptions with \r\n```help(ep)```\r\n\r\n## Functions\r\nDescriptions of the functions in the package. \r\n\r\n### getPlayers(league, year)\r\nGet all players for a specific year and league from the page with structure\r\n'https://www.eliteprospects.com/league/' + league + '/stats/' + year \r\nExample: https://www.eliteprospects.com/league/shl/stats/2016-2017\r\n\r\nThe function takes input parameters league and year \r\n* League: valid league [from eliteprospects](https://www.eliteprospects.com/leagues)\r\n* year: valid combination of year in format 2015-2016, 2016-2017 etc. \r\n\r\nExample: \r\n```getPlayers('shl', '2015-2016')``` \r\n\r\nThe page contains pagination, and the function loops over 10 pages.\r\nThis is typically enough to extract all players. \r\n\r\n### getGoalies(league, year)\r\nSame as getPlayers, but returns dataframe with goalies. \r\n\r\n### getPlayerMetadata(dfplayers)\r\nCreate dataframe with metadata by players. \r\nInput is dataframe created with function getPlayers \r\n\r\n### getPlayerStats(playerlinks)\r\nCreate dataframe with all statistics [from playerpages](https://eliteprospects.com/player/2050/mattias-ritola). \r\nTakes a series of playerlinks as input. Playerlinks are also included in return output from ```getPlayerMetadata``` \r\n\r\n```ep.getPlayerStats([\"https://eliteprospects.com/player/2050/mattias-ritola\"])```\r\n\r\n### dataprep_players(playerstats, league_mapping, players):\r\n```dataprep_players(playerstats, league_mapping, players)```\r\n Takes series of playerlinks to eliteprospect-profiles, \r\n Return dataframe with stats by player and season\r\n\r\n\r\n## Example Notebook\r\n[See this notebook](https://github.com/msjoelin/eliteprospect_scraper/blob/master/sample_workbook.ipynb) for examples of how to use the package, and in what order you can run the functions. \r\n\r\n\r\n# eliteprospect_scraper\r\nPackage to scrape hockey data from eliteprospect\r\n\r\n## [1.2] - 2024-09-12\r\n### Added\r\n- New function getGoalies(), to extract also Goaltender links similar to getPlayers(). \r\n\r\n### Fixed\r\n- The getPlayers() now accepts input league \"Hockeyallsvenskan\" (swedish 2nd league) instead of \"Allsvenskan\". \r\nAllsvenskan is a legacy name used before 2012 season. \r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Functions to scrape ice hockey data from eliteprospects",
"version": "1.2",
"project_urls": {
"Download": "https://pypi.org/project/eliteprospect_scraper/",
"Homepage": "https://github.com/msjoelin/eliteprospect_scraper"
},
"split_keywords": [
"ice hockey",
" scraping",
" sport analytics",
" eliteprospects"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0743d26bea3a73a13fec27c660055b368370ff0338d3546b88124c2e815da5eb",
"md5": "55c52df4555f6235352049315e68d628",
"sha256": "54e3723156c9330d38c9240a5305042a411fcc0f48aa7a1f5b9f1411fe5fe8c0"
},
"downloads": -1,
"filename": "eliteprospect_scraper-1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "55c52df4555f6235352049315e68d628",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 6674,
"upload_time": "2024-09-13T06:07:56",
"upload_time_iso_8601": "2024-09-13T06:07:56.715302Z",
"url": "https://files.pythonhosted.org/packages/07/43/d26bea3a73a13fec27c660055b368370ff0338d3546b88124c2e815da5eb/eliteprospect_scraper-1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-13 06:07:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "msjoelin",
"github_project": "eliteprospect_scraper",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "eliteprospect-scraper"
}