# 🎲 Board Game Scraper 🕸
Scraping data about board games from the web. View the data live at
[Recommend.Games](https://recommend.games/)! Install via
```bash
pip install board-game-scraper
```
## Sources
* [BoardGameGeek](https://boardgamegeek.com/) (`bgg`)
* [DBpedia](https://wiki.dbpedia.org/) (`dbpedia`)
* [Luding.org](https://luding.org/) (`luding`)
* [Spielen.de](https://gesellschaftsspiele.spielen.de/) (`spielen`)
* [Wikidata](https://www.wikidata.org/) (`wikidata`)
## Run scrapers
[Requires Python 3](https://pythonclock.org/). Make sure
[Pipenv](https://docs.pipenv.org/) is installed and create the virtual
environment:
```bash
python3 -m pip install --upgrade pipenv
pipenv install --dev
pipenv shell
```
Run a spider like so:
```bash
JOBDIR="jobs/${SPIDER}/$(date --utc +'%Y-%m-%dT%H-%M-%S')"
scrapy crawl "${SPIDER}" \
--output 'feeds/%(name)s/%(time)s/%(class)s.csv' \
--set "JOBDIR=${JOBDIR}"
```
where `$SPIDER` is one of the IDs above.
Run all the spiders with the [`run_scrapers.sh`](run_scrapers.sh) script. Get a
list of the running scrapers' PIDs with the [`processes.sh`](processes.sh)
script. You can close all the running scrapers via
```bash
./processes.sh stop
```
and resume them later.
## Tests
You can run `scrapy check` to perform contract tests for all spiders, or
`scrapy check $SPIDER` to test one particular spider. If tests fails,
there most likely has been some change on the website and the spider needs
updating.
## Board game datasets
If you are interested in using any of the datasets produced by this scraper,
take a look at the
[BoardGameGeek guild](https://boardgamegeek.com/thread/2287371/boardgamegeek-games-and-ratings-datasets).
A subset of the data can also be found on [Kaggle](https://www.kaggle.com/mshepherd/board-games).
## Links
* [board-game-scraper](https://gitlab.com/recommend.games/board-game-scraper):
This repository
* [Recommend.Games](https://recommend.games/): board game recommender using the
scraped data
* [recommend-games-server](https://gitlab.com/recommend.games/recommend-games-server):
Server code for [Recommend.Games](https://recommend.games/)
* [board-game-recommender](https://gitlab.com/recommend.games/board-game-recommender):
Recommender code for [Recommend.Games](https://recommend.games/)
Raw data
{
"_id": null,
"home_page": "https://recommend.games/",
"name": "board-game-scraper",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7.0",
"maintainer_email": "",
"keywords": "board games,tabletop games,data,datasets,scraper,scrapy,spider,boardgamegeek,bgg,ludoj,ludoj-scraper",
"author": "Markus Shepherd",
"author_email": "markus@recommend.games",
"download_url": "https://files.pythonhosted.org/packages/c4/be/9ec76b6d9c08c37aa3db3f9837f944f35a4e007fec669c1055a449706db7/board-game-scraper-2.22.0.tar.gz",
"platform": null,
"description": "\n# \ud83c\udfb2 Board Game Scraper \ud83d\udd78\n\nScraping data about board games from the web. View the data live at\n[Recommend.Games](https://recommend.games/)! Install via\n\n```bash\npip install board-game-scraper\n```\n\n## Sources\n\n* [BoardGameGeek](https://boardgamegeek.com/) (`bgg`)\n* [DBpedia](https://wiki.dbpedia.org/) (`dbpedia`)\n* [Luding.org](https://luding.org/) (`luding`)\n* [Spielen.de](https://gesellschaftsspiele.spielen.de/) (`spielen`)\n* [Wikidata](https://www.wikidata.org/) (`wikidata`)\n\n## Run scrapers\n\n[Requires Python 3](https://pythonclock.org/). Make sure\n[Pipenv](https://docs.pipenv.org/) is installed and create the virtual\nenvironment:\n\n```bash\npython3 -m pip install --upgrade pipenv\npipenv install --dev\npipenv shell\n```\n\nRun a spider like so:\n\n```bash\nJOBDIR=\"jobs/${SPIDER}/$(date --utc +'%Y-%m-%dT%H-%M-%S')\"\nscrapy crawl \"${SPIDER}\" \\\n --output 'feeds/%(name)s/%(time)s/%(class)s.csv' \\\n --set \"JOBDIR=${JOBDIR}\"\n```\n\nwhere `$SPIDER` is one of the IDs above.\n\nRun all the spiders with the [`run_scrapers.sh`](run_scrapers.sh) script. Get a\nlist of the running scrapers' PIDs with the [`processes.sh`](processes.sh)\nscript. You can close all the running scrapers via\n\n```bash\n./processes.sh stop\n```\n\nand resume them later.\n\n## Tests\n\nYou can run `scrapy check` to perform contract tests for all spiders, or\n`scrapy check $SPIDER` to test one particular spider. If tests fails,\nthere most likely has been some change on the website and the spider needs\nupdating.\n\n## Board game datasets\n\nIf you are interested in using any of the datasets produced by this scraper,\ntake a look at the\n[BoardGameGeek guild](https://boardgamegeek.com/thread/2287371/boardgamegeek-games-and-ratings-datasets).\nA subset of the data can also be found on [Kaggle](https://www.kaggle.com/mshepherd/board-games).\n\n## Links\n\n* [board-game-scraper](https://gitlab.com/recommend.games/board-game-scraper):\n This repository\n* [Recommend.Games](https://recommend.games/): board game recommender using the\n scraped data\n* [recommend-games-server](https://gitlab.com/recommend.games/recommend-games-server):\n Server code for [Recommend.Games](https://recommend.games/)\n* [board-game-recommender](https://gitlab.com/recommend.games/board-game-recommender):\n Recommender code for [Recommend.Games](https://recommend.games/)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Board games data scraping and processing from BoardGameGeek and more!",
"version": "2.22.0",
"project_urls": {
"Documentation": "https://gitlab.com/recommend.games/board-game-scraper/blob/master/README.md",
"Funding": "https://paypal.me/mschepke",
"Homepage": "https://recommend.games/",
"Say Thanks!": "https://saythanks.io/to/mk.schepke%40gmail.com",
"Source": "https://gitlab.com/recommend.games/board-game-scraper",
"Tracker": "https://gitlab.com/recommend.games/board-game-scraper/issues",
"Twitter": "https://twitter.com/recommend_games"
},
"split_keywords": [
"board games",
"tabletop games",
"data",
"datasets",
"scraper",
"scrapy",
"spider",
"boardgamegeek",
"bgg",
"ludoj",
"ludoj-scraper"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cc48cbb26c02d404bf6a193a6386d8ca3a323b01d6926b92a7f3695b012deb14",
"md5": "169f751dc252547430b37ccbf432058a",
"sha256": "ac53dc7732d16eb99bea28abd5ed30679f2635b1833b9d7f9d23fdf3c7dd7f4b"
},
"downloads": -1,
"filename": "board_game_scraper-2.22.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "169f751dc252547430b37ccbf432058a",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.7.0",
"size": 73792,
"upload_time": "2024-02-11T12:43:13",
"upload_time_iso_8601": "2024-02-11T12:43:13.489411Z",
"url": "https://files.pythonhosted.org/packages/cc/48/cbb26c02d404bf6a193a6386d8ca3a323b01d6926b92a7f3695b012deb14/board_game_scraper-2.22.0-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c4be9ec76b6d9c08c37aa3db3f9837f944f35a4e007fec669c1055a449706db7",
"md5": "e9fa3ff0bfd66e4ec2b7de4b69dcb8cd",
"sha256": "fde27badb99b5f4699a2ce6da776b8511ea25d4242e3d6ac625f2f13e778b691"
},
"downloads": -1,
"filename": "board-game-scraper-2.22.0.tar.gz",
"has_sig": false,
"md5_digest": "e9fa3ff0bfd66e4ec2b7de4b69dcb8cd",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7.0",
"size": 57731,
"upload_time": "2024-02-11T12:43:15",
"upload_time_iso_8601": "2024-02-11T12:43:15.752463Z",
"url": "https://files.pythonhosted.org/packages/c4/be/9ec76b6d9c08c37aa3db3f9837f944f35a4e007fec669c1055a449706db7/board-game-scraper-2.22.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-11 12:43:15",
"github": false,
"gitlab": true,
"bitbucket": false,
"codeberg": false,
"gitlab_user": "recommend.games",
"gitlab_project": "board-game-scraper",
"lcname": "board-game-scraper"
}