# imslp
![pytest](https://github.com/jlumbroso/imslp/workflows/pytest/badge.svg)
[![codecov](https://codecov.io/gh/jlumbroso/imslp/branch/master/graph/badge.svg?token=GX52420WN4)](https://codecov.io/gh/jlumbroso/imslp)
[![Documentation Status](https://readthedocs.org/projects/imslp/badge/?version=latest)](https://imslp.readthedocs.io/en/latest/?badge=latest)
[![Downloads](https://pepy.tech/badge/imslp)](https://pepy.tech/project/imslp)
[![Run on Repl.it](https://repl.it/badge/github/jlumbroso/imslp)](https://repl.it/github/jlumbroso/imslp)
[![Stargazers](https://img.shields.io/github/stars/jlumbroso/imslp?style=social)](https://github.com/jlumbroso/imslp)
🎼 The clean and modern way of accessing IMSLP data and scores programmatically. 🎶
## Installation
The package is available on PyPi and can be installed using your favorite package
manager:
```shell
pip install imslp
```
## Data Sources
This project attempts to use robust sources of data, that do not require web scraping of some sort:
- **MediaWiki API.** IMSLP is [one of tens of thousands of websites](https://wikiapiary.com/wiki/IMSLP)
built on top of [MediaWiki](https://www.mediawiki.org/wiki/MediaWiki), the framework created for
[Wikipedia.org](https://en.wikipedia.org/wiki/MediaWiki). As such, it can be accessed through
the [MediaWiki API](https://www.mediawiki.org/wiki/API:Main_page) for which, fortunately,
there exists a fantastic Python wrapper library called [`mwclient`](https://github.com/mwclient/mwclient).
- **IMSLP API.** For convenience, the IMSLP built some *ad-hoc* scripts that can be used to get a
list of people and a list of works, in a variety of different formats, including JSON.
It also uses scraping to collect additional information (such as the number of pages in a score, the
number of times a score was downloaded, or the user-provided ratings).
### Some quirks of IMSLP
While fortunately, as mentioned, IMSLP uses a widely used open-source Wiki platform, MediaWiki, it has a
handful of quirks. Such as:
- Composers are stored as `Category`, for instance `Category:Scarlatti, Domenico`. For each composer,
there is usually three tabs: "Compositions", "Collaborations" and "Collections"; these are stored as
separate categories resulting from the concatenation of the composer and subtype, such as
`Category:Scarlatti, Domenico/Collections`.
- PDF files for sheet music are stored as "images"; unfortunately, for the time being, the scheme does
not appear in the URLs computed for the files. These need to be manually patched.
- The `imslpdisclaimeraccepted` cookie must be set to `"yes"` for files to download properly (otherwise,
downloading any file will result in the disclaimer page). With `mwclient`, this can be specified on login.
```python
cookies = {
"imslp_wikiLanguageSelectorLanguage": "en",
"imslpdisclaimeraccepted": "yes",
}
```
- Much of the metadata associated with images, such as the internal ID or the download counter, is stored
separately than the MediaWiki metadata. This makes scraping the rendered HTML page a necessary endeavour.
Fortunately all these quirks are handled by this package!
## Related Projects
Here are a handful of other related projects available on GitHub to access the IMSLP data programmatically:
- [jjjake/imslp-scrape](https://github.com/): Last commit in May 2012 (32 commits), mix of Python and shell, scraping
the website for data (people, score links) with HTML parsing.
- [FrankTheCodeMonkey/IMSLP-Scraper](https://github.com/FrankTheCodeMonkey/IMSLP-Scraper): Last commit in June 2020
(6 commits), Python, scraping the website for data and scores, with HTML parsing and Selenium.
- [josefleventon/imslp-api](https://github.com/josefleventon/imslp-api): Last commit in May 2020 (17 commits),
JavaScript, uses [IMSLP's custom API](https://imslp.org/wiki/IMSLP:API) to get the list of people and list of works
programmatically through a web API query.
More recently, and in other languages:
- [IMSLP Instrument Information Parsing Program](https://github.com/yoonlight/imslp): Last commit in July 2020
(47 commits), uses scraping to extract instrumentation information.
## Acknowledgements
Let's be clear that all the heavy lifting is done by [`mwclient`](https://github.com/mwclient/mwclient)—and
the volunteers who uploaded and/or scanned and/or typeset the scores on IMSLP.
## License
This project is licensed under the LGPLv3 license, with the understanding
that importing a Python modular is similar in spirit to dynamically linking
against a library.
- You can use the library `imslp` in any project, for any purpose, as long
as you provide some acknowledgement to this original project for use of
the library.
- If you make improvements to `imslp`, you are required to make those
changes publicly available.
Raw data
{
"_id": null,
"home_page": "https://github.com/jlumbroso/imslp",
"name": "imslp",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4.0",
"maintainer_email": "",
"keywords": "IMSLP,IMSLP-scraper,IMSLP-scraping,sheet-music,music-data",
"author": "J\u00e9r\u00e9mie Lumbroso",
"author_email": "lumbroso@cs.princeton.edu",
"download_url": "https://files.pythonhosted.org/packages/a0/39/84eaea22a89d52e479149c3aa92dc5aa85a00da4b793439d689ca550717b/imslp-0.2.3.tar.gz",
"platform": null,
"description": "# imslp\n\n![pytest](https://github.com/jlumbroso/imslp/workflows/pytest/badge.svg)\n [![codecov](https://codecov.io/gh/jlumbroso/imslp/branch/master/graph/badge.svg?token=GX52420WN4)](https://codecov.io/gh/jlumbroso/imslp)\n [![Documentation Status](https://readthedocs.org/projects/imslp/badge/?version=latest)](https://imslp.readthedocs.io/en/latest/?badge=latest)\n [![Downloads](https://pepy.tech/badge/imslp)](https://pepy.tech/project/imslp)\n [![Run on Repl.it](https://repl.it/badge/github/jlumbroso/imslp)](https://repl.it/github/jlumbroso/imslp)\n [![Stargazers](https://img.shields.io/github/stars/jlumbroso/imslp?style=social)](https://github.com/jlumbroso/imslp)\n\n\ud83c\udfbc The clean and modern way of accessing IMSLP data and scores programmatically. \ud83c\udfb6\n\n## Installation\n\nThe package is available on PyPi and can be installed using your favorite package\nmanager:\n\n```shell\npip install imslp\n```\n\n## Data Sources\n\nThis project attempts to use robust sources of data, that do not require web scraping of some sort:\n\n- **MediaWiki API.** IMSLP is [one of tens of thousands of websites](https://wikiapiary.com/wiki/IMSLP)\nbuilt on top of [MediaWiki](https://www.mediawiki.org/wiki/MediaWiki), the framework created for\n[Wikipedia.org](https://en.wikipedia.org/wiki/MediaWiki). As such, it can be accessed through\nthe [MediaWiki API](https://www.mediawiki.org/wiki/API:Main_page) for which, fortunately,\nthere exists a fantastic Python wrapper library called [`mwclient`](https://github.com/mwclient/mwclient).\n\n- **IMSLP API.** For convenience, the IMSLP built some *ad-hoc* scripts that can be used to get a\nlist of people and a list of works, in a variety of different formats, including JSON.\n\nIt also uses scraping to collect additional information (such as the number of pages in a score, the\nnumber of times a score was downloaded, or the user-provided ratings).\n\n### Some quirks of IMSLP\n\nWhile fortunately, as mentioned, IMSLP uses a widely used open-source Wiki platform, MediaWiki, it has a\nhandful of quirks. Such as:\n\n- Composers are stored as `Category`, for instance `Category:Scarlatti, Domenico`. For each composer,\nthere is usually three tabs: \"Compositions\", \"Collaborations\" and \"Collections\"; these are stored as\nseparate categories resulting from the concatenation of the composer and subtype, such as\n`Category:Scarlatti, Domenico/Collections`.\n\n- PDF files for sheet music are stored as \"images\"; unfortunately, for the time being, the scheme does\nnot appear in the URLs computed for the files. These need to be manually patched.\n\n- The `imslpdisclaimeraccepted` cookie must be set to `\"yes\"` for files to download properly (otherwise,\ndownloading any file will result in the disclaimer page). With `mwclient`, this can be specified on login.\n ```python\n cookies = {\n \"imslp_wikiLanguageSelectorLanguage\": \"en\",\n \"imslpdisclaimeraccepted\": \"yes\",\n }\n ```\n\n- Much of the metadata associated with images, such as the internal ID or the download counter, is stored\nseparately than the MediaWiki metadata. This makes scraping the rendered HTML page a necessary endeavour.\n\nFortunately all these quirks are handled by this package!\n\n## Related Projects\n\nHere are a handful of other related projects available on GitHub to access the IMSLP data programmatically:\n\n- [jjjake/imslp-scrape](https://github.com/): Last commit in May 2012 (32 commits), mix of Python and shell, scraping\nthe website for data (people, score links) with HTML parsing.\n\n- [FrankTheCodeMonkey/IMSLP-Scraper](https://github.com/FrankTheCodeMonkey/IMSLP-Scraper): Last commit in June 2020 \n(6 commits), Python, scraping the website for data and scores, with HTML parsing and Selenium.\n\n- [josefleventon/imslp-api](https://github.com/josefleventon/imslp-api): Last commit in May 2020 (17 commits),\nJavaScript, uses [IMSLP's custom API](https://imslp.org/wiki/IMSLP:API) to get the list of people and list of works\nprogrammatically through a web API query. \n\nMore recently, and in other languages:\n\n- [IMSLP Instrument Information Parsing Program](https://github.com/yoonlight/imslp): Last commit in July 2020\n(47 commits), uses scraping to extract instrumentation information. \n\n## Acknowledgements\n\nLet's be clear that all the heavy lifting is done by [`mwclient`](https://github.com/mwclient/mwclient)\u2014and\nthe volunteers who uploaded and/or scanned and/or typeset the scores on IMSLP. \n\n## License\n\nThis project is licensed under the LGPLv3 license, with the understanding\nthat importing a Python modular is similar in spirit to dynamically linking\nagainst a library.\n\n- You can use the library `imslp` in any project, for any purpose, as long\n as you provide some acknowledgement to this original project for use of\n the library.\n\n- If you make improvements to `imslp`, you are required to make those\n changes publicly available.\n ",
"bugtrack_url": null,
"license": "LGPL-3.0-or-later",
"summary": "The clean and modern way of accessing IMSLP data and scores programmatically.",
"version": "0.2.3",
"split_keywords": [
"imslp",
"imslp-scraper",
"imslp-scraping",
"sheet-music",
"music-data"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "ea2adf48f53f75964a3bdc90e68f66df",
"sha256": "f5bb7a3b9a401ea685889dfad21403cfdf4eeac066d2a98e5c6b78939d72e27d"
},
"downloads": -1,
"filename": "imslp-0.2.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ea2adf48f53f75964a3bdc90e68f66df",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<4.0",
"size": 19102,
"upload_time": "2022-12-30T04:01:54",
"upload_time_iso_8601": "2022-12-30T04:01:54.178033Z",
"url": "https://files.pythonhosted.org/packages/a3/ff/d4d989ccaa27861fd9bb33a4bb2144fb9d8733088915b31bb08f2bba4ef5/imslp-0.2.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "88abd376fdb52da174797cfc9c3b1ec7",
"sha256": "ecc731560c0e55460b6506bb3d7db30d9dd7ca15daa7a68f545f24ca59a46277"
},
"downloads": -1,
"filename": "imslp-0.2.3.tar.gz",
"has_sig": false,
"md5_digest": "88abd376fdb52da174797cfc9c3b1ec7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4.0",
"size": 14045,
"upload_time": "2022-12-30T04:01:55",
"upload_time_iso_8601": "2022-12-30T04:01:55.263537Z",
"url": "https://files.pythonhosted.org/packages/a0/39/84eaea22a89d52e479149c3aa92dc5aa85a00da4b793439d689ca550717b/imslp-0.2.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-12-30 04:01:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "jlumbroso",
"github_project": "imslp",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "beautifulsoup4",
"specs": [
[
"==",
"4.11.1"
]
]
},
{
"name": "bs4",
"specs": [
[
"==",
"0.0.1"
]
]
},
{
"name": "certifi",
"specs": [
[
"==",
"2022.12.7"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
"==",
"2.1.1"
]
]
},
{
"name": "idna",
"specs": [
[
"==",
"3.4"
]
]
},
{
"name": "mwclient",
"specs": [
[
"==",
"0.10.1"
]
]
},
{
"name": "oauthlib",
"specs": [
[
"==",
"3.2.2"
]
]
},
{
"name": "requests-oauthlib",
"specs": [
[
"==",
"1.3.1"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.28.1"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "soupsieve",
"specs": [
[
"==",
"2.3.2.post1"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"1.26.13"
]
]
}
],
"lcname": "imslp"
}