imslp


Nameimslp JSON
Version 0.2.3 PyPI version JSON
download
home_pagehttps://github.com/jlumbroso/imslp
SummaryThe clean and modern way of accessing IMSLP data and scores programmatically.
upload_time2022-12-30 04:01:55
maintainer
docs_urlNone
authorJérémie Lumbroso
requires_python>=3.8,<4.0
licenseLGPL-3.0-or-later
keywords imslp imslp-scraper imslp-scraping sheet-music music-data
VCS
bugtrack_url
requirements beautifulsoup4 bs4 certifi charset-normalizer idna mwclient oauthlib requests-oauthlib requests six soupsieve urllib3
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # imslp

![pytest](https://github.com/jlumbroso/imslp/workflows/pytest/badge.svg)
 [![codecov](https://codecov.io/gh/jlumbroso/imslp/branch/master/graph/badge.svg?token=GX52420WN4)](https://codecov.io/gh/jlumbroso/imslp)
 [![Documentation Status](https://readthedocs.org/projects/imslp/badge/?version=latest)](https://imslp.readthedocs.io/en/latest/?badge=latest)
 [![Downloads](https://pepy.tech/badge/imslp)](https://pepy.tech/project/imslp)
 [![Run on Repl.it](https://repl.it/badge/github/jlumbroso/imslp)](https://repl.it/github/jlumbroso/imslp)
 [![Stargazers](https://img.shields.io/github/stars/jlumbroso/imslp?style=social)](https://github.com/jlumbroso/imslp)

🎼 The clean and modern way of accessing IMSLP data and scores programmatically. 🎶

## Installation

The package is available on PyPi and can be installed using your favorite package
manager:

```shell
pip install imslp
```

## Data Sources

This project attempts to use robust sources of data, that do not require web scraping of some sort:

- **MediaWiki API.** IMSLP is [one of tens of thousands of websites](https://wikiapiary.com/wiki/IMSLP)
built on top of [MediaWiki](https://www.mediawiki.org/wiki/MediaWiki), the framework created for
[Wikipedia.org](https://en.wikipedia.org/wiki/MediaWiki). As such, it can be accessed through
the [MediaWiki API](https://www.mediawiki.org/wiki/API:Main_page) for which, fortunately,
there exists a fantastic Python wrapper library called [`mwclient`](https://github.com/mwclient/mwclient).

- **IMSLP API.** For convenience, the IMSLP built some *ad-hoc* scripts that can be used to get a
list of people and a list of works, in a variety of different formats, including JSON.

It also uses scraping to collect additional information (such as the number of pages in a score, the
number of times a score was downloaded, or the user-provided ratings).

### Some quirks of IMSLP

While fortunately, as mentioned, IMSLP uses a widely used open-source Wiki platform, MediaWiki, it has a
handful of quirks. Such as:

- Composers are stored as `Category`, for instance `Category:Scarlatti, Domenico`. For each composer,
there is usually three tabs: "Compositions", "Collaborations" and "Collections"; these are stored as
separate categories resulting from the concatenation of the composer and subtype, such as
`Category:Scarlatti, Domenico/Collections`.

- PDF files for sheet music are stored as "images"; unfortunately, for the time being, the scheme does
not appear in the URLs computed for the files. These need to be manually patched.

- The `imslpdisclaimeraccepted` cookie must be set to `"yes"` for files to download properly (otherwise,
downloading any file will result in the disclaimer page). With `mwclient`, this can be specified on login.
    ```python
    cookies = {
        "imslp_wikiLanguageSelectorLanguage": "en",
        "imslpdisclaimeraccepted": "yes",
    }
    ```

- Much of the metadata associated with images, such as the internal ID or the download counter, is stored
separately than the MediaWiki metadata. This makes scraping the rendered HTML page a necessary endeavour.

Fortunately all these quirks are handled by this package!

## Related Projects

Here are a handful of other related projects available on GitHub to access the IMSLP data programmatically:

- [jjjake/imslp-scrape](https://github.com/): Last commit in May 2012 (32 commits), mix of Python and shell, scraping
the website for data (people, score links) with HTML parsing.

- [FrankTheCodeMonkey/IMSLP-Scraper](https://github.com/FrankTheCodeMonkey/IMSLP-Scraper): Last commit in June 2020 
(6 commits), Python, scraping the website for data and scores, with HTML parsing and Selenium.

- [josefleventon/imslp-api](https://github.com/josefleventon/imslp-api): Last commit in May 2020 (17 commits),
JavaScript, uses [IMSLP's custom API](https://imslp.org/wiki/IMSLP:API) to get the list of people and list of works
programmatically through a web API query. 

More recently, and in other languages:

- [IMSLP Instrument Information Parsing Program](https://github.com/yoonlight/imslp): Last commit in July 2020
(47 commits), uses scraping to extract instrumentation information. 

## Acknowledgements

Let's be clear that all the heavy lifting is done by [`mwclient`](https://github.com/mwclient/mwclient)—and
the volunteers who uploaded and/or scanned and/or typeset the scores on IMSLP. 

## License

This project is licensed under the LGPLv3 license, with the understanding
that importing a Python modular is similar in spirit to dynamically linking
against a library.

- You can use the library `imslp` in any project, for any purpose, as long
  as you provide some acknowledgement to this original project for use of
  the library.

- If you make improvements to `imslp`, you are required to make those
  changes publicly available.
  
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jlumbroso/imslp",
    "name": "imslp",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "IMSLP,IMSLP-scraper,IMSLP-scraping,sheet-music,music-data",
    "author": "J\u00e9r\u00e9mie Lumbroso",
    "author_email": "lumbroso@cs.princeton.edu",
    "download_url": "https://files.pythonhosted.org/packages/a0/39/84eaea22a89d52e479149c3aa92dc5aa85a00da4b793439d689ca550717b/imslp-0.2.3.tar.gz",
    "platform": null,
    "description": "# imslp\n\n![pytest](https://github.com/jlumbroso/imslp/workflows/pytest/badge.svg)\n [![codecov](https://codecov.io/gh/jlumbroso/imslp/branch/master/graph/badge.svg?token=GX52420WN4)](https://codecov.io/gh/jlumbroso/imslp)\n [![Documentation Status](https://readthedocs.org/projects/imslp/badge/?version=latest)](https://imslp.readthedocs.io/en/latest/?badge=latest)\n [![Downloads](https://pepy.tech/badge/imslp)](https://pepy.tech/project/imslp)\n [![Run on Repl.it](https://repl.it/badge/github/jlumbroso/imslp)](https://repl.it/github/jlumbroso/imslp)\n [![Stargazers](https://img.shields.io/github/stars/jlumbroso/imslp?style=social)](https://github.com/jlumbroso/imslp)\n\n\ud83c\udfbc The clean and modern way of accessing IMSLP data and scores programmatically. \ud83c\udfb6\n\n## Installation\n\nThe package is available on PyPi and can be installed using your favorite package\nmanager:\n\n```shell\npip install imslp\n```\n\n## Data Sources\n\nThis project attempts to use robust sources of data, that do not require web scraping of some sort:\n\n- **MediaWiki API.** IMSLP is [one of tens of thousands of websites](https://wikiapiary.com/wiki/IMSLP)\nbuilt on top of [MediaWiki](https://www.mediawiki.org/wiki/MediaWiki), the framework created for\n[Wikipedia.org](https://en.wikipedia.org/wiki/MediaWiki). As such, it can be accessed through\nthe [MediaWiki API](https://www.mediawiki.org/wiki/API:Main_page) for which, fortunately,\nthere exists a fantastic Python wrapper library called [`mwclient`](https://github.com/mwclient/mwclient).\n\n- **IMSLP API.** For convenience, the IMSLP built some *ad-hoc* scripts that can be used to get a\nlist of people and a list of works, in a variety of different formats, including JSON.\n\nIt also uses scraping to collect additional information (such as the number of pages in a score, the\nnumber of times a score was downloaded, or the user-provided ratings).\n\n### Some quirks of IMSLP\n\nWhile fortunately, as mentioned, IMSLP uses a widely used open-source Wiki platform, MediaWiki, it has a\nhandful of quirks. Such as:\n\n- Composers are stored as `Category`, for instance `Category:Scarlatti, Domenico`. For each composer,\nthere is usually three tabs: \"Compositions\", \"Collaborations\" and \"Collections\"; these are stored as\nseparate categories resulting from the concatenation of the composer and subtype, such as\n`Category:Scarlatti, Domenico/Collections`.\n\n- PDF files for sheet music are stored as \"images\"; unfortunately, for the time being, the scheme does\nnot appear in the URLs computed for the files. These need to be manually patched.\n\n- The `imslpdisclaimeraccepted` cookie must be set to `\"yes\"` for files to download properly (otherwise,\ndownloading any file will result in the disclaimer page). With `mwclient`, this can be specified on login.\n    ```python\n    cookies = {\n        \"imslp_wikiLanguageSelectorLanguage\": \"en\",\n        \"imslpdisclaimeraccepted\": \"yes\",\n    }\n    ```\n\n- Much of the metadata associated with images, such as the internal ID or the download counter, is stored\nseparately than the MediaWiki metadata. This makes scraping the rendered HTML page a necessary endeavour.\n\nFortunately all these quirks are handled by this package!\n\n## Related Projects\n\nHere are a handful of other related projects available on GitHub to access the IMSLP data programmatically:\n\n- [jjjake/imslp-scrape](https://github.com/): Last commit in May 2012 (32 commits), mix of Python and shell, scraping\nthe website for data (people, score links) with HTML parsing.\n\n- [FrankTheCodeMonkey/IMSLP-Scraper](https://github.com/FrankTheCodeMonkey/IMSLP-Scraper): Last commit in June 2020 \n(6 commits), Python, scraping the website for data and scores, with HTML parsing and Selenium.\n\n- [josefleventon/imslp-api](https://github.com/josefleventon/imslp-api): Last commit in May 2020 (17 commits),\nJavaScript, uses [IMSLP's custom API](https://imslp.org/wiki/IMSLP:API) to get the list of people and list of works\nprogrammatically through a web API query. \n\nMore recently, and in other languages:\n\n- [IMSLP Instrument Information Parsing Program](https://github.com/yoonlight/imslp): Last commit in July 2020\n(47 commits), uses scraping to extract instrumentation information. \n\n## Acknowledgements\n\nLet's be clear that all the heavy lifting is done by [`mwclient`](https://github.com/mwclient/mwclient)\u2014and\nthe volunteers who uploaded and/or scanned and/or typeset the scores on IMSLP. \n\n## License\n\nThis project is licensed under the LGPLv3 license, with the understanding\nthat importing a Python modular is similar in spirit to dynamically linking\nagainst a library.\n\n- You can use the library `imslp` in any project, for any purpose, as long\n  as you provide some acknowledgement to this original project for use of\n  the library.\n\n- If you make improvements to `imslp`, you are required to make those\n  changes publicly available.\n  ",
    "bugtrack_url": null,
    "license": "LGPL-3.0-or-later",
    "summary": "The clean and modern way of accessing IMSLP data and scores programmatically.",
    "version": "0.2.3",
    "split_keywords": [
        "imslp",
        "imslp-scraper",
        "imslp-scraping",
        "sheet-music",
        "music-data"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "ea2adf48f53f75964a3bdc90e68f66df",
                "sha256": "f5bb7a3b9a401ea685889dfad21403cfdf4eeac066d2a98e5c6b78939d72e27d"
            },
            "downloads": -1,
            "filename": "imslp-0.2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ea2adf48f53f75964a3bdc90e68f66df",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 19102,
            "upload_time": "2022-12-30T04:01:54",
            "upload_time_iso_8601": "2022-12-30T04:01:54.178033Z",
            "url": "https://files.pythonhosted.org/packages/a3/ff/d4d989ccaa27861fd9bb33a4bb2144fb9d8733088915b31bb08f2bba4ef5/imslp-0.2.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "88abd376fdb52da174797cfc9c3b1ec7",
                "sha256": "ecc731560c0e55460b6506bb3d7db30d9dd7ca15daa7a68f545f24ca59a46277"
            },
            "downloads": -1,
            "filename": "imslp-0.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "88abd376fdb52da174797cfc9c3b1ec7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 14045,
            "upload_time": "2022-12-30T04:01:55",
            "upload_time_iso_8601": "2022-12-30T04:01:55.263537Z",
            "url": "https://files.pythonhosted.org/packages/a0/39/84eaea22a89d52e479149c3aa92dc5aa85a00da4b793439d689ca550717b/imslp-0.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-30 04:01:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "jlumbroso",
    "github_project": "imslp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    "==",
                    "4.11.1"
                ]
            ]
        },
        {
            "name": "bs4",
            "specs": [
                [
                    "==",
                    "0.0.1"
                ]
            ]
        },
        {
            "name": "certifi",
            "specs": [
                [
                    "==",
                    "2022.12.7"
                ]
            ]
        },
        {
            "name": "charset-normalizer",
            "specs": [
                [
                    "==",
                    "2.1.1"
                ]
            ]
        },
        {
            "name": "idna",
            "specs": [
                [
                    "==",
                    "3.4"
                ]
            ]
        },
        {
            "name": "mwclient",
            "specs": [
                [
                    "==",
                    "0.10.1"
                ]
            ]
        },
        {
            "name": "oauthlib",
            "specs": [
                [
                    "==",
                    "3.2.2"
                ]
            ]
        },
        {
            "name": "requests-oauthlib",
            "specs": [
                [
                    "==",
                    "1.3.1"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.28.1"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "soupsieve",
            "specs": [
                [
                    "==",
                    "2.3.2.post1"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    "==",
                    "1.26.13"
                ]
            ]
        }
    ],
    "lcname": "imslp"
}
        
Elapsed time: 0.10419s