<div align="center">
# guestrrday
#### because for music heads, context matters!
A CLI tool that does one thing and does it well: guess the release date and record label of a set of music tracks with high accuracy and precision. 20,000+ songs guessed so far!
</div>
## Description
The pop music historian's dream: given a directory of music tracks or a list of one or more song names (in a textfile or CLI args), guess the **year of release** and the **record label** of each track.
Example:
```Ike & Tina Turner - A Love Like Yours (Don't Come Knocking Everyday)```
**=>**
```Ike & Tina Turner - A Love Like Yours (Don't Come Knocking Everyday) (London American, 1966)```
The script queries discogs.com and includes some tweaks to increase the precision and accuracy of the search (see [Why This Tool?](#why-this-tool) below).
The tool will rename files if the input is a directory, and will create a new textfile in the case of file, and will print to stdout in case of cli args.
## Installation
- *guesterrday* can be installed by running `pip install guestrrday`.
- To update *guesterrday* run `pip install --upgrade guestrrday`.
> On some systems you might have to change `pip` to `pip3`.
## Usage
**NOTE**: You need to have a free discogs user token to use the tool. This is how to do it:
1. Create a free account on [discogs.com](https://discogs.com)
2. Visit the [developer tools section](https://www.discogs.com/de/settings/developers) and create a new token
3. Set the _DISCOGS_TOKEN_ environmental variable on your system with that value. *Alternatively*, the tool will prompt you for it on the first run and then store in an environmental variable.
General Usage:
```sh
guestrrday --input SOURCE
```
Where _SOURCE_ can be any of:
- ```Directory``` containing music files, or
- ```Filename``` for a text file containing a list song names, one per line, or
- ```"[song1] [, song2 ] [...]"```: Comma-separated list of song names
The tool will automatically detect which one you mean.
You can run _guestrrday_ as a package if running it as a script doesn't work:
```sh
python -m guestrrday --input SOURCE
```
## Why This Tool?
*TLDR*; The devil in the detail
Guestrrday scans a list of song titles and queries discogs.com for the year and label of each. This is clearly a simple function, right? Why a whole tool?
I made this tool because I had two strict requirements: *high prediction rate* and *accuracy* **at scale**. So what is the current prediction rate and accuracy you ask? Well, there are three variables affecting prediction rate and accuracy: (1) the completeness of the discogs.com database, (2) unsanitized tracknames / filenames, and (3) the limitations of the discogs search engine (ex., sensitivity to slight changes in search terms). We can't control (1) but we can control (2) & (3), and this is what this tool focuses on[^1]. The completeness of the discogs DB and any music DB really varies based on the music: for example, data on 90s electronic music singles is much more complete than pre-war blues releases (1930s, 40s).
To throw rough estimates from experience, I would say the average completness of the discogs DB with regards to the *year of release* is around 85-90%. 95% of those are typically detected by this tool, which gives it a 95% prediction rate.
However, it must be noted, there are many singles on discogs which have no original year of release, in that case the tool will return the year of release of the earliest available reissue. For example, a disco single released in 1970s but for which no year is available for the original release, but a reissue exists (say, one released in 2015), the detected year for that track would be 2015 (and the label would be the reissue label).
## Contribution
Pull requests welcome, just create an issue.
## License
This project is Licensed under the [MIT](/LICENSE) License.
[^1]: I tried generic fuzzy matching ([thefuzz](https://github.com/seatgeek/thefuzz)) but couldn't seem to get as good of an accuracy in search results: I got more false positives and couldn't find a sweetspot in the numerical score to optimize false negatives. Ofcourse I could have run a large test sets and optimized for that value and fuzzy matching algorithm, but for such a small project it was not worth it.
Raw data
{
"_id": null,
"home_page": "https://github.com/n42r/guestrrday",
"name": "guestrrday",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10,<4.0",
"maintainer_email": "",
"keywords": "music,crate-digging,music discovery",
"author": "n42r",
"author_email": "n42r.me@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/24/64/05828e82199f353013296761267fcae08a4b6857625bb675b080eb10c0fb/guestrrday-0.1.2.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n\n# guestrrday \n\n#### because for music heads, context matters!\n\nA CLI tool that does one thing and does it well: guess the release date and record label of a set of music tracks with high accuracy and precision. 20,000+ songs guessed so far!\n\n</div>\n\n## Description\n\nThe pop music historian's dream: given a directory of music tracks or a list of one or more song names (in a textfile or CLI args), guess the **year of release** and the **record label** of each track.\n\nExample:\n\n```Ike & Tina Turner - A Love Like Yours (Don't Come Knocking Everyday)``` \n\n**=>** \n\n```Ike & Tina Turner - A Love Like Yours (Don't Come Knocking Everyday) (London American, 1966)```\n\nThe script queries discogs.com and includes some tweaks to increase the precision and accuracy of the search (see [Why This Tool?](#why-this-tool) below).\n\nThe tool will rename files if the input is a directory, and will create a new textfile in the case of file, and will print to stdout in case of cli args.\n\n\n## Installation\n - *guesterrday* can be installed by running `pip install guestrrday`.\n - To update *guesterrday* run `pip install --upgrade guestrrday`.\n\n > On some systems you might have to change `pip` to `pip3`.\n\n\n## Usage\n\n**NOTE**: You need to have a free discogs user token to use the tool. This is how to do it:\n1. Create a free account on [discogs.com](https://discogs.com)\n2. Visit the [developer tools section](https://www.discogs.com/de/settings/developers) and create a new token\n3. Set the _DISCOGS_TOKEN_ environmental variable on your system with that value. *Alternatively*, the tool will prompt you for it on the first run and then store in an environmental variable.\n\nGeneral Usage:\n```sh\nguestrrday --input SOURCE\n```\n\nWhere _SOURCE_ can be any of: \n- ```Directory``` containing music files, or\n- ```Filename``` for a text file containing a list song names, one per line, or\n- ```\"[song1] [, song2 ] [...]\"```: Comma-separated list of song names\n\nThe tool will automatically detect which one you mean.\n\nYou can run _guestrrday_ as a package if running it as a script doesn't work:\n```sh\npython -m guestrrday --input SOURCE\n```\n\n\n## Why This Tool?\n\n*TLDR*; The devil in the detail\n\nGuestrrday scans a list of song titles and queries discogs.com for the year and label of each. This is clearly a simple function, right? Why a whole tool?\n\nI made this tool because I had two strict requirements: *high prediction rate* and *accuracy* **at scale**. So what is the current prediction rate and accuracy you ask? Well, there are three variables affecting prediction rate and accuracy: (1) the completeness of the discogs.com database, (2) unsanitized tracknames / filenames, and (3) the limitations of the discogs search engine (ex., sensitivity to slight changes in search terms). We can't control (1) but we can control (2) & (3), and this is what this tool focuses on[^1]. The completeness of the discogs DB and any music DB really varies based on the music: for example, data on 90s electronic music singles is much more complete than pre-war blues releases (1930s, 40s).\n\nTo throw rough estimates from experience, I would say the average completness of the discogs DB with regards to the *year of release* is around 85-90%. 95% of those are typically detected by this tool, which gives it a 95% prediction rate.\n\nHowever, it must be noted, there are many singles on discogs which have no original year of release, in that case the tool will return the year of release of the earliest available reissue. For example, a disco single released in 1970s but for which no year is available for the original release, but a reissue exists (say, one released in 2015), the detected year for that track would be 2015 (and the label would be the reissue label).\n\n\n## Contribution\n\nPull requests welcome, just create an issue.\n\n## License\n\nThis project is Licensed under the [MIT](/LICENSE) License.\n\n\n[^1]: I tried generic fuzzy matching ([thefuzz](https://github.com/seatgeek/thefuzz)) but couldn't seem to get as good of an accuracy in search results: I got more false positives and couldn't find a sweetspot in the numerical score to optimize false negatives. Ofcourse I could have run a large test sets and optimized for that value and fuzzy matching algorithm, but for such a small project it was not worth it.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Retrieve the release date and record label of a set of music tracks with high precision & accuracy. 20,000 guessed so far!",
"version": "0.1.2",
"project_urls": {
"Bug Tracker": "https://github.com/n42r/guestrrday/issues",
"Homepage": "https://github.com/n42r/guestrrday"
},
"split_keywords": [
"music",
"crate-digging",
"music discovery"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3c9724e606b7b93741099a1be672fdc0090e1cec8a3540892d956d080e4f6f7f",
"md5": "f54d6ee6bd5921120ca6e7fb1dbd641d",
"sha256": "49da7e8a8fcb47653d42a6799984d7df36a4f53307ac49dd87b281a862706791"
},
"downloads": -1,
"filename": "guestrrday-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f54d6ee6bd5921120ca6e7fb1dbd641d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10,<4.0",
"size": 16504,
"upload_time": "2023-12-09T00:57:30",
"upload_time_iso_8601": "2023-12-09T00:57:30.973690Z",
"url": "https://files.pythonhosted.org/packages/3c/97/24e606b7b93741099a1be672fdc0090e1cec8a3540892d956d080e4f6f7f/guestrrday-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "246405828e82199f353013296761267fcae08a4b6857625bb675b080eb10c0fb",
"md5": "8e365953cfe233be9efd14db15e31ea9",
"sha256": "ac5816d8046478dfcaa4f4149e28a0e1ba60b91e4ff4e24764bd0beef11b4ed2"
},
"downloads": -1,
"filename": "guestrrday-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "8e365953cfe233be9efd14db15e31ea9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10,<4.0",
"size": 16326,
"upload_time": "2023-12-09T00:57:33",
"upload_time_iso_8601": "2023-12-09T00:57:33.155462Z",
"url": "https://files.pythonhosted.org/packages/24/64/05828e82199f353013296761267fcae08a4b6857625bb675b080eb10c0fb/guestrrday-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-09 00:57:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "n42r",
"github_project": "guestrrday",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "guestrrday"
}