discogslearner


Namediscogslearner JSON
Version 0.21 PyPI version JSON
download
home_pagehttps://github.com/Pascallio
SummaryMachine Learning module for Discogs
upload_time2021-05-04 21:45:59
maintainer
docs_urlNone
authorPascal Maas
requires_python
licenseGPL-3.0
keywords discogs machine learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DiscogsLearner - ML library for Discogs

<!--- Version: 0.21 ---> 

## Introduction
This package enables predicting similar releases using your Discogs Wantlist and/or Collection. To accomplish this, a 2-step process is executed: Data retrieval using the monthly data dumps and data learning using a list of identifiers obtained from your Wantlist and/or Collection. It produces release identifiers together with probabilities of similarity to your input. See *Details* for an in-depth explanation. This package requires about 3GB of free RAM to process the whole 'Electronic' genre.

## Installation

    pip install discogslearner
   
## Usage 

1. Obtain a Discogs personal access token. See https://www.discogs.com/settings/developers on how to obtain one.
2. Execute a script like the following:

```python
import discogslearner

if __name__ == "__main__":
    output_file = "Data/discogs_db.tsv"
    my_genre = "Electronic"
    my_token = "your_token_here"
    
    extracter = discogslearner.Extracter(genre = my_genre)
    extracter.extract(output = output_file)
    learner = discogslearner.Learner(db_path = output_file, 
                                    use_wantlist=True, 
                                    use_collection=True,
                                    token = my_token)

    outcome = learner.learn_and_predict(n_models = 10)
    print(outcome)
```

## Details
In order to learn from Discogs data, the fields Format, Year, Country, Style(s) and Number of Tracks are considered factors of a Release. Fields with categorical values (Format, Country & Styles) are formatted using One-Hot encoding, using only Releases from the given Wantlist and/or Collection.  Next, a PCA transformation is applied on these Releases, before applying the transformation on all extracted Releases from Discogs. Note that during this process, only the Styles within the Wantlist and/or Collection are kept in the database as Releases with other styles are most likely not interesting.

Artists, Labels, and Companies are considered to be groups of Releases, so to incorporate these, the mean and variance of the grouped PCA data is taken and attached to the original PCA data. In the current version, collaborating groups (e.g. two Artists together) are seen as a single entity, but this will be updated in future versions.

The Wantlist and/or Collection are seen as positive predictors, but negative predictors are usually not saved. Therefore, a random set of Releases of equal size as the positive predictors is taken as negative predictors. This introduces bias and thus, this package combines 10 models with 10 different negative predictors and multiplies the probabilities to obtain a single score for each Release. Note that Releases part of the Wantlist and/or Collection are not returned in the predictions.
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Pascallio",
    "name": "discogslearner",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "Discogs,Machine Learning",
    "author": "Pascal Maas",
    "author_email": "p.maas92@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/bb/80/f61ba4b0a4e7a25767c6606fca38a41804018d629d4f23226d4d08331970/discogslearner-0.21.tar.gz",
    "platform": "",
    "description": "# DiscogsLearner - ML library for Discogs\n\n<!--- Version: 0.21 ---> \n\n## Introduction\nThis package enables predicting similar releases using your Discogs Wantlist and/or Collection. To accomplish this, a 2-step process is executed: Data retrieval using the monthly data dumps and data learning using a list of identifiers obtained from your Wantlist and/or Collection. It produces release identifiers together with probabilities of similarity to your input. See *Details* for an in-depth explanation. This package requires about 3GB of free RAM to process the whole 'Electronic' genre.\n\n## Installation\n\n    pip install discogslearner\n   \n## Usage \n\n1. Obtain a Discogs personal access token. See https://www.discogs.com/settings/developers on how to obtain one.\n2. Execute a script like the following:\n\n```python\nimport discogslearner\n\nif __name__ == \"__main__\":\n    output_file = \"Data/discogs_db.tsv\"\n    my_genre = \"Electronic\"\n    my_token = \"your_token_here\"\n    \n    extracter = discogslearner.Extracter(genre = my_genre)\n    extracter.extract(output = output_file)\n    learner = discogslearner.Learner(db_path = output_file, \n                                    use_wantlist=True, \n                                    use_collection=True,\n                                    token = my_token)\n\n    outcome = learner.learn_and_predict(n_models = 10)\n    print(outcome)\n```\n\n## Details\nIn order to learn from Discogs data, the fields Format, Year, Country, Style(s) and Number of Tracks are considered factors of a Release. Fields with categorical values (Format, Country & Styles) are formatted using One-Hot encoding, using only Releases from the given Wantlist and/or Collection.  Next, a PCA transformation is applied on these Releases, before applying the transformation on all extracted Releases from Discogs. Note that during this process, only the Styles within the Wantlist and/or Collection are kept in the database as Releases with other styles are most likely not interesting.\n\nArtists, Labels, and Companies are considered to be groups of Releases, so to incorporate these, the mean and variance of the grouped PCA data is taken and attached to the original PCA data. In the current version, collaborating groups (e.g. two Artists together) are seen as a single entity, but this will be updated in future versions.\n\nThe Wantlist and/or Collection are seen as positive predictors, but negative predictors are usually not saved. Therefore, a random set of Releases of equal size as the positive predictors is taken as negative predictors. This introduces bias and thus, this package combines 10 models with 10 different negative predictors and multiplies the probabilities to obtain a single score for each Release. Note that Releases part of the Wantlist and/or Collection are not returned in the predictions.",
    "bugtrack_url": null,
    "license": "GPL-3.0",
    "summary": "Machine Learning module for Discogs",
    "version": "0.21",
    "split_keywords": [
        "discogs",
        "machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "4a3bafeb9493f3ac140d5b3bac834b4a",
                "sha256": "3c9edab9210b7efe322fb81f2215c80af66712702a9dfb94f1bf448d99ff1e46"
            },
            "downloads": -1,
            "filename": "discogslearner-0.21.tar.gz",
            "has_sig": false,
            "md5_digest": "4a3bafeb9493f3ac140d5b3bac834b4a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 12780,
            "upload_time": "2021-05-04T21:45:59",
            "upload_time_iso_8601": "2021-05-04T21:45:59.751205Z",
            "url": "https://files.pythonhosted.org/packages/bb/80/f61ba4b0a4e7a25767c6606fca38a41804018d629d4f23226d4d08331970/discogslearner-0.21.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-05-04 21:45:59",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "discogslearner"
}
        
Elapsed time: 0.23482s