clayrs

Name	clayrs JSON
Version	0.5.1 JSON
	download
home_page	https://github.com/swapUniba/ClayRS
Summary	Complexly represent contents, build recommender systems, evaluate them. All in one place!
upload_time	2023-07-04 17:32:53
maintainer
docs_url	None
author	Antonio Silletti, Elio Musacchio, Roberta Sallustio
requires_python	>=3.8
license	GPL-3.0
keywords	recommender system cbrs evaluation recsys
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage

            <p align="center">
    <img src="https://user-images.githubusercontent.com/26851363/172485577-be6993ef-47c3-4b3c-9187-4988f6c44d94.svg" alt="ClayRS logo" style="width:75%;"/>
</p>


# ClayRS

[![Build Status](https://github.com/swapUniba/ClayRS/actions/workflows/testing_pipeline.yml/badge.svg)](https://github.com/swapUniba/ClayRS/actions/workflows/testing_pipeline.yml)&nbsp;&nbsp;
[![Docs](https://github.com/swapUniba/ClayRS/actions/workflows/docs_building.yml/badge.svg)](https://swapuniba.github.io/ClayRS/)&nbsp;&nbsp;
[![codecov](https://codecov.io/gh/swapUniba/ClayRS/branch/master/graph/badge.svg?token=dftmT3QD8D)](https://codecov.io/gh/swapUniba/ClayRS)&nbsp;&nbsp;
[![Python versions](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue)](https://www.python.org/downloads/)


***ClayRS*** is a python framework for (mainly) content-based recommender systems which allows you to perform several operations, starting from a raw representation of users and items to building and evaluating a recommender system. It also supports graph-based recommendation with feature selection algorithms and graph manipulation methods.

The framework has three main modules, which you can also use individually:

<p align="center">
    <img src="https://user-images.githubusercontent.com/26851363/164490523-00d60efd-7b17-4d20-872a-28eaf2323b03.png" alt="ClayRS" style="width:75%;"/>
</p>

Given a raw source, the ***Content Analyzer***:
* Creates and serializes contents,
* Using the chosen configuration

The ***RecSys*** module allows to:
* Instantiate a recommender system
    * *Using items and users serialized by the Content Analyzer*
* Make score *prediction* or *recommend* items for the active user(s)

The ***EvalModel*** has the task of evaluating a recommender system, using several state-of-the-art metrics

Code examples for all three modules will follow in the *Usage* section

## Installation
*ClayRS* requires Python **3.7** or later, while package dependencies are in `requirements.txt` and are all installable
via `pip`, as *ClayRS* itself.

To install it execute the following command:

``
pip install clayrs
``

## Usage

### Content Analyzer
The first thing to do is to import the Content Analyzer module
* We will access its methods and classes via dot notation
```python
import clayrs.content_analyzer as ca
```

Then, let's point to the source containing raw information to process
```python
raw_source = ca.JSONFile('items_info.json')
```

We can now start building the configuration for the items

* Note that same operations that can be specified for *items*, could be also specified for *users*, via the
`ca.UserAnalyzerConfig` class

```python
# Configuration of item representation
movies_ca_config = ca.ItemAnalyzerConfig(
    source=raw_source,
    id='movielens_id',  # id which uniquely identifies each item
    output_directory='movies_codified/'  # where items complexly represented will be stored
)
```

Let's represent the *plot* field of each content with a TfIdf representation

* Since the `preprocessing` parameter has been specified, then each field is first preprocessed with the specified
operations
```python
movies_ca_config.add_single_config(
    'plot',
    ca.FieldConfig(ca.SkLearnTfIdf(),
                   preprocessing=ca.NLTK(stopwords_removal=True,
                                         lemmatization=True),
                   id='tfidf')  # Custom id
)
```

To finalize the Content Analyzer part, let's instantiate the `ContentAnalyzer` class by passing the built configuration
and by calling its `fit()` method

* The items will be created with the specified representations and serialized
```python
ca.ContentAnalyzer(movies_ca_config).fit()
```

### RecSys
Similarly above, we must first import the RecSys module
```python
import clayrs.recsys as rs
```

Then we load the rating frame from a TSV file

* In this case in our file the first three columns are user_id, item_id, score in this order
  * If your file has a different structure you must specify how to map the column via parameters, check documentation
  for more

```python
ratings = ca.Ratings(ca.CSVFile('ratings.tsv', separator='\t'))
```

Let's split with the KFold technique the loaded rating frame into train set and test set

* since `n_splits=2`, train_list will contain two *train_sets* and test_list will contain two *test_sets*
```python
train_list, test_list = rs.KFoldPartitioning(n_splits=2).split_all(ratings)
```

In order to recommend items to users, we must choose an algorithm to use

* In this case we are using the `CentroidVector` algorithm which will work by using the first representation
specified for the *plot* field
* You can freely choose which representation to use among all representation codified for the fields in the Content
Analyzer phase
* 
```python
centroid_vec = rs.CentroidVector(
    {'plot': 'tfidf'},
  
    similarity=rs.CosineSimilarity()
)
```

Let's now compute the top-10 ranking for each user of the train set
* By default the candidate items are those in the test set of the user, but you can change this behaviour with the
`methodology` parameter

Since we used the kfold technique, we iterate over the train sets and test sets
```python
result_list = []

for train_set, test_set in zip(train_list, test_list):
  
  cbrs = rs.ContentBasedRS(centroid_vec, train_set, 'movies_codified/')
  rank = cbrs.fit_rank(test_set, n_recs=10)

  result_list.append(rank)
```

### EvalModel
Similarly to the Content Analyzer and RecSys module, we must first import the evaluation module
```python
import clayrs.evaluation as eva
```

The Evaluation module needs the following parameters:

*   A list of computed rank/predictions (in case multiple splits must be evaluated)
*   A list of truths (in case multiple splits must be evaluated)
*   List of metrics to compute

Obviously the list of computed rank/predictions and list of truths must have the same length,
and the rank/prediction in position <img src="https://render.githubusercontent.com/render/math?math=i"> will be compared
with the truth at position <img src="https://render.githubusercontent.com/render/math?math=i">

```python
em = eva.EvalModel(
    pred_list=result_list,
    truth_list=test_list,
    metric_list=[
        eva.NDCG(),
        eva.Precision(),
        eva.RecallAtK(k=5)
    ]
)
```

Then simply call the `fit()` method of the instantiated object
* It will return two pandas DataFrame: the first one contains the metrics aggregated for the system,
while the second contains the metrics computed for each user (where possible)

```python
sys_result, users_result =  em.fit()
```

Note that the EvalModel is able to compute evaluation of recommendations generated by other tools/frameworks, check
documentation for more

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/swapUniba/ClayRS",
    "name": "clayrs",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "recommender system,cbrs,evaluation,recsys",
    "author": "Antonio Silletti, Elio Musacchio, Roberta Sallustio",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/59/d9/52b3f3dba1f3151fdf50a09ab303f6d8a8bf9d9212dd8caa75e17c3d989b/clayrs-0.5.1.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\r\n    <img src=\"https://user-images.githubusercontent.com/26851363/172485577-be6993ef-47c3-4b3c-9187-4988f6c44d94.svg\" alt=\"ClayRS logo\" style=\"width:75%;\"/>\r\n</p>\r\n\r\n\r\n# ClayRS\r\n\r\n[![Build Status](https://github.com/swapUniba/ClayRS/actions/workflows/testing_pipeline.yml/badge.svg)](https://github.com/swapUniba/ClayRS/actions/workflows/testing_pipeline.yml)&nbsp;&nbsp;\r\n[![Docs](https://github.com/swapUniba/ClayRS/actions/workflows/docs_building.yml/badge.svg)](https://swapuniba.github.io/ClayRS/)&nbsp;&nbsp;\r\n[![codecov](https://codecov.io/gh/swapUniba/ClayRS/branch/master/graph/badge.svg?token=dftmT3QD8D)](https://codecov.io/gh/swapUniba/ClayRS)&nbsp;&nbsp;\r\n[![Python versions](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue)](https://www.python.org/downloads/)\r\n\r\n\r\n***ClayRS*** is a python framework for (mainly) content-based recommender systems which allows you to perform several operations, starting from a raw representation of users and items to building and evaluating a recommender system. It also supports graph-based recommendation with feature selection algorithms and graph manipulation methods.\r\n\r\nThe framework has three main modules, which you can also use individually:\r\n\r\n<p align=\"center\">\r\n    <img src=\"https://user-images.githubusercontent.com/26851363/164490523-00d60efd-7b17-4d20-872a-28eaf2323b03.png\" alt=\"ClayRS\" style=\"width:75%;\"/>\r\n</p>\r\n\r\nGiven a raw source, the ***Content Analyzer***:\r\n* Creates and serializes contents,\r\n* Using the chosen configuration\r\n\r\nThe ***RecSys*** module allows to:\r\n* Instantiate a recommender system\r\n    * *Using items and users serialized by the Content Analyzer*\r\n* Make score *prediction* or *recommend* items for the active user(s)\r\n\r\nThe ***EvalModel*** has the task of evaluating a recommender system, using several state-of-the-art metrics\r\n\r\nCode examples for all three modules will follow in the *Usage* section\r\n\r\n## Installation\r\n*ClayRS* requires Python **3.7** or later, while package dependencies are in `requirements.txt` and are all installable\r\nvia `pip`, as *ClayRS* itself.\r\n\r\nTo install it execute the following command:\r\n\r\n``\r\npip install clayrs\r\n``\r\n\r\n## Usage\r\n\r\n### Content Analyzer\r\nThe first thing to do is to import the Content Analyzer module\r\n* We will access its methods and classes via dot notation\r\n```python\r\nimport clayrs.content_analyzer as ca\r\n```\r\n\r\nThen, let's point to the source containing raw information to process\r\n```python\r\nraw_source = ca.JSONFile('items_info.json')\r\n```\r\n\r\nWe can now start building the configuration for the items\r\n\r\n* Note that same operations that can be specified for *items*, could be also specified for *users*, via the\r\n`ca.UserAnalyzerConfig` class\r\n\r\n```python\r\n# Configuration of item representation\r\nmovies_ca_config = ca.ItemAnalyzerConfig(\r\n    source=raw_source,\r\n    id='movielens_id',  # id which uniquely identifies each item\r\n    output_directory='movies_codified/'  # where items complexly represented will be stored\r\n)\r\n```\r\n\r\nLet's represent the *plot* field of each content with a TfIdf representation\r\n\r\n* Since the `preprocessing` parameter has been specified, then each field is first preprocessed with the specified\r\noperations\r\n```python\r\nmovies_ca_config.add_single_config(\r\n    'plot',\r\n    ca.FieldConfig(ca.SkLearnTfIdf(),\r\n                   preprocessing=ca.NLTK(stopwords_removal=True,\r\n                                         lemmatization=True),\r\n                   id='tfidf')  # Custom id\r\n)\r\n```\r\n\r\nTo finalize the Content Analyzer part, let's instantiate the `ContentAnalyzer` class by passing the built configuration\r\nand by calling its `fit()` method\r\n\r\n* The items will be created with the specified representations and serialized\r\n```python\r\nca.ContentAnalyzer(movies_ca_config).fit()\r\n```\r\n\r\n### RecSys\r\nSimilarly above, we must first import the RecSys module\r\n```python\r\nimport clayrs.recsys as rs\r\n```\r\n\r\nThen we load the rating frame from a TSV file\r\n\r\n* In this case in our file the first three columns are user_id, item_id, score in this order\r\n  * If your file has a different structure you must specify how to map the column via parameters, check documentation\r\n  for more\r\n\r\n```python\r\nratings = ca.Ratings(ca.CSVFile('ratings.tsv', separator='\\t'))\r\n```\r\n\r\nLet's split with the KFold technique the loaded rating frame into train set and test set\r\n\r\n* since `n_splits=2`, train_list will contain two *train_sets* and test_list will contain two *test_sets*\r\n```python\r\ntrain_list, test_list = rs.KFoldPartitioning(n_splits=2).split_all(ratings)\r\n```\r\n\r\nIn order to recommend items to users, we must choose an algorithm to use\r\n\r\n* In this case we are using the `CentroidVector` algorithm which will work by using the first representation\r\nspecified for the *plot* field\r\n* You can freely choose which representation to use among all representation codified for the fields in the Content\r\nAnalyzer phase\r\n* \r\n```python\r\ncentroid_vec = rs.CentroidVector(\r\n    {'plot': 'tfidf'},\r\n  \r\n    similarity=rs.CosineSimilarity()\r\n)\r\n```\r\n\r\nLet's now compute the top-10 ranking for each user of the train set\r\n* By default the candidate items are those in the test set of the user, but you can change this behaviour with the\r\n`methodology` parameter\r\n\r\nSince we used the kfold technique, we iterate over the train sets and test sets\r\n```python\r\nresult_list = []\r\n\r\nfor train_set, test_set in zip(train_list, test_list):\r\n  \r\n  cbrs = rs.ContentBasedRS(centroid_vec, train_set, 'movies_codified/')\r\n  rank = cbrs.fit_rank(test_set, n_recs=10)\r\n\r\n  result_list.append(rank)\r\n```\r\n\r\n### EvalModel\r\nSimilarly to the Content Analyzer and RecSys module, we must first import the evaluation module\r\n```python\r\nimport clayrs.evaluation as eva\r\n```\r\n\r\nThe Evaluation module needs the following parameters:\r\n\r\n*   A list of computed rank/predictions (in case multiple splits must be evaluated)\r\n*   A list of truths (in case multiple splits must be evaluated)\r\n*   List of metrics to compute\r\n\r\nObviously the list of computed rank/predictions and list of truths must have the same length,\r\nand the rank/prediction in position <img src=\"https://render.githubusercontent.com/render/math?math=i\"> will be compared\r\nwith the truth at position <img src=\"https://render.githubusercontent.com/render/math?math=i\">\r\n\r\n```python\r\nem = eva.EvalModel(\r\n    pred_list=result_list,\r\n    truth_list=test_list,\r\n    metric_list=[\r\n        eva.NDCG(),\r\n        eva.Precision(),\r\n        eva.RecallAtK(k=5)\r\n    ]\r\n)\r\n```\r\n\r\nThen simply call the `fit()` method of the instantiated object\r\n* It will return two pandas DataFrame: the first one contains the metrics aggregated for the system,\r\nwhile the second contains the metrics computed for each user (where possible)\r\n\r\n```python\r\nsys_result, users_result =  em.fit()\r\n```\r\n\r\nNote that the EvalModel is able to compute evaluation of recommendations generated by other tools/frameworks, check\r\ndocumentation for more\r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": "GPL-3.0",
    "summary": "Complexly represent contents, build recommender systems, evaluate them. All in one place!",
    "version": "0.5.1",
    "project_urls": {
        "Homepage": "https://github.com/swapUniba/ClayRS"
    },
    "split_keywords": [
        "recommender system",
        "cbrs",
        "evaluation",
        "recsys"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "59d952b3f3dba1f3151fdf50a09ab303f6d8a8bf9d9212dd8caa75e17c3d989b",
                "md5": "7da0ac14536df2d3c2ae09347135fce5",
                "sha256": "626b3af6559faaa63fe4177175b2ca009ce3d9405de129c7ac5773a08e37fb17"
            },
            "downloads": -1,
            "filename": "clayrs-0.5.1.tar.gz",
            "has_sig": false,
            "md5_digest": "7da0ac14536df2d3c2ae09347135fce5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 281117,
            "upload_time": "2023-07-04T17:32:53",
            "upload_time_iso_8601": "2023-07-04T17:32:53.883426Z",
            "url": "https://files.pythonhosted.org/packages/59/d9/52b3f3dba1f3151fdf50a09ab303f6d8a8bf9d9212dd8caa75e17c3d989b/clayrs-0.5.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-04 17:32:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "swapUniba",
    "github_project": "ClayRS",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "lcname": "clayrs"
}

Antonio Silletti, Elio Musacchio, Roberta Sallustio