mmsbm


Namemmsbm JSON
Version 0.2.0 PyPI version JSON
download
home_pagehttps://github.com/eudald-seeslab/mmsbm
SummaryCompute Mixed Membership Stochastic Block Models.
upload_time2024-08-02 12:36:40
maintainerNone
docs_urlNone
authorEudald Correig
requires_pythonNone
licenseBSD-3-Clause License
keywords bayesian analysis recommender systems network analysis python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            # Mixed Membership Stochastic Block Models

[![Build Status](https://travis-ci.com/eudald-seeslab/mmsbm.svg?token=FgqRjRbiBxssKd9AcHMK&branch=main)](https://travis-ci.com/eudald-seeslab/mmsbm)

This library converts [this](https://github.com/agodoylo/MMSBMrecommender) 
 work on Mixed Membership Stochastic Block Models to build a recommender 
system [1] into a library to be used with more generic data.

## Installation

```
pip install mmsbm
```

## Usage

### Input data

You'll need a pandas dataframe with exactly 3 columns: users, items and ratings, e.g.:

```python
import pandas as pd
from random import choice

train = pd.DataFrame(
    {
    "users": [f"user{choice(list(range(5)))}" for _ in range(100)],
    "items": [f"item{choice(list(range(10)))}" for _ in range(100)],
    "ratings": [choice(list(range(1, 6))) for _ in range(100)]
    }
)

test = pd.DataFrame(
    {
    "users": [f"user{choice(list(range(5)))}" for _ in range(50)],
    "items": [f"item{choice(list(range(10)))}" for _ in range(50)],
    "ratings": [choice(list(range(1, 6))) for _ in range(50)]
    }
)

```

### Setup

```python

from mmsbm import MMSBM

# Initialize the MMSBM class:
mmsbm = MMSBM(
    user_groups=2,
    item_groups=4,
    iterations=500,
    sampling=5,
    seed=1,
)
```

### Fit models

In here you have two options, a simple fit where we run "sampling" times the fitting algorithm and return the results
for all runs, you are then in charge of choosing the best one. 

```python
mmsbm.fit(train)
```

The other option is the cv_fit (cross-validated fit) function, whereby we split the input data in "folds" number of folds
and run the fitting in each one and test on the excluded fold. We then return all the 
samplings of the best performing model. The function returns a list of the accuracies for 
each fold so that you can get confidence intervals on them.

```python
accuracies = mmsbm.cv_fit(train, folds=5)
```

### Prediction

Once the model is fitted, we can predict on test data. The function predict returns
the prediction matrix (the probability of each user to belong to each group) as a numpy array.

```python
pred_matrix = mmsbm.predict(test)
```

### Score

Finally, you can get statistics about the goodness of fit and other parameters of the model, 
as well as the computed objects: the theta matrix, the eta matrix and the probability distributions.

The function score returns a dictionary with two sub-dictionaries, one for statistics about the model (called "stats") and 
the other one with the computed objects (called "objects").

```python
results = mmsbm.score()
```

## Performance

Each iteration takes a little about half a second in an Intel i7. This means that a
500 iteration runs takes around 4 minutes. The computation is vectorized, so, as 
long as you don't go crazy with the number of observations, the time should be 
approximately the same regardless of training set size. It is also parallelized 
over sampling, so, as long as you choose less sampling than number of cores, 
you should have approximately the same performance  regardless of training set 
size and sampling number.

## Tests

To run tests do the following:

```
python -m pytest tests/*
```


## TODO

- Progress bars are not working for jupyter notebooks.
- Include user_groups and item_groups optimization procedure.
- The cv_fit test is not working on travis.

## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

# References
[1]: Godoy-Lorite, Antonia, et al. "Accurate and scalable social recommendation 
using mixed-membership stochastic block models." Proceedings of the National 
Academy of Sciences 113.50 (2016): 14207-14212.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/eudald-seeslab/mmsbm",
    "name": "mmsbm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "bayesian analysis, recommender systems, network analysis, python",
    "author": "Eudald Correig",
    "author_email": "eudald.correig@urv.cat",
    "download_url": "https://files.pythonhosted.org/packages/68/1c/8a4e88b221f05fb368d767031076549285cb6b3bbed59c886017099922d4/mmsbm-0.2.0.tar.gz",
    "platform": null,
    "description": "# Mixed Membership Stochastic Block Models\r\n\r\n[![Build Status](https://travis-ci.com/eudald-seeslab/mmsbm.svg?token=FgqRjRbiBxssKd9AcHMK&branch=main)](https://travis-ci.com/eudald-seeslab/mmsbm)\r\n\r\nThis library converts [this](https://github.com/agodoylo/MMSBMrecommender) \r\n work on Mixed Membership Stochastic Block Models to build a recommender \r\nsystem [1] into a library to be used with more generic data.\r\n\r\n## Installation\r\n\r\n```\r\npip install mmsbm\r\n```\r\n\r\n## Usage\r\n\r\n### Input data\r\n\r\nYou'll need a pandas dataframe with exactly 3 columns: users, items and ratings, e.g.:\r\n\r\n```python\r\nimport pandas as pd\r\nfrom random import choice\r\n\r\ntrain = pd.DataFrame(\r\n    {\r\n    \"users\": [f\"user{choice(list(range(5)))}\" for _ in range(100)],\r\n    \"items\": [f\"item{choice(list(range(10)))}\" for _ in range(100)],\r\n    \"ratings\": [choice(list(range(1, 6))) for _ in range(100)]\r\n    }\r\n)\r\n\r\ntest = pd.DataFrame(\r\n    {\r\n    \"users\": [f\"user{choice(list(range(5)))}\" for _ in range(50)],\r\n    \"items\": [f\"item{choice(list(range(10)))}\" for _ in range(50)],\r\n    \"ratings\": [choice(list(range(1, 6))) for _ in range(50)]\r\n    }\r\n)\r\n\r\n```\r\n\r\n### Setup\r\n\r\n```python\r\n\r\nfrom mmsbm import MMSBM\r\n\r\n# Initialize the MMSBM class:\r\nmmsbm = MMSBM(\r\n    user_groups=2,\r\n    item_groups=4,\r\n    iterations=500,\r\n    sampling=5,\r\n    seed=1,\r\n)\r\n```\r\n\r\n### Fit models\r\n\r\nIn here you have two options, a simple fit where we run \"sampling\" times the fitting algorithm and return the results\r\nfor all runs, you are then in charge of choosing the best one. \r\n\r\n```python\r\nmmsbm.fit(train)\r\n```\r\n\r\nThe other option is the cv_fit (cross-validated fit) function, whereby we split the input data in \"folds\" number of folds\r\nand run the fitting in each one and test on the excluded fold. We then return all the \r\nsamplings of the best performing model. The function returns a list of the accuracies for \r\neach fold so that you can get confidence intervals on them.\r\n\r\n```python\r\naccuracies = mmsbm.cv_fit(train, folds=5)\r\n```\r\n\r\n### Prediction\r\n\r\nOnce the model is fitted, we can predict on test data. The function predict returns\r\nthe prediction matrix (the probability of each user to belong to each group) as a numpy array.\r\n\r\n```python\r\npred_matrix = mmsbm.predict(test)\r\n```\r\n\r\n### Score\r\n\r\nFinally, you can get statistics about the goodness of fit and other parameters of the model, \r\nas well as the computed objects: the theta matrix, the eta matrix and the probability distributions.\r\n\r\nThe function score returns a dictionary with two sub-dictionaries, one for statistics about the model (called \"stats\") and \r\nthe other one with the computed objects (called \"objects\").\r\n\r\n```python\r\nresults = mmsbm.score()\r\n```\r\n\r\n## Performance\r\n\r\nEach iteration takes a little about half a second in an Intel i7. This means that a\r\n500 iteration runs takes around 4 minutes. The computation is vectorized, so, as \r\nlong as you don't go crazy with the number of observations, the time should be \r\napproximately the same regardless of training set size. It is also parallelized \r\nover sampling, so, as long as you choose less sampling than number of cores, \r\nyou should have approximately the same performance  regardless of training set \r\nsize and sampling number.\r\n\r\n## Tests\r\n\r\nTo run tests do the following:\r\n\r\n```\r\npython -m pytest tests/*\r\n```\r\n\r\n\r\n## TODO\r\n\r\n- Progress bars are not working for jupyter notebooks.\r\n- Include user_groups and item_groups optimization procedure.\r\n- The cv_fit test is not working on travis.\r\n\r\n## Contributing\r\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\r\n\r\nPlease make sure to update tests as appropriate.\r\n\r\n# References\r\n[1]: Godoy-Lorite, Antonia, et al. \"Accurate and scalable social recommendation \r\nusing mixed-membership stochastic block models.\" Proceedings of the National \r\nAcademy of Sciences 113.50 (2016): 14207-14212.\r\n",
    "bugtrack_url": null,
    "license": "BSD-3-Clause License",
    "summary": "Compute Mixed Membership Stochastic Block Models.",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/eudald-seeslab/mmsbm"
    },
    "split_keywords": [
        "bayesian analysis",
        " recommender systems",
        " network analysis",
        " python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "40953b1c324a0cba53a6dc799ee824fad90347db10e1ff60dbd51b72ab64afdc",
                "md5": "e6c60750f557ab1152857c7c91b9cc57",
                "sha256": "ddf5869edc9744645a06ddcd9db7cea5cae32e4cb2c4b6a46542c174451d01fe"
            },
            "downloads": -1,
            "filename": "mmsbm-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e6c60750f557ab1152857c7c91b9cc57",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 11956,
            "upload_time": "2024-08-02T12:36:38",
            "upload_time_iso_8601": "2024-08-02T12:36:38.922108Z",
            "url": "https://files.pythonhosted.org/packages/40/95/3b1c324a0cba53a6dc799ee824fad90347db10e1ff60dbd51b72ab64afdc/mmsbm-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "681c8a4e88b221f05fb368d767031076549285cb6b3bbed59c886017099922d4",
                "md5": "f9cdaa8fd655ce5e5d9ff1d9d1cd382d",
                "sha256": "3b6e92784d626863008b0866473f5ab97ecc9a15d52406905590459ab6e756a6"
            },
            "downloads": -1,
            "filename": "mmsbm-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "f9cdaa8fd655ce5e5d9ff1d9d1cd382d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 12063,
            "upload_time": "2024-08-02T12:36:40",
            "upload_time_iso_8601": "2024-08-02T12:36:40.732125Z",
            "url": "https://files.pythonhosted.org/packages/68/1c/8a4e88b221f05fb368d767031076549285cb6b3bbed59c886017099922d4/mmsbm-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-02 12:36:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "eudald-seeslab",
    "github_project": "mmsbm",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "lcname": "mmsbm"
}
        
Elapsed time: 0.27138s