similaripy


Namesimilaripy JSON
Version 0.2.5 PyPI version JSON
download
home_pageNone
SummaryHigh-performance KNN similarity functions in Python, optimized for sparse matrices
upload_time2025-10-25 15:35:39
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords similarity knn nearest neighbors collaborative filtering normalization recommender systems
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
  <img src="https://raw.githubusercontent.com/bogliosimone/similaripy/master/docs/logo.png" alt="similaripy" width="350"/>
</p>

# SimilariPy

[![PyPI version](https://img.shields.io/pypi/v/similaripy.svg?logo=pypi&logoColor=white)](https://pypi.org/project/similaripy/)
[![Docs](https://img.shields.io/badge/docs-online-blue.svg)](https://bogliosimone.github.io/similaripy/)
[![Build and Test](https://github.com/bogliosimone/similaripy/actions/workflows/python-package.yml/badge.svg)](https://github.com/bogliosimone/similaripy/actions/workflows/python-package.yml)
[![Publish to PyPI](https://github.com/bogliosimone/similaripy/actions/workflows/pypi-publish.yml/badge.svg)](https://github.com/bogliosimone/similaripy/actions/workflows/pypi-publish.yml)
[![Docs Status](https://github.com/bogliosimone/similaripy/actions/workflows/deploy-docs.yml/badge.svg)](https://github.com/bogliosimone/similaripy/actions/workflows/deploy-docs.yml)
[![Python versions](https://img.shields.io/pypi/pyversions/similaripy.svg?logo=python&logoColor=white)](https://pypi.org/project/similaripy/)
[![License](https://img.shields.io/github/license/bogliosimone/similaripy.svg)](https://github.com/bogliosimone/similaripy/blob/master/LICENSE)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2583851.svg)](https://doi.org/10.5281/zenodo.2583851)


High-performance KNN similarity functions in Python, optimized for sparse matrices.

SimilariPy is primarily designed for Recommender Systems and Information Retrieval (IR) tasks, but can be applied to other domains as well.

The package also includes a set of normalization functions useful for pre-processing data before the similarity computation.

The official documentations is available at **[๐Ÿ“˜ SimilariPy Guide](https://bogliosimone.github.io/similaripy/)**

## ๐Ÿ” Similarity Functions

SimilariPy provides a range of high-performance similarity functions for sparse matrices.  
All functions are multi-threaded and implemented in Cython + OpenMP for fast parallel computation on CSR matrixes.

### Core 

- **Dot Product** โ€“ Simple raw inner product between vectors.
- **Cosine** โ€“ Normalized dot product based on L2 norm.
- **Asymmetric Cosine** โ€“ Skewed cosine similarity using an `alpha` parameter.
- **Jaccard**, **Dice**, **Tversky** โ€“ Set-based generalized similarities.

### Graph-Based

- **P3ฮฑ** โ€“ Graph-based similarity computed through random walk propagation with exponentiation.
- **RP3ฮฒ** โ€“ Similar to P3ฮฑ but includes popularity penalization using a `beta` parameter.

### Advanced 

- **S-Plus** โ€“ A hybrid model combining Tversky and Cosine components, with full control over weights and smoothing.

For mathematical definitions and parameter details, see the **[๐Ÿ“˜ SimilariPy Guide](https://bogliosimone.github.io/similaripy/)**.

## ๐Ÿงฎ Normalization Functions

SimilariPy provides a suite of normalization functions for sparse matrix pre-processing.  
All functions are implemented in Cython and can operate in-place on CSR matrixes for maximum performance and memory efficiency.

- **L1, L2** โ€“ Applies row- or column-wise normalization.
- **TF-IDF** โ€“ Computes TF-IDF weighting with customizable term-frequency and IDF modes.
- **BM25** โ€“ Applies classic BM25 weighting used in information retrieval.
- **BM25+** โ€“ Variant of BM25 with additive smoothing for low-frequency terms.

For more details, check the **[๐Ÿ“˜ SimilariPy Guide](https://bogliosimone.github.io/similaripy/)**.


## ๐Ÿš€ Getting Started

Hereโ€™s a minimal example to get you up and running with SimilariPy:

```python
import similaripy as sim
import scipy.sparse as sps

# Create a random User-Rating Matrix (URM)
urm = sps.random(1000, 2000, density=0.025)

# Normalize the URM using BM25
urm = sim.normalization.bm25(urm)

# Train an item-item cosine similarity model
similarity_matrix = sim.cosine(urm.T, k=50)

# Compute recommendations for user 1, 14, 8 
# filtering out already-seen items
recommendations = sim.dot_product(
    urm,
    similarity_matrix.T,
    k=100,
    target_rows=[1, 14, 8],
    filter_cols=urm
)
```

For a more in-depth introduction you can check the article [Item and User KNN Recommender models with similaripy in Python](https://www.stepbystepdatascience.com/knn-recommenders-python), an amazing resource from the [StepByStepDataScience](https://www.stepbystepdatascience.com) blog.

## ๐Ÿ“ฆ Installation

SimilariPy can be installed from PyPI with:

```cmd
pip install similaripy
```

### ๐Ÿ”ง GCC Compiler - Required

To install the package and compile the Cython code, a GCC-compatible compiler with OpenMP is required.

#### Ubuntu / Debian

Install the official dev-tools:

```bash
sudo apt update && sudo apt install build-essential
```

#### MacOS (Intel & Apple Silicon)

Install GCC with homebrew:

```bash
brew install gcc
```

#### Windows

Install the official **[Visual C++ Build Tools](https://visualstudio.microsoft.com/en/visual-cpp-build-tools/)**.

โš ๏ธ On Windows, use the default *format_output='coo'* in all similarity functions, as *'csr'* is currently not supported.


#### Optional Optimization: Intel MKL for Intel CPUs

For Intel CPUs, using SciPy/Numpy with MKL (Math Kernel Library) is highly recommended for best performance.
The easiest way to achieve this is to install them via Anaconda.

## ๐Ÿ“ฆ Requirements

| Package                         | Version        |
| --------------------------------|:--------------:|
| numpy                           |   >= 1.21      |
| scipy                           |   >= 1.10.1    |
| tqdm                            |   >= 4.65.2    |

## ๐Ÿ“œ History

This library originated during the **[Spotify Recsys Challenge 2018](https://research.atspotify.com/publications/recsys-challenge-2018-automatic-music-playlist-continuation/)**.

Our team, The Creamy Fireflies, faced major challenges computing large similarity models on a dataset with over 66 million interactions. Standard Python/Numpy solutions were too slow as a whole day was required to compute one single model.

To overcome this, I developed high-performance versions of the core similarity functions in Cython and OpenMP. Encouraged by my teammates, I open-sourced this work to help others solve similar challenges.

Thanks to my Creamy Fireflies friends for the support! ๐Ÿ™ 

## ๐Ÿ“„ License

This project is released under the MIT License.

## ๐Ÿ”– Citation

If you use SimilariPy in your research, please cite:

```text
@misc{boglio_simone_similaripy,
  author       = {Boglio Simone},
  title        = {bogliosimone/similaripy},
  doi          = {10.5281/zenodo.2583851},
  url          = {https://doi.org/10.5281/zenodo.2583851}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "similaripy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "Similarity, KNN, Nearest Neighbors, Collaborative Filtering, Normalization, Recommender Systems",
    "author": null,
    "author_email": "Simone Boglio <bogliosimone@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/dd/7f/84ca3920fd7a7cbc9fc61bfa3c7f0ec187e1d510a0d94afddc18fe4d756b/similaripy-0.2.5.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/bogliosimone/similaripy/master/docs/logo.png\" alt=\"similaripy\" width=\"350\"/>\n</p>\n\n# SimilariPy\n\n[![PyPI version](https://img.shields.io/pypi/v/similaripy.svg?logo=pypi&logoColor=white)](https://pypi.org/project/similaripy/)\n[![Docs](https://img.shields.io/badge/docs-online-blue.svg)](https://bogliosimone.github.io/similaripy/)\n[![Build and Test](https://github.com/bogliosimone/similaripy/actions/workflows/python-package.yml/badge.svg)](https://github.com/bogliosimone/similaripy/actions/workflows/python-package.yml)\n[![Publish to PyPI](https://github.com/bogliosimone/similaripy/actions/workflows/pypi-publish.yml/badge.svg)](https://github.com/bogliosimone/similaripy/actions/workflows/pypi-publish.yml)\n[![Docs Status](https://github.com/bogliosimone/similaripy/actions/workflows/deploy-docs.yml/badge.svg)](https://github.com/bogliosimone/similaripy/actions/workflows/deploy-docs.yml)\n[![Python versions](https://img.shields.io/pypi/pyversions/similaripy.svg?logo=python&logoColor=white)](https://pypi.org/project/similaripy/)\n[![License](https://img.shields.io/github/license/bogliosimone/similaripy.svg)](https://github.com/bogliosimone/similaripy/blob/master/LICENSE)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2583851.svg)](https://doi.org/10.5281/zenodo.2583851)\n\n\nHigh-performance KNN similarity functions in Python, optimized for sparse matrices.\n\nSimilariPy is primarily designed for Recommender Systems and Information Retrieval (IR) tasks, but can be applied to other domains as well.\n\nThe package also includes a set of normalization functions useful for pre-processing data before the similarity computation.\n\nThe official documentations is available at **[\ud83d\udcd8 SimilariPy Guide](https://bogliosimone.github.io/similaripy/)**\n\n## \ud83d\udd0d Similarity Functions\n\nSimilariPy provides a range of high-performance similarity functions for sparse matrices.  \nAll functions are multi-threaded and implemented in Cython + OpenMP for fast parallel computation on CSR matrixes.\n\n### Core \n\n- **Dot Product** \u2013 Simple raw inner product between vectors.\n- **Cosine** \u2013 Normalized dot product based on L2 norm.\n- **Asymmetric Cosine** \u2013 Skewed cosine similarity using an `alpha` parameter.\n- **Jaccard**, **Dice**, **Tversky** \u2013 Set-based generalized similarities.\n\n### Graph-Based\n\n- **P3\u03b1** \u2013 Graph-based similarity computed through random walk propagation with exponentiation.\n- **RP3\u03b2** \u2013 Similar to P3\u03b1 but includes popularity penalization using a `beta` parameter.\n\n### Advanced \n\n- **S-Plus** \u2013 A hybrid model combining Tversky and Cosine components, with full control over weights and smoothing.\n\nFor mathematical definitions and parameter details, see the **[\ud83d\udcd8 SimilariPy Guide](https://bogliosimone.github.io/similaripy/)**.\n\n## \ud83e\uddee Normalization Functions\n\nSimilariPy provides a suite of normalization functions for sparse matrix pre-processing.  \nAll functions are implemented in Cython and can operate in-place on CSR matrixes for maximum performance and memory efficiency.\n\n- **L1, L2** \u2013 Applies row- or column-wise normalization.\n- **TF-IDF** \u2013 Computes TF-IDF weighting with customizable term-frequency and IDF modes.\n- **BM25** \u2013 Applies classic BM25 weighting used in information retrieval.\n- **BM25+** \u2013 Variant of BM25 with additive smoothing for low-frequency terms.\n\nFor more details, check the **[\ud83d\udcd8 SimilariPy Guide](https://bogliosimone.github.io/similaripy/)**.\n\n\n## \ud83d\ude80 Getting Started\n\nHere\u2019s a minimal example to get you up and running with SimilariPy:\n\n```python\nimport similaripy as sim\nimport scipy.sparse as sps\n\n# Create a random User-Rating Matrix (URM)\nurm = sps.random(1000, 2000, density=0.025)\n\n# Normalize the URM using BM25\nurm = sim.normalization.bm25(urm)\n\n# Train an item-item cosine similarity model\nsimilarity_matrix = sim.cosine(urm.T, k=50)\n\n# Compute recommendations for user 1, 14, 8 \n# filtering out already-seen items\nrecommendations = sim.dot_product(\n    urm,\n    similarity_matrix.T,\n    k=100,\n    target_rows=[1, 14, 8],\n    filter_cols=urm\n)\n```\n\nFor a more in-depth introduction you can check the article [Item and User KNN Recommender models with similaripy in Python](https://www.stepbystepdatascience.com/knn-recommenders-python), an amazing resource from the [StepByStepDataScience](https://www.stepbystepdatascience.com) blog.\n\n## \ud83d\udce6 Installation\n\nSimilariPy can be installed from PyPI with:\n\n```cmd\npip install similaripy\n```\n\n### \ud83d\udd27 GCC Compiler - Required\n\nTo install the package and compile the Cython code, a GCC-compatible compiler with OpenMP is required.\n\n#### Ubuntu / Debian\n\nInstall the official dev-tools:\n\n```bash\nsudo apt update && sudo apt install build-essential\n```\n\n#### MacOS (Intel & Apple Silicon)\n\nInstall GCC with homebrew:\n\n```bash\nbrew install gcc\n```\n\n#### Windows\n\nInstall the official **[Visual C++ Build Tools](https://visualstudio.microsoft.com/en/visual-cpp-build-tools/)**.\n\n\u26a0\ufe0f On Windows, use the default *format_output='coo'* in all similarity functions, as *'csr'* is currently not supported.\n\n\n#### Optional Optimization: Intel MKL for Intel CPUs\n\nFor Intel CPUs, using SciPy/Numpy with MKL (Math Kernel Library) is highly recommended for best performance.\nThe easiest way to achieve this is to install them via Anaconda.\n\n## \ud83d\udce6 Requirements\n\n| Package                         | Version        |\n| --------------------------------|:--------------:|\n| numpy                           |   >= 1.21      |\n| scipy                           |   >= 1.10.1    |\n| tqdm                            |   >= 4.65.2    |\n\n## \ud83d\udcdc History\n\nThis library originated during the **[Spotify Recsys Challenge 2018](https://research.atspotify.com/publications/recsys-challenge-2018-automatic-music-playlist-continuation/)**.\n\nOur team, The Creamy Fireflies, faced major challenges computing large similarity models on a dataset with over 66 million interactions. Standard Python/Numpy solutions were too slow as a whole day was required to compute one single model.\n\nTo overcome this, I developed high-performance versions of the core similarity functions in Cython and OpenMP. Encouraged by my teammates, I open-sourced this work to help others solve similar challenges.\n\nThanks to my Creamy Fireflies friends for the support! \ud83d\ude4f \n\n## \ud83d\udcc4 License\n\nThis project is released under the MIT License.\n\n## \ud83d\udd16 Citation\n\nIf you use SimilariPy in your research, please cite:\n\n```text\n@misc{boglio_simone_similaripy,\n  author       = {Boglio Simone},\n  title        = {bogliosimone/similaripy},\n  doi          = {10.5281/zenodo.2583851},\n  url          = {https://doi.org/10.5281/zenodo.2583851}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "High-performance KNN similarity functions in Python, optimized for sparse matrices",
    "version": "0.2.5",
    "project_urls": {
        "changelog": "https://github.com/bogliosimone/similaripy/releases",
        "documentation": "https://bogliosimone.github.io/similaripy/",
        "homepage": "https://github.com/bogliosimone/similaripy",
        "repository": "https://github.com/bogliosimone/similaripy.git"
    },
    "split_keywords": [
        "similarity",
        " knn",
        " nearest neighbors",
        " collaborative filtering",
        " normalization",
        " recommender systems"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "dd7f84ca3920fd7a7cbc9fc61bfa3c7f0ec187e1d510a0d94afddc18fe4d756b",
                "md5": "673c06ef1aeccfc48d25f66ac7b8e55f",
                "sha256": "87a26fc0f8f9366f534dc3e79a5d29868ed721d864a6a5cb61f0b824c7d272f9"
            },
            "downloads": -1,
            "filename": "similaripy-0.2.5.tar.gz",
            "has_sig": false,
            "md5_digest": "673c06ef1aeccfc48d25f66ac7b8e55f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 54258,
            "upload_time": "2025-10-25T15:35:39",
            "upload_time_iso_8601": "2025-10-25T15:35:39.689384Z",
            "url": "https://files.pythonhosted.org/packages/dd/7f/84ca3920fd7a7cbc9fc61bfa3c7f0ec187e1d510a0d94afddc18fe4d756b/similaripy-0.2.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-25 15:35:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bogliosimone",
    "github_project": "similaripy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "similaripy"
}
        
Elapsed time: 0.56554s