nbmf-mm


Namenbmf-mm JSON
Version 0.1.4 PyPI version JSON
download
home_pageNone
SummaryBernoulli (binary) mean-parameterized NMF (NBMF) w/ Majorization–Minimization (MM)
upload_time2025-09-05 04:46:16
maintainerNone
docs_urlNone
authorNone
requires_python<3.13,>=3.9
licenseNone
keywords nmf bernoulli matrix factorization topic modeling ica dirichlet beta
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # nbmf‑mm — Mean‑parameterized Bernoulli NMF via Majorization–Minimization

[![CI](https://github.com/siddC/nbmf_mm/actions/workflows/ci.yml/badge.svg)](https://github.com/siddC/nbmf_mm/actions/workflows/ci.yml)
[![PyPI version](https://img.shields.io/pypi/v/nbmf-mm.svg)](https://pypi.org/project/nbmf-mm/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE.md)

**nbmf‑mm** implements *mean‑parameterized* Bernoulli matrix factorization
\(\Theta = W H \in (0,1)^{M\times N}\) with a **Majorization–Minimization (MM)**
solver, following **Magron & Févotte (2022)**. It exposes a scikit‑learn‑style
API and two symmetric orientations:

- **`orientation="beta-dir"` (default; Original Paper Setting)**  
  **W** rows sum to 1 (simplex constraint); **H** is continuous in [0,1] with Beta prior  
  This matches Magron & Févotte (2022)

- **`orientation="dir-beta"` (Symmetric Alternative)**  
  **W** is continuous in [0,1] with Beta prior; **H** columns sum to 1 (simplex constraint)  
  Mathematically equivalent to `"beta-dir"` on `V.T`

**Projection choices** for the simplex‑constrained factor:

- **`projection_method="normalize"` (default; theory‑first)** – paper‑exact
  multiplicative step with the **/N** (or **/M**) normalizer that preserves the
  simplex in exact arithmetic and enjoys the classical MM monotonicity
  guarantee.

- **`projection_method="duchi"` (fast)** – Euclidean projection to the simplex
  (Duchi et al., 2008) performed **after** the multiplicative step. This is
  typically near‑identical to `"normalize"` numerically, but the formal MM
  monotonicity guarantee applies to `"normalize"`.

> Masked training (matrix completion) is supported: only observed entries
> contribute to the likelihood and updates. In the simplex steps the paper‑exact
> **/N** (or **/M**) normalizer is naturally replaced by **per‑row** (or
> **per‑column**) observed counts, preserving the simplex under masking.

---

## Installation

```bash
pip install nbmf-mm
```

From source:
```bash
pip install "git+https://github.com/siddC/nbmf_mm"
```

Optional extras:
```bash
# scikit-learn integration & NNDSVD-style init (if you enable it later)
pip install "nbmf-mm[sklearn]"

# docs build stack
pip install "nbmf-mm[docs]"
```

---

## Quick Start

```python
import numpy as np
from nbmf_mm import NBMF

rng = np.random.default_rng(0)
X = (rng.random((100, 500)) < 0.25).astype(float)   # binary {0,1} or probabilities in [0,1]

# Theory-first default: monotone MM with paper-exact normalizer
model = NBMF(n_components=6, orientation="beta-dir",
             alpha=1.2, beta=1.2, random_state=0).fit(X)

W = model.W_                 # shape (n_samples, n_components)
H = model.components_        # shape (n_components, n_features)
Xhat = model.inverse_transform(W)  # probabilities in (0,1)

# Transform new data using fixed components H
Y_new = (rng.random((10, 500)) < 0.25).astype(float)
W_new = model.transform(Y_new)     # shape (10, n_components)

# Masked training / hold-out validation
mask = (rng.random(X.shape) < 0.9).astype(float)  # observe 90% of entries
model = NBMF(n_components=20).fit(X, mask=mask)

print("score (−NLL per observed entry):", model.score(X, mask=mask))
print("perplexity:", model.perplexity(X, mask=mask))
```

To use the fast projection alternative:
```python
model = NBMF(n_components=6, orientation="beta-dir",
             projection_method="duchi", random_state=0).fit(X)
```

---

## Mathematical Formulations

Both orientations solve the Bernoulli likelihood: `V ~ Bernoulli(sigmoid(W @ H))`

### Orientation: `beta-dir` (Matches paper's primary formulation)
- **W**: Rows lie on probability simplex (sum to 1)
- **H**: Continuous in [0,1] with Beta(α, β) prior
- Use this to reproduce paper experiments

### Orientation: `dir-beta` (Symmetric formulation)
- **W**: Continuous in [0,1] with Beta(α, β) prior  
- **H**: Columns lie on probability simplex (sum to 1)
- Mathematically equivalent to `beta-dir` on `V.T`

## Why two orientations?

Different interpretability needs:
- **beta-dir**: W rows are mixture weights over latent factors; H captures factor-feature associations
- **dir-beta**: H columns are mixture weights over latent aspects; W captures sample-aspect propensities

Both solve the same mean-parameterized factorization with symmetric geometric constraints.

---

## API Hightlihts

NBMF(
  n_components: int,
  orientation: {"dir-beta","beta-dir"} = "beta-dir",
  alpha: float = 1.2,
  beta: float = 1.2,
  projection_method: {"normalize","duchi"} = "normalize",
  max_iter: int = 2000,
  tol: float = 1e-6,
  random_state: int | None = None,
  n_init: int = 1,
  # accepted for compatibility (currently unused in core Python impl)
  use_numexpr: bool = False,
  use_numba: bool = False,
  projection_backend: str = "auto",
)

---

## Reproducibility
- Set `random_state` (int) for reproducible initialization.
- Use `n_init > 1` to run several random restarts and keep the best NLL.

## Reproducing Magron & Févotte (2022)

To reproduce the results from the original paper, use these settings:

```python
from nbmf_mm import NBMF

# Use beta-dir to match paper exactly
model = NBMF(
    n_components=10,
    orientation="beta-dir",
    alpha=1.2,
    beta=1.2,
    max_iter=500,
    tol=1e-5
)
model.fit(X)

# W rows will sum to 1, H will be continuous
```

Run the reproduction scripts:

```bash
python examples/reproduce_magron2022.py
python examples/display_figures.py
```

---

## References
- **Simplex projection** (default):
  - J. Duchi, S. Shalev‑Shwartz, Y. Singer, T. Chandra (2008).
  Efficient Projections onto the ℓ₁‑Ball for Learning in High Dimensions. ICML 2008.
  
  - W. Wang, M. Á. Carreira‑Perpiñán (2013).
  Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application. arXiv:1309.1541.
  
  - Bayesian NBMF (related, slower but fully Bayesian):
    - See the `NBMF` project by alumbreras for reference implementations of Bayesian variants.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "nbmf-mm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.9",
    "maintainer_email": null,
    "keywords": "NMF, Bernoulli, matrix factorization, topic modeling, ICA, Dirichlet, Beta",
    "author": null,
    "author_email": "\"Siddharth M. Chauhan\" <github.chauhan.siddharth@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/11/6c/29db02cb083752decb34b3a093d3e0a0353236eca6d57baadca1d88a6253/nbmf_mm-0.1.4.tar.gz",
    "platform": null,
    "description": "# nbmf\u2011mm \u2014 Mean\u2011parameterized Bernoulli NMF via Majorization\u2013Minimization\n\n[![CI](https://github.com/siddC/nbmf_mm/actions/workflows/ci.yml/badge.svg)](https://github.com/siddC/nbmf_mm/actions/workflows/ci.yml)\n[![PyPI version](https://img.shields.io/pypi/v/nbmf-mm.svg)](https://pypi.org/project/nbmf-mm/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE.md)\n\n**nbmf\u2011mm** implements *mean\u2011parameterized* Bernoulli matrix factorization\n\\(\\Theta = W H \\in (0,1)^{M\\times N}\\) with a **Majorization\u2013Minimization (MM)**\nsolver, following **Magron & F\u00e9votte (2022)**. It exposes a scikit\u2011learn\u2011style\nAPI and two symmetric orientations:\n\n- **`orientation=\"beta-dir\"` (default; Original Paper Setting)**  \n  **W** rows sum to 1 (simplex constraint); **H** is continuous in [0,1] with Beta prior  \n  This matches Magron & F\u00e9votte (2022)\n\n- **`orientation=\"dir-beta\"` (Symmetric Alternative)**  \n  **W** is continuous in [0,1] with Beta prior; **H** columns sum to 1 (simplex constraint)  \n  Mathematically equivalent to `\"beta-dir\"` on `V.T`\n\n**Projection choices** for the simplex\u2011constrained factor:\n\n- **`projection_method=\"normalize\"` (default; theory\u2011first)** \u2013 paper\u2011exact\n  multiplicative step with the **/N** (or **/M**) normalizer that preserves the\n  simplex in exact arithmetic and enjoys the classical MM monotonicity\n  guarantee.\n\n- **`projection_method=\"duchi\"` (fast)** \u2013 Euclidean projection to the simplex\n  (Duchi et\u202fal.,\u00a02008) performed **after** the multiplicative step. This is\n  typically near\u2011identical to `\"normalize\"` numerically, but the formal MM\n  monotonicity guarantee applies to `\"normalize\"`.\n\n> Masked training (matrix completion) is supported: only observed entries\n> contribute to the likelihood and updates. In the simplex steps the paper\u2011exact\n> **/N** (or **/M**) normalizer is naturally replaced by **per\u2011row** (or\n> **per\u2011column**) observed counts, preserving the simplex under masking.\n\n---\n\n## Installation\n\n```bash\npip install nbmf-mm\n```\n\nFrom source:\n```bash\npip install \"git+https://github.com/siddC/nbmf_mm\"\n```\n\nOptional extras:\n```bash\n# scikit-learn integration & NNDSVD-style init (if you enable it later)\npip install \"nbmf-mm[sklearn]\"\n\n# docs build stack\npip install \"nbmf-mm[docs]\"\n```\n\n---\n\n## Quick Start\n\n```python\nimport numpy as np\nfrom nbmf_mm import NBMF\n\nrng = np.random.default_rng(0)\nX = (rng.random((100, 500)) < 0.25).astype(float)   # binary {0,1} or probabilities in [0,1]\n\n# Theory-first default: monotone MM with paper-exact normalizer\nmodel = NBMF(n_components=6, orientation=\"beta-dir\",\n             alpha=1.2, beta=1.2, random_state=0).fit(X)\n\nW = model.W_                 # shape (n_samples, n_components)\nH = model.components_        # shape (n_components, n_features)\nXhat = model.inverse_transform(W)  # probabilities in (0,1)\n\n# Transform new data using fixed components H\nY_new = (rng.random((10, 500)) < 0.25).astype(float)\nW_new = model.transform(Y_new)     # shape (10, n_components)\n\n# Masked training / hold-out validation\nmask = (rng.random(X.shape) < 0.9).astype(float)  # observe 90% of entries\nmodel = NBMF(n_components=20).fit(X, mask=mask)\n\nprint(\"score (\u2212NLL per observed entry):\", model.score(X, mask=mask))\nprint(\"perplexity:\", model.perplexity(X, mask=mask))\n```\n\nTo use the fast projection alternative:\n```python\nmodel = NBMF(n_components=6, orientation=\"beta-dir\",\n             projection_method=\"duchi\", random_state=0).fit(X)\n```\n\n---\n\n## Mathematical Formulations\n\nBoth orientations solve the Bernoulli likelihood: `V ~ Bernoulli(sigmoid(W @ H))`\n\n### Orientation: `beta-dir` (Matches paper's primary formulation)\n- **W**: Rows lie on probability simplex (sum to 1)\n- **H**: Continuous in [0,1] with Beta(\u03b1, \u03b2) prior\n- Use this to reproduce paper experiments\n\n### Orientation: `dir-beta` (Symmetric formulation)\n- **W**: Continuous in [0,1] with Beta(\u03b1, \u03b2) prior  \n- **H**: Columns lie on probability simplex (sum to 1)\n- Mathematically equivalent to `beta-dir` on `V.T`\n\n## Why two orientations?\n\nDifferent interpretability needs:\n- **beta-dir**: W rows are mixture weights over latent factors; H captures factor-feature associations\n- **dir-beta**: H columns are mixture weights over latent aspects; W captures sample-aspect propensities\n\nBoth solve the same mean-parameterized factorization with symmetric geometric constraints.\n\n---\n\n## API Hightlihts\n\nNBMF(\n  n_components: int,\n  orientation: {\"dir-beta\",\"beta-dir\"} = \"beta-dir\",\n  alpha: float = 1.2,\n  beta: float = 1.2,\n  projection_method: {\"normalize\",\"duchi\"} = \"normalize\",\n  max_iter: int = 2000,\n  tol: float = 1e-6,\n  random_state: int | None = None,\n  n_init: int = 1,\n  # accepted for compatibility (currently unused in core Python impl)\n  use_numexpr: bool = False,\n  use_numba: bool = False,\n  projection_backend: str = \"auto\",\n)\n\n---\n\n## Reproducibility\n- Set `random_state` (int) for reproducible initialization.\n- Use `n_init > 1` to run several random restarts and keep the best NLL.\n\n## Reproducing Magron & F\u00e9votte (2022)\n\nTo reproduce the results from the original paper, use these settings:\n\n```python\nfrom nbmf_mm import NBMF\n\n# Use beta-dir to match paper exactly\nmodel = NBMF(\n    n_components=10,\n    orientation=\"beta-dir\",\n    alpha=1.2,\n    beta=1.2,\n    max_iter=500,\n    tol=1e-5\n)\nmodel.fit(X)\n\n# W rows will sum to 1, H will be continuous\n```\n\nRun the reproduction scripts:\n\n```bash\npython examples/reproduce_magron2022.py\npython examples/display_figures.py\n```\n\n---\n\n## References\n- **Simplex projection** (default):\n  - J. Duchi, S. Shalev\u2011Shwartz, Y. Singer, T. Chandra (2008).\n  Efficient Projections onto the \u2113\u2081\u2011Ball for Learning in High Dimensions. ICML 2008.\n  \n  - W. Wang, M. \u00c1. Carreira\u2011Perpi\u00f1\u00e1n (2013).\n  Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application. arXiv:1309.1541.\n  \n  - Bayesian NBMF (related, slower but fully Bayesian):\n    - See the `NBMF` project by alumbreras for reference implementations of Bayesian variants.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Bernoulli (binary) mean-parameterized NMF (NBMF) w/ Majorization\u2013Minimization (MM)",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/siddC/nbmf_mm",
        "Issues": "https://github.com/siddC/nbmf_mm/issues",
        "Repository": "https://github.com/siddC/nbmf_mm"
    },
    "split_keywords": [
        "nmf",
        " bernoulli",
        " matrix factorization",
        " topic modeling",
        " ica",
        " dirichlet",
        " beta"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "deee90a150a81e3be91811b730fa4608315a5b02fc803a70fd33e21768f4a2de",
                "md5": "eb9c6a7ba33730d1061132d1e52fa21f",
                "sha256": "8b9ac6686d966341607a7bcca41e2c1f410ecfc16fa83a67016abc205fef1bcd"
            },
            "downloads": -1,
            "filename": "nbmf_mm-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "eb9c6a7ba33730d1061132d1e52fa21f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.9",
            "size": 11893,
            "upload_time": "2025-09-05T04:46:13",
            "upload_time_iso_8601": "2025-09-05T04:46:13.640592Z",
            "url": "https://files.pythonhosted.org/packages/de/ee/90a150a81e3be91811b730fa4608315a5b02fc803a70fd33e21768f4a2de/nbmf_mm-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "116c29db02cb083752decb34b3a093d3e0a0353236eca6d57baadca1d88a6253",
                "md5": "4db4fb7c2b76eb939aa3ab458c10bb86",
                "sha256": "abedae25f1a563190996325025827d38362f47b9563c61d98fb85fbff19206fb"
            },
            "downloads": -1,
            "filename": "nbmf_mm-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "4db4fb7c2b76eb939aa3ab458c10bb86",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.9",
            "size": 13997149,
            "upload_time": "2025-09-05T04:46:16",
            "upload_time_iso_8601": "2025-09-05T04:46:16.017153Z",
            "url": "https://files.pythonhosted.org/packages/11/6c/29db02cb083752decb34b3a093d3e0a0353236eca6d57baadca1d88a6253/nbmf_mm-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-05 04:46:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "siddC",
    "github_project": "nbmf_mm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "nbmf-mm"
}
        
Elapsed time: 2.09432s