nbmf-mm


Namenbmf-mm JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryBernoulli (binary) mean-parameterized NMF (NBMF) w/ Majorization–Minimization (MM)
upload_time2025-08-17 23:29:35
maintainerNone
docs_urlNone
authorNone
requires_python<3.13,>=3.9
licenseNone
keywords nmf bernoulli matrix factorization topic modeling ica dirichlet beta
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # NBMF‑MM

[![CI](https://github.com/siddC/nbmf_mm/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/siddC/nbmf_mm/actions/workflows/ci.yml)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE.md)
![Python versions](https://img.shields.io/badge/python-3.9–3.12-blue)

**NBMF‑MM** is a fast, scikit‑learn‑style implementation of **mean‑parameterized Bernoulli (binary) matrix factorization** using a **Majorization–Minimization (MM)** solver.

- Two symmetric orientations:
  - **`orientation="dir-beta"`** (default, *Aspect Bernoulli*): **columns of `H`** lie on the simplex; Beta prior on `W`.
  - **`orientation="beta-dir"`** (*Binary ICA*): **rows of `W`** lie on the simplex; Beta prior on `H`.
- **Masked training** for matrix completion / hold‑out validation.
- Optional acceleration: **NumExpr** (elementwise ops) and **Numba** (simplex projection).
- **Projection options**: default **Duchi** simplex projection (fast) with an opt‑in **legacy “normalize”** method for parity with older behavior.

---

## Installation

From PyPI (when released):

```bash
pip install nbmf-mm
```
From source:
```bash
pip install "git+https://github.com/siddC/nbmf_mm"
```

Optional extras:
```bash
# scikit-learn integration & NNDSVD-style init (if you enable it later)
pip install "nbmf-mm[sklearn]"

# docs build stack
pip install "nbmf-mm[docs]"
```

---

## Quick Start

```python
import numpy as np
from nbmf_mm import NBMF

rng = np.random.default_rng(0)
X = (rng.random((100, 500)) < 0.25).astype(float)   # binary {0,1} or probabilities in [0,1]

# Aspect Bernoulli (default): H columns on simplex; W has a Beta prior
model = NBMF(
    n_components=20,
    orientation="dir-beta",
    alpha=1.2, beta=1.2,
    random_state=0,
    max_iter=2000, tol=1e-6,
    # fast defaults:
    projection_method="duchi",      # Euclidean simplex projection (recommended)
    projection_backend="auto",      # prefer Numba if installed
    use_numexpr=True,               # use NumExpr if installed
).fit(X)

W = model.W_                 # shape (n_samples, n_components)
H = model.components_        # shape (n_components, n_features)
Xhat = model.inverse_transform(W)  # probabilities in (0,1)

# Transform new data using fixed components H
X_new = (rng.random((10, 500)) < 0.25).astype(float)
W_new = model.transform(X_new)     # shape (10, n_components)

# Masked training / hold-out validation
mask = (rng.random(X.shape) < 0.9).astype(float)  # observe 90% of entries
model = NBMF(n_components=20).fit(X, mask=mask)

print("score (−NLL per observed entry):", model.score(X, mask=mask))
print("perplexity:", model.perplexity(X, mask=mask))
```

---

## Why two orientations?

- dir-beta (Aspect Bernoulli) — H columns are on the simplex → each feature (e.g., gene) has interpretable mixture memberships across latent aspects. W carries sample‑specific propensities with a Beta prior.

- beta-dir (Binary ICA) — W rows are on the simplex; H is Beta‑constrained.

Both solve the same Bernoulli mean‑parameterized factorization with different geometric constraints; pick the one that best matches your interpretability needs.

---

## API (scikit-learn style)
- `NBMF(...).fit(X, mask=None) -> self`
- `fit_transform(X, mask=None) -> W`
- `transform(X, mask=None, max_iter=500, tol=1e-6) -> W` (estimate `W` for new `X` with learned `H` fixed)
- `inverse_transform(W) -> Xhat` (reconstructed probabilities in (0,1))
- `score(X, mask=None) -> float` (negative NLL per observed entry; higher is better)
- `perplexity(X, mask=None) -> float` (exp of average NLL per observed entry; lower is better)

## Key parameters
- `n_components` (int) — rank 𝐾
- `orientation` ∈ {`"dir-beta"`, `"beta-dir"`}.
- `alpha, beta` (float > 0) — Beta prior hyperparameters (on `W` for dir-beta, on `H` for `beta-dir`).
- `projection_method` ∈ {`"duchi"`, `"normalize"`} — default `"duchi"` (fast & stable). `"normalize"` gives legacy behavior (nonnegativity + renormalization).
- `projection_backend` ∈ {`"auto"`, `"numba"`, `"numpy"`} — backend for `"duchi"` projection.
- `use_numexpr` (bool) — use NumExpr if available.
---

## Command-line (CLI)
After installation, a console script nbmf-mm is available:
```bash
nbmf-mm fit \
  --input X.npz --rank 30 \
  --orientation dir-beta --alpha 1.2 --beta 1.2 \
  --max-iter 2000 --tol 1e-6 --seed 0 --n-init 1 \
  --mask train_mask.npz \
  --out model_rank30.npz
```
This writes an `.npz` with `W`, `H`, `Xhat`, `objective_history`, and `n_iter`.

Input formats: `.npz` (expects key `arr_0`) or `.npy`. Masks are optional and must match `X` shape.
---

## Data requirements
- `X` must be in [0,1] (binary recommended; probabilistic inputs are allowed).
- `mask` (optional) must be the same shape as X, with values in [0,1] (typically {0,1}).
**Sparse inputs** (scipy.sparse) and masks are accepted and densified internally in this version.
---

## Performance notes
- The default **Duchi** projection gives an 𝑂(𝑑*log⁡(𝑑))
O(dlogd) per‑row/column simplex projection and is accelerated with **Numba** when installed.

**NumExpr** speeds large elementwise expressions.

Both accelerations are optional and degrade gracefully if not present.
---

## Reproducibility
- Set `random_state` (int) for reproducible initialization.
- Use `n_init > 1` to run several random restarts and keep the best NLL.
---

## References
- **Simplex projection** (default):
  - J. Duchi, S. Shalev‑Shwartz, Y. Singer, T. Chandra (2008).
  Efficient Projections onto the ℓ₁‑Ball for Learning in High Dimensions. ICML 2008.
  
  - W. Wang, M. Á. Carreira‑Perpiñán (2013).
  Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application. arXiv:1309.1541.
  
  - Bayesian NBMF (related, slower but fully Bayesian):
    - See the `NBMF` project by alumbreras for reference implementations of Bayesian variants.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "nbmf-mm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.9",
    "maintainer_email": null,
    "keywords": "NMF, Bernoulli, matrix factorization, topic modeling, ICA, Dirichlet, Beta",
    "author": null,
    "author_email": "\"Siddharth M. Chauhan\" <github.chauhan.siddharth@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/cd/f1/1bb647428df9bbcdb69c07f312163118a8d3db910652fe38cc4cca65f9d2/nbmf_mm-0.1.2.tar.gz",
    "platform": null,
    "description": "# NBMF\u2011MM\n\n[![CI](https://github.com/siddC/nbmf_mm/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/siddC/nbmf_mm/actions/workflows/ci.yml)\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE.md)\n![Python versions](https://img.shields.io/badge/python-3.9\u20133.12-blue)\n\n**NBMF\u2011MM** is a fast, scikit\u2011learn\u2011style implementation of **mean\u2011parameterized Bernoulli (binary) matrix factorization** using a **Majorization\u2013Minimization (MM)** solver.\n\n- Two symmetric orientations:\n  - **`orientation=\"dir-beta\"`** (default, *Aspect Bernoulli*): **columns of `H`** lie on the simplex; Beta prior on `W`.\n  - **`orientation=\"beta-dir\"`** (*Binary ICA*): **rows of `W`** lie on the simplex; Beta prior on `H`.\n- **Masked training** for matrix completion / hold\u2011out validation.\n- Optional acceleration: **NumExpr** (elementwise ops) and **Numba** (simplex projection).\n- **Projection options**: default **Duchi** simplex projection (fast) with an opt\u2011in **legacy \u201cnormalize\u201d** method for parity with older behavior.\n\n---\n\n## Installation\n\nFrom PyPI (when released):\n\n```bash\npip install nbmf-mm\n```\nFrom source:\n```bash\npip install \"git+https://github.com/siddC/nbmf_mm\"\n```\n\nOptional extras:\n```bash\n# scikit-learn integration & NNDSVD-style init (if you enable it later)\npip install \"nbmf-mm[sklearn]\"\n\n# docs build stack\npip install \"nbmf-mm[docs]\"\n```\n\n---\n\n## Quick Start\n\n```python\nimport numpy as np\nfrom nbmf_mm import NBMF\n\nrng = np.random.default_rng(0)\nX = (rng.random((100, 500)) < 0.25).astype(float)   # binary {0,1} or probabilities in [0,1]\n\n# Aspect Bernoulli (default): H columns on simplex; W has a Beta prior\nmodel = NBMF(\n    n_components=20,\n    orientation=\"dir-beta\",\n    alpha=1.2, beta=1.2,\n    random_state=0,\n    max_iter=2000, tol=1e-6,\n    # fast defaults:\n    projection_method=\"duchi\",      # Euclidean simplex projection (recommended)\n    projection_backend=\"auto\",      # prefer Numba if installed\n    use_numexpr=True,               # use NumExpr if installed\n).fit(X)\n\nW = model.W_                 # shape (n_samples, n_components)\nH = model.components_        # shape (n_components, n_features)\nXhat = model.inverse_transform(W)  # probabilities in (0,1)\n\n# Transform new data using fixed components H\nX_new = (rng.random((10, 500)) < 0.25).astype(float)\nW_new = model.transform(X_new)     # shape (10, n_components)\n\n# Masked training / hold-out validation\nmask = (rng.random(X.shape) < 0.9).astype(float)  # observe 90% of entries\nmodel = NBMF(n_components=20).fit(X, mask=mask)\n\nprint(\"score (\u2212NLL per observed entry):\", model.score(X, mask=mask))\nprint(\"perplexity:\", model.perplexity(X, mask=mask))\n```\n\n---\n\n## Why two orientations?\n\n- dir-beta (Aspect Bernoulli) \u2014 H columns are on the simplex \u2192 each feature (e.g., gene) has interpretable mixture memberships across latent aspects. W carries sample\u2011specific propensities with a Beta prior.\n\n- beta-dir (Binary ICA) \u2014 W rows are on the simplex; H is Beta\u2011constrained.\n\nBoth solve the same Bernoulli mean\u2011parameterized factorization with different geometric constraints; pick the one that best matches your interpretability needs.\n\n---\n\n## API (scikit-learn style)\n- `NBMF(...).fit(X, mask=None) -> self`\n- `fit_transform(X, mask=None) -> W`\n- `transform(X, mask=None, max_iter=500, tol=1e-6) -> W` (estimate `W` for new `X` with learned `H` fixed)\n- `inverse_transform(W) -> Xhat` (reconstructed probabilities in (0,1))\n- `score(X, mask=None) -> float` (negative NLL per observed entry; higher is better)\n- `perplexity(X, mask=None) -> float` (exp of average NLL per observed entry; lower is better)\n\n## Key parameters\n- `n_components` (int) \u2014 rank \ud835\udc3e\n- `orientation` \u2208 {`\"dir-beta\"`, `\"beta-dir\"`}.\n- `alpha, beta` (float > 0) \u2014 Beta prior hyperparameters (on `W` for dir-beta, on `H` for `beta-dir`).\n- `projection_method` \u2208 {`\"duchi\"`, `\"normalize\"`} \u2014 default `\"duchi\"` (fast & stable). `\"normalize\"` gives legacy behavior (nonnegativity + renormalization).\n- `projection_backend` \u2208 {`\"auto\"`, `\"numba\"`, `\"numpy\"`} \u2014 backend for `\"duchi\"` projection.\n- `use_numexpr` (bool) \u2014 use NumExpr if available.\n---\n\n## Command-line (CLI)\nAfter installation, a console script nbmf-mm is available:\n```bash\nnbmf-mm fit \\\n  --input X.npz --rank 30 \\\n  --orientation dir-beta --alpha 1.2 --beta 1.2 \\\n  --max-iter 2000 --tol 1e-6 --seed 0 --n-init 1 \\\n  --mask train_mask.npz \\\n  --out model_rank30.npz\n```\nThis writes an `.npz` with `W`, `H`, `Xhat`, `objective_history`, and `n_iter`.\n\nInput formats: `.npz` (expects key `arr_0`) or `.npy`. Masks are optional and must match `X` shape.\n---\n\n## Data requirements\n- `X` must be in [0,1] (binary recommended; probabilistic inputs are allowed).\n- `mask` (optional) must be the same shape as X, with values in [0,1] (typically {0,1}).\n**Sparse inputs** (scipy.sparse) and masks are accepted and densified internally in this version.\n---\n\n## Performance notes\n- The default **Duchi** projection gives an \ud835\udc42(\ud835\udc51*log\u2061(\ud835\udc51))\nO(dlogd) per\u2011row/column simplex projection and is accelerated with **Numba** when installed.\n\n**NumExpr** speeds large elementwise expressions.\n\nBoth accelerations are optional and degrade gracefully if not present.\n---\n\n## Reproducibility\n- Set `random_state` (int) for reproducible initialization.\n- Use `n_init > 1` to run several random restarts and keep the best NLL.\n---\n\n## References\n- **Simplex projection** (default):\n  - J. Duchi, S. Shalev\u2011Shwartz, Y. Singer, T. Chandra (2008).\n  Efficient Projections onto the \u2113\u2081\u2011Ball for Learning in High Dimensions. ICML 2008.\n  \n  - W. Wang, M. \u00c1. Carreira\u2011Perpi\u00f1\u00e1n (2013).\n  Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application. arXiv:1309.1541.\n  \n  - Bayesian NBMF (related, slower but fully Bayesian):\n    - See the `NBMF` project by alumbreras for reference implementations of Bayesian variants.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Bernoulli (binary) mean-parameterized NMF (NBMF) w/ Majorization\u2013Minimization (MM)",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/siddC/nbmf_mm",
        "Issues": "https://github.com/siddC/nbmf_mm/issues",
        "Repository": "https://github.com/siddC/nbmf_mm"
    },
    "split_keywords": [
        "nmf",
        " bernoulli",
        " matrix factorization",
        " topic modeling",
        " ica",
        " dirichlet",
        " beta"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "084252bec2e73ac8c1ad5444de7175cd58e99e956c99d76fa5089ed3c68d0ed0",
                "md5": "fcaf21959656f20a47c96b67c4f87d2f",
                "sha256": "38535959c646de8554c887ed3553f864ebb59c646e60a33babc43cfa68722a95"
            },
            "downloads": -1,
            "filename": "nbmf_mm-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fcaf21959656f20a47c96b67c4f87d2f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.9",
            "size": 16109,
            "upload_time": "2025-08-17T23:29:33",
            "upload_time_iso_8601": "2025-08-17T23:29:33.823315Z",
            "url": "https://files.pythonhosted.org/packages/08/42/52bec2e73ac8c1ad5444de7175cd58e99e956c99d76fa5089ed3c68d0ed0/nbmf_mm-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cdf11bb647428df9bbcdb69c07f312163118a8d3db910652fe38cc4cca65f9d2",
                "md5": "d3761ead3ea0914b063a6ed629c24675",
                "sha256": "5b630e637a9702475d115dc0af0c29e7b3d0fb28044cc0564ffaa35b88fb80d0"
            },
            "downloads": -1,
            "filename": "nbmf_mm-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "d3761ead3ea0914b063a6ed629c24675",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.9",
            "size": 63703,
            "upload_time": "2025-08-17T23:29:35",
            "upload_time_iso_8601": "2025-08-17T23:29:35.394550Z",
            "url": "https://files.pythonhosted.org/packages/cd/f1/1bb647428df9bbcdb69c07f312163118a8d3db910652fe38cc4cca65f9d2/nbmf_mm-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-17 23:29:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "siddC",
    "github_project": "nbmf_mm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "nbmf-mm"
}
        
Elapsed time: 0.62673s