# NBMF‑MM
[](https://github.com/siddC/nbmf_mm/actions/workflows/ci.yml)
[](./LICENSE.md)

**NBMF‑MM** is a fast, scikit‑learn‑style implementation of **mean‑parameterized Bernoulli (binary) matrix factorization** using a **Majorization–Minimization (MM)** solver.
- Two symmetric orientations:
- **`orientation="dir-beta"`** (default, *Aspect Bernoulli*): **columns of `H`** lie on the simplex; Beta prior on `W`.
- **`orientation="beta-dir"`** (*Binary ICA*): **rows of `W`** lie on the simplex; Beta prior on `H`.
- **Masked training** for matrix completion / hold‑out validation.
- Optional acceleration: **NumExpr** (elementwise ops) and **Numba** (simplex projection).
- **Projection options**: default **Duchi** simplex projection (fast) with an opt‑in **legacy “normalize”** method for parity with older behavior.
---
## Installation
From PyPI (when released):
```bash
pip install nbmf-mm
```
From source:
```bash
pip install "git+https://github.com/siddC/nbmf_mm"
```
Optional extras:
```bash
# scikit-learn integration & NNDSVD-style init (if you enable it later)
pip install "nbmf-mm[sklearn]"
# docs build stack
pip install "nbmf-mm[docs]"
```
---
## Quick Start
```python
import numpy as np
from nbmf_mm import NBMF
rng = np.random.default_rng(0)
X = (rng.random((100, 500)) < 0.25).astype(float) # binary {0,1} or probabilities in [0,1]
# Aspect Bernoulli (default): H columns on simplex; W has a Beta prior
model = NBMF(
n_components=20,
orientation="dir-beta",
alpha=1.2, beta=1.2,
random_state=0,
max_iter=2000, tol=1e-6,
# fast defaults:
projection_method="duchi", # Euclidean simplex projection (recommended)
projection_backend="auto", # prefer Numba if installed
use_numexpr=True, # use NumExpr if installed
).fit(X)
W = model.W_ # shape (n_samples, n_components)
H = model.components_ # shape (n_components, n_features)
Xhat = model.inverse_transform(W) # probabilities in (0,1)
# Transform new data using fixed components H
X_new = (rng.random((10, 500)) < 0.25).astype(float)
W_new = model.transform(X_new) # shape (10, n_components)
# Masked training / hold-out validation
mask = (rng.random(X.shape) < 0.9).astype(float) # observe 90% of entries
model = NBMF(n_components=20).fit(X, mask=mask)
print("score (−NLL per observed entry):", model.score(X, mask=mask))
print("perplexity:", model.perplexity(X, mask=mask))
```
---
## Why two orientations?
- dir-beta (Aspect Bernoulli) — H columns are on the simplex → each feature (e.g., gene) has interpretable mixture memberships across latent aspects. W carries sample‑specific propensities with a Beta prior.
- beta-dir (Binary ICA) — W rows are on the simplex; H is Beta‑constrained.
Both solve the same Bernoulli mean‑parameterized factorization with different geometric constraints; pick the one that best matches your interpretability needs.
---
## API (scikit-learn style)
- `NBMF(...).fit(X, mask=None) -> self`
- `fit_transform(X, mask=None) -> W`
- `transform(X, mask=None, max_iter=500, tol=1e-6) -> W` (estimate `W` for new `X` with learned `H` fixed)
- `inverse_transform(W) -> Xhat` (reconstructed probabilities in (0,1))
- `score(X, mask=None) -> float` (negative NLL per observed entry; higher is better)
- `perplexity(X, mask=None) -> float` (exp of average NLL per observed entry; lower is better)
## Key parameters
- `n_components` (int) — rank 𝐾
- `orientation` ∈ {`"dir-beta"`, `"beta-dir"`}.
- `alpha, beta` (float > 0) — Beta prior hyperparameters (on `W` for dir-beta, on `H` for `beta-dir`).
- `projection_method` ∈ {`"duchi"`, `"normalize"`} — default `"duchi"` (fast & stable). `"normalize"` gives legacy behavior (nonnegativity + renormalization).
- `projection_backend` ∈ {`"auto"`, `"numba"`, `"numpy"`} — backend for `"duchi"` projection.
- `use_numexpr` (bool) — use NumExpr if available.
---
## Command-line (CLI)
After installation, a console script nbmf-mm is available:
```bash
nbmf-mm fit \
--input X.npz --rank 30 \
--orientation dir-beta --alpha 1.2 --beta 1.2 \
--max-iter 2000 --tol 1e-6 --seed 0 --n-init 1 \
--mask train_mask.npz \
--out model_rank30.npz
```
This writes an `.npz` with `W`, `H`, `Xhat`, `objective_history`, and `n_iter`.
Input formats: `.npz` (expects key `arr_0`) or `.npy`. Masks are optional and must match `X` shape.
---
## Data requirements
- `X` must be in [0,1] (binary recommended; probabilistic inputs are allowed).
- `mask` (optional) must be the same shape as X, with values in [0,1] (typically {0,1}).
**Sparse inputs** (scipy.sparse) and masks are accepted and densified internally in this version.
---
## Performance notes
- The default **Duchi** projection gives an 𝑂(𝑑*log(𝑑))
O(dlogd) per‑row/column simplex projection and is accelerated with **Numba** when installed.
**NumExpr** speeds large elementwise expressions.
Both accelerations are optional and degrade gracefully if not present.
---
## Reproducibility
- Set `random_state` (int) for reproducible initialization.
- Use `n_init > 1` to run several random restarts and keep the best NLL.
---
## References
- **Simplex projection** (default):
- J. Duchi, S. Shalev‑Shwartz, Y. Singer, T. Chandra (2008).
Efficient Projections onto the ℓ₁‑Ball for Learning in High Dimensions. ICML 2008.
- W. Wang, M. Á. Carreira‑Perpiñán (2013).
Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application. arXiv:1309.1541.
- Bayesian NBMF (related, slower but fully Bayesian):
- See the `NBMF` project by alumbreras for reference implementations of Bayesian variants.
Raw data
{
"_id": null,
"home_page": null,
"name": "nbmf-mm",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.9",
"maintainer_email": null,
"keywords": "NMF, Bernoulli, matrix factorization, topic modeling, ICA, Dirichlet, Beta",
"author": null,
"author_email": "\"Siddharth M. Chauhan\" <github.chauhan.siddharth@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/cd/f1/1bb647428df9bbcdb69c07f312163118a8d3db910652fe38cc4cca65f9d2/nbmf_mm-0.1.2.tar.gz",
"platform": null,
"description": "# NBMF\u2011MM\n\n[](https://github.com/siddC/nbmf_mm/actions/workflows/ci.yml)\n[](./LICENSE.md)\n\n\n**NBMF\u2011MM** is a fast, scikit\u2011learn\u2011style implementation of **mean\u2011parameterized Bernoulli (binary) matrix factorization** using a **Majorization\u2013Minimization (MM)** solver.\n\n- Two symmetric orientations:\n - **`orientation=\"dir-beta\"`** (default, *Aspect Bernoulli*): **columns of `H`** lie on the simplex; Beta prior on `W`.\n - **`orientation=\"beta-dir\"`** (*Binary ICA*): **rows of `W`** lie on the simplex; Beta prior on `H`.\n- **Masked training** for matrix completion / hold\u2011out validation.\n- Optional acceleration: **NumExpr** (elementwise ops) and **Numba** (simplex projection).\n- **Projection options**: default **Duchi** simplex projection (fast) with an opt\u2011in **legacy \u201cnormalize\u201d** method for parity with older behavior.\n\n---\n\n## Installation\n\nFrom PyPI (when released):\n\n```bash\npip install nbmf-mm\n```\nFrom source:\n```bash\npip install \"git+https://github.com/siddC/nbmf_mm\"\n```\n\nOptional extras:\n```bash\n# scikit-learn integration & NNDSVD-style init (if you enable it later)\npip install \"nbmf-mm[sklearn]\"\n\n# docs build stack\npip install \"nbmf-mm[docs]\"\n```\n\n---\n\n## Quick Start\n\n```python\nimport numpy as np\nfrom nbmf_mm import NBMF\n\nrng = np.random.default_rng(0)\nX = (rng.random((100, 500)) < 0.25).astype(float) # binary {0,1} or probabilities in [0,1]\n\n# Aspect Bernoulli (default): H columns on simplex; W has a Beta prior\nmodel = NBMF(\n n_components=20,\n orientation=\"dir-beta\",\n alpha=1.2, beta=1.2,\n random_state=0,\n max_iter=2000, tol=1e-6,\n # fast defaults:\n projection_method=\"duchi\", # Euclidean simplex projection (recommended)\n projection_backend=\"auto\", # prefer Numba if installed\n use_numexpr=True, # use NumExpr if installed\n).fit(X)\n\nW = model.W_ # shape (n_samples, n_components)\nH = model.components_ # shape (n_components, n_features)\nXhat = model.inverse_transform(W) # probabilities in (0,1)\n\n# Transform new data using fixed components H\nX_new = (rng.random((10, 500)) < 0.25).astype(float)\nW_new = model.transform(X_new) # shape (10, n_components)\n\n# Masked training / hold-out validation\nmask = (rng.random(X.shape) < 0.9).astype(float) # observe 90% of entries\nmodel = NBMF(n_components=20).fit(X, mask=mask)\n\nprint(\"score (\u2212NLL per observed entry):\", model.score(X, mask=mask))\nprint(\"perplexity:\", model.perplexity(X, mask=mask))\n```\n\n---\n\n## Why two orientations?\n\n- dir-beta (Aspect Bernoulli) \u2014 H columns are on the simplex \u2192 each feature (e.g., gene) has interpretable mixture memberships across latent aspects. W carries sample\u2011specific propensities with a Beta prior.\n\n- beta-dir (Binary ICA) \u2014 W rows are on the simplex; H is Beta\u2011constrained.\n\nBoth solve the same Bernoulli mean\u2011parameterized factorization with different geometric constraints; pick the one that best matches your interpretability needs.\n\n---\n\n## API (scikit-learn style)\n- `NBMF(...).fit(X, mask=None) -> self`\n- `fit_transform(X, mask=None) -> W`\n- `transform(X, mask=None, max_iter=500, tol=1e-6) -> W` (estimate `W` for new `X` with learned `H` fixed)\n- `inverse_transform(W) -> Xhat` (reconstructed probabilities in (0,1))\n- `score(X, mask=None) -> float` (negative NLL per observed entry; higher is better)\n- `perplexity(X, mask=None) -> float` (exp of average NLL per observed entry; lower is better)\n\n## Key parameters\n- `n_components` (int) \u2014 rank \ud835\udc3e\n- `orientation` \u2208 {`\"dir-beta\"`, `\"beta-dir\"`}.\n- `alpha, beta` (float > 0) \u2014 Beta prior hyperparameters (on `W` for dir-beta, on `H` for `beta-dir`).\n- `projection_method` \u2208 {`\"duchi\"`, `\"normalize\"`} \u2014 default `\"duchi\"` (fast & stable). `\"normalize\"` gives legacy behavior (nonnegativity + renormalization).\n- `projection_backend` \u2208 {`\"auto\"`, `\"numba\"`, `\"numpy\"`} \u2014 backend for `\"duchi\"` projection.\n- `use_numexpr` (bool) \u2014 use NumExpr if available.\n---\n\n## Command-line (CLI)\nAfter installation, a console script nbmf-mm is available:\n```bash\nnbmf-mm fit \\\n --input X.npz --rank 30 \\\n --orientation dir-beta --alpha 1.2 --beta 1.2 \\\n --max-iter 2000 --tol 1e-6 --seed 0 --n-init 1 \\\n --mask train_mask.npz \\\n --out model_rank30.npz\n```\nThis writes an `.npz` with `W`, `H`, `Xhat`, `objective_history`, and `n_iter`.\n\nInput formats: `.npz` (expects key `arr_0`) or `.npy`. Masks are optional and must match `X` shape.\n---\n\n## Data requirements\n- `X` must be in [0,1] (binary recommended; probabilistic inputs are allowed).\n- `mask` (optional) must be the same shape as X, with values in [0,1] (typically {0,1}).\n**Sparse inputs** (scipy.sparse) and masks are accepted and densified internally in this version.\n---\n\n## Performance notes\n- The default **Duchi** projection gives an \ud835\udc42(\ud835\udc51*log\u2061(\ud835\udc51))\nO(dlogd) per\u2011row/column simplex projection and is accelerated with **Numba** when installed.\n\n**NumExpr** speeds large elementwise expressions.\n\nBoth accelerations are optional and degrade gracefully if not present.\n---\n\n## Reproducibility\n- Set `random_state` (int) for reproducible initialization.\n- Use `n_init > 1` to run several random restarts and keep the best NLL.\n---\n\n## References\n- **Simplex projection** (default):\n - J. Duchi, S. Shalev\u2011Shwartz, Y. Singer, T. Chandra (2008).\n Efficient Projections onto the \u2113\u2081\u2011Ball for Learning in High Dimensions. ICML 2008.\n \n - W. Wang, M. \u00c1. Carreira\u2011Perpi\u00f1\u00e1n (2013).\n Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application. arXiv:1309.1541.\n \n - Bayesian NBMF (related, slower but fully Bayesian):\n - See the `NBMF` project by alumbreras for reference implementations of Bayesian variants.\n",
"bugtrack_url": null,
"license": null,
"summary": "Bernoulli (binary) mean-parameterized NMF (NBMF) w/ Majorization\u2013Minimization (MM)",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/siddC/nbmf_mm",
"Issues": "https://github.com/siddC/nbmf_mm/issues",
"Repository": "https://github.com/siddC/nbmf_mm"
},
"split_keywords": [
"nmf",
" bernoulli",
" matrix factorization",
" topic modeling",
" ica",
" dirichlet",
" beta"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "084252bec2e73ac8c1ad5444de7175cd58e99e956c99d76fa5089ed3c68d0ed0",
"md5": "fcaf21959656f20a47c96b67c4f87d2f",
"sha256": "38535959c646de8554c887ed3553f864ebb59c646e60a33babc43cfa68722a95"
},
"downloads": -1,
"filename": "nbmf_mm-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fcaf21959656f20a47c96b67c4f87d2f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.9",
"size": 16109,
"upload_time": "2025-08-17T23:29:33",
"upload_time_iso_8601": "2025-08-17T23:29:33.823315Z",
"url": "https://files.pythonhosted.org/packages/08/42/52bec2e73ac8c1ad5444de7175cd58e99e956c99d76fa5089ed3c68d0ed0/nbmf_mm-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "cdf11bb647428df9bbcdb69c07f312163118a8d3db910652fe38cc4cca65f9d2",
"md5": "d3761ead3ea0914b063a6ed629c24675",
"sha256": "5b630e637a9702475d115dc0af0c29e7b3d0fb28044cc0564ffaa35b88fb80d0"
},
"downloads": -1,
"filename": "nbmf_mm-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "d3761ead3ea0914b063a6ed629c24675",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.9",
"size": 63703,
"upload_time": "2025-08-17T23:29:35",
"upload_time_iso_8601": "2025-08-17T23:29:35.394550Z",
"url": "https://files.pythonhosted.org/packages/cd/f1/1bb647428df9bbcdb69c07f312163118a8d3db910652fe38cc4cca65f9d2/nbmf_mm-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-17 23:29:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "siddC",
"github_project": "nbmf_mm",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "nbmf-mm"
}