# nbmf‑mm — Mean‑parameterized Bernoulli NMF via Majorization–Minimization
[](https://github.com/siddC/nbmf_mm/actions/workflows/ci.yml)
[](https://pypi.org/project/nbmf-mm/)
[](LICENSE.md)
**nbmf‑mm** implements *mean‑parameterized* Bernoulli matrix factorization
\(\Theta = W H \in (0,1)^{M\times N}\) with a **Majorization–Minimization (MM)**
solver, following **Magron & Févotte (2022)**. It exposes a scikit‑learn‑style
API and two symmetric orientations:
- **`orientation="beta-dir"` (default; Original Paper Setting)**
**W** rows sum to 1 (simplex constraint); **H** is continuous in [0,1] with Beta prior
This matches Magron & Févotte (2022)
- **`orientation="dir-beta"` (Symmetric Alternative)**
**W** is continuous in [0,1] with Beta prior; **H** columns sum to 1 (simplex constraint)
Mathematically equivalent to `"beta-dir"` on `V.T`
**Projection choices** for the simplex‑constrained factor:
- **`projection_method="normalize"` (default; theory‑first)** – paper‑exact
multiplicative step with the **/N** (or **/M**) normalizer that preserves the
simplex in exact arithmetic and enjoys the classical MM monotonicity
guarantee.
- **`projection_method="duchi"` (fast)** – Euclidean projection to the simplex
(Duchi et al., 2008) performed **after** the multiplicative step. This is
typically near‑identical to `"normalize"` numerically, but the formal MM
monotonicity guarantee applies to `"normalize"`.
> Masked training (matrix completion) is supported: only observed entries
> contribute to the likelihood and updates. In the simplex steps the paper‑exact
> **/N** (or **/M**) normalizer is naturally replaced by **per‑row** (or
> **per‑column**) observed counts, preserving the simplex under masking.
---
## Installation
```bash
pip install nbmf-mm
```
From source:
```bash
pip install "git+https://github.com/siddC/nbmf_mm"
```
Optional extras:
```bash
# scikit-learn integration & NNDSVD-style init (if you enable it later)
pip install "nbmf-mm[sklearn]"
# docs build stack
pip install "nbmf-mm[docs]"
```
---
## Quick Start
```python
import numpy as np
from nbmf_mm import NBMF
rng = np.random.default_rng(0)
X = (rng.random((100, 500)) < 0.25).astype(float) # binary {0,1} or probabilities in [0,1]
# Theory-first default: monotone MM with paper-exact normalizer
model = NBMF(n_components=6, orientation="beta-dir",
alpha=1.2, beta=1.2, random_state=0).fit(X)
W = model.W_ # shape (n_samples, n_components)
H = model.components_ # shape (n_components, n_features)
Xhat = model.inverse_transform(W) # probabilities in (0,1)
# Transform new data using fixed components H
Y_new = (rng.random((10, 500)) < 0.25).astype(float)
W_new = model.transform(Y_new) # shape (10, n_components)
# Masked training / hold-out validation
mask = (rng.random(X.shape) < 0.9).astype(float) # observe 90% of entries
model = NBMF(n_components=20).fit(X, mask=mask)
print("score (−NLL per observed entry):", model.score(X, mask=mask))
print("perplexity:", model.perplexity(X, mask=mask))
```
To use the fast projection alternative:
```python
model = NBMF(n_components=6, orientation="beta-dir",
projection_method="duchi", random_state=0).fit(X)
```
---
## Mathematical Formulations
Both orientations solve the Bernoulli likelihood: `V ~ Bernoulli(sigmoid(W @ H))`
### Orientation: `beta-dir` (Matches paper's primary formulation)
- **W**: Rows lie on probability simplex (sum to 1)
- **H**: Continuous in [0,1] with Beta(α, β) prior
- Use this to reproduce paper experiments
### Orientation: `dir-beta` (Symmetric formulation)
- **W**: Continuous in [0,1] with Beta(α, β) prior
- **H**: Columns lie on probability simplex (sum to 1)
- Mathematically equivalent to `beta-dir` on `V.T`
## Why two orientations?
Different interpretability needs:
- **beta-dir**: W rows are mixture weights over latent factors; H captures factor-feature associations
- **dir-beta**: H columns are mixture weights over latent aspects; W captures sample-aspect propensities
Both solve the same mean-parameterized factorization with symmetric geometric constraints.
---
## API Hightlihts
NBMF(
n_components: int,
orientation: {"dir-beta","beta-dir"} = "beta-dir",
alpha: float = 1.2,
beta: float = 1.2,
projection_method: {"normalize","duchi"} = "normalize",
max_iter: int = 2000,
tol: float = 1e-6,
random_state: int | None = None,
n_init: int = 1,
# accepted for compatibility (currently unused in core Python impl)
use_numexpr: bool = False,
use_numba: bool = False,
projection_backend: str = "auto",
)
---
## Reproducibility
- Set `random_state` (int) for reproducible initialization.
- Use `n_init > 1` to run several random restarts and keep the best NLL.
## Reproducing Magron & Févotte (2022)
To reproduce the results from the original paper, use these settings:
```python
from nbmf_mm import NBMF
# Use beta-dir to match paper exactly
model = NBMF(
n_components=10,
orientation="beta-dir",
alpha=1.2,
beta=1.2,
max_iter=500,
tol=1e-5
)
model.fit(X)
# W rows will sum to 1, H will be continuous
```
Run the reproduction scripts:
```bash
python examples/reproduce_magron2022.py
python examples/display_figures.py
```
---
## References
- **Simplex projection** (default):
- J. Duchi, S. Shalev‑Shwartz, Y. Singer, T. Chandra (2008).
Efficient Projections onto the ℓ₁‑Ball for Learning in High Dimensions. ICML 2008.
- W. Wang, M. Á. Carreira‑Perpiñán (2013).
Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application. arXiv:1309.1541.
- Bayesian NBMF (related, slower but fully Bayesian):
- See the `NBMF` project by alumbreras for reference implementations of Bayesian variants.
Raw data
{
"_id": null,
"home_page": null,
"name": "nbmf-mm",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.9",
"maintainer_email": null,
"keywords": "NMF, Bernoulli, matrix factorization, topic modeling, ICA, Dirichlet, Beta",
"author": null,
"author_email": "\"Siddharth M. Chauhan\" <github.chauhan.siddharth@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/11/6c/29db02cb083752decb34b3a093d3e0a0353236eca6d57baadca1d88a6253/nbmf_mm-0.1.4.tar.gz",
"platform": null,
"description": "# nbmf\u2011mm \u2014 Mean\u2011parameterized Bernoulli NMF via Majorization\u2013Minimization\n\n[](https://github.com/siddC/nbmf_mm/actions/workflows/ci.yml)\n[](https://pypi.org/project/nbmf-mm/)\n[](LICENSE.md)\n\n**nbmf\u2011mm** implements *mean\u2011parameterized* Bernoulli matrix factorization\n\\(\\Theta = W H \\in (0,1)^{M\\times N}\\) with a **Majorization\u2013Minimization (MM)**\nsolver, following **Magron & F\u00e9votte (2022)**. It exposes a scikit\u2011learn\u2011style\nAPI and two symmetric orientations:\n\n- **`orientation=\"beta-dir\"` (default; Original Paper Setting)** \n **W** rows sum to 1 (simplex constraint); **H** is continuous in [0,1] with Beta prior \n This matches Magron & F\u00e9votte (2022)\n\n- **`orientation=\"dir-beta\"` (Symmetric Alternative)** \n **W** is continuous in [0,1] with Beta prior; **H** columns sum to 1 (simplex constraint) \n Mathematically equivalent to `\"beta-dir\"` on `V.T`\n\n**Projection choices** for the simplex\u2011constrained factor:\n\n- **`projection_method=\"normalize\"` (default; theory\u2011first)** \u2013 paper\u2011exact\n multiplicative step with the **/N** (or **/M**) normalizer that preserves the\n simplex in exact arithmetic and enjoys the classical MM monotonicity\n guarantee.\n\n- **`projection_method=\"duchi\"` (fast)** \u2013 Euclidean projection to the simplex\n (Duchi et\u202fal.,\u00a02008) performed **after** the multiplicative step. This is\n typically near\u2011identical to `\"normalize\"` numerically, but the formal MM\n monotonicity guarantee applies to `\"normalize\"`.\n\n> Masked training (matrix completion) is supported: only observed entries\n> contribute to the likelihood and updates. In the simplex steps the paper\u2011exact\n> **/N** (or **/M**) normalizer is naturally replaced by **per\u2011row** (or\n> **per\u2011column**) observed counts, preserving the simplex under masking.\n\n---\n\n## Installation\n\n```bash\npip install nbmf-mm\n```\n\nFrom source:\n```bash\npip install \"git+https://github.com/siddC/nbmf_mm\"\n```\n\nOptional extras:\n```bash\n# scikit-learn integration & NNDSVD-style init (if you enable it later)\npip install \"nbmf-mm[sklearn]\"\n\n# docs build stack\npip install \"nbmf-mm[docs]\"\n```\n\n---\n\n## Quick Start\n\n```python\nimport numpy as np\nfrom nbmf_mm import NBMF\n\nrng = np.random.default_rng(0)\nX = (rng.random((100, 500)) < 0.25).astype(float) # binary {0,1} or probabilities in [0,1]\n\n# Theory-first default: monotone MM with paper-exact normalizer\nmodel = NBMF(n_components=6, orientation=\"beta-dir\",\n alpha=1.2, beta=1.2, random_state=0).fit(X)\n\nW = model.W_ # shape (n_samples, n_components)\nH = model.components_ # shape (n_components, n_features)\nXhat = model.inverse_transform(W) # probabilities in (0,1)\n\n# Transform new data using fixed components H\nY_new = (rng.random((10, 500)) < 0.25).astype(float)\nW_new = model.transform(Y_new) # shape (10, n_components)\n\n# Masked training / hold-out validation\nmask = (rng.random(X.shape) < 0.9).astype(float) # observe 90% of entries\nmodel = NBMF(n_components=20).fit(X, mask=mask)\n\nprint(\"score (\u2212NLL per observed entry):\", model.score(X, mask=mask))\nprint(\"perplexity:\", model.perplexity(X, mask=mask))\n```\n\nTo use the fast projection alternative:\n```python\nmodel = NBMF(n_components=6, orientation=\"beta-dir\",\n projection_method=\"duchi\", random_state=0).fit(X)\n```\n\n---\n\n## Mathematical Formulations\n\nBoth orientations solve the Bernoulli likelihood: `V ~ Bernoulli(sigmoid(W @ H))`\n\n### Orientation: `beta-dir` (Matches paper's primary formulation)\n- **W**: Rows lie on probability simplex (sum to 1)\n- **H**: Continuous in [0,1] with Beta(\u03b1, \u03b2) prior\n- Use this to reproduce paper experiments\n\n### Orientation: `dir-beta` (Symmetric formulation)\n- **W**: Continuous in [0,1] with Beta(\u03b1, \u03b2) prior \n- **H**: Columns lie on probability simplex (sum to 1)\n- Mathematically equivalent to `beta-dir` on `V.T`\n\n## Why two orientations?\n\nDifferent interpretability needs:\n- **beta-dir**: W rows are mixture weights over latent factors; H captures factor-feature associations\n- **dir-beta**: H columns are mixture weights over latent aspects; W captures sample-aspect propensities\n\nBoth solve the same mean-parameterized factorization with symmetric geometric constraints.\n\n---\n\n## API Hightlihts\n\nNBMF(\n n_components: int,\n orientation: {\"dir-beta\",\"beta-dir\"} = \"beta-dir\",\n alpha: float = 1.2,\n beta: float = 1.2,\n projection_method: {\"normalize\",\"duchi\"} = \"normalize\",\n max_iter: int = 2000,\n tol: float = 1e-6,\n random_state: int | None = None,\n n_init: int = 1,\n # accepted for compatibility (currently unused in core Python impl)\n use_numexpr: bool = False,\n use_numba: bool = False,\n projection_backend: str = \"auto\",\n)\n\n---\n\n## Reproducibility\n- Set `random_state` (int) for reproducible initialization.\n- Use `n_init > 1` to run several random restarts and keep the best NLL.\n\n## Reproducing Magron & F\u00e9votte (2022)\n\nTo reproduce the results from the original paper, use these settings:\n\n```python\nfrom nbmf_mm import NBMF\n\n# Use beta-dir to match paper exactly\nmodel = NBMF(\n n_components=10,\n orientation=\"beta-dir\",\n alpha=1.2,\n beta=1.2,\n max_iter=500,\n tol=1e-5\n)\nmodel.fit(X)\n\n# W rows will sum to 1, H will be continuous\n```\n\nRun the reproduction scripts:\n\n```bash\npython examples/reproduce_magron2022.py\npython examples/display_figures.py\n```\n\n---\n\n## References\n- **Simplex projection** (default):\n - J. Duchi, S. Shalev\u2011Shwartz, Y. Singer, T. Chandra (2008).\n Efficient Projections onto the \u2113\u2081\u2011Ball for Learning in High Dimensions. ICML 2008.\n \n - W. Wang, M. \u00c1. Carreira\u2011Perpi\u00f1\u00e1n (2013).\n Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application. arXiv:1309.1541.\n \n - Bayesian NBMF (related, slower but fully Bayesian):\n - See the `NBMF` project by alumbreras for reference implementations of Bayesian variants.\n",
"bugtrack_url": null,
"license": null,
"summary": "Bernoulli (binary) mean-parameterized NMF (NBMF) w/ Majorization\u2013Minimization (MM)",
"version": "0.1.4",
"project_urls": {
"Homepage": "https://github.com/siddC/nbmf_mm",
"Issues": "https://github.com/siddC/nbmf_mm/issues",
"Repository": "https://github.com/siddC/nbmf_mm"
},
"split_keywords": [
"nmf",
" bernoulli",
" matrix factorization",
" topic modeling",
" ica",
" dirichlet",
" beta"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "deee90a150a81e3be91811b730fa4608315a5b02fc803a70fd33e21768f4a2de",
"md5": "eb9c6a7ba33730d1061132d1e52fa21f",
"sha256": "8b9ac6686d966341607a7bcca41e2c1f410ecfc16fa83a67016abc205fef1bcd"
},
"downloads": -1,
"filename": "nbmf_mm-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "eb9c6a7ba33730d1061132d1e52fa21f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.9",
"size": 11893,
"upload_time": "2025-09-05T04:46:13",
"upload_time_iso_8601": "2025-09-05T04:46:13.640592Z",
"url": "https://files.pythonhosted.org/packages/de/ee/90a150a81e3be91811b730fa4608315a5b02fc803a70fd33e21768f4a2de/nbmf_mm-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "116c29db02cb083752decb34b3a093d3e0a0353236eca6d57baadca1d88a6253",
"md5": "4db4fb7c2b76eb939aa3ab458c10bb86",
"sha256": "abedae25f1a563190996325025827d38362f47b9563c61d98fb85fbff19206fb"
},
"downloads": -1,
"filename": "nbmf_mm-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "4db4fb7c2b76eb939aa3ab458c10bb86",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.9",
"size": 13997149,
"upload_time": "2025-09-05T04:46:16",
"upload_time_iso_8601": "2025-09-05T04:46:16.017153Z",
"url": "https://files.pythonhosted.org/packages/11/6c/29db02cb083752decb34b3a093d3e0a0353236eca6d57baadca1d88a6253/nbmf_mm-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-05 04:46:16",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "siddC",
"github_project": "nbmf_mm",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "nbmf-mm"
}