lib-ppca


Namelib-ppca JSON
Version 1.0.0 PyPI version JSON
download
home_pageNone
SummaryProbabilistic PCA (PPCA) with missing-data support - fast C++ core, clean Python API
upload_time2025-09-17 09:17:25
maintainerNone
docs_urlNone
authorbrdav
requires_python>=3.9
licenseMIT License Copyright (c) 2025 brdav Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords ppca probabilistic pca dimensionality reduction missing data
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# lib-ppca

[![Build Wheels](https://img.shields.io/github/actions/workflow/status/brdav/lib-ppca/.github/workflows/build.yml?branch=main)](https://github.com/brdav/lib-ppca/actions)
[![PyPI version](https://img.shields.io/pypi/v/lib-ppca.svg)](https://pypi.org/project/lib-ppca)
[![Python Versions](https://img.shields.io/pypi/pyversions/lib-ppca.svg)](https://pypi.org/project/lib-ppca)
[![License](https://img.shields.io/github/license/brdav/lib-ppca.svg)](LICENSE)

Probabilistic PCA (PPCA) with missing-data support — fast C++ core, clean Python API.

<img src="https://raw.githubusercontent.com/brdav/lib-ppca/main/docs/teaser.jpg" width="500" alt="teaser"/>

## Overview

**lib-ppca** implements Probabilistic Principal Component Analysis (PPCA) as described by Tipping & Bishop (1999), with a focus on speed, usability, and robust handling of missing data. The core is written in C++ (Armadillo + CARMA + pybind11), exposed via a simple Python interface.

### Key Features

- **Handles missing values natively:** No need for manual imputation—just use `np.nan` for missing entries.
- **Familiar API:** Drop-in replacement for scikit-learn PCA with attributes like `components_`, `explained_variance_`, etc.
- **Probabilistic modeling:** Compute log-likelihoods, posterior latent variable distributions, multiple imputations, and more.
- **Fast and scalable:** Optimized C++ backend for large datasets.
- **Flexible:** Supports both batch and online (mini-batch) EM.

## Quick Start

```bash
pip install lib-ppca
```

Note: pre-built wheels are produced only for Linux and macOS (CI builds target ubuntu-latest and macos-latest). On other platforms (e.g. Windows) pip you will need to build from source (see further below).

Usage example:

```python
import numpy as np
from lib_ppca import PPCA

X_train = np.random.randn(600, 10)
X_train[::7, 3] = np.nan  # introduce missing values
X_test = np.random.randn(100, 10)
X_test[::7, 2] = np.nan  # introduce missing values

model = PPCA(n_components=3, batch_size=200)
model.fit(X_train)

mZ, covZ = model.posterior_latent(X_test) # latent representation
mX, covX = model.likelihood(mZ)           # reconstruction
ll = model.score(X_test)                  # mean log likelihood

# multiple imputation
X_imputed = model.sample_missing(X_test, n_draws=5)

print("Components shape:", model.components_.shape)
print("Explained variance ratio:", model.explained_variance_ratio_)
```

For more examples and PPCA functionalities see the [demo notebook](https://github.com/brdav/lib-ppca/blob/main/notebooks/demo.ipynb).

## Installation from Source

Minimum requirements

* CMake >= 3.18
* Python >= 3.9 (+ development headers)
* C++20-capable compiler (clang on macOS, gcc on Linux, MSVC on Windows)
* BLAS/LAPACK implementation (OpenBLAS, MKL, or Accelerate on macOS)
* git (to fetch submodules)
* ninja recommended (CMake/scikit-build often prefer Ninja)
* Network access (CMake will download Armadillo into extern by default) or provide extern/armadillo-<ver>/ or a system Armadillo install

Quick install (fresh clone)

```bash
git clone https://github.com/brdav/lib-ppca.git
cd lib-ppca
git submodule update --init --recursive   # ensure extern/carma is present
python -m pip install .                   # build and install
```

Editable install for development

```bash
git clone https://github.com/brdav/lib-ppca.git
cd lib-ppca
git submodule update --init --recursive
python -m pip install -e '.[dev]'         # editable install
pre-commit install                        # optional: register hooks
```

### Windows

Builds on Windows are untested in CI. You can attempt a Windows build but expect manual steps:

* Install Visual Studio with the C++ toolchain (or a supported MinGW) and CMake.
* Provide BLAS/LAPACK (OpenBLAS, MKL) and point CMake to their libraries.
* Provide Armadillo sources (extern/armadillo-<ver>/) or install Armadillo system-wide and set ARMADILLO_ROOT_DIR/use find_package.
* You may need to adjust linker/rpath settings or prefer static linking to avoid missing DLLs at runtime.

## Internals

PPCA uses an Expectation-Maximization (EM) algorithm to learn parameters through maximum likelihood estimation. For details see the reference paper listed below. The equations for the EM algorithm in the presence of missing values are listed in [EQUATIONS.md](https://github.com/brdav/lib-ppca/blob/main/EQUATIONS.md).

## Citing

If you use this code academically, cite the original PPCA paper:

* M. Tipping & C. Bishop. Probabilistic Principal Component Analysis. JRSS B, 1999.

You may also reference the library name or URL.

## License

MIT License — see `LICENSE`.

---
Questions or requests? Open an issue.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "lib-ppca",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "ppca, probabilistic pca, dimensionality reduction, missing data",
    "author": "brdav",
    "author_email": null,
    "download_url": null,
    "platform": null,
    "description": "\n# lib-ppca\n\n[![Build Wheels](https://img.shields.io/github/actions/workflow/status/brdav/lib-ppca/.github/workflows/build.yml?branch=main)](https://github.com/brdav/lib-ppca/actions)\n[![PyPI version](https://img.shields.io/pypi/v/lib-ppca.svg)](https://pypi.org/project/lib-ppca)\n[![Python Versions](https://img.shields.io/pypi/pyversions/lib-ppca.svg)](https://pypi.org/project/lib-ppca)\n[![License](https://img.shields.io/github/license/brdav/lib-ppca.svg)](LICENSE)\n\nProbabilistic PCA (PPCA) with missing-data support \u2014 fast C++ core, clean Python API.\n\n<img src=\"https://raw.githubusercontent.com/brdav/lib-ppca/main/docs/teaser.jpg\" width=\"500\" alt=\"teaser\"/>\n\n## Overview\n\n**lib-ppca** implements Probabilistic Principal Component Analysis (PPCA) as described by Tipping & Bishop (1999), with a focus on speed, usability, and robust handling of missing data. The core is written in C++ (Armadillo + CARMA + pybind11), exposed via a simple Python interface.\n\n### Key Features\n\n- **Handles missing values natively:** No need for manual imputation\u2014just use `np.nan` for missing entries.\n- **Familiar API:** Drop-in replacement for scikit-learn PCA with attributes like `components_`, `explained_variance_`, etc.\n- **Probabilistic modeling:** Compute log-likelihoods, posterior latent variable distributions, multiple imputations, and more.\n- **Fast and scalable:** Optimized C++ backend for large datasets.\n- **Flexible:** Supports both batch and online (mini-batch) EM.\n\n## Quick Start\n\n```bash\npip install lib-ppca\n```\n\nNote: pre-built wheels are produced only for Linux and macOS (CI builds target ubuntu-latest and macos-latest). On other platforms (e.g. Windows) pip you will need to build from source (see further below).\n\nUsage example:\n\n```python\nimport numpy as np\nfrom lib_ppca import PPCA\n\nX_train = np.random.randn(600, 10)\nX_train[::7, 3] = np.nan  # introduce missing values\nX_test = np.random.randn(100, 10)\nX_test[::7, 2] = np.nan  # introduce missing values\n\nmodel = PPCA(n_components=3, batch_size=200)\nmodel.fit(X_train)\n\nmZ, covZ = model.posterior_latent(X_test) # latent representation\nmX, covX = model.likelihood(mZ)           # reconstruction\nll = model.score(X_test)                  # mean log likelihood\n\n# multiple imputation\nX_imputed = model.sample_missing(X_test, n_draws=5)\n\nprint(\"Components shape:\", model.components_.shape)\nprint(\"Explained variance ratio:\", model.explained_variance_ratio_)\n```\n\nFor more examples and PPCA functionalities see the [demo notebook](https://github.com/brdav/lib-ppca/blob/main/notebooks/demo.ipynb).\n\n## Installation from Source\n\nMinimum requirements\n\n* CMake >= 3.18\n* Python >= 3.9 (+ development headers)\n* C++20-capable compiler (clang on macOS, gcc on Linux, MSVC on Windows)\n* BLAS/LAPACK implementation (OpenBLAS, MKL, or Accelerate on macOS)\n* git (to fetch submodules)\n* ninja recommended (CMake/scikit-build often prefer Ninja)\n* Network access (CMake will download Armadillo into extern by default) or provide extern/armadillo-<ver>/ or a system Armadillo install\n\nQuick install (fresh clone)\n\n```bash\ngit clone https://github.com/brdav/lib-ppca.git\ncd lib-ppca\ngit submodule update --init --recursive   # ensure extern/carma is present\npython -m pip install .                   # build and install\n```\n\nEditable install for development\n\n```bash\ngit clone https://github.com/brdav/lib-ppca.git\ncd lib-ppca\ngit submodule update --init --recursive\npython -m pip install -e '.[dev]'         # editable install\npre-commit install                        # optional: register hooks\n```\n\n### Windows\n\nBuilds on Windows are untested in CI. You can attempt a Windows build but expect manual steps:\n\n* Install Visual Studio with the C++ toolchain (or a supported MinGW) and CMake.\n* Provide BLAS/LAPACK (OpenBLAS, MKL) and point CMake to their libraries.\n* Provide Armadillo sources (extern/armadillo-<ver>/) or install Armadillo system-wide and set ARMADILLO_ROOT_DIR/use find_package.\n* You may need to adjust linker/rpath settings or prefer static linking to avoid missing DLLs at runtime.\n\n## Internals\n\nPPCA uses an Expectation-Maximization (EM) algorithm to learn parameters through maximum likelihood estimation. For details see the reference paper listed below. The equations for the EM algorithm in the presence of missing values are listed in [EQUATIONS.md](https://github.com/brdav/lib-ppca/blob/main/EQUATIONS.md).\n\n## Citing\n\nIf you use this code academically, cite the original PPCA paper:\n\n* M. Tipping & C. Bishop. Probabilistic Principal Component Analysis. JRSS B, 1999.\n\nYou may also reference the library name or URL.\n\n## License\n\nMIT License \u2014 see `LICENSE`.\n\n---\nQuestions or requests? Open an issue.\n",
    "bugtrack_url": null,
    "license": "MIT License\n         \n         Copyright (c) 2025 brdav\n         \n         Permission is hereby granted, free of charge, to any person obtaining a copy\n         of this software and associated documentation files (the \"Software\"), to deal\n         in the Software without restriction, including without limitation the rights\n         to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n         copies of the Software, and to permit persons to whom the Software is\n         furnished to do so, subject to the following conditions:\n         \n         The above copyright notice and this permission notice shall be included in all\n         copies or substantial portions of the Software.\n         \n         THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n         IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n         FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n         AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n         LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n         OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n         SOFTWARE.\n         ",
    "summary": "Probabilistic PCA (PPCA) with missing-data support - fast C++ core, clean Python API",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://github.com/brdav/lib-ppca",
        "Source": "https://github.com/brdav/lib-ppca"
    },
    "split_keywords": [
        "ppca",
        " probabilistic pca",
        " dimensionality reduction",
        " missing data"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7fdb73a819c1223688c813e92eb2afeb6007468d231dd5a654964353065bf91a",
                "md5": "72092b6db93b8a0499745a032c057ec3",
                "sha256": "26b072b1dddd9dff17dd0f62e5a9b403d13fa048726b1fef942cfd685bad5ede"
            },
            "downloads": -1,
            "filename": "lib_ppca-1.0.0-cp310-cp310-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "72092b6db93b8a0499745a032c057ec3",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.9",
            "size": 197538,
            "upload_time": "2025-09-17T09:17:25",
            "upload_time_iso_8601": "2025-09-17T09:17:25.541369Z",
            "url": "https://files.pythonhosted.org/packages/7f/db/73a819c1223688c813e92eb2afeb6007468d231dd5a654964353065bf91a/lib_ppca-1.0.0-cp310-cp310-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1fd5b2bb361c162254e5e67c9e2557d566e5618b3da17c5fb7ac9a10c304f63f",
                "md5": "23a801c0b2ed7c30a3d5c808bb87255a",
                "sha256": "5ae669f17565f6826f6d15ef59cdb6acaf70e103aff1759f9fe9acae4cc9b21e"
            },
            "downloads": -1,
            "filename": "lib_ppca-1.0.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
            "has_sig": false,
            "md5_digest": "23a801c0b2ed7c30a3d5c808bb87255a",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.9",
            "size": 256377,
            "upload_time": "2025-09-17T09:17:27",
            "upload_time_iso_8601": "2025-09-17T09:17:27.149186Z",
            "url": "https://files.pythonhosted.org/packages/1f/d5/b2bb361c162254e5e67c9e2557d566e5618b3da17c5fb7ac9a10c304f63f/lib_ppca-1.0.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "24173c6b81aad5566c685f097a2b892cb5aabff77d9fca31f19a53f0c595132e",
                "md5": "fc4fa8219e233a52739c6349417d8724",
                "sha256": "e0a1a7028f6e7e6556178e251d3617c32a5b4ed2c14feee8961be2d398f3517b"
            },
            "downloads": -1,
            "filename": "lib_ppca-1.0.0-cp311-cp311-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "fc4fa8219e233a52739c6349417d8724",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.9",
            "size": 198801,
            "upload_time": "2025-09-17T09:17:28",
            "upload_time_iso_8601": "2025-09-17T09:17:28.174144Z",
            "url": "https://files.pythonhosted.org/packages/24/17/3c6b81aad5566c685f097a2b892cb5aabff77d9fca31f19a53f0c595132e/lib_ppca-1.0.0-cp311-cp311-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ce6b4806c9886c70a2017b5101ef4731e77c097f4e30f536bcf1c054cf9a46f7",
                "md5": "57593590292333bb54fb89a321b84e8c",
                "sha256": "b6eb52666a4b179f0f3bc882abe5e51356d8047b69986dcd0869b8492a2e5bb8"
            },
            "downloads": -1,
            "filename": "lib_ppca-1.0.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
            "has_sig": false,
            "md5_digest": "57593590292333bb54fb89a321b84e8c",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.9",
            "size": 257168,
            "upload_time": "2025-09-17T09:17:29",
            "upload_time_iso_8601": "2025-09-17T09:17:29.316989Z",
            "url": "https://files.pythonhosted.org/packages/ce/6b/4806c9886c70a2017b5101ef4731e77c097f4e30f536bcf1c054cf9a46f7/lib_ppca-1.0.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7da300ecf1e0852d1ad43addaa13a216fc059a4fb0019295f7bb43abd9a71adb",
                "md5": "38967f154ccbd5b692bdd51518e3d4b9",
                "sha256": "f541bb2b5f7a5b77feca2ee136e438e6742c0b5f6e575962f4139c4de8ce6864"
            },
            "downloads": -1,
            "filename": "lib_ppca-1.0.0-cp39-cp39-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "38967f154ccbd5b692bdd51518e3d4b9",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 197681,
            "upload_time": "2025-09-17T09:17:30",
            "upload_time_iso_8601": "2025-09-17T09:17:30.245681Z",
            "url": "https://files.pythonhosted.org/packages/7d/a3/00ecf1e0852d1ad43addaa13a216fc059a4fb0019295f7bb43abd9a71adb/lib_ppca-1.0.0-cp39-cp39-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ac3cdda3c0a9aa23c5e8a4fdea75156e8d1c9f3f06e27c37387da08e24544c7e",
                "md5": "d7e6a815ffaa5f6f86019b1b3a0c9456",
                "sha256": "13f1ce5d8f9655eb785c2b5a857fac61e2aef81f079af9f7ca95c4a6d616ffd6"
            },
            "downloads": -1,
            "filename": "lib_ppca-1.0.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
            "has_sig": false,
            "md5_digest": "d7e6a815ffaa5f6f86019b1b3a0c9456",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 256447,
            "upload_time": "2025-09-17T09:17:31",
            "upload_time_iso_8601": "2025-09-17T09:17:31.558859Z",
            "url": "https://files.pythonhosted.org/packages/ac/3c/dda3c0a9aa23c5e8a4fdea75156e8d1c9f3f06e27c37387da08e24544c7e/lib_ppca-1.0.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-17 09:17:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "brdav",
    "github_project": "lib-ppca",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "lib-ppca"
}
        
Elapsed time: 3.35933s