easy-pmf


Nameeasy-pmf JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryAn easy-to-use package for Positive Matrix Factorization (PMF) analysis of environmental data.
upload_time2025-08-09 19:30:38
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords pmf positive matrix factorization environmental air quality source apportionment
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# Easy PMF

[![PyPI version](https://badge.fury.io/py/easy-pmf.svg)](https://badge.fury.io/py/easy-pmf)
[![Python versions](https://img.shields.io/pypi/pyversions/easy-pmf.svg)](https://pypi.org/project/easy-pmf/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

[![CI/CD](https://github.com/gerritjandebruin/easy-pmf/actions/workflows/ci.yml/badge.svg)](https://github.com/gerritjandebruin/easy-pmf/actions/workflows/ci.yml)
[![Documentation](https://github.com/gerritjandebruin/easy-pmf/actions/workflows/docs.yml/badge.svg)](https://gerritjandebruin.github.io/easy-pmf/)
[![Publish](https://github.com/gerritjandebruin/easy-pmf/actions/workflows/publish.yml/badge.svg)](https://github.com/gerritjandebruin/easy-pmf/actions/workflows/publish.yml)

**Easy PMF** is a comprehensive Python package for Positive Matrix Factorization (PMF) analysis, designed specifically for environmental data analysis such as air quality source apportionment. It provides an easy-to-use interface similar to EPA's PMF software with built-in visualization capabilities.

:warning: This project is in the early stages of development and may not yet be suitable for production use.

:warning: A LLM (Large Language Model) is being used to assist with development and documentation; much of the content was vibe coded without much oversight.

## ✨ Features

- **Simple API**: Easy-to-use interface similar to scikit-learn
- **Comprehensive Visualizations**: EPA PMF-style plots and heatmaps
- **Multiple Dataset Support**: Built-in support for various environmental datasets
- **Robust Error Handling**: Input validation and convergence checking
- **Flexible Data Input**: Support for CSV, TXT, and Excel files
- **Interactive Analysis**: Command-line tools for quick analysis
- **Well Documented**: Extensive documentation with examples

## 🚀 Quick Start

### Installation

```bash
pip install easy-pmf
```

### Basic Usage

```python
import pandas as pd
from easy_pmf import PMF

# Load your concentration and uncertainty data
concentrations = pd.read_csv("concentrations.csv", index_col=0)
uncertainties = pd.read_csv("uncertainties.csv", index_col=0)

# Initialize PMF with 5 factors
pmf = PMF(n_components=5, random_state=42)

# Fit the model
pmf.fit(concentrations, uncertainties)

# Access results
factor_contributions = pmf.contributions_  # Time series of factor contributions
factor_profiles = pmf.profiles_            # Chemical profiles of each factor

# Check model performance
q_value = pmf.score(concentrations, uncertainties)
print(f"Model Q-value: {q_value:.2f}")
print(f"Converged: {pmf.converged_}")
print(f"Iterations: {pmf.n_iter_}")
```

### Command Line Interface

```bash
# Analyze a single dataset interactively
easy-pmf

# Or use the analysis scripts directly
python quick_analysis.py
```

## 📊 Included Example Datasets

The package comes with three real-world datasets:

- **Baton Rouge**: Air quality data (307 samples × 41 species)
- **St. Louis**: Environmental monitoring data (418 samples × 13 species)
- **Baltimore**: PM2.5 composition data (657 samples × 26 species)

## 🎯 Use Cases

- **Air Quality Analysis**: Source apportionment of particulate matter
- **Environmental Monitoring**: Identifying pollution sources
- **Research**: Academic studies requiring PMF analysis
- **Regulatory Compliance**: EPA-style PMF analysis for reporting

## 📈 Visualization Capabilities

Easy PMF automatically generates comprehensive visualizations:

- **Factor Profiles**: Chemical signatures of each source
- **Factor Contributions**: Time series showing source strength
- **Correlation Matrices**: Relationships between factors
- **EPA-style Plots**: Publication-ready visualizations
- **Summary Dashboards**: Quick overview of results

## 📚 Documentation

### PMF Class Parameters

- `n_components` (int): Number of factors to extract
- `max_iter` (int, default=1000): Maximum iterations
- `tol` (float, default=1e-4): Convergence tolerance
- `random_state` (int, optional): Random seed for reproducibility

### Methods

- `fit(X, U=None)`: Fit PMF model to data
- `transform(X, U=None)`: Apply fitted model to new data
- `score(X, U=None)`: Calculate Q-value for goodness of fit

### Data Format Requirements

- **Concentrations**: Rows = time points, Columns = chemical species
- **Uncertainties**: Same format as concentrations (optional)
- **Index**: Date/time information
- **Values**: Non-negative concentrations

## 🛠️ Advanced Usage

### Custom Analysis Pipeline

```python
from easy_pmf import PMF
import matplotlib.pyplot as plt

# Load and preprocess data
concentrations = pd.read_csv("data.csv", index_col=0)
uncertainties = pd.read_csv("uncertainties.csv", index_col=0)

# Remove low-signal species
concentrations = concentrations.loc[:, (concentrations > 0).any(axis=0)]
uncertainties = uncertainties[concentrations.columns]

# Try different numbers of factors
for n_factors in range(3, 8):
    pmf = PMF(n_components=n_factors, random_state=42)
    pmf.fit(concentrations, uncertainties)
    q_value = pmf.score(concentrations, uncertainties)
    print(f"Factors: {n_factors}, Q-value: {q_value:.2f}")

# Analyze best model
best_pmf = PMF(n_components=5, random_state=42)
best_pmf.fit(concentrations, uncertainties)

# Custom visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Plot factor profiles
best_pmf.profiles_.T.plot(kind='bar', ax=ax1)
ax1.set_title('Factor Profiles')
ax1.set_xlabel('Chemical Species')

# Plot contributions over time
best_pmf.contributions_.plot(ax=ax2)
ax2.set_title('Factor Contributions Over Time')
ax2.set_ylabel('Contribution')

plt.tight_layout()
plt.show()
```

### Batch Processing Multiple Datasets

```python
import os
from easy_pmf import PMF

datasets = {
    "site1": {"conc": "site1_conc.csv", "unc": "site1_unc.csv"},
    "site2": {"conc": "site2_conc.csv", "unc": "site2_unc.csv"},
}

results = {}
for site, files in datasets.items():
    print(f"Analyzing {site}...")

    conc = pd.read_csv(files["conc"], index_col=0)
    unc = pd.read_csv(files["unc"], index_col=0)

    pmf = PMF(n_components=5, random_state=42)
    pmf.fit(conc, unc)

    results[site] = {
        "contributions": pmf.contributions_,
        "profiles": pmf.profiles_,
        "q_value": pmf.score(conc, unc),
        "converged": pmf.converged_
    }

    print(f"  Q-value: {results[site]['q_value']:.2f}")
    print(f"  Converged: {results[site]['converged']}")
```

## 🔧 Development & Infrastructure

### CI/CD Pipeline

This project features a comprehensive CI/CD infrastructure:

- **✅ Automated Testing**: Matrix testing across Python 3.9-3.12 on Ubuntu, macOS, and Windows
- **✅ Code Quality**: Automated linting, formatting, and type checking with pre-commit hooks
- **✅ Security Scanning**: Dependency vulnerability scanning with Bandit
- **✅ Documentation**: Automatic deployment to GitHub Pages
- **✅ Package Publishing**: Automated PyPI publishing on releases
- **✅ Dependency Management**: Weekly dependency updates and maintenance

### Code Quality Standards

- **Type Safety**: Full type annotation coverage with mypy validation
- **Code Style**: Enforced with Ruff (linting and formatting)
- **Testing**: Comprehensive test suite with pytest
- **Documentation**: Auto-generated docs with MkDocs Material
- **Pre-commit Hooks**: Quality checks run on every commit using `uv`

## 🤝 Contributing

## 🤝 Contributing

We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details on:

- Development setup with `uv` and pre-commit hooks
- Code quality standards and automated checks
- Testing requirements and CI/CD infrastructure
- Documentation guidelines and examples
- Pull request process and review requirements

### Quick Start for Contributors

```bash
# Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/easy-pmf.git
cd easy-pmf

# Set up development environment
uv sync --all-extras
uv run pre-commit install

# Make changes and test
uv run pytest
uv run pre-commit run --all-files
```

### Development Setup

```bash
git clone https://github.com/gerritjandebruin/easy-pmf.git
cd easy-pmf

# Install uv (modern Python package manager)
# On Windows: https://docs.astral.sh/uv/getting-started/installation/
# On macOS/Linux: curl -LsSf https://astral.sh/uv/install.sh | sh

# Create development environment and install dependencies
uv sync --all-extras

# Install pre-commit hooks for code quality
uv run pre-commit install

# Run tests to verify setup
uv run pytest

# Run type checking
uv run mypy .

# Run code formatting and linting
uv run ruff check --fix
uv run ruff format
```

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- EPA PMF software for inspiration
- Marloes van Os for first contributions and ideas

## 📞 Support

- **Documentation**: [https://gerritjandebruin.github.io/easy-pmf/](https://gerritjandebruin.github.io/easy-pmf/)
- **Issues**: [GitHub Issues](https://github.com/gerritjandebruin/easy-pmf/issues)
- **Discussions**: [GitHub Discussions](https://github.com/gerritjandebruin/easy-pmf/discussions)

---

**Easy PMF** - Making positive matrix factorization accessible to everyone! 🌍

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "easy-pmf",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "pmf, positive matrix factorization, environmental, air quality, source apportionment",
    "author": null,
    "author_email": "Gerrit Jan de Bruin <gerritjan.debruin@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/77/19/3f7fa649484dcdf2bf052e097ec907dcad766baf9811b947ec05de8c9dba/easy_pmf-0.1.0.tar.gz",
    "platform": null,
    "description": "\n# Easy PMF\n\n[![PyPI version](https://badge.fury.io/py/easy-pmf.svg)](https://badge.fury.io/py/easy-pmf)\n[![Python versions](https://img.shields.io/pypi/pyversions/easy-pmf.svg)](https://pypi.org/project/easy-pmf/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n[![CI/CD](https://github.com/gerritjandebruin/easy-pmf/actions/workflows/ci.yml/badge.svg)](https://github.com/gerritjandebruin/easy-pmf/actions/workflows/ci.yml)\n[![Documentation](https://github.com/gerritjandebruin/easy-pmf/actions/workflows/docs.yml/badge.svg)](https://gerritjandebruin.github.io/easy-pmf/)\n[![Publish](https://github.com/gerritjandebruin/easy-pmf/actions/workflows/publish.yml/badge.svg)](https://github.com/gerritjandebruin/easy-pmf/actions/workflows/publish.yml)\n\n**Easy PMF** is a comprehensive Python package for Positive Matrix Factorization (PMF) analysis, designed specifically for environmental data analysis such as air quality source apportionment. It provides an easy-to-use interface similar to EPA's PMF software with built-in visualization capabilities.\n\n:warning: This project is in the early stages of development and may not yet be suitable for production use.\n\n:warning: A LLM (Large Language Model) is being used to assist with development and documentation; much of the content was vibe coded without much oversight.\n\n## \u2728 Features\n\n- **Simple API**: Easy-to-use interface similar to scikit-learn\n- **Comprehensive Visualizations**: EPA PMF-style plots and heatmaps\n- **Multiple Dataset Support**: Built-in support for various environmental datasets\n- **Robust Error Handling**: Input validation and convergence checking\n- **Flexible Data Input**: Support for CSV, TXT, and Excel files\n- **Interactive Analysis**: Command-line tools for quick analysis\n- **Well Documented**: Extensive documentation with examples\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n```bash\npip install easy-pmf\n```\n\n### Basic Usage\n\n```python\nimport pandas as pd\nfrom easy_pmf import PMF\n\n# Load your concentration and uncertainty data\nconcentrations = pd.read_csv(\"concentrations.csv\", index_col=0)\nuncertainties = pd.read_csv(\"uncertainties.csv\", index_col=0)\n\n# Initialize PMF with 5 factors\npmf = PMF(n_components=5, random_state=42)\n\n# Fit the model\npmf.fit(concentrations, uncertainties)\n\n# Access results\nfactor_contributions = pmf.contributions_  # Time series of factor contributions\nfactor_profiles = pmf.profiles_            # Chemical profiles of each factor\n\n# Check model performance\nq_value = pmf.score(concentrations, uncertainties)\nprint(f\"Model Q-value: {q_value:.2f}\")\nprint(f\"Converged: {pmf.converged_}\")\nprint(f\"Iterations: {pmf.n_iter_}\")\n```\n\n### Command Line Interface\n\n```bash\n# Analyze a single dataset interactively\neasy-pmf\n\n# Or use the analysis scripts directly\npython quick_analysis.py\n```\n\n## \ud83d\udcca Included Example Datasets\n\nThe package comes with three real-world datasets:\n\n- **Baton Rouge**: Air quality data (307 samples \u00d7 41 species)\n- **St. Louis**: Environmental monitoring data (418 samples \u00d7 13 species)\n- **Baltimore**: PM2.5 composition data (657 samples \u00d7 26 species)\n\n## \ud83c\udfaf Use Cases\n\n- **Air Quality Analysis**: Source apportionment of particulate matter\n- **Environmental Monitoring**: Identifying pollution sources\n- **Research**: Academic studies requiring PMF analysis\n- **Regulatory Compliance**: EPA-style PMF analysis for reporting\n\n## \ud83d\udcc8 Visualization Capabilities\n\nEasy PMF automatically generates comprehensive visualizations:\n\n- **Factor Profiles**: Chemical signatures of each source\n- **Factor Contributions**: Time series showing source strength\n- **Correlation Matrices**: Relationships between factors\n- **EPA-style Plots**: Publication-ready visualizations\n- **Summary Dashboards**: Quick overview of results\n\n## \ud83d\udcda Documentation\n\n### PMF Class Parameters\n\n- `n_components` (int): Number of factors to extract\n- `max_iter` (int, default=1000): Maximum iterations\n- `tol` (float, default=1e-4): Convergence tolerance\n- `random_state` (int, optional): Random seed for reproducibility\n\n### Methods\n\n- `fit(X, U=None)`: Fit PMF model to data\n- `transform(X, U=None)`: Apply fitted model to new data\n- `score(X, U=None)`: Calculate Q-value for goodness of fit\n\n### Data Format Requirements\n\n- **Concentrations**: Rows = time points, Columns = chemical species\n- **Uncertainties**: Same format as concentrations (optional)\n- **Index**: Date/time information\n- **Values**: Non-negative concentrations\n\n## \ud83d\udee0\ufe0f Advanced Usage\n\n### Custom Analysis Pipeline\n\n```python\nfrom easy_pmf import PMF\nimport matplotlib.pyplot as plt\n\n# Load and preprocess data\nconcentrations = pd.read_csv(\"data.csv\", index_col=0)\nuncertainties = pd.read_csv(\"uncertainties.csv\", index_col=0)\n\n# Remove low-signal species\nconcentrations = concentrations.loc[:, (concentrations > 0).any(axis=0)]\nuncertainties = uncertainties[concentrations.columns]\n\n# Try different numbers of factors\nfor n_factors in range(3, 8):\n    pmf = PMF(n_components=n_factors, random_state=42)\n    pmf.fit(concentrations, uncertainties)\n    q_value = pmf.score(concentrations, uncertainties)\n    print(f\"Factors: {n_factors}, Q-value: {q_value:.2f}\")\n\n# Analyze best model\nbest_pmf = PMF(n_components=5, random_state=42)\nbest_pmf.fit(concentrations, uncertainties)\n\n# Custom visualization\nfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))\n\n# Plot factor profiles\nbest_pmf.profiles_.T.plot(kind='bar', ax=ax1)\nax1.set_title('Factor Profiles')\nax1.set_xlabel('Chemical Species')\n\n# Plot contributions over time\nbest_pmf.contributions_.plot(ax=ax2)\nax2.set_title('Factor Contributions Over Time')\nax2.set_ylabel('Contribution')\n\nplt.tight_layout()\nplt.show()\n```\n\n### Batch Processing Multiple Datasets\n\n```python\nimport os\nfrom easy_pmf import PMF\n\ndatasets = {\n    \"site1\": {\"conc\": \"site1_conc.csv\", \"unc\": \"site1_unc.csv\"},\n    \"site2\": {\"conc\": \"site2_conc.csv\", \"unc\": \"site2_unc.csv\"},\n}\n\nresults = {}\nfor site, files in datasets.items():\n    print(f\"Analyzing {site}...\")\n\n    conc = pd.read_csv(files[\"conc\"], index_col=0)\n    unc = pd.read_csv(files[\"unc\"], index_col=0)\n\n    pmf = PMF(n_components=5, random_state=42)\n    pmf.fit(conc, unc)\n\n    results[site] = {\n        \"contributions\": pmf.contributions_,\n        \"profiles\": pmf.profiles_,\n        \"q_value\": pmf.score(conc, unc),\n        \"converged\": pmf.converged_\n    }\n\n    print(f\"  Q-value: {results[site]['q_value']:.2f}\")\n    print(f\"  Converged: {results[site]['converged']}\")\n```\n\n## \ud83d\udd27 Development & Infrastructure\n\n### CI/CD Pipeline\n\nThis project features a comprehensive CI/CD infrastructure:\n\n- **\u2705 Automated Testing**: Matrix testing across Python 3.9-3.12 on Ubuntu, macOS, and Windows\n- **\u2705 Code Quality**: Automated linting, formatting, and type checking with pre-commit hooks\n- **\u2705 Security Scanning**: Dependency vulnerability scanning with Bandit\n- **\u2705 Documentation**: Automatic deployment to GitHub Pages\n- **\u2705 Package Publishing**: Automated PyPI publishing on releases\n- **\u2705 Dependency Management**: Weekly dependency updates and maintenance\n\n### Code Quality Standards\n\n- **Type Safety**: Full type annotation coverage with mypy validation\n- **Code Style**: Enforced with Ruff (linting and formatting)\n- **Testing**: Comprehensive test suite with pytest\n- **Documentation**: Auto-generated docs with MkDocs Material\n- **Pre-commit Hooks**: Quality checks run on every commit using `uv`\n\n## \ud83e\udd1d Contributing\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details on:\n\n- Development setup with `uv` and pre-commit hooks\n- Code quality standards and automated checks\n- Testing requirements and CI/CD infrastructure\n- Documentation guidelines and examples\n- Pull request process and review requirements\n\n### Quick Start for Contributors\n\n```bash\n# Fork and clone the repository\ngit clone https://github.com/YOUR_USERNAME/easy-pmf.git\ncd easy-pmf\n\n# Set up development environment\nuv sync --all-extras\nuv run pre-commit install\n\n# Make changes and test\nuv run pytest\nuv run pre-commit run --all-files\n```\n\n### Development Setup\n\n```bash\ngit clone https://github.com/gerritjandebruin/easy-pmf.git\ncd easy-pmf\n\n# Install uv (modern Python package manager)\n# On Windows: https://docs.astral.sh/uv/getting-started/installation/\n# On macOS/Linux: curl -LsSf https://astral.sh/uv/install.sh | sh\n\n# Create development environment and install dependencies\nuv sync --all-extras\n\n# Install pre-commit hooks for code quality\nuv run pre-commit install\n\n# Run tests to verify setup\nuv run pytest\n\n# Run type checking\nuv run mypy .\n\n# Run code formatting and linting\nuv run ruff check --fix\nuv run ruff format\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4f Acknowledgments\n\n- EPA PMF software for inspiration\n- Marloes van Os for first contributions and ideas\n\n## \ud83d\udcde Support\n\n- **Documentation**: [https://gerritjandebruin.github.io/easy-pmf/](https://gerritjandebruin.github.io/easy-pmf/)\n- **Issues**: [GitHub Issues](https://github.com/gerritjandebruin/easy-pmf/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/gerritjandebruin/easy-pmf/discussions)\n\n---\n\n**Easy PMF** - Making positive matrix factorization accessible to everyone! \ud83c\udf0d\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "An easy-to-use package for Positive Matrix Factorization (PMF) analysis of environmental data.",
    "version": "0.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/easy-pmf/easy-pmf/issues",
        "Changelog": "https://github.com/easy-pmf/easy-pmf/blob/main/CHANGELOG.md",
        "Documentation": "https://easy-pmf.readthedocs.io",
        "Homepage": "https://github.com/easy-pmf/easy-pmf",
        "Repository": "https://github.com/easy-pmf/easy-pmf"
    },
    "split_keywords": [
        "pmf",
        " positive matrix factorization",
        " environmental",
        " air quality",
        " source apportionment"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4c00b914b505866f5dc2999c5116ccaa282c864f382c3e3bc40b8435544c8b78",
                "md5": "ad6514b5922cf02c519440821b041e43",
                "sha256": "010e423f0d1ebd5f3edc2631307ab496de77f6615fa6da4b861e24250f3ab89e"
            },
            "downloads": -1,
            "filename": "easy_pmf-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ad6514b5922cf02c519440821b041e43",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 197717,
            "upload_time": "2025-08-09T19:30:36",
            "upload_time_iso_8601": "2025-08-09T19:30:36.861205Z",
            "url": "https://files.pythonhosted.org/packages/4c/00/b914b505866f5dc2999c5116ccaa282c864f382c3e3bc40b8435544c8b78/easy_pmf-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "77193f7fa649484dcdf2bf052e097ec907dcad766baf9811b947ec05de8c9dba",
                "md5": "a5fa451bad5c5749c3f8974af0659a42",
                "sha256": "338f6c61d761d63d178906c556995c88fd5df8bf64290740950f0a70b74f15ac"
            },
            "downloads": -1,
            "filename": "easy_pmf-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a5fa451bad5c5749c3f8974af0659a42",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 200647,
            "upload_time": "2025-08-09T19:30:38",
            "upload_time_iso_8601": "2025-08-09T19:30:38.175688Z",
            "url": "https://files.pythonhosted.org/packages/77/19/3f7fa649484dcdf2bf052e097ec907dcad766baf9811b947ec05de8c9dba/easy_pmf-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-09 19:30:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "easy-pmf",
    "github_project": "easy-pmf",
    "github_not_found": true,
    "lcname": "easy-pmf"
}
        
Elapsed time: 1.40896s