# Easy PMF
[](https://badge.fury.io/py/easy-pmf)
[](https://pypi.org/project/easy-pmf/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/gerritjandebruin/easy-pmf/actions/workflows/ci.yml)
[](https://gerritjandebruin.github.io/easy-pmf/)
[](https://github.com/gerritjandebruin/easy-pmf/actions/workflows/publish.yml)
**Easy PMF** is a comprehensive Python package for Positive Matrix Factorization (PMF) analysis, designed specifically for environmental data analysis such as air quality source apportionment. It provides an easy-to-use interface similar to EPA's PMF software with built-in visualization capabilities.
:warning: This project is in the early stages of development and may not yet be suitable for production use.
:warning: A LLM (Large Language Model) is being used to assist with development and documentation; much of the content was vibe coded without much oversight.
## ✨ Features
- **Simple API**: Easy-to-use interface similar to scikit-learn
- **Comprehensive Visualizations**: EPA PMF-style plots and heatmaps
- **Multiple Dataset Support**: Built-in support for various environmental datasets
- **Robust Error Handling**: Input validation and convergence checking
- **Flexible Data Input**: Support for CSV, TXT, and Excel files
- **Interactive Analysis**: Command-line tools for quick analysis
- **Well Documented**: Extensive documentation with examples
## 🚀 Quick Start
### Installation
```bash
pip install easy-pmf
```
### Basic Usage
```python
import pandas as pd
from easy_pmf import PMF
# Load your concentration and uncertainty data
concentrations = pd.read_csv("concentrations.csv", index_col=0)
uncertainties = pd.read_csv("uncertainties.csv", index_col=0)
# Initialize PMF with 5 factors
pmf = PMF(n_components=5, random_state=42)
# Fit the model
pmf.fit(concentrations, uncertainties)
# Access results
factor_contributions = pmf.contributions_ # Time series of factor contributions
factor_profiles = pmf.profiles_ # Chemical profiles of each factor
# Check model performance
q_value = pmf.score(concentrations, uncertainties)
print(f"Model Q-value: {q_value:.2f}")
print(f"Converged: {pmf.converged_}")
print(f"Iterations: {pmf.n_iter_}")
```
### Command Line Interface
```bash
# Analyze a single dataset interactively
easy-pmf
# Or use the analysis scripts directly
python quick_analysis.py
```
## 📊 Included Example Datasets
The package comes with three real-world datasets:
- **Baton Rouge**: Air quality data (307 samples × 41 species)
- **St. Louis**: Environmental monitoring data (418 samples × 13 species)
- **Baltimore**: PM2.5 composition data (657 samples × 26 species)
## 🎯 Use Cases
- **Air Quality Analysis**: Source apportionment of particulate matter
- **Environmental Monitoring**: Identifying pollution sources
- **Research**: Academic studies requiring PMF analysis
- **Regulatory Compliance**: EPA-style PMF analysis for reporting
## 📈 Visualization Capabilities
Easy PMF automatically generates comprehensive visualizations:
- **Factor Profiles**: Chemical signatures of each source
- **Factor Contributions**: Time series showing source strength
- **Correlation Matrices**: Relationships between factors
- **EPA-style Plots**: Publication-ready visualizations
- **Summary Dashboards**: Quick overview of results
## 📚 Documentation
### PMF Class Parameters
- `n_components` (int): Number of factors to extract
- `max_iter` (int, default=1000): Maximum iterations
- `tol` (float, default=1e-4): Convergence tolerance
- `random_state` (int, optional): Random seed for reproducibility
### Methods
- `fit(X, U=None)`: Fit PMF model to data
- `transform(X, U=None)`: Apply fitted model to new data
- `score(X, U=None)`: Calculate Q-value for goodness of fit
### Data Format Requirements
- **Concentrations**: Rows = time points, Columns = chemical species
- **Uncertainties**: Same format as concentrations (optional)
- **Index**: Date/time information
- **Values**: Non-negative concentrations
## 🛠️ Advanced Usage
### Custom Analysis Pipeline
```python
from easy_pmf import PMF
import matplotlib.pyplot as plt
# Load and preprocess data
concentrations = pd.read_csv("data.csv", index_col=0)
uncertainties = pd.read_csv("uncertainties.csv", index_col=0)
# Remove low-signal species
concentrations = concentrations.loc[:, (concentrations > 0).any(axis=0)]
uncertainties = uncertainties[concentrations.columns]
# Try different numbers of factors
for n_factors in range(3, 8):
pmf = PMF(n_components=n_factors, random_state=42)
pmf.fit(concentrations, uncertainties)
q_value = pmf.score(concentrations, uncertainties)
print(f"Factors: {n_factors}, Q-value: {q_value:.2f}")
# Analyze best model
best_pmf = PMF(n_components=5, random_state=42)
best_pmf.fit(concentrations, uncertainties)
# Custom visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
# Plot factor profiles
best_pmf.profiles_.T.plot(kind='bar', ax=ax1)
ax1.set_title('Factor Profiles')
ax1.set_xlabel('Chemical Species')
# Plot contributions over time
best_pmf.contributions_.plot(ax=ax2)
ax2.set_title('Factor Contributions Over Time')
ax2.set_ylabel('Contribution')
plt.tight_layout()
plt.show()
```
### Batch Processing Multiple Datasets
```python
import os
from easy_pmf import PMF
datasets = {
"site1": {"conc": "site1_conc.csv", "unc": "site1_unc.csv"},
"site2": {"conc": "site2_conc.csv", "unc": "site2_unc.csv"},
}
results = {}
for site, files in datasets.items():
print(f"Analyzing {site}...")
conc = pd.read_csv(files["conc"], index_col=0)
unc = pd.read_csv(files["unc"], index_col=0)
pmf = PMF(n_components=5, random_state=42)
pmf.fit(conc, unc)
results[site] = {
"contributions": pmf.contributions_,
"profiles": pmf.profiles_,
"q_value": pmf.score(conc, unc),
"converged": pmf.converged_
}
print(f" Q-value: {results[site]['q_value']:.2f}")
print(f" Converged: {results[site]['converged']}")
```
## 🔧 Development & Infrastructure
### CI/CD Pipeline
This project features a comprehensive CI/CD infrastructure:
- **✅ Automated Testing**: Matrix testing across Python 3.9-3.12 on Ubuntu, macOS, and Windows
- **✅ Code Quality**: Automated linting, formatting, and type checking with pre-commit hooks
- **✅ Security Scanning**: Dependency vulnerability scanning with Bandit
- **✅ Documentation**: Automatic deployment to GitHub Pages
- **✅ Package Publishing**: Automated PyPI publishing on releases
- **✅ Dependency Management**: Weekly dependency updates and maintenance
### Code Quality Standards
- **Type Safety**: Full type annotation coverage with mypy validation
- **Code Style**: Enforced with Ruff (linting and formatting)
- **Testing**: Comprehensive test suite with pytest
- **Documentation**: Auto-generated docs with MkDocs Material
- **Pre-commit Hooks**: Quality checks run on every commit using `uv`
## 🤝 Contributing
## 🤝 Contributing
We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details on:
- Development setup with `uv` and pre-commit hooks
- Code quality standards and automated checks
- Testing requirements and CI/CD infrastructure
- Documentation guidelines and examples
- Pull request process and review requirements
### Quick Start for Contributors
```bash
# Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/easy-pmf.git
cd easy-pmf
# Set up development environment
uv sync --all-extras
uv run pre-commit install
# Make changes and test
uv run pytest
uv run pre-commit run --all-files
```
### Development Setup
```bash
git clone https://github.com/gerritjandebruin/easy-pmf.git
cd easy-pmf
# Install uv (modern Python package manager)
# On Windows: https://docs.astral.sh/uv/getting-started/installation/
# On macOS/Linux: curl -LsSf https://astral.sh/uv/install.sh | sh
# Create development environment and install dependencies
uv sync --all-extras
# Install pre-commit hooks for code quality
uv run pre-commit install
# Run tests to verify setup
uv run pytest
# Run type checking
uv run mypy .
# Run code formatting and linting
uv run ruff check --fix
uv run ruff format
```
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🙏 Acknowledgments
- EPA PMF software for inspiration
- Marloes van Os for first contributions and ideas
## 📞 Support
- **Documentation**: [https://gerritjandebruin.github.io/easy-pmf/](https://gerritjandebruin.github.io/easy-pmf/)
- **Issues**: [GitHub Issues](https://github.com/gerritjandebruin/easy-pmf/issues)
- **Discussions**: [GitHub Discussions](https://github.com/gerritjandebruin/easy-pmf/discussions)
---
**Easy PMF** - Making positive matrix factorization accessible to everyone! 🌍
Raw data
{
"_id": null,
"home_page": null,
"name": "easy-pmf",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "pmf, positive matrix factorization, environmental, air quality, source apportionment",
"author": null,
"author_email": "Gerrit Jan de Bruin <gerritjan.debruin@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/77/19/3f7fa649484dcdf2bf052e097ec907dcad766baf9811b947ec05de8c9dba/easy_pmf-0.1.0.tar.gz",
"platform": null,
"description": "\n# Easy PMF\n\n[](https://badge.fury.io/py/easy-pmf)\n[](https://pypi.org/project/easy-pmf/)\n[](https://opensource.org/licenses/MIT)\n\n[](https://github.com/gerritjandebruin/easy-pmf/actions/workflows/ci.yml)\n[](https://gerritjandebruin.github.io/easy-pmf/)\n[](https://github.com/gerritjandebruin/easy-pmf/actions/workflows/publish.yml)\n\n**Easy PMF** is a comprehensive Python package for Positive Matrix Factorization (PMF) analysis, designed specifically for environmental data analysis such as air quality source apportionment. It provides an easy-to-use interface similar to EPA's PMF software with built-in visualization capabilities.\n\n:warning: This project is in the early stages of development and may not yet be suitable for production use.\n\n:warning: A LLM (Large Language Model) is being used to assist with development and documentation; much of the content was vibe coded without much oversight.\n\n## \u2728 Features\n\n- **Simple API**: Easy-to-use interface similar to scikit-learn\n- **Comprehensive Visualizations**: EPA PMF-style plots and heatmaps\n- **Multiple Dataset Support**: Built-in support for various environmental datasets\n- **Robust Error Handling**: Input validation and convergence checking\n- **Flexible Data Input**: Support for CSV, TXT, and Excel files\n- **Interactive Analysis**: Command-line tools for quick analysis\n- **Well Documented**: Extensive documentation with examples\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n```bash\npip install easy-pmf\n```\n\n### Basic Usage\n\n```python\nimport pandas as pd\nfrom easy_pmf import PMF\n\n# Load your concentration and uncertainty data\nconcentrations = pd.read_csv(\"concentrations.csv\", index_col=0)\nuncertainties = pd.read_csv(\"uncertainties.csv\", index_col=0)\n\n# Initialize PMF with 5 factors\npmf = PMF(n_components=5, random_state=42)\n\n# Fit the model\npmf.fit(concentrations, uncertainties)\n\n# Access results\nfactor_contributions = pmf.contributions_ # Time series of factor contributions\nfactor_profiles = pmf.profiles_ # Chemical profiles of each factor\n\n# Check model performance\nq_value = pmf.score(concentrations, uncertainties)\nprint(f\"Model Q-value: {q_value:.2f}\")\nprint(f\"Converged: {pmf.converged_}\")\nprint(f\"Iterations: {pmf.n_iter_}\")\n```\n\n### Command Line Interface\n\n```bash\n# Analyze a single dataset interactively\neasy-pmf\n\n# Or use the analysis scripts directly\npython quick_analysis.py\n```\n\n## \ud83d\udcca Included Example Datasets\n\nThe package comes with three real-world datasets:\n\n- **Baton Rouge**: Air quality data (307 samples \u00d7 41 species)\n- **St. Louis**: Environmental monitoring data (418 samples \u00d7 13 species)\n- **Baltimore**: PM2.5 composition data (657 samples \u00d7 26 species)\n\n## \ud83c\udfaf Use Cases\n\n- **Air Quality Analysis**: Source apportionment of particulate matter\n- **Environmental Monitoring**: Identifying pollution sources\n- **Research**: Academic studies requiring PMF analysis\n- **Regulatory Compliance**: EPA-style PMF analysis for reporting\n\n## \ud83d\udcc8 Visualization Capabilities\n\nEasy PMF automatically generates comprehensive visualizations:\n\n- **Factor Profiles**: Chemical signatures of each source\n- **Factor Contributions**: Time series showing source strength\n- **Correlation Matrices**: Relationships between factors\n- **EPA-style Plots**: Publication-ready visualizations\n- **Summary Dashboards**: Quick overview of results\n\n## \ud83d\udcda Documentation\n\n### PMF Class Parameters\n\n- `n_components` (int): Number of factors to extract\n- `max_iter` (int, default=1000): Maximum iterations\n- `tol` (float, default=1e-4): Convergence tolerance\n- `random_state` (int, optional): Random seed for reproducibility\n\n### Methods\n\n- `fit(X, U=None)`: Fit PMF model to data\n- `transform(X, U=None)`: Apply fitted model to new data\n- `score(X, U=None)`: Calculate Q-value for goodness of fit\n\n### Data Format Requirements\n\n- **Concentrations**: Rows = time points, Columns = chemical species\n- **Uncertainties**: Same format as concentrations (optional)\n- **Index**: Date/time information\n- **Values**: Non-negative concentrations\n\n## \ud83d\udee0\ufe0f Advanced Usage\n\n### Custom Analysis Pipeline\n\n```python\nfrom easy_pmf import PMF\nimport matplotlib.pyplot as plt\n\n# Load and preprocess data\nconcentrations = pd.read_csv(\"data.csv\", index_col=0)\nuncertainties = pd.read_csv(\"uncertainties.csv\", index_col=0)\n\n# Remove low-signal species\nconcentrations = concentrations.loc[:, (concentrations > 0).any(axis=0)]\nuncertainties = uncertainties[concentrations.columns]\n\n# Try different numbers of factors\nfor n_factors in range(3, 8):\n pmf = PMF(n_components=n_factors, random_state=42)\n pmf.fit(concentrations, uncertainties)\n q_value = pmf.score(concentrations, uncertainties)\n print(f\"Factors: {n_factors}, Q-value: {q_value:.2f}\")\n\n# Analyze best model\nbest_pmf = PMF(n_components=5, random_state=42)\nbest_pmf.fit(concentrations, uncertainties)\n\n# Custom visualization\nfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))\n\n# Plot factor profiles\nbest_pmf.profiles_.T.plot(kind='bar', ax=ax1)\nax1.set_title('Factor Profiles')\nax1.set_xlabel('Chemical Species')\n\n# Plot contributions over time\nbest_pmf.contributions_.plot(ax=ax2)\nax2.set_title('Factor Contributions Over Time')\nax2.set_ylabel('Contribution')\n\nplt.tight_layout()\nplt.show()\n```\n\n### Batch Processing Multiple Datasets\n\n```python\nimport os\nfrom easy_pmf import PMF\n\ndatasets = {\n \"site1\": {\"conc\": \"site1_conc.csv\", \"unc\": \"site1_unc.csv\"},\n \"site2\": {\"conc\": \"site2_conc.csv\", \"unc\": \"site2_unc.csv\"},\n}\n\nresults = {}\nfor site, files in datasets.items():\n print(f\"Analyzing {site}...\")\n\n conc = pd.read_csv(files[\"conc\"], index_col=0)\n unc = pd.read_csv(files[\"unc\"], index_col=0)\n\n pmf = PMF(n_components=5, random_state=42)\n pmf.fit(conc, unc)\n\n results[site] = {\n \"contributions\": pmf.contributions_,\n \"profiles\": pmf.profiles_,\n \"q_value\": pmf.score(conc, unc),\n \"converged\": pmf.converged_\n }\n\n print(f\" Q-value: {results[site]['q_value']:.2f}\")\n print(f\" Converged: {results[site]['converged']}\")\n```\n\n## \ud83d\udd27 Development & Infrastructure\n\n### CI/CD Pipeline\n\nThis project features a comprehensive CI/CD infrastructure:\n\n- **\u2705 Automated Testing**: Matrix testing across Python 3.9-3.12 on Ubuntu, macOS, and Windows\n- **\u2705 Code Quality**: Automated linting, formatting, and type checking with pre-commit hooks\n- **\u2705 Security Scanning**: Dependency vulnerability scanning with Bandit\n- **\u2705 Documentation**: Automatic deployment to GitHub Pages\n- **\u2705 Package Publishing**: Automated PyPI publishing on releases\n- **\u2705 Dependency Management**: Weekly dependency updates and maintenance\n\n### Code Quality Standards\n\n- **Type Safety**: Full type annotation coverage with mypy validation\n- **Code Style**: Enforced with Ruff (linting and formatting)\n- **Testing**: Comprehensive test suite with pytest\n- **Documentation**: Auto-generated docs with MkDocs Material\n- **Pre-commit Hooks**: Quality checks run on every commit using `uv`\n\n## \ud83e\udd1d Contributing\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details on:\n\n- Development setup with `uv` and pre-commit hooks\n- Code quality standards and automated checks\n- Testing requirements and CI/CD infrastructure\n- Documentation guidelines and examples\n- Pull request process and review requirements\n\n### Quick Start for Contributors\n\n```bash\n# Fork and clone the repository\ngit clone https://github.com/YOUR_USERNAME/easy-pmf.git\ncd easy-pmf\n\n# Set up development environment\nuv sync --all-extras\nuv run pre-commit install\n\n# Make changes and test\nuv run pytest\nuv run pre-commit run --all-files\n```\n\n### Development Setup\n\n```bash\ngit clone https://github.com/gerritjandebruin/easy-pmf.git\ncd easy-pmf\n\n# Install uv (modern Python package manager)\n# On Windows: https://docs.astral.sh/uv/getting-started/installation/\n# On macOS/Linux: curl -LsSf https://astral.sh/uv/install.sh | sh\n\n# Create development environment and install dependencies\nuv sync --all-extras\n\n# Install pre-commit hooks for code quality\nuv run pre-commit install\n\n# Run tests to verify setup\nuv run pytest\n\n# Run type checking\nuv run mypy .\n\n# Run code formatting and linting\nuv run ruff check --fix\nuv run ruff format\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4f Acknowledgments\n\n- EPA PMF software for inspiration\n- Marloes van Os for first contributions and ideas\n\n## \ud83d\udcde Support\n\n- **Documentation**: [https://gerritjandebruin.github.io/easy-pmf/](https://gerritjandebruin.github.io/easy-pmf/)\n- **Issues**: [GitHub Issues](https://github.com/gerritjandebruin/easy-pmf/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/gerritjandebruin/easy-pmf/discussions)\n\n---\n\n**Easy PMF** - Making positive matrix factorization accessible to everyone! \ud83c\udf0d\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "An easy-to-use package for Positive Matrix Factorization (PMF) analysis of environmental data.",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/easy-pmf/easy-pmf/issues",
"Changelog": "https://github.com/easy-pmf/easy-pmf/blob/main/CHANGELOG.md",
"Documentation": "https://easy-pmf.readthedocs.io",
"Homepage": "https://github.com/easy-pmf/easy-pmf",
"Repository": "https://github.com/easy-pmf/easy-pmf"
},
"split_keywords": [
"pmf",
" positive matrix factorization",
" environmental",
" air quality",
" source apportionment"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "4c00b914b505866f5dc2999c5116ccaa282c864f382c3e3bc40b8435544c8b78",
"md5": "ad6514b5922cf02c519440821b041e43",
"sha256": "010e423f0d1ebd5f3edc2631307ab496de77f6615fa6da4b861e24250f3ab89e"
},
"downloads": -1,
"filename": "easy_pmf-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ad6514b5922cf02c519440821b041e43",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 197717,
"upload_time": "2025-08-09T19:30:36",
"upload_time_iso_8601": "2025-08-09T19:30:36.861205Z",
"url": "https://files.pythonhosted.org/packages/4c/00/b914b505866f5dc2999c5116ccaa282c864f382c3e3bc40b8435544c8b78/easy_pmf-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "77193f7fa649484dcdf2bf052e097ec907dcad766baf9811b947ec05de8c9dba",
"md5": "a5fa451bad5c5749c3f8974af0659a42",
"sha256": "338f6c61d761d63d178906c556995c88fd5df8bf64290740950f0a70b74f15ac"
},
"downloads": -1,
"filename": "easy_pmf-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "a5fa451bad5c5749c3f8974af0659a42",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 200647,
"upload_time": "2025-08-09T19:30:38",
"upload_time_iso_8601": "2025-08-09T19:30:38.175688Z",
"url": "https://files.pythonhosted.org/packages/77/19/3f7fa649484dcdf2bf052e097ec907dcad766baf9811b947ec05de8c9dba/easy_pmf-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-09 19:30:38",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "easy-pmf",
"github_project": "easy-pmf",
"github_not_found": true,
"lcname": "easy-pmf"
}