Name | statflow JSON |
Version |
3.5.9
JSON |
| download |
home_page | None |
Summary | A versatile statistical toolkit for Python, featuring core statistical methods, time series analysis, signal processing, and climatology tools |
upload_time | 2025-07-28 09:33:20 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.10 |
license | MIT License
Copyright (c) 2024 StatFlow
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE. |
keywords |
statistics
time series
signal processing
climatology
data analysis
|
VCS |
 |
bugtrack_url |
|
requirements |
numpy
pandas
scipy
matplotlib
xarray
filewise
pygenutils
scikit-learn
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# statflow
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://pypi.org/project/statflow/)
**statflow** is a comprehensive Python toolkit for statistical analysis, time series processing, and climatological data analysis. Built with modern scientific computing standards, it provides robust tools for statistical operations, signal processing, and specialised climatology workflows. The package emphasises professional-grade statistical computing with comprehensive type annotations, efficient algorithms, and extensive climatological indicators.
## Features
- **Core Statistical Analysis**:
- Advanced time series analysis with periodic statistics and trend detection
- Statistical hypothesis testing (Z-tests, Chi-squared tests)
- Moving operations (moving averages, window sums) for multi-dimensional data
- Comprehensive interpolation methods (polynomial, spline, linear) for NumPy, pandas, and xarray
- Signal processing with filtering (low-pass, high-pass, band-pass) and whitening techniques
- Regression analysis tools and approximation techniques
- **Climatological Analysis**:
- Climate indicator calculations (WSDI, SU, CSU, FD, TN, RR, CWD, HWD)
- Periodic climatological statistics with multi-frequency support (hourly, daily, monthly, seasonal, yearly)
- Representative series generation including Hourly Design Year (HDY) following ISO 15927-4:2005
- Simple bias correction techniques with absolute and relative delta methods
- Comprehensive meteorological variable calculations (heat index, wind chill, dew point, specific humidity)
- Bioclimatic variable computation (19 standard bioclimatic indicators)
- **Advanced Data Processing**:
- Multi-format data support (pandas DataFrames, xarray Datasets/DataArrays, NumPy arrays)
- Cumulative data decomposition and time series transformation
- Consecutive occurrence analysis for extreme event detection
- Autocorrelation analysis with optimised algorithms for large datasets
- Professional error handling with comprehensive input validation
- **Signal Processing & Filtering**:
- Signal whitening techniques (classic, sklearn PCA, ZCA whitening)
- Multiple filtering approaches with frequency domain processing
- Fourier transform-based band-pass filtering methods
- Noise handling and signal enhancement tools
## Installation
### Prerequisites
- **Python 3.10+**: Required for modern type annotations and features
- **Core Dependencies**: NumPy, pandas, scipy, xarray for scientific computing
- **Additional Dependencies**: filewise, pygenutils (project packages)
### For Regular Users
**For regular users** who want to use the package in their projects:
```bash
pip install statflow
```
This automatically installs `statflow` and all its dependencies from PyPI and GitHub repositories.
### Package Updates
To stay up-to-date with the latest version of this package, simply run:
```bash
pip install --upgrade statflow
```
## Development Setup
### For Contributors and Developers
If you're planning to contribute to the project or work with the source code, follow these setup instructions:
#### Quick Setup (Recommended)
```bash
# Clone the repository
git clone https://github.com/EusDancerDev/statflow.git
cd statflow
# Install in editable mode with all dependencies
pip install -e .
```
**Note**: The `-e` flag installs the package in "editable" mode, meaning changes to the source code are immediately reflected without reinstalling.
This will automatically install all dependencies with version constraints.
#### Alternative Setup (Explicit Git Dependencies)
If you prefer to use the explicit development requirements file:
```bash
# Clone the repository
git clone https://github.com/EusDancerDev/statflow.git
cd statflow
# Install development dependencies from requirements-dev.txt
pip install -r requirements-dev.txt
# Install in editable mode
pip install -e .
```
This approach gives you the latest development versions of all interdependent packages for testing and development.
If you encounter import errors after cloning:
1. **For regular users**: Run `pip install statflow` (all dependencies included)
2. **For developers**: Run `pip install -e .[dev]` to include development dependencies
3. **Verify Python environment**: Make sure you're using a compatible Python version (3.10+)
4. **Check scientific computing libraries**: Ensure scipy, xarray, and other scientific packages are available
### Verify Installation
To verify that your installation is working correctly, you can run this quick test:
```python
# Test script to verify installation
try:
import statflow
from filewise.general.introspection_utils import get_type_str
from pygenutils.arrays_and_lists.data_manipulation import flatten_list
from statflow.core.time_series import periodic_statistics
print("✅ All imports successful!")
print(f"✅ statflow version: {statflow.__version__}")
print("✅ Installation is working correctly.")
except ImportError as e:
print(f"❌ Import error: {e}")
print("💡 For regular users: pip install statflow")
print("💡 For developers: pip install -e .[dev]")
```
### Implementation Notes
This project implements a **dual-approach dependency management** system:
- **Production Dependencies**: Version-constrained dependencies for PyPI compatibility
- **Development Dependencies**: Git-based dependencies for latest development versions
- **Installation Methods**:
- **Regular users**: Simple `pip install statflow` with all dependencies included
- **Developers**: `pip install -e .[dev]` for latest Git versions and development tools
- **PyPI Compatibility**: All packages can be published without Git dependency issues
- **Development Flexibility**: Contributors get access to latest versions for testing and development
## Usage
### Core Statistical Analysis
```python
from statflow.core.time_series import periodic_statistics, autocorrelate
from statflow.core.statistical_tests import z_test_two_means, chi_squared_test
import pandas as pd
import numpy as np
# Load your time series data
df = pd.read_csv("your_data.csv", parse_dates=['date'])
# Calculate periodic statistics
monthly_means = periodic_statistics(
df,
statistic="mean",
freq="M", # Monthly frequency
drop_date_idx_col=False
)
# Perform hypothesis testing
sample1 = np.random.normal(10, 2, 100)
sample2 = np.random.normal(12, 2, 100)
z_stat, p_value, result = z_test_two_means(sample1, sample2)
print(f"Z-test result: {result}")
# Autocorrelation analysis
autocorr = autocorrelate(df['temperature'].values, twosided=False)
```
### Signal Processing
```python
from statflow.core.signal_processing import low_pass_filter, band_pass1, signal_whitening
from statflow.core.moving_operations import moving_average, window_sum
# Apply signal filtering
filtered_signal = low_pass_filter(noisy_data, window_size=5)
# Band-pass filtering in frequency domain
band_filtered = band_pass1(
original_signal,
timestep=0.1,
low_freq=0.1,
high_freq=2.0
)
# Signal whitening for decorrelation
whitened_data = signal_whitening(signal_data, method="classic")
# Moving operations for time series
moving_avg = moving_average(time_series, N=7) # 7-day moving average
cumulative_sum = window_sum(data_array, N=30) # 30-point window sum
```
### Interpolation Methods
```python
from statflow.core.interpolation_methods import interp_np, interp_pd, interp_xr, polynomial_fitting
# NumPy array interpolation
interpolated_np = interp_np(
data_with_gaps,
method='spline',
order=3
)
# Pandas DataFrame interpolation
interpolated_pd = interp_pd(
df_with_missing,
method='polynomial',
order=2
)
# Polynomial fitting with edge preservation
fitted_data = polynomial_fitting(
y_values,
poly_ord=3,
fix_edges=True
)
```
### Climatological Analysis
```python
from statflow.fields.climatology.indicators import calculate_WSDI, calculate_SU, calculate_hwd
from statflow.fields.climatology.periodic_climat_stats import climat_periodic_statistics
from statflow.fields.climatology.variables import calculate_heat_index, biovars
# Climate indicators
# Warm Spell Duration Index
wsdi = calculate_WSDI(
daily_tmax_data,
tmax_threshold=30.0,
min_consec_days=6
)
# Summer Days count
summer_days = calculate_SU(daily_tmax_data, tmax_threshold=25.0)
# Heat wave analysis
hwd_events, total_hwd = calculate_hwd(
tmax_data, tmin_data,
max_thresh=35.0, min_thresh=20.0,
dates=date_index, min_days=3
)
# Climatological statistics
monthly_climat = climat_periodic_statistics(
climate_data,
statistic="mean",
time_freq="monthly",
keep_std_dates=True
)
# Meteorological calculations
heat_idx = calculate_heat_index(temperature, humidity, unit="celsius")
dew_point = calculate_dew_point(temperature, humidity)
# Bioclimatic variables (19 standard indicators)
bioclim_vars = biovars(
tmax_monthly_climat,
tmin_monthly_climat,
precip_monthly_climat
)
```
### Bias Correction
```python
from statflow.fields.climatology.simple_bias_correction import calculate_and_apply_deltas
# Simple bias correction between observed and reanalysis data
corrected_data = calculate_and_apply_deltas(
observed_series=obs_data,
reanalysis_series=reanalysis_data,
time_freq="monthly",
delta_type="absolute", # or "relative"
statistic="mean",
preference="observed", # treat observations as truth
season_months=[12, 1, 2] # for seasonal analysis
)
```
### Representative Series (HDY)
```python
from statflow.fields.climatology.representative_series import calculate_HDY, hdy_interpolation
# Calculate Hourly Design Year following ISO 15927-4:2005
hdy_dataframe, selected_years = calculate_HDY(
hourly_climate_df,
varlist=['date', 'temperature', 'humidity', 'wind_speed'],
varlist_primary=['date', 'temperature', 'humidity'],
drop_new_idx_col=True
)
# Interpolate between months to smooth transitions
hdy_smooth, wind_dir_smooth = hdy_interpolation(
hdy_dataframe,
selected_years,
previous_month_last_time_range="20:23",
next_month_first_time_range="0:3",
varlist_to_interpolate=['temperature', 'humidity'],
polynomial_order=3
)
```
## Project Structure
The package is organised as a comprehensive statistical analysis toolkit:
```text
statflow/
├── core/ # Core statistical functionality
│ ├── approximation_techniques.py # Curve fitting and approximation methods
│ ├── interpolation_methods.py # Multi-format interpolation tools
│ ├── moving_operations.py # Moving averages and window operations
│ ├── regressions.py # Regression analysis tools
│ ├── signal_processing.py # Signal filtering and processing
│ ├── statistical_tests.py # Hypothesis testing functions
│ └── time_series.py # Time series analysis and statistics
├── fields/ # Domain-specific analysis modules
│ └── climatology/ # Climate data analysis tools
│ ├── indicators.py # Climate indicators (WSDI, SU, etc.)
│ ├── periodic_climat_stats.py # Climatological statistics
│ ├── representative_series.py # HDY and representative data
│ ├── simple_bias_correction.py # Bias correction methods
│ └── variables.py # Meteorological calculations
├── distributions/ # Statistical distributions (future expansion)
├── utils/ # Utility functions and helpers
│ └── helpers.py # Support functions for analysis
├── CHANGELOG.md # Detailed version history
├── VERSIONING.md # Version management documentation
└── README.md # Package documentation
```
## Key Capabilities
### 1. Time Series Analysis
- **Periodic Statistics**: Calculate statistics across multiple time frequencies with robust datetime handling
- **Cumulative Data Processing**: Decompose cumulative time series into individual values
- **Consecutive Analysis**: Detect and count consecutive occurrences of extreme events
- **Autocorrelation**: Optimised autocorrelation analysis for pattern detection
### 2. Statistical Testing
- **Hypothesis Tests**: Z-tests for mean comparison, Chi-squared tests for independence
- **Robust Validation**: Comprehensive input validation and error handling
- **Multiple Data Types**: Support for NumPy arrays, pandas Series, and more
### 3. Signal Processing
- **Filtering Suite**: Low-pass, high-pass, and band-pass filters with multiple implementation methods
- **Signal Enhancement**: Whitening techniques for decorrelation and noise reduction
- **Frequency Domain**: Fourier transform-based processing for advanced filtering
### 4. Climatological Indicators
- **Standard Indices**: WSDI, SU, CSU, FD, TN, RR, CWD following international standards
- **Heat Wave Analysis**: Comprehensive heat wave detection with intensity metrics
- **Bioclimatic Variables**: Complete set of 19 bioclimatic indicators for ecological studies
### 5. Meteorological Calculations
- **Atmospheric Variables**: Heat index, wind chill, dew point, specific humidity
- **Magnus Formula**: Accurate saturation vapor pressure calculations
- **Multi-Unit Support**: Celsius/Fahrenheit and metric/imperial unit systems
### 6. Data Processing Excellence
- **Multi-Format Support**: Seamless handling of pandas, xarray, and NumPy data structures
- **Type Safety**: Modern PEP-604 type annotations throughout the codebase
- **Error Handling**: Comprehensive validation with descriptive error messages
## Advanced Features
### Professional Climatology Workflows
```python
# Complete climatological analysis workflow
from statflow.fields.climatology import *
# 1. Calculate basic climate indicators
indicators = {
'summer_days': calculate_SU(daily_tmax, 25.0),
'frost_days': calculate_FD(daily_tmin, 0.0),
'tropical_nights': calculate_TN(daily_tmin, 20.0),
'wet_days': calculate_RR(daily_precip, 1.0)
}
# 2. Generate climatological statistics
climat_stats = climat_periodic_statistics(
climate_dataframe,
statistic="mean",
time_freq="seasonal",
season_months=[6, 7, 8] # Summer season
)
# 3. Apply bias correction
corrected_projections = calculate_and_apply_deltas(
observed_data, model_data,
time_freq="monthly",
delta_type="relative",
preference="observed"
)
# 4. Calculate meteorological variables
heat_stress = calculate_heat_index(temperature, humidity)
comfort_metrics = calculate_wind_chill(temperature, wind_speed)
```
### High-Performance Time Series Processing
```python
# Optimised for large datasets
from statflow.core.time_series import periodic_statistics, consec_occurrences_maxdata
# Process multi-dimensional climate data
large_dataset = xr.open_dataset("large_climate_file.nc")
# Efficient periodic statistics with proper memory management
monthly_stats = periodic_statistics(
large_dataset,
statistic="mean",
freq="M",
groupby_dates=True
)
# Vectorised extreme event analysis
extreme_events = consec_occurrences_maxdata(
temperature_array,
max_threshold=35.0,
min_consec=3,
calc_max_consec=True
)
```
## Dependencies
### Core Dependencies
- **numpy**: Numerical computing and array operations
- **pandas**: Data manipulation and time series handling
- **scipy**: Statistical functions and signal processing
- **xarray**: Multi-dimensional data handling for climate data
### Project Dependencies
- **filewise**: File operations and introspection utilities
- **pygenutils**: General-purpose utilities for arrays, strings, and time handling
- **paramlib**: Parameter management and global constants
### Optional Dependencies
- **scikit-learn**: For advanced whitening techniques in signal processing
- **matplotlib**: For plotting and visualisation (user's choice)
## Integration Examples
### Climate Data Analysis Pipeline
```python
import statflow as sf
import xarray as xr
import pandas as pd
# Load climate model data
climate_data = xr.open_dataset("climate_model_output.nc")
# 1. Time series analysis
trend_analysis = sf.core.time_series.periodic_statistics(
climate_data.temperature,
statistic="mean",
freq="Y" # Annual trends
)
# 2. Calculate climate indicators
heat_waves = sf.fields.climatology.indicators.calculate_hwd(
climate_data.tasmax.values,
climate_data.tasmin.values,
max_thresh=35.0,
min_thresh=20.0,
dates=climate_data.time,
min_days=3
)
# 3. Signal processing for trend detection
filtered_temp = sf.core.signal_processing.low_pass_filter(
climate_data.temperature.values,
window_size=10
)
# 4. Statistical validation
temp_stats = sf.core.statistical_tests.z_test_two_means(
historical_period,
future_period
)
```
### Multi-Scale Statistical Analysis
```python
# Analyse data across multiple temporal scales
scales = ['hourly', 'daily', 'monthly', 'seasonal']
results = {}
for scale in scales:
results[scale] = sf.fields.climatology.climat_periodic_statistics(
meteorological_data,
statistic="mean",
time_freq=scale,
keep_std_dates=True
)
# Cross-scale correlation analysis
correlations = {}
for i, scale1 in enumerate(scales):
for scale2 in scales[i+1:]:
corr_data = sf.core.time_series.autocorrelate(
results[scale1].values.flatten()
)
correlations[f"{scale1}_{scale2}"] = corr_data
```
## Best Practices
### Data Preparation
- Ensure consistent datetime indexing for time series analysis
- Validate data quality and handle missing values appropriately
- Use appropriate data structures (pandas for tabular, xarray for multi-dimensional)
- Consider memory usage for large climate datasets
### Statistical Analysis
- Choose appropriate statistical tests based on data distribution and assumptions
- Use robust error handling and validate input parameters
- Consider multiple time scales for comprehensive climate analysis
- Apply proper bias correction techniques for model-observation comparisons
### Performance Optimisation
- Leverage vectorised operations for large datasets
- Use appropriate interpolation methods based on data characteristics
- Consider parallel processing for independent calculations
- Monitor memory usage with large climate model outputs
### Climatological Standards
- Follow international standards for climate indicator calculations
- Use appropriate thresholds for regional climate conditions
- Document methodology and parameter choices
- Validate results against established climatological references
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request for:
- New statistical methods or climate indicators
- Performance improvements and optimisations
- Enhanced documentation and examples
- Bug fixes and error handling improvements
### Development Guidelines
1. **Follow Type Annotations**: Use modern PEP-604 syntax for type hints
2. **Maintain Documentation**: Comprehensive docstrings with examples
3. **Add Tests**: Unit tests for new functionality
4. **Performance Considerations**: Optimise for large scientific datasets
5. **Compatibility**: Ensure compatibility with multiple data formats
```bash
git clone https://github.com/EusDancerDev/statflow.git
cd statflow
pip install -e ".[dev]"
pytest # Run test suite
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- **Scientific Python Community** for foundational libraries (NumPy, pandas, scipy, xarray)
- **Climate Research Community** for standard definitions of climate indicators
- **International Standards** (ISO 15927-4:2005) for representative weather data methodologies
- **Open Source Contributors** for continuous improvement and feedback
## Citation
If you use statflow in your research, please cite:
```bibtex
@software{statflow2024,
title={statflow: Statistical Analysis and Climatology Toolkit},
author={Your Name},
year={2024},
url={https://github.com/yourusername/statflow},
version={3.5.0}
}
```
## Contact
For questions, suggestions, or collaboration opportunities:
- **Issues**: Open an issue on GitHub for bug reports or feature requests
- **Discussions**: Use GitHub Discussions for general questions and ideas
- **Email**: Contact the maintainers for collaboration inquiries
## Related Projects
- **climalab**: Climate data analysis and processing tools
- **filewise**: File operations and data manipulation utilities
- **pygenutils**: General-purpose Python utilities
- **paramlib**: Parameter management and configuration constants
## Troubleshooting
### Common Issues
1. **Memory Errors with Large Datasets**:
```python
# Use chunking for large xarray datasets
large_data = xr.open_dataset("huge_file.nc", chunks={'time': 1000})
```
2. **Type Compatibility**:
```python
# Ensure consistent data types
data = data.astype(np.float64) # Convert to consistent numeric type
```
3. **Missing Dependencies**:
```bash
pip install scipy xarray # Install missing scientific computing libraries
```
4. **Performance Issues**:
```python
# Use appropriate methods for data size
if len(data) > 50000:
autocorr = sf.core.time_series.autocorrelate(data, twosided=False)
```
### Getting Help
- Check the [CHANGELOG.md](CHANGELOG.md) for recent updates and breaking changes
- Review function docstrings for parameter details and examples
- Consult the [VERSIONING.md](VERSIONING.md) for version compatibility information
- Open an issue on GitHub with a minimal reproducible example
---
**statflow** - Professional statistical analysis and climatology toolkit for Python 🌡️📊
Raw data
{
"_id": null,
"home_page": null,
"name": "statflow",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "statistics, time series, signal processing, climatology, data analysis",
"author": null,
"author_email": "Jon Ander Gabantxo <jagabantxo@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/3d/f9/af60d16cb7467ff1e7000dd3690f99c9c00527143ef3b2fdcad107528e07/statflow-3.5.9.tar.gz",
"platform": null,
"description": "# statflow\n\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n[](https://pypi.org/project/statflow/)\n\n**statflow** is a comprehensive Python toolkit for statistical analysis, time series processing, and climatological data analysis. Built with modern scientific computing standards, it provides robust tools for statistical operations, signal processing, and specialised climatology workflows. The package emphasises professional-grade statistical computing with comprehensive type annotations, efficient algorithms, and extensive climatological indicators.\n\n## Features\n\n- **Core Statistical Analysis**:\n - Advanced time series analysis with periodic statistics and trend detection\n - Statistical hypothesis testing (Z-tests, Chi-squared tests)\n - Moving operations (moving averages, window sums) for multi-dimensional data\n - Comprehensive interpolation methods (polynomial, spline, linear) for NumPy, pandas, and xarray\n - Signal processing with filtering (low-pass, high-pass, band-pass) and whitening techniques\n - Regression analysis tools and approximation techniques\n\n- **Climatological Analysis**:\n - Climate indicator calculations (WSDI, SU, CSU, FD, TN, RR, CWD, HWD)\n - Periodic climatological statistics with multi-frequency support (hourly, daily, monthly, seasonal, yearly)\n - Representative series generation including Hourly Design Year (HDY) following ISO 15927-4:2005\n - Simple bias correction techniques with absolute and relative delta methods\n - Comprehensive meteorological variable calculations (heat index, wind chill, dew point, specific humidity)\n - Bioclimatic variable computation (19 standard bioclimatic indicators)\n\n- **Advanced Data Processing**:\n - Multi-format data support (pandas DataFrames, xarray Datasets/DataArrays, NumPy arrays)\n - Cumulative data decomposition and time series transformation\n - Consecutive occurrence analysis for extreme event detection\n - Autocorrelation analysis with optimised algorithms for large datasets\n - Professional error handling with comprehensive input validation\n\n- **Signal Processing & Filtering**:\n - Signal whitening techniques (classic, sklearn PCA, ZCA whitening)\n - Multiple filtering approaches with frequency domain processing\n - Fourier transform-based band-pass filtering methods\n - Noise handling and signal enhancement tools\n\n## Installation\n\n### Prerequisites\n\n- **Python 3.10+**: Required for modern type annotations and features\n- **Core Dependencies**: NumPy, pandas, scipy, xarray for scientific computing\n- **Additional Dependencies**: filewise, pygenutils (project packages)\n\n### For Regular Users\n\n**For regular users** who want to use the package in their projects:\n\n```bash\npip install statflow\n```\n\nThis automatically installs `statflow` and all its dependencies from PyPI and GitHub repositories.\n\n### Package Updates\n\nTo stay up-to-date with the latest version of this package, simply run:\n\n```bash\npip install --upgrade statflow\n```\n\n## Development Setup\n\n### For Contributors and Developers\n\nIf you're planning to contribute to the project or work with the source code, follow these setup instructions:\n\n#### Quick Setup (Recommended)\n\n```bash\n# Clone the repository\ngit clone https://github.com/EusDancerDev/statflow.git\ncd statflow\n\n# Install in editable mode with all dependencies\npip install -e .\n```\n\n**Note**: The `-e` flag installs the package in \"editable\" mode, meaning changes to the source code are immediately reflected without reinstalling.\n\nThis will automatically install all dependencies with version constraints.\n\n#### Alternative Setup (Explicit Git Dependencies)\n\nIf you prefer to use the explicit development requirements file:\n\n```bash\n# Clone the repository\ngit clone https://github.com/EusDancerDev/statflow.git\ncd statflow\n\n# Install development dependencies from requirements-dev.txt\npip install -r requirements-dev.txt\n\n# Install in editable mode\npip install -e .\n```\n\nThis approach gives you the latest development versions of all interdependent packages for testing and development.\n\nIf you encounter import errors after cloning:\n\n1. **For regular users**: Run `pip install statflow` (all dependencies included)\n2. **For developers**: Run `pip install -e .[dev]` to include development dependencies\n3. **Verify Python environment**: Make sure you're using a compatible Python version (3.10+)\n4. **Check scientific computing libraries**: Ensure scipy, xarray, and other scientific packages are available\n\n### Verify Installation\n\nTo verify that your installation is working correctly, you can run this quick test:\n\n```python\n# Test script to verify installation\ntry:\n import statflow\n from filewise.general.introspection_utils import get_type_str\n from pygenutils.arrays_and_lists.data_manipulation import flatten_list\n from statflow.core.time_series import periodic_statistics\n \n print(\"\u2705 All imports successful!\")\n print(f\"\u2705 statflow version: {statflow.__version__}\")\n print(\"\u2705 Installation is working correctly.\")\n \nexcept ImportError as e:\n print(f\"\u274c Import error: {e}\")\n print(\"\ud83d\udca1 For regular users: pip install statflow\")\n print(\"\ud83d\udca1 For developers: pip install -e .[dev]\")\n```\n\n### Implementation Notes\n\nThis project implements a **dual-approach dependency management** system:\n\n- **Production Dependencies**: Version-constrained dependencies for PyPI compatibility\n- **Development Dependencies**: Git-based dependencies for latest development versions\n- **Installation Methods**:\n - **Regular users**: Simple `pip install statflow` with all dependencies included\n - **Developers**: `pip install -e .[dev]` for latest Git versions and development tools\n- **PyPI Compatibility**: All packages can be published without Git dependency issues\n- **Development Flexibility**: Contributors get access to latest versions for testing and development\n\n## Usage\n\n### Core Statistical Analysis\n\n```python\nfrom statflow.core.time_series import periodic_statistics, autocorrelate\nfrom statflow.core.statistical_tests import z_test_two_means, chi_squared_test\nimport pandas as pd\nimport numpy as np\n\n# Load your time series data\ndf = pd.read_csv(\"your_data.csv\", parse_dates=['date'])\n\n# Calculate periodic statistics\nmonthly_means = periodic_statistics(\n df, \n statistic=\"mean\", \n freq=\"M\", # Monthly frequency\n drop_date_idx_col=False\n)\n\n# Perform hypothesis testing\nsample1 = np.random.normal(10, 2, 100)\nsample2 = np.random.normal(12, 2, 100)\nz_stat, p_value, result = z_test_two_means(sample1, sample2)\nprint(f\"Z-test result: {result}\")\n\n# Autocorrelation analysis\nautocorr = autocorrelate(df['temperature'].values, twosided=False)\n```\n\n### Signal Processing\n\n```python\nfrom statflow.core.signal_processing import low_pass_filter, band_pass1, signal_whitening\nfrom statflow.core.moving_operations import moving_average, window_sum\n\n# Apply signal filtering\nfiltered_signal = low_pass_filter(noisy_data, window_size=5)\n\n# Band-pass filtering in frequency domain\nband_filtered = band_pass1(\n original_signal, \n timestep=0.1, \n low_freq=0.1, \n high_freq=2.0\n)\n\n# Signal whitening for decorrelation\nwhitened_data = signal_whitening(signal_data, method=\"classic\")\n\n# Moving operations for time series\nmoving_avg = moving_average(time_series, N=7) # 7-day moving average\ncumulative_sum = window_sum(data_array, N=30) # 30-point window sum\n```\n\n### Interpolation Methods\n\n```python\nfrom statflow.core.interpolation_methods import interp_np, interp_pd, interp_xr, polynomial_fitting\n\n# NumPy array interpolation\ninterpolated_np = interp_np(\n data_with_gaps, \n method='spline', \n order=3\n)\n\n# Pandas DataFrame interpolation\ninterpolated_pd = interp_pd(\n df_with_missing, \n method='polynomial', \n order=2\n)\n\n# Polynomial fitting with edge preservation\nfitted_data = polynomial_fitting(\n y_values, \n poly_ord=3, \n fix_edges=True\n)\n```\n\n### Climatological Analysis\n\n```python\nfrom statflow.fields.climatology.indicators import calculate_WSDI, calculate_SU, calculate_hwd\nfrom statflow.fields.climatology.periodic_climat_stats import climat_periodic_statistics\nfrom statflow.fields.climatology.variables import calculate_heat_index, biovars\n\n# Climate indicators\n# Warm Spell Duration Index\nwsdi = calculate_WSDI(\n daily_tmax_data, \n tmax_threshold=30.0, \n min_consec_days=6\n)\n\n# Summer Days count\nsummer_days = calculate_SU(daily_tmax_data, tmax_threshold=25.0)\n\n# Heat wave analysis\nhwd_events, total_hwd = calculate_hwd(\n tmax_data, tmin_data, \n max_thresh=35.0, min_thresh=20.0, \n dates=date_index, min_days=3\n)\n\n# Climatological statistics\nmonthly_climat = climat_periodic_statistics(\n climate_data,\n statistic=\"mean\",\n time_freq=\"monthly\",\n keep_std_dates=True\n)\n\n# Meteorological calculations\nheat_idx = calculate_heat_index(temperature, humidity, unit=\"celsius\")\ndew_point = calculate_dew_point(temperature, humidity)\n\n# Bioclimatic variables (19 standard indicators)\nbioclim_vars = biovars(\n tmax_monthly_climat, \n tmin_monthly_climat, \n precip_monthly_climat\n)\n```\n\n### Bias Correction\n\n```python\nfrom statflow.fields.climatology.simple_bias_correction import calculate_and_apply_deltas\n\n# Simple bias correction between observed and reanalysis data\ncorrected_data = calculate_and_apply_deltas(\n observed_series=obs_data,\n reanalysis_series=reanalysis_data,\n time_freq=\"monthly\",\n delta_type=\"absolute\", # or \"relative\"\n statistic=\"mean\",\n preference=\"observed\", # treat observations as truth\n season_months=[12, 1, 2] # for seasonal analysis\n)\n```\n\n### Representative Series (HDY)\n\n```python\nfrom statflow.fields.climatology.representative_series import calculate_HDY, hdy_interpolation\n\n# Calculate Hourly Design Year following ISO 15927-4:2005\nhdy_dataframe, selected_years = calculate_HDY(\n hourly_climate_df,\n varlist=['date', 'temperature', 'humidity', 'wind_speed'],\n varlist_primary=['date', 'temperature', 'humidity'],\n drop_new_idx_col=True\n)\n\n# Interpolate between months to smooth transitions\nhdy_smooth, wind_dir_smooth = hdy_interpolation(\n hdy_dataframe,\n selected_years,\n previous_month_last_time_range=\"20:23\",\n next_month_first_time_range=\"0:3\",\n varlist_to_interpolate=['temperature', 'humidity'],\n polynomial_order=3\n)\n```\n\n## Project Structure\n\nThe package is organised as a comprehensive statistical analysis toolkit:\n\n```text\nstatflow/\n\u251c\u2500\u2500 core/ # Core statistical functionality\n\u2502 \u251c\u2500\u2500 approximation_techniques.py # Curve fitting and approximation methods\n\u2502 \u251c\u2500\u2500 interpolation_methods.py # Multi-format interpolation tools\n\u2502 \u251c\u2500\u2500 moving_operations.py # Moving averages and window operations\n\u2502 \u251c\u2500\u2500 regressions.py # Regression analysis tools\n\u2502 \u251c\u2500\u2500 signal_processing.py # Signal filtering and processing\n\u2502 \u251c\u2500\u2500 statistical_tests.py # Hypothesis testing functions\n\u2502 \u2514\u2500\u2500 time_series.py # Time series analysis and statistics\n\u251c\u2500\u2500 fields/ # Domain-specific analysis modules\n\u2502 \u2514\u2500\u2500 climatology/ # Climate data analysis tools\n\u2502 \u251c\u2500\u2500 indicators.py # Climate indicators (WSDI, SU, etc.)\n\u2502 \u251c\u2500\u2500 periodic_climat_stats.py # Climatological statistics\n\u2502 \u251c\u2500\u2500 representative_series.py # HDY and representative data\n\u2502 \u251c\u2500\u2500 simple_bias_correction.py # Bias correction methods\n\u2502 \u2514\u2500\u2500 variables.py # Meteorological calculations\n\u251c\u2500\u2500 distributions/ # Statistical distributions (future expansion)\n\u251c\u2500\u2500 utils/ # Utility functions and helpers\n\u2502 \u2514\u2500\u2500 helpers.py # Support functions for analysis\n\u251c\u2500\u2500 CHANGELOG.md # Detailed version history\n\u251c\u2500\u2500 VERSIONING.md # Version management documentation\n\u2514\u2500\u2500 README.md # Package documentation\n```\n\n## Key Capabilities\n\n### 1. Time Series Analysis\n\n- **Periodic Statistics**: Calculate statistics across multiple time frequencies with robust datetime handling\n- **Cumulative Data Processing**: Decompose cumulative time series into individual values\n- **Consecutive Analysis**: Detect and count consecutive occurrences of extreme events\n- **Autocorrelation**: Optimised autocorrelation analysis for pattern detection\n\n### 2. Statistical Testing\n\n- **Hypothesis Tests**: Z-tests for mean comparison, Chi-squared tests for independence\n- **Robust Validation**: Comprehensive input validation and error handling\n- **Multiple Data Types**: Support for NumPy arrays, pandas Series, and more\n\n### 3. Signal Processing\n\n- **Filtering Suite**: Low-pass, high-pass, and band-pass filters with multiple implementation methods\n- **Signal Enhancement**: Whitening techniques for decorrelation and noise reduction\n- **Frequency Domain**: Fourier transform-based processing for advanced filtering\n\n### 4. Climatological Indicators\n\n- **Standard Indices**: WSDI, SU, CSU, FD, TN, RR, CWD following international standards\n- **Heat Wave Analysis**: Comprehensive heat wave detection with intensity metrics\n- **Bioclimatic Variables**: Complete set of 19 bioclimatic indicators for ecological studies\n\n### 5. Meteorological Calculations\n\n- **Atmospheric Variables**: Heat index, wind chill, dew point, specific humidity\n- **Magnus Formula**: Accurate saturation vapor pressure calculations\n- **Multi-Unit Support**: Celsius/Fahrenheit and metric/imperial unit systems\n\n### 6. Data Processing Excellence\n\n- **Multi-Format Support**: Seamless handling of pandas, xarray, and NumPy data structures\n- **Type Safety**: Modern PEP-604 type annotations throughout the codebase\n- **Error Handling**: Comprehensive validation with descriptive error messages\n\n## Advanced Features\n\n### Professional Climatology Workflows\n\n```python\n# Complete climatological analysis workflow\nfrom statflow.fields.climatology import *\n\n# 1. Calculate basic climate indicators\nindicators = {\n 'summer_days': calculate_SU(daily_tmax, 25.0),\n 'frost_days': calculate_FD(daily_tmin, 0.0),\n 'tropical_nights': calculate_TN(daily_tmin, 20.0),\n 'wet_days': calculate_RR(daily_precip, 1.0)\n}\n\n# 2. Generate climatological statistics\nclimat_stats = climat_periodic_statistics(\n climate_dataframe,\n statistic=\"mean\",\n time_freq=\"seasonal\",\n season_months=[6, 7, 8] # Summer season\n)\n\n# 3. Apply bias correction\ncorrected_projections = calculate_and_apply_deltas(\n observed_data, model_data,\n time_freq=\"monthly\",\n delta_type=\"relative\",\n preference=\"observed\"\n)\n\n# 4. Calculate meteorological variables\nheat_stress = calculate_heat_index(temperature, humidity)\ncomfort_metrics = calculate_wind_chill(temperature, wind_speed)\n```\n\n### High-Performance Time Series Processing\n\n```python\n# Optimised for large datasets\nfrom statflow.core.time_series import periodic_statistics, consec_occurrences_maxdata\n\n# Process multi-dimensional climate data\nlarge_dataset = xr.open_dataset(\"large_climate_file.nc\")\n\n# Efficient periodic statistics with proper memory management\nmonthly_stats = periodic_statistics(\n large_dataset,\n statistic=\"mean\",\n freq=\"M\",\n groupby_dates=True\n)\n\n# Vectorised extreme event analysis\nextreme_events = consec_occurrences_maxdata(\n temperature_array,\n max_threshold=35.0,\n min_consec=3,\n calc_max_consec=True\n)\n```\n\n## Dependencies\n\n### Core Dependencies\n\n- **numpy**: Numerical computing and array operations\n- **pandas**: Data manipulation and time series handling\n- **scipy**: Statistical functions and signal processing\n- **xarray**: Multi-dimensional data handling for climate data\n\n### Project Dependencies\n\n- **filewise**: File operations and introspection utilities\n- **pygenutils**: General-purpose utilities for arrays, strings, and time handling\n- **paramlib**: Parameter management and global constants\n\n### Optional Dependencies\n\n- **scikit-learn**: For advanced whitening techniques in signal processing\n- **matplotlib**: For plotting and visualisation (user's choice)\n\n## Integration Examples\n\n### Climate Data Analysis Pipeline\n\n```python\nimport statflow as sf\nimport xarray as xr\nimport pandas as pd\n\n# Load climate model data\nclimate_data = xr.open_dataset(\"climate_model_output.nc\")\n\n# 1. Time series analysis\ntrend_analysis = sf.core.time_series.periodic_statistics(\n climate_data.temperature,\n statistic=\"mean\",\n freq=\"Y\" # Annual trends\n)\n\n# 2. Calculate climate indicators\nheat_waves = sf.fields.climatology.indicators.calculate_hwd(\n climate_data.tasmax.values,\n climate_data.tasmin.values,\n max_thresh=35.0,\n min_thresh=20.0,\n dates=climate_data.time,\n min_days=3\n)\n\n# 3. Signal processing for trend detection\nfiltered_temp = sf.core.signal_processing.low_pass_filter(\n climate_data.temperature.values,\n window_size=10\n)\n\n# 4. Statistical validation\ntemp_stats = sf.core.statistical_tests.z_test_two_means(\n historical_period,\n future_period\n)\n```\n\n### Multi-Scale Statistical Analysis\n\n```python\n# Analyse data across multiple temporal scales\nscales = ['hourly', 'daily', 'monthly', 'seasonal']\nresults = {}\n\nfor scale in scales:\n results[scale] = sf.fields.climatology.climat_periodic_statistics(\n meteorological_data,\n statistic=\"mean\",\n time_freq=scale,\n keep_std_dates=True\n )\n\n# Cross-scale correlation analysis\ncorrelations = {}\nfor i, scale1 in enumerate(scales):\n for scale2 in scales[i+1:]:\n corr_data = sf.core.time_series.autocorrelate(\n results[scale1].values.flatten()\n )\n correlations[f\"{scale1}_{scale2}\"] = corr_data\n```\n\n## Best Practices\n\n### Data Preparation\n\n- Ensure consistent datetime indexing for time series analysis\n- Validate data quality and handle missing values appropriately\n- Use appropriate data structures (pandas for tabular, xarray for multi-dimensional)\n- Consider memory usage for large climate datasets\n\n### Statistical Analysis\n\n- Choose appropriate statistical tests based on data distribution and assumptions\n- Use robust error handling and validate input parameters\n- Consider multiple time scales for comprehensive climate analysis\n- Apply proper bias correction techniques for model-observation comparisons\n\n### Performance Optimisation\n\n- Leverage vectorised operations for large datasets\n- Use appropriate interpolation methods based on data characteristics\n- Consider parallel processing for independent calculations\n- Monitor memory usage with large climate model outputs\n\n### Climatological Standards\n\n- Follow international standards for climate indicator calculations\n- Use appropriate thresholds for regional climate conditions\n- Document methodology and parameter choices\n- Validate results against established climatological references\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request for:\n\n- New statistical methods or climate indicators\n- Performance improvements and optimisations\n- Enhanced documentation and examples\n- Bug fixes and error handling improvements\n\n### Development Guidelines\n\n1. **Follow Type Annotations**: Use modern PEP-604 syntax for type hints\n2. **Maintain Documentation**: Comprehensive docstrings with examples\n3. **Add Tests**: Unit tests for new functionality\n4. **Performance Considerations**: Optimise for large scientific datasets\n5. **Compatibility**: Ensure compatibility with multiple data formats\n\n```bash\ngit clone https://github.com/EusDancerDev/statflow.git\ncd statflow\npip install -e \".[dev]\"\npytest # Run test suite\n```\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- **Scientific Python Community** for foundational libraries (NumPy, pandas, scipy, xarray)\n- **Climate Research Community** for standard definitions of climate indicators\n- **International Standards** (ISO 15927-4:2005) for representative weather data methodologies\n- **Open Source Contributors** for continuous improvement and feedback\n\n## Citation\n\nIf you use statflow in your research, please cite:\n\n```bibtex\n@software{statflow2024,\n title={statflow: Statistical Analysis and Climatology Toolkit},\n author={Your Name},\n year={2024},\n url={https://github.com/yourusername/statflow},\n version={3.5.0}\n}\n```\n\n## Contact\n\nFor questions, suggestions, or collaboration opportunities:\n\n- **Issues**: Open an issue on GitHub for bug reports or feature requests\n- **Discussions**: Use GitHub Discussions for general questions and ideas\n- **Email**: Contact the maintainers for collaboration inquiries\n\n## Related Projects\n\n- **climalab**: Climate data analysis and processing tools\n- **filewise**: File operations and data manipulation utilities \n- **pygenutils**: General-purpose Python utilities\n- **paramlib**: Parameter management and configuration constants\n\n## Troubleshooting\n\n### Common Issues\n\n1. **Memory Errors with Large Datasets**:\n\n ```python\n # Use chunking for large xarray datasets\n large_data = xr.open_dataset(\"huge_file.nc\", chunks={'time': 1000})\n ```\n\n2. **Type Compatibility**:\n\n ```python\n # Ensure consistent data types\n data = data.astype(np.float64) # Convert to consistent numeric type\n ```\n\n3. **Missing Dependencies**:\n\n ```bash\n pip install scipy xarray # Install missing scientific computing libraries\n ```\n\n4. **Performance Issues**:\n\n ```python\n # Use appropriate methods for data size\n if len(data) > 50000:\n autocorr = sf.core.time_series.autocorrelate(data, twosided=False)\n ```\n\n### Getting Help\n\n- Check the [CHANGELOG.md](CHANGELOG.md) for recent updates and breaking changes\n- Review function docstrings for parameter details and examples\n- Consult the [VERSIONING.md](VERSIONING.md) for version compatibility information\n- Open an issue on GitHub with a minimal reproducible example\n\n---\n\n**statflow** - Professional statistical analysis and climatology toolkit for Python \ud83c\udf21\ufe0f\ud83d\udcca\n",
"bugtrack_url": null,
"license": "MIT License\n \n Copyright (c) 2024 StatFlow\n \n Permission is hereby granted, free of charge, to any person obtaining a copy\n of this software and associated documentation files (the \"Software\"), to deal\n in the Software without restriction, including without limitation the rights\n to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n copies of the Software, and to permit persons to whom the Software is\n furnished to do so, subject to the following conditions:\n \n The above copyright notice and this permission notice shall be included in all\n copies or substantial portions of the Software.\n \n THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n SOFTWARE. ",
"summary": "A versatile statistical toolkit for Python, featuring core statistical methods, time series analysis, signal processing, and climatology tools",
"version": "3.5.9",
"project_urls": {
"Bug Reports": "https://github.com/EusDancerDev/statflow/issues",
"Documentation": "https://github.com/EusDancerDev/statflow#readme",
"Homepage": "https://github.com/EusDancerDev/statflow",
"Repository": "https://github.com/EusDancerDev/statflow.git"
},
"split_keywords": [
"statistics",
" time series",
" signal processing",
" climatology",
" data analysis"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "0a02e743f9e8b3bbc1ce0dd827639f32a82cbcbaa012d3baaa64246f30f27aff",
"md5": "4e5f2cc91199bf3f5f03f0632c2dd23b",
"sha256": "28a1e642e1890de016805fc835fed8a74b2742b55d823e7c2de66b73c700a223"
},
"downloads": -1,
"filename": "statflow-3.5.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4e5f2cc91199bf3f5f03f0632c2dd23b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 51699,
"upload_time": "2025-07-28T09:33:18",
"upload_time_iso_8601": "2025-07-28T09:33:18.728688Z",
"url": "https://files.pythonhosted.org/packages/0a/02/e743f9e8b3bbc1ce0dd827639f32a82cbcbaa012d3baaa64246f30f27aff/statflow-3.5.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "3df9af60d16cb7467ff1e7000dd3690f99c9c00527143ef3b2fdcad107528e07",
"md5": "aa744f214b0283391befdb4815ce7de6",
"sha256": "d1136e438a787aabae58e9e3cdcc8f5f5fae0ddf9c79d6292d7b8349b91359e2"
},
"downloads": -1,
"filename": "statflow-3.5.9.tar.gz",
"has_sig": false,
"md5_digest": "aa744f214b0283391befdb4815ce7de6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 52969,
"upload_time": "2025-07-28T09:33:20",
"upload_time_iso_8601": "2025-07-28T09:33:20.001845Z",
"url": "https://files.pythonhosted.org/packages/3d/f9/af60d16cb7467ff1e7000dd3690f99c9c00527143ef3b2fdcad107528e07/statflow-3.5.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-28 09:33:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "EusDancerDev",
"github_project": "statflow",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.21.0"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"1.3.0"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.7.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.5.0"
]
]
},
{
"name": "xarray",
"specs": [
[
">=",
"2022.1.0"
]
]
},
{
"name": "filewise",
"specs": [
[
">=",
"3.11.6"
]
]
},
{
"name": "pygenutils",
"specs": [
[
">=",
"16.2.3"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.0.0"
]
]
}
],
"lcname": "statflow"
}