# Fisher Scoring with Python
**Author:** [xRiskLab](https://github.com/xRiskLab)<br>
**Version:** v2.0.2<br>
**License:** [MIT License](https://opensource.org/licenses/MIT) (2024)
![Title](https://github.com/xRiskLab/fisher-scoring/raw/main/docs/images/title.png)
This repository contains optimized Python implementations of the Fisher Scoring algorithm for various logistic regression models. With version 2.0, the core algorithms are now significantly faster due to optimized matrix operations and reduced memory usage, providing faster convergence for larger datasets.
```python
%pip install fisher-scoring
from fisher_scoring import FisherScoringLogisticRegression
# Initialize and fit model
model = FisherScoringLogisticRegression(epsilon=1e-5)
model.fit(X_train, y_train)
# Make predictions
predictions = model.predict(X_test)
probabilities = model.predict_proba(X_test)
```
## Overview
### Introduction
This repository contains a Python package with scikit-learn compatible implementations of the Fisher Scoring algorithm for various logistic regression use cases:
1. Binary classification problems: **Logistic Regression**.
2. Multi-class classification problems: **Multinomial Logistic Regression**.
3. Imbalanced classification problems: **Focal Loss Logistic Regression**.
### Fisher Scoring Algorithm
The Fisher Scoring algorithm is an iterative optimization technique that estimates maximum likelihood estimates by leveraging the expected or observed Fisher information matrix. This second-order optimization method allows to avoid the use of learning rates and provides more stable convergence compared to gradient descent.
There are two types of information matrices used in the Fisher Scoring algorithm:
* **Observed Information Matrix**: Uses ground truth labels to calculate the information matrix, often resulting in more reliable inference metrics.
* **Expected Information Matrix**: Relies on predicted probabilities, providing an efficient approximation for the information matrix.
These information matrices are used to derive standard errors of estimates to calculate detailed model statistics, including Wald statistics, p-values, and confidence intervals at a chosen level.
### Implementation Notes
- **Fisher Scoring Multinomial Regression**
The `FisherScoringMultinomialRegression` model differs from standard statistical multinomial logistic regression by using all classes rather than \( K - 1 \). This approach allows multi-class classification problems to be converted to binary problems by calculating \(1 - probability of the target class).
- **Fisher Scoring Focal Regression**
The `FisherScoringFocalRegression` class employs a non-standard log-likelihood function in its optimization process.
The focal loss function, originally developed for object detection, prioritizes difficult-to-classify examples—often the minority class—by reducing the contribution of easy-to-classify samples. It introduces a focusing parameter, *gamma*, which down-weights the influence of easily classified instances, thereby concentrating learning on challenging cases.
Source: [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002).
## Models
### Fisher Scoring Logistic Regression
The `FisherScoringLogisticRegression` class is a custom implementation of logistic regression using the Fisher scoring algorithm. It provides methods for fitting the model, making predictions, and computing model statistics, including standard errors, Wald statistics, p-values, and confidence intervals.
**Parameters:**
- `epsilon`: Convergence threshold for the algorithm.
- `max_iter`: Maximum number of iterations for the algorithm.
- `information`: Type of information matrix to use ('expected' or 'observed').
- `use_bias`: Include a bias term in the model.
- `significance`: Significance level for computing confidence intervals.
**Methods:**
- `fit(X, y)`: Fit the model to the data.
- `predict(X)`: Predict target labels for input data.
- `predict_proba(X)`: Predict class probabilities for input data.
- `get_params()`: Get model parameters.
- `set_params(**params)`: Set model parameters.
- `summary()`: Get a summary of model parameters, standard errors, p-values, and confidence intervals.
- `display_summary()`: Display a summary of model parameters, standard errors, p-values, and confidence intervals.
### Fisher Scoring Multinomial Regression
The `FisherScoringMultinomialRegression` class implements the Fisher Scoring algorithm for multinomial logistic regression, suitable for multi-class classification tasks.
**Parameters:**
- `epsilon`: Convergence threshold for the algorithm.
- `max_iter`: Maximum number of iterations for the algorithm.
- `information`: Type of information matrix to use ('expected' or 'observed').
- `use_bias`: Include a bias term in the model.
- `significance`: Significance level for computing confidence intervals.
- `verbose`: Enable verbose output.
**Methods:**
- `fit(X, y)`: Fit the model to the data.
- `predict(X)`: Predict target labels for input data.
- `predict_proba(X)`: Predict class probabilities for input data.
- `summary(class_idx)`: Get a summary of model parameters, standard errors, p-values, and confidence intervals for a specific class.
- `display_summary(class_idx)`: Display a summary of model parameters, standard errors, p-values, and confidence intervals for a specific class.
The algorithm is in a beta version and may require further testing and optimization to speed up matrix operations.
### Fisher Scoring Focal Loss Regression
The `FisherScoringFocalRegression` class implements the Fisher Scoring algorithm with focal loss, designed for imbalanced classification problems where the positive class is rare.
**Parameters:**
- `gamma`: Focusing parameter for focal loss.
- `epsilon`: Convergence threshold for the algorithm.
- `max_iter`: Maximum number of iterations for the algorithm.
- `information`: Type of information matrix to use ('expected' or 'observed').
- `use_bias`: Include a bias term in the model.
- `verbose`: Enable verbose output.
*Note*: The algorithm does not have a summary method for model statistics implemented yet.
## Installation
To use the models, clone the repository and install the required dependencies.
```bash
git clone https://github.com/xRiskLab/fisher-scoring.git
cd fisher-scoring
pip install -r requirements.txt
```
Alternatively, install the package directly from PyPI.
```bash
pip install fisher-scoring
```
## Change Log
- **v2.0.2**
- **Bug Fixes**: Fixed the `FisherScoringMultinomialRegression` class to have flexible NumPy data types.
- **v2.0.1**
- **Bug Fixes**: Removed the debug print statement from the `FisherScoringLogisticRegression` class.
- **v2.0**
- **Performance Improvements**: Performance Enhancements: Optimized matrix calculations for substantial speed and memory efficiency improvements across all models. Leveraging streamlined operations, this version achieves up to 290x faster convergence. Performance gains per model:
- *Multinomial Logistic Regression*: Training time reduced from 125.10s to 0.43s (~290x speedup).
- *Logistic Regression*: Training time reduced from 0.24s to 0.05s (~5x speedup).
- *Focal Loss Logistic Regression*: Training time reduced from 0.26s to 0.01s (~26x speedup).
- **Bug Fixes**: `verbose` parameter in Focal Loss Logistic Regression now functions as expected, providing accurate logging during training.
- **v0.1.4**
- Updated log likelihood for Multinomial Regression and minor changes to Logistic Regression for integration with scikit-learn.
- **v0.1.3**
- Added coefficients, standard errors, p-values, and confidence intervals for Multinomial Regression.
- **v0.1.2**
- Updated NumPy dependency.
- **v0.1.1**
- Added support for Python 3.9+ 🐍.
- **v0.1.0**
- Initial release of Fisher Scoring Logistic, Multinomial, and Focal Loss Regression.
Raw data
{
"_id": null,
"home_page": "https://github.com/xRiskLab/fisher-scoring",
"name": "fisher-scoring",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "fisher scoring, logistic regression, focal loss, multinomial regression",
"author": "xRiskLab",
"author_email": "contact@xrisklab.ai",
"download_url": "https://files.pythonhosted.org/packages/6d/05/482c5ad2fb5e3053f7f8166c3170953ef818a5ecf038b7aed0085e2f5b56/fisher_scoring-2.0.2.tar.gz",
"platform": null,
"description": "# Fisher Scoring with Python\n\n**Author:** [xRiskLab](https://github.com/xRiskLab)<br>\n**Version:** v2.0.2<br>\n**License:** [MIT License](https://opensource.org/licenses/MIT) (2024)\n\n![Title](https://github.com/xRiskLab/fisher-scoring/raw/main/docs/images/title.png)\n\nThis repository contains optimized Python implementations of the Fisher Scoring algorithm for various logistic regression models. With version 2.0, the core algorithms are now significantly faster due to optimized matrix operations and reduced memory usage, providing faster convergence for larger datasets.\n\n```python\n%pip install fisher-scoring\nfrom fisher_scoring import FisherScoringLogisticRegression\n\n# Initialize and fit model\nmodel = FisherScoringLogisticRegression(epsilon=1e-5)\nmodel.fit(X_train, y_train)\n\n# Make predictions\npredictions = model.predict(X_test)\nprobabilities = model.predict_proba(X_test)\n```\n\n## Overview\n\n### Introduction\n\nThis repository contains a Python package with scikit-learn compatible implementations of the Fisher Scoring algorithm for various logistic regression use cases:\n\n1. Binary classification problems: **Logistic Regression**.\n2. Multi-class classification problems: **Multinomial Logistic Regression**.\n3. Imbalanced classification problems: **Focal Loss Logistic Regression**.\n\n### Fisher Scoring Algorithm\n\nThe Fisher Scoring algorithm is an iterative optimization technique that estimates maximum likelihood estimates by leveraging the expected or observed Fisher information matrix. This second-order optimization method allows to avoid the use of learning rates and provides more stable convergence compared to gradient descent.\n\nThere are two types of information matrices used in the Fisher Scoring algorithm:\n\n* **Observed Information Matrix**: Uses ground truth labels to calculate the information matrix, often resulting in more reliable inference metrics.\n* **Expected Information Matrix**: Relies on predicted probabilities, providing an efficient approximation for the information matrix.\n\nThese information matrices are used to derive standard errors of estimates to calculate detailed model statistics, including Wald statistics, p-values, and confidence intervals at a chosen level.\n\n### Implementation Notes\n\n- **Fisher Scoring Multinomial Regression** \n The `FisherScoringMultinomialRegression` model differs from standard statistical multinomial logistic regression by using all classes rather than \\( K - 1 \\). This approach allows multi-class classification problems to be converted to binary problems by calculating \\(1 - probability of the target class).\n\n- **Fisher Scoring Focal Regression** \n The `FisherScoringFocalRegression` class employs a non-standard log-likelihood function in its optimization process.\n\n The focal loss function, originally developed for object detection, prioritizes difficult-to-classify examples\u2014often the minority class\u2014by reducing the contribution of easy-to-classify samples. It introduces a focusing parameter, *gamma*, which down-weights the influence of easily classified instances, thereby concentrating learning on challenging cases.\n\n Source: [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002).\n\n\n## Models\n\n### Fisher Scoring Logistic Regression\n\nThe `FisherScoringLogisticRegression` class is a custom implementation of logistic regression using the Fisher scoring algorithm. It provides methods for fitting the model, making predictions, and computing model statistics, including standard errors, Wald statistics, p-values, and confidence intervals.\n\n**Parameters:**\n- `epsilon`: Convergence threshold for the algorithm.\n- `max_iter`: Maximum number of iterations for the algorithm.\n- `information`: Type of information matrix to use ('expected' or 'observed').\n- `use_bias`: Include a bias term in the model.\n- `significance`: Significance level for computing confidence intervals.\n\n**Methods:**\n- `fit(X, y)`: Fit the model to the data.\n- `predict(X)`: Predict target labels for input data.\n- `predict_proba(X)`: Predict class probabilities for input data.\n- `get_params()`: Get model parameters.\n- `set_params(**params)`: Set model parameters.\n- `summary()`: Get a summary of model parameters, standard errors, p-values, and confidence intervals.\n- `display_summary()`: Display a summary of model parameters, standard errors, p-values, and confidence intervals.\n\n### Fisher Scoring Multinomial Regression\n\nThe `FisherScoringMultinomialRegression` class implements the Fisher Scoring algorithm for multinomial logistic regression, suitable for multi-class classification tasks.\n\n**Parameters:**\n- `epsilon`: Convergence threshold for the algorithm.\n- `max_iter`: Maximum number of iterations for the algorithm.\n- `information`: Type of information matrix to use ('expected' or 'observed').\n- `use_bias`: Include a bias term in the model.\n- `significance`: Significance level for computing confidence intervals.\n- `verbose`: Enable verbose output.\n\n**Methods:**\n- `fit(X, y)`: Fit the model to the data.\n- `predict(X)`: Predict target labels for input data.\n- `predict_proba(X)`: Predict class probabilities for input data.\n- `summary(class_idx)`: Get a summary of model parameters, standard errors, p-values, and confidence intervals for a specific class.\n- `display_summary(class_idx)`: Display a summary of model parameters, standard errors, p-values, and confidence intervals for a specific class.\n\nThe algorithm is in a beta version and may require further testing and optimization to speed up matrix operations.\n\n### Fisher Scoring Focal Loss Regression\n\nThe `FisherScoringFocalRegression` class implements the Fisher Scoring algorithm with focal loss, designed for imbalanced classification problems where the positive class is rare.\n\n**Parameters:**\n- `gamma`: Focusing parameter for focal loss.\n- `epsilon`: Convergence threshold for the algorithm.\n- `max_iter`: Maximum number of iterations for the algorithm.\n- `information`: Type of information matrix to use ('expected' or 'observed').\n- `use_bias`: Include a bias term in the model.\n- `verbose`: Enable verbose output.\n\n*Note*: The algorithm does not have a summary method for model statistics implemented yet.\n\n\n## Installation\n\nTo use the models, clone the repository and install the required dependencies.\n\n```bash\ngit clone https://github.com/xRiskLab/fisher-scoring.git\ncd fisher-scoring\npip install -r requirements.txt\n```\n\nAlternatively, install the package directly from PyPI.\n\n```bash\npip install fisher-scoring\n```\n\n## Change Log\n\n- **v2.0.2**\n - **Bug Fixes**: Fixed the `FisherScoringMultinomialRegression` class to have flexible NumPy data types.\n\n- **v2.0.1**\n - **Bug Fixes**: Removed the debug print statement from the `FisherScoringLogisticRegression` class.\n\n- **v2.0**\n - **Performance Improvements**: Performance Enhancements: Optimized matrix calculations for substantial speed and memory efficiency improvements across all models. Leveraging streamlined operations, this version achieves up to 290x faster convergence. Performance gains per model:\n - *Multinomial Logistic Regression*: Training time reduced from 125.10s to 0.43s (~290x speedup).\n - *Logistic Regression*: Training time reduced from 0.24s to 0.05s (~5x speedup).\n - *Focal Loss Logistic Regression*: Training time reduced from 0.26s to 0.01s (~26x speedup).\n - **Bug Fixes**: `verbose` parameter in Focal Loss Logistic Regression now functions as expected, providing accurate logging during training.\n\n- **v0.1.4**\n - Updated log likelihood for Multinomial Regression and minor changes to Logistic Regression for integration with scikit-learn.\n\n- **v0.1.3**\n - Added coefficients, standard errors, p-values, and confidence intervals for Multinomial Regression.\n\n- **v0.1.2**\n - Updated NumPy dependency.\n\n- **v0.1.1**\n - Added support for Python 3.9+ \ud83d\udc0d.\n\n- **v0.1.0**\n - Initial release of Fisher Scoring Logistic, Multinomial, and Focal Loss Regression.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python implementation of the Fisher Scoring algorithm for logistic regression, multinomial regression, and focal loss regression",
"version": "2.0.2",
"project_urls": {
"Homepage": "https://github.com/xRiskLab/fisher-scoring",
"Repository": "https://github.com/xRiskLab/fisher-scoring"
},
"split_keywords": [
"fisher scoring",
" logistic regression",
" focal loss",
" multinomial regression"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8b30e4965771b60a62bc5226f06d0849e0362b9eb7f0fefb67f4e60a5d7030f6",
"md5": "07d36cadb1ba66b793a89b90894ded03",
"sha256": "738160bed25d4e9fce4d7e06d6c1aba250f90da6eb7edb5e7ae543a1eb7dfe2d"
},
"downloads": -1,
"filename": "fisher_scoring-2.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "07d36cadb1ba66b793a89b90894ded03",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 15655,
"upload_time": "2024-10-26T14:43:03",
"upload_time_iso_8601": "2024-10-26T14:43:03.343824Z",
"url": "https://files.pythonhosted.org/packages/8b/30/e4965771b60a62bc5226f06d0849e0362b9eb7f0fefb67f4e60a5d7030f6/fisher_scoring-2.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6d05482c5ad2fb5e3053f7f8166c3170953ef818a5ecf038b7aed0085e2f5b56",
"md5": "f8b572088c613f0ed8ead22139765b5c",
"sha256": "9e2bcd51b5740eb5240f72f32ea74e157e40e6a07d70885e1e65dba41a91a939"
},
"downloads": -1,
"filename": "fisher_scoring-2.0.2.tar.gz",
"has_sig": false,
"md5_digest": "f8b572088c613f0ed8ead22139765b5c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 12574,
"upload_time": "2024-10-26T14:43:05",
"upload_time_iso_8601": "2024-10-26T14:43:05.556149Z",
"url": "https://files.pythonhosted.org/packages/6d/05/482c5ad2fb5e3053f7f8166c3170953ef818a5ecf038b7aed0085e2f5b56/fisher_scoring-2.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-26 14:43:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "xRiskLab",
"github_project": "fisher-scoring",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "astroid",
"specs": [
[
"==",
"3.3.5"
]
]
},
{
"name": "black",
"specs": [
[
"==",
"24.10.0"
]
]
},
{
"name": "click",
"specs": [
[
"==",
"8.1.7"
]
]
},
{
"name": "colorama",
"specs": [
[
"==",
"0.4.6"
]
]
},
{
"name": "dill",
"specs": [
[
"==",
"0.3.9"
]
]
},
{
"name": "exceptiongroup",
"specs": [
[
"==",
"1.2.2"
]
]
},
{
"name": "fixit",
"specs": [
[
"==",
"2.1.0"
]
]
},
{
"name": "iniconfig",
"specs": [
[
"==",
"2.0.0"
]
]
},
{
"name": "isort",
"specs": [
[
"==",
"5.13.2"
]
]
},
{
"name": "joblib",
"specs": [
[
"==",
"1.4.2"
]
]
},
{
"name": "libcst",
"specs": [
[
"==",
"1.5.0"
]
]
},
{
"name": "markdown-it-py",
"specs": [
[
"==",
"3.0.0"
]
]
},
{
"name": "mccabe",
"specs": [
[
"==",
"0.7.0"
]
]
},
{
"name": "mdurl",
"specs": [
[
"==",
"0.1.2"
]
]
},
{
"name": "moreorless",
"specs": [
[
"==",
"0.4.0"
]
]
},
{
"name": "mypy-extensions",
"specs": [
[
"==",
"1.0.0"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"1.26.4"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"24.1"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.2.3"
]
]
},
{
"name": "pathspec",
"specs": [
[
"==",
"0.12.1"
]
]
},
{
"name": "platformdirs",
"specs": [
[
"==",
"4.3.6"
]
]
},
{
"name": "pluggy",
"specs": [
[
"==",
"1.5.0"
]
]
},
{
"name": "pygments",
"specs": [
[
"==",
"2.18.0"
]
]
},
{
"name": "pylint",
"specs": [
[
"==",
"3.3.1"
]
]
},
{
"name": "pytest-mock",
"specs": [
[
"==",
"3.14.0"
]
]
},
{
"name": "pytest",
"specs": [
[
"==",
"8.3.3"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.9.0.post0"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2024.2"
]
]
},
{
"name": "pyyaml",
"specs": [
[
"==",
"6.0.2"
]
]
},
{
"name": "rich",
"specs": [
[
"==",
"13.9.3"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
"==",
"1.5.2"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.13.1"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "threadpoolctl",
"specs": [
[
"==",
"3.5.0"
]
]
},
{
"name": "tomli",
"specs": [
[
"==",
"2.0.2"
]
]
},
{
"name": "tomlkit",
"specs": [
[
"==",
"0.13.2"
]
]
},
{
"name": "trailrunner",
"specs": [
[
"==",
"1.4.0"
]
]
},
{
"name": "typing-extensions",
"specs": [
[
"==",
"4.12.2"
]
]
},
{
"name": "tzdata",
"specs": [
[
"==",
"2024.2"
]
]
}
],
"lcname": "fisher-scoring"
}