# Robust Gaussian Fitting Library
A Library for Robust Gaussian Fitting using geometric models in presence of outliers. Basically, many machine learning methods are based on and limited to cost functions that are differentiable with respect to their parameters. However, a class of machine learning methods supported by "order statistics" do not need derivatives to be known for initilization of the learning process. This library is based on such methods. Currently, this library supports only two core algorithms, FLKOS for finding the average of Gaussian, and MSSE for finding the scales. More novel methods are on their way.
## Introduction
<img src="images/use_of_lib_lineFitting_2.jpg" width="400">
Try it with 30% outliers:
```
from RobustGaussianFittingLibrary import fitValue
import numpy as np
inliers = 50 + 5*np.random.randn(70)
outliers = 500*(np.random.rand(30)-0.5)
inVec = np.hstack((inliers, outliers))
np.random.shuffle(inVec)
mP = fitValue(inVec)
print('inliers.mean -> ' + str(inliers.mean()) + ', inliers.std -> ' + str(inliers.std()))
print('robust.mean -> ' + str(mP[0]) + ', robust.std -> ' + str(mP[1]))
```
Are average and standard deviation good enough as statistics? Are they proper statistics to use for fitting lines and planes to data? What would happen to these statistics in presence of outliers? One solution seems to be Median, but what would happen if number of outliers increase?
In this library, we have put together one of the state-of-the-art methods in robust statistics for curve fitting that is easy to understand and tune. If you are currently using mean and median, you can simply replace them with these methods.
## Prior knowledge: Rough estimate of structure size
In this robust model fitting method, the main assumption is that, the Gaussian we are looking for, has the majority of data points. If it doesn't, this turns the problem into a clustering problem. If the structure does not have the majority of data and the outliers do not form a structure, this reduces the problem back to segmentation where the structure size is smaller than half of data.
If the structure size cannot be guessed, you can follow MCNC which uses covariance of data points to sample from structure density. However, if that seems hard to implement, you can just run the method with many structure sizes and fuse the models by taking Median of them. IMHO these are among the top most efficient and yet accurate methods.
## Usage from Python
You can install this library via pip.
```
pip3 install RobustGaussianFittingLibrary
```
### importable libraries ###
* __from RobustGaussianFittingLibrary import__: Basic functions can be found here for 1D and 2D data. Also for Tensors.
* MSSE : Given set of residuals, it finds the scale of a Gaussian (Reference :Robust segmentation of visual data using ranked unbiased scale estimate, A. Bab-Hadiashar and D. Suter)
* fitValue : Given a vector, it finds average and standard deviation of the Gaussian.
<img src="images/use_of_lib_valueFitting.jpg" width="400">
* fitValue2Skewed : Given a vector (and weights are accepted too), it finds the mode by (Median of inliers) and reports it along with a scale which is the distance of the mode from the edges of the Gaussian (by 3 STDs) divided by 3.
* fitValueTensor : Given a tensor of size n_F, n_R, n_C, it finds the Gaussian mean and std for each pixel in n_R and n_C.
* fitLine : Given vectors X and Y, it finds three parameters describing a line by slope, intercept and scale of noise.
<img src="images/use_of_lib_lineFitting.jpg" width="400">
* fitLineTensor : Given a tensor, it fits a line for each pixel.
* fitPlane : Given an image, returns four parameters of for algebraic fitting of a plane.
<img src="images/use_of_lib_planeFitting.jpg" width="400">
* fitBackground : Given an image, returns the mean and std of background at each pixel.
* fitBackgroundTensor : Given a tensor of images n_F x n_R x n_C, returns the background mean and std for each pixel for each frame in n_F.
* __useMultiproc__: In this set, the tensor oporations are run by python Multiprocessing.
* fitValueTensor_MultiProc : Does fitValueTensor using multiprocessing
* fitLineTensor_MultiProc : Does fitLineTensor using multiprocessing
* fitBackgroundTensor_multiproc : Does fitBackgroundTensor using multiprocessing
### Examples in Python ###
Many test functions are availble in the tests.py script. in the script, look for the main function and choose one of them to run.
```
test_fitBackgroundRadiallyTensor_multiproc()
test_fitValueTensor_MultiProc()
test_PDF2Uniform()
test_fitBackgroundTensor()
test_fitBackgroundTensor_multiproc()
test_RobustAlgebraicLineFittingPy()
test_fitBackground()
test_fitValue2Skewed()
test_for_textProgBar()
test_removeIslands()
test_fitValueSmallSample()
test_bigTensor2SmallsInds()
test_RobustAlgebraicPlaneFittingPy()
test_SginleGaussianVec()
test_flatField()
test_fitValue2Skewed_sweep_over_N()
test_fitBackgroundRadially()
test_fitLineTensor_MultiProc()
```
## Compilation into static and shared library
Run the following command to generate a shared .so library:
```
make
```
The python wrapper will be looking for the .so shared library file. The wrapper is in the file cWrapper.py and is used by other python files.
**Note**: if you are using windows, you can use mingwin and it has a make in its bin folder with a different name. Copy it and rename it to make. Also you would need rm from coreutils for windows.
To test the shared library, simply type in:
```
make test
```
## Usage from MATLAB ##
Currently, only the fitValue funciton is supported by a mex C code for MATLAB. However, you can request for more, or implement it yourself accordingly. Look at the RGFLib_mex_fitValue2Skewed_Test.m file
# Credits
This library is an effort to implement a set of robsut statistical functions. However, part of the core of the RGFLib.c, (MSSE) was implemented as part of the package [RobustPeakFinder](https://github.com/MarjanHJ/RobustPeakFinder) for crystallography data analysis in 2017 under free license in LaTrobe University Australia. Afterwards, since robust Gaussian fitting can solve many problems, we put them all together into the current library in CFEL/DESY Hamburg. The RPF project now imports this library as it well serves the purpose of that project.
## Authors
* Alireza Sadri <Alireza[Dot]Sadri[At]Monash[Dot]edu>
* Marjan Hadian Jazi <Hadian-Jazi.M[At]WEHI[Dot]edu[Dot]au>
Raw data
{
"_id": null,
"home_page": "https://github.com/arsadri/RobustGaussianFittingLibrary",
"name": "RobustGaussianFittingLibrary",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "rgflib,outlier,outlier detection,outlier removal,anamoly detection,curve fitting,line fitting,plane fitting,fit a Gaussian,Gaussian fitting",
"author": "Alireza Sadri",
"author_email": "arsadri@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/cb/17/45c2125f1e4b889c3cd2b81aea0cc3321f187885f61965b5b60c12235647/RobustGaussianFittingLibrary-0.2.4.tar.gz",
"platform": null,
"description": "# Robust Gaussian Fitting Library\nA Library for Robust Gaussian Fitting using geometric models in presence of outliers. Basically, many machine learning methods are based on and limited to cost functions that are differentiable with respect to their parameters. However, a class of machine learning methods supported by \"order statistics\" do not need derivatives to be known for initilization of the learning process. This library is based on such methods. Currently, this library supports only two core algorithms, FLKOS for finding the average of Gaussian, and MSSE for finding the scales. More novel methods are on their way.\n\n## Introduction\n<img src=\"images/use_of_lib_lineFitting_2.jpg\" width=\"400\">\nTry it with 30% outliers:\n\n```\nfrom RobustGaussianFittingLibrary import fitValue\nimport numpy as np\n\ninliers = 50 + 5*np.random.randn(70)\noutliers = 500*(np.random.rand(30)-0.5)\ninVec = np.hstack((inliers, outliers))\nnp.random.shuffle(inVec)\n\nmP = fitValue(inVec)\nprint('inliers.mean -> ' + str(inliers.mean()) + ', inliers.std -> ' + str(inliers.std()))\nprint('robust.mean -> ' + str(mP[0]) + ', robust.std -> ' + str(mP[1]))\n```\n\nAre average and standard deviation good enough as statistics? Are they proper statistics to use for fitting lines and planes to data? What would happen to these statistics in presence of outliers? One solution seems to be Median, but what would happen if number of outliers increase?\n\nIn this library, we have put together one of the state-of-the-art methods in robust statistics for curve fitting that is easy to understand and tune. If you are currently using mean and median, you can simply replace them with these methods.\n\n## Prior knowledge: Rough estimate of structure size\nIn this robust model fitting method, the main assumption is that, the Gaussian we are looking for, has the majority of data points. If it doesn't, this turns the problem into a clustering problem. If the structure does not have the majority of data and the outliers do not form a structure, this reduces the problem back to segmentation where the structure size is smaller than half of data. \n\nIf the structure size cannot be guessed, you can follow MCNC which uses covariance of data points to sample from structure density. However, if that seems hard to implement, you can just run the method with many structure sizes and fuse the models by taking Median of them. IMHO these are among the top most efficient and yet accurate methods.\n\n## Usage from Python\nYou can install this library via pip.\n```\npip3 install RobustGaussianFittingLibrary\n```\n\n### importable libraries ###\n* __from RobustGaussianFittingLibrary import__: Basic functions can be found here for 1D and 2D data. Also for Tensors.\n\t* MSSE : Given set of residuals, it finds the scale of a Gaussian (Reference :Robust segmentation of visual data using ranked unbiased scale estimate, A. Bab-Hadiashar and D. Suter)\n\t* fitValue : Given a vector, it finds average and standard deviation of the Gaussian.\n\t<img src=\"images/use_of_lib_valueFitting.jpg\" width=\"400\">\n\t\n\t* fitValue2Skewed : Given a vector (and weights are accepted too), it finds the mode by (Median of inliers) and reports it along with a scale which is the distance of the mode from the edges of the Gaussian (by 3 STDs) divided by 3.\n\t* fitValueTensor : Given a tensor of size n_F, n_R, n_C, it finds the Gaussian mean and std for each pixel in n_R and n_C.\n\t* fitLine : Given vectors X and Y, it finds three parameters describing a line by slope, intercept and scale of noise.\n\t<img src=\"images/use_of_lib_lineFitting.jpg\" width=\"400\">\n\t\n\t* fitLineTensor : Given a tensor, it fits a line for each pixel.\n\t* fitPlane : Given an image, returns four parameters of for algebraic fitting of a plane.\n\t<img src=\"images/use_of_lib_planeFitting.jpg\" width=\"400\">\n\t\n\t* fitBackground : Given an image, returns the mean and std of background at each pixel.\n\t* fitBackgroundTensor : Given a tensor of images n_F x n_R x n_C, returns the background mean and std for each pixel for each frame in n_F.\n\n* __useMultiproc__: In this set, the tensor oporations are run by python Multiprocessing.\n\t* fitValueTensor_MultiProc : Does fitValueTensor using multiprocessing\n\t* fitLineTensor_MultiProc : Does fitLineTensor using multiprocessing\n\t* fitBackgroundTensor_multiproc : Does fitBackgroundTensor using multiprocessing\n\n### Examples in Python ###\nMany test functions are availble in the tests.py script. in the script, look for the main function and choose one of them to run. \n\n```\n test_fitBackgroundRadiallyTensor_multiproc()\n test_fitValueTensor_MultiProc()\n test_PDF2Uniform()\n test_fitBackgroundTensor()\n test_fitBackgroundTensor_multiproc()\n test_RobustAlgebraicLineFittingPy()\n test_fitBackground()\n test_fitValue2Skewed()\n test_for_textProgBar()\n test_removeIslands()\n test_fitValueSmallSample()\n test_bigTensor2SmallsInds()\n test_RobustAlgebraicPlaneFittingPy()\n test_SginleGaussianVec()\n test_flatField()\n test_fitValue2Skewed_sweep_over_N()\n test_fitBackgroundRadially()\n test_fitLineTensor_MultiProc()\n```\n\n## Compilation into static and shared library\nRun the following command to generate a shared .so library:\n```\nmake\n```\nThe python wrapper will be looking for the .so shared library file. The wrapper is in the file cWrapper.py and is used by other python files.\n**Note**: if you are using windows, you can use mingwin and it has a make in its bin folder with a different name. Copy it and rename it to make. Also you would need rm from coreutils for windows.\n\nTo test the shared library, simply type in:\n```\nmake test\n```\n\n## Usage from MATLAB ##\nCurrently, only the fitValue funciton is supported by a mex C code for MATLAB. However, you can request for more, or implement it yourself accordingly. Look at the RGFLib_mex_fitValue2Skewed_Test.m file\n\n# Credits\nThis library is an effort to implement a set of robsut statistical functions. However, part of the core of the RGFLib.c, (MSSE) was implemented as part of the package [RobustPeakFinder](https://github.com/MarjanHJ/RobustPeakFinder) for crystallography data analysis in 2017 under free license in LaTrobe University Australia. Afterwards, since robust Gaussian fitting can solve many problems, we put them all together into the current library in CFEL/DESY Hamburg. The RPF project now imports this library as it well serves the purpose of that project.\n\n## Authors\n* Alireza Sadri <Alireza[Dot]Sadri[At]Monash[Dot]edu>\n* Marjan Hadian Jazi <Hadian-Jazi.M[At]WEHI[Dot]edu[Dot]au>\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A library for robust Gaussian fitting of geometric models in presence of outliers.",
"version": "0.2.4",
"project_urls": {
"Homepage": "https://github.com/arsadri/RobustGaussianFittingLibrary"
},
"split_keywords": [
"rgflib",
"outlier",
"outlier detection",
"outlier removal",
"anamoly detection",
"curve fitting",
"line fitting",
"plane fitting",
"fit a gaussian",
"gaussian fitting"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cb1745c2125f1e4b889c3cd2b81aea0cc3321f187885f61965b5b60c12235647",
"md5": "3d252449a816e1a25e6f7d2d01bbdc4e",
"sha256": "d9766cdbb7fc034aa090edd2390afe8fd4d8e6013a254aa90e8e2d257eabc229"
},
"downloads": -1,
"filename": "RobustGaussianFittingLibrary-0.2.4.tar.gz",
"has_sig": false,
"md5_digest": "3d252449a816e1a25e6f7d2d01bbdc4e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 80908,
"upload_time": "2023-07-09T10:55:24",
"upload_time_iso_8601": "2023-07-09T10:55:24.796108Z",
"url": "https://files.pythonhosted.org/packages/cb/17/45c2125f1e4b889c3cd2b81aea0cc3321f187885f61965b5b60c12235647/RobustGaussianFittingLibrary-0.2.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-09 10:55:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "arsadri",
"github_project": "RobustGaussianFittingLibrary",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "robustgaussianfittinglibrary"
}