pyfirst


Namepyfirst JSON
Version 0.1.4 PyPI version JSON
download
home_pagehttps://github.com/BillHuang01/pyfirst
SummaryFactor Importance Ranking and Selection using Total Indices
upload_time2024-08-06 06:20:16
maintainerChaofan Huang
docs_urlNone
authorChaofan Huang
requires_python>=3.7.1
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # FIRST: Factor Importance Ranking and Selection for Total Indices

A ``Python3`` module of FIRST, a model-independent factor importance ranking and selection procedure that is based on total Sobol' indices ([Huang and Joseph, 2024][1]). This research is supported by U.S. National Science Foundation grants *DMS-2310637* and *DMREF-1921873*. The ``R`` implementation is also available on [CRAN][2]. 

## Installation

```bash
pip install pyfirst
```

or from source

```bash
pip install git+https://github.com/BillHuang01/pyfirst.git
```

## Usage

### Factor Importance Ranking and Selection

``FIRST`` is the main function of this module. It provides factor importance ranking and selection directly from scattered data without any model fitting, where the importance is computed based on total Sobol' indices ([Sobol', 2001][5]). ``FIRST`` requires the following two arguments:
- a numpy ndarray or a pandas dataframe for the factors/predictors ``X`` 
- a numpy ndarray or a pandas series for the response ``y`` 

``FIRST`` returns a numpy ndarray for the factor importance, with value of *zero* indicating that the factor is not important to the prediction of the response.   

```python
from pyfirst import FIRST
from sklearn.datasets import make_friedman1

X, y = make_friedman1(n_samples=10000, n_features=10, noise=1.0, random_state=43)

FIRST(X, y)
```
For more advanced usages of ``FIRST``, e.g., speeding up for big data, please see [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)][7] or [API documentation][10].

To support an easy integration with [sklearn.pipeline.Pipeline][3] for a streamline model training process, we also provide ``SelectByFIRST``, a class that is built from [sklearn.feature_selection][4].

```python
import numpy as np
from pyfirst import SelectByFIRST
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
X = housing.data
y = np.log(housing.target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=43)

pipe = Pipeline([
    ('selector', SelectByFIRST(regression=True,random_state=43)),
    ('estimator', RandomForestRegressor(random_state=43))
]).fit(X_train, y_train)

pipe.predict(X_test)
```
For more details, please see [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)][8] or [API documentation][11]. 

### Total Sobol' Indices Estimation

This module also provides the function ``TotalSobolKNN`` for a consistent estimation of total Sobol' indices ([Sobol', 2001][5]) directly from scattered data. When the response is noiseless, ``TotalSobolKNN`` implements the Nearest-Neighbor estimator from [Broto et al. (2020)][6]. For noisy response, ``TotalSobolKNN`` implements the Noise-Adjusted Nearest-Neighbor estimator from [Huang and Joseph (2024)][1]. ``TotalSobolKNN`` returns a numpy ndarray for the total Sobol' indices estimation.

```python
from pyfirst import TotalSobolKNN
from sklearn.datasets import make_friedman1

X, y = make_friedman1(n_samples=10000, n_features=5, noise=1.0, random_state=43)

TotalSobolKNN(X, y, noise=True)
```
For more details and applications, please see [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)][9] or [API documentation][12]. 

A documentation of this module is also available on [Read the Docs][13].

## References

Huang, C., & Joseph, V. R. (2024). Factor Importance Ranking and Selection using Total Indices. arXiv preprint arXiv:2401.00800.

Sobol', I. M. (2001). Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Mathematics and computers in simulation, 55(1-3), 271-280.

Broto, B., Bachoc, F., & Depecker, M. (2020). Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2), 693-716.

## Citation

If you find this module useful, please consider citing 

```
@article{huang2024factor,
  title={Factor Importance Ranking and Selection using Total Indices},
  author={Huang, Chaofan and Joseph, V Roshan},
  journal={arXiv preprint arXiv:2401.00800},
  year={2024}
}
```


[1]:https://arxiv.org/abs/2401.00800
[2]:https://cran.r-project.org/web/packages/first/index.html
[3]:https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
[4]: https://scikit-learn.org/stable/modules/feature_selection.html
[5]: https://www.sciencedirect.com/science/article/pii/S0378475400002706
[6]: https://epubs.siam.org/doi/10.1137/18M1234631
[7]: https://colab.research.google.com/github/BillHuang01/pyfirst/blob/main/docs/FIRST.ipynb
[8]: https://colab.research.google.com/github/BillHuang01/pyfirst/blob/main/docs/SelectByFIRST.ipynb
[9]: https://colab.research.google.com/github/BillHuang01/pyfirst/blob/main/docs/TotalSobolKNN.ipynb
[10]: https://pyfirst.readthedocs.io/en/latest/autoapi/pyfirst/index.html#pyfirst.FIRST
[11]: https://pyfirst.readthedocs.io/en/latest/autoapi/pyfirst/index.html#pyfirst.SelectByFIRST
[12]: https://pyfirst.readthedocs.io/en/latest/autoapi/pyfirst/index.html#pyfirst.TotalSobolKNN
[13]: https://pyfirst.readthedocs.io/

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/BillHuang01/pyfirst",
    "name": "pyfirst",
    "maintainer": "Chaofan Huang",
    "docs_url": null,
    "requires_python": ">=3.7.1",
    "maintainer_email": "chuang397@gatech.edu",
    "keywords": null,
    "author": "Chaofan Huang",
    "author_email": "chuang397@gatech.edu",
    "download_url": "https://files.pythonhosted.org/packages/c1/45/7c374fb0d26bc6c81301075ee1dca9a510cde41ac5a57c299acc85503724/pyfirst-0.1.4.tar.gz",
    "platform": null,
    "description": "# FIRST: Factor Importance Ranking and Selection for Total Indices\n\nA ``Python3`` module of FIRST, a model-independent factor importance ranking and selection procedure that is based on total Sobol' indices ([Huang and Joseph, 2024][1]). This research is supported by U.S. National Science Foundation grants *DMS-2310637* and *DMREF-1921873*. The ``R`` implementation is also available on [CRAN][2]. \n\n## Installation\n\n```bash\npip install pyfirst\n```\n\nor from source\n\n```bash\npip install git+https://github.com/BillHuang01/pyfirst.git\n```\n\n## Usage\n\n### Factor Importance Ranking and Selection\n\n``FIRST`` is the main function of this module. It provides factor importance ranking and selection directly from scattered data without any model fitting, where the importance is computed based on total Sobol' indices ([Sobol', 2001][5]). ``FIRST`` requires the following two arguments:\n- a numpy ndarray or a pandas dataframe for the factors/predictors ``X`` \n- a numpy ndarray or a pandas series for the response ``y`` \n\n``FIRST`` returns a numpy ndarray for the factor importance, with value of *zero* indicating that the factor is not important to the prediction of the response.   \n\n```python\nfrom pyfirst import FIRST\nfrom sklearn.datasets import make_friedman1\n\nX, y = make_friedman1(n_samples=10000, n_features=10, noise=1.0, random_state=43)\n\nFIRST(X, y)\n```\nFor more advanced usages of ``FIRST``, e.g., speeding up for big data, please see [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)][7] or [API documentation][10].\n\nTo support an easy integration with [sklearn.pipeline.Pipeline][3] for a streamline model training process, we also provide ``SelectByFIRST``, a class that is built from [sklearn.feature_selection][4].\n\n```python\nimport numpy as np\nfrom pyfirst import SelectByFIRST\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.ensemble import RandomForestRegressor\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.datasets import fetch_california_housing\n\nhousing = fetch_california_housing()\nX = housing.data\ny = np.log(housing.target)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=43)\n\npipe = Pipeline([\n    ('selector', SelectByFIRST(regression=True,random_state=43)),\n    ('estimator', RandomForestRegressor(random_state=43))\n]).fit(X_train, y_train)\n\npipe.predict(X_test)\n```\nFor more details, please see [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)][8] or [API documentation][11]. \n\n### Total Sobol' Indices Estimation\n\nThis module also provides the function ``TotalSobolKNN`` for a consistent estimation of total Sobol' indices ([Sobol', 2001][5]) directly from scattered data. When the response is noiseless, ``TotalSobolKNN`` implements the Nearest-Neighbor estimator from [Broto et al. (2020)][6]. For noisy response, ``TotalSobolKNN`` implements the Noise-Adjusted Nearest-Neighbor estimator from [Huang and Joseph (2024)][1]. ``TotalSobolKNN`` returns a numpy ndarray for the total Sobol' indices estimation.\n\n```python\nfrom pyfirst import TotalSobolKNN\nfrom sklearn.datasets import make_friedman1\n\nX, y = make_friedman1(n_samples=10000, n_features=5, noise=1.0, random_state=43)\n\nTotalSobolKNN(X, y, noise=True)\n```\nFor more details and applications, please see [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)][9] or [API documentation][12]. \n\nA documentation of this module is also available on [Read the Docs][13].\n\n## References\n\nHuang, C., & Joseph, V. R. (2024). Factor Importance Ranking and Selection using Total Indices. arXiv preprint arXiv:2401.00800.\n\nSobol', I. M. (2001). Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Mathematics and computers in simulation, 55(1-3), 271-280.\n\nBroto, B., Bachoc, F., & Depecker, M. (2020). Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2), 693-716.\n\n## Citation\n\nIf you find this module useful, please consider citing \n\n```\n@article{huang2024factor,\n  title={Factor Importance Ranking and Selection using Total Indices},\n  author={Huang, Chaofan and Joseph, V Roshan},\n  journal={arXiv preprint arXiv:2401.00800},\n  year={2024}\n}\n```\n\n\n[1]:https://arxiv.org/abs/2401.00800\n[2]:https://cran.r-project.org/web/packages/first/index.html\n[3]:https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html\n[4]: https://scikit-learn.org/stable/modules/feature_selection.html\n[5]: https://www.sciencedirect.com/science/article/pii/S0378475400002706\n[6]: https://epubs.siam.org/doi/10.1137/18M1234631\n[7]: https://colab.research.google.com/github/BillHuang01/pyfirst/blob/main/docs/FIRST.ipynb\n[8]: https://colab.research.google.com/github/BillHuang01/pyfirst/blob/main/docs/SelectByFIRST.ipynb\n[9]: https://colab.research.google.com/github/BillHuang01/pyfirst/blob/main/docs/TotalSobolKNN.ipynb\n[10]: https://pyfirst.readthedocs.io/en/latest/autoapi/pyfirst/index.html#pyfirst.FIRST\n[11]: https://pyfirst.readthedocs.io/en/latest/autoapi/pyfirst/index.html#pyfirst.SelectByFIRST\n[12]: https://pyfirst.readthedocs.io/en/latest/autoapi/pyfirst/index.html#pyfirst.TotalSobolKNN\n[13]: https://pyfirst.readthedocs.io/\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Factor Importance Ranking and Selection using Total Indices",
    "version": "0.1.4",
    "project_urls": {
        "Documentation": "https://pyfirst.readthedocs.io/",
        "Homepage": "https://github.com/BillHuang01/pyfirst",
        "Repository": "https://github.com/BillHuang01/pyfirst.git"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fef2c907ef8392df4fbecb0dc91856115eb83b2cd863c1f2567e26b195a3c2a0",
                "md5": "8523a10a77f94236228c2d6887b82d56",
                "sha256": "f04a61decf3ebb27790039618f6c92465e779512acf459299562eb50ba7cc750"
            },
            "downloads": -1,
            "filename": "pyfirst-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8523a10a77f94236228c2d6887b82d56",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7.1",
            "size": 10756,
            "upload_time": "2024-08-06T06:20:12",
            "upload_time_iso_8601": "2024-08-06T06:20:12.972448Z",
            "url": "https://files.pythonhosted.org/packages/fe/f2/c907ef8392df4fbecb0dc91856115eb83b2cd863c1f2567e26b195a3c2a0/pyfirst-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c1457c374fb0d26bc6c81301075ee1dca9a510cde41ac5a57c299acc85503724",
                "md5": "8876544382c43c25bdf9f125986da2ae",
                "sha256": "dcf3aadea88f228de78a74cee1cf2e2b0d8b7582d08315294961a1be75395ad9"
            },
            "downloads": -1,
            "filename": "pyfirst-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "8876544382c43c25bdf9f125986da2ae",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7.1",
            "size": 11142,
            "upload_time": "2024-08-06T06:20:16",
            "upload_time_iso_8601": "2024-08-06T06:20:16.521919Z",
            "url": "https://files.pythonhosted.org/packages/c1/45/7c374fb0d26bc6c81301075ee1dca9a510cde41ac5a57c299acc85503724/pyfirst-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-06 06:20:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "BillHuang01",
    "github_project": "pyfirst",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pyfirst"
}
        
Elapsed time: 0.32595s