horns


Namehorns JSON
Version 0.6.0 PyPI version JSON
download
home_pagehttps://github.com/sammosummo/Horns
SummaryHorn's parallel analysis in Python.
upload_time2024-03-01 17:23:28
maintainer
docs_urlNone
authorSam Mathias
requires_python>=3.11,<4.0
licenseMIT
keywords psychology psychometrics factor analysis parallel analysis eigenvalues eigenvectors
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Horns: Horn's parallel analysis in Python

Horns is a Python implementation of Horn's (1965) parallel analysis, the most widely
accepted method for determining the number of components or factors to retain in
principal component analysis (PCA) or common factor analysis (FA). The functionality of
this package is similar to that of the `paran` package in R.

## Background

Parallel analysis involves simulating a large number of random datasets with the same 
shape as the original dataset but with no underlying correlation structure. We calculate
the eigenvalues of the random datasets and the *q*th ppf of the distribution of 
each eigenvalue, as well as the eigenvalues of the original dataset. The original 
eigenvalues are then compared to the quantiles. The number of components/factors to
retain is the number of original eigenvalues that are greater than their corresponding
ppf until we encounter the first eigenvalue that is not greater than its ppf.

Horn (1965) originally proposed using the median at the selection criterion (i.e.,
$q=0.5$), but Glorfeld (1995) recommended *q* = 0.95 (and a large number of 
simulations) to reduce the chances of retaining too many components or factors. As in
`paran`, the user can choose *q* and the number of simulations, allowing them to follow
Glorfeld's recommendations or not.

There has been some debate about the best way to simulate random data for parallel 
analysis. Hayton et al.(2004) originally claimed it is necessary to simulate data with 
the same values of the original data, but later Dinno (2009) demonstrated that 
parallel analysis is robust to a wide range of distributional forms of the random data,
and therefore recommended using the most computationally efficient method available.
This may be good advice when one is performing parallel analysis on Pearson correlation
or covariance matrices, but I'm not sure it makes sense for other kinds of matrices
(i.e., polyserial or polychoric correlations). Therefore, I have included several 
methods of simulating random data, including shuffling and bootstrapping the original
data.

PCA and FA, and therefore parallel analysis, are often performed on Pearson correlation
matrices. However, a Pearson correlation matrix is not the correct choice in all cases. 
For example, Tran et al. (2008) showed that parallel analysis on binary data is more
accurate when using polychoric correlation matrices. This package will select the
appropriate correlation estimate per pair of variables based on the number of unique
values a each variable.

As pointed out by Dinno (2014), some implementations of parallel analysis do not
correctly calculate the eigenvalues for FA, which are different from those for PCA. This
package uses the correct eigenvalues for both PCA and FA, like the `paran` package in R.

Horns optionally produces a figure showing the eigenvalues and the quantiles via 
Matplotlib. 

### Performance

Since there are apparently no other Python packages that perform parallel analysis, I 
didn't profile or benchmark my code extensively. However, the package does perform 
just-in-time (JIT) compilation of many of its functions via Numba, and parallelises
where possible, so it should be reasonably fast. Parallel analysis with polychoric 
correlations does take much longer than with Pearson and/or polyserial correlations
because each correlation is found iteratively. 

## Installation

You can install Horns directly from PyPI using pip:

```bash
pip install horns
```

## Quick Start

Here's a quick example to get you started:

```python

import pandas as pd  # <- not required by Horns, but you need to load your data somehow
from horns import parallel_analysis

# load your dataset
data = pd.read_csv("path/to/your/data.csv")

# perform parallel analysis to determine the optimal number of components for PCA
m = parallel_analysis(data)

print(f"Optimal number of components: {m}")

```

There should be no need to call anything other than `parallel_analysis`, but you may 
find some of the ancillary functions useful for other applications.

## Contributing

Contributions to Horns are welcome! Submit an issue or pull request if you have any
suggestions or would like to contribute.

## License

This project is licensed under the MIT License.

## Citation

If you use Horns in your research, please consider citing it:

```bibtex

@misc{horns2024,
  title={Horns: Horn's parallel analysis in Python},
  author={Samuel R. Mathias},
  year={2024},
  howpublished={\url{https://github.com/sammosummo/Horns}},
}
```

## Thanks

Thanks for choosing Horns for your factor analysis needs!
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sammosummo/Horns",
    "name": "horns",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.11,<4.0",
    "maintainer_email": "",
    "keywords": "psychology,psychometrics,factor analysis,parallel analysis,eigenvalues,eigenvectors",
    "author": "Sam Mathias",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/d6/88/f058caec6a2130cff7aebdbc0638481f8114a646d618fcd73b6ae62ef58c/horns-0.6.0.tar.gz",
    "platform": null,
    "description": "# Horns: Horn's parallel analysis in Python\n\nHorns is a Python implementation of Horn's (1965) parallel analysis, the most widely\naccepted method for determining the number of components or factors to retain in\nprincipal component analysis (PCA) or common factor analysis (FA). The functionality of\nthis package is similar to that of the `paran` package in R.\n\n## Background\n\nParallel analysis involves simulating a large number of random datasets with the same \nshape as the original dataset but with no underlying correlation structure. We calculate\nthe eigenvalues of the random datasets and the *q*th ppf of the distribution of \neach eigenvalue, as well as the eigenvalues of the original dataset. The original \neigenvalues are then compared to the quantiles. The number of components/factors to\nretain is the number of original eigenvalues that are greater than their corresponding\nppf until we encounter the first eigenvalue that is not greater than its ppf.\n\nHorn (1965) originally proposed using the median at the selection criterion (i.e.,\n$q=0.5$), but Glorfeld (1995) recommended *q* = 0.95 (and a large number of \nsimulations) to reduce the chances of retaining too many components or factors. As in\n`paran`, the user can choose *q* and the number of simulations, allowing them to follow\nGlorfeld's recommendations or not.\n\nThere has been some debate about the best way to simulate random data for parallel \nanalysis. Hayton et al.(2004) originally claimed it is necessary to simulate data with \nthe same values of the original data, but later Dinno (2009) demonstrated that \nparallel analysis is robust to a wide range of distributional forms of the random data,\nand therefore recommended using the most computationally efficient method available.\nThis may be good advice when one is performing parallel analysis on Pearson correlation\nor covariance matrices, but I'm not sure it makes sense for other kinds of matrices\n(i.e., polyserial or polychoric correlations). Therefore, I have included several \nmethods of simulating random data, including shuffling and bootstrapping the original\ndata.\n\nPCA and FA, and therefore parallel analysis, are often performed on Pearson correlation\nmatrices. However, a Pearson correlation matrix is not the correct choice in all cases. \nFor example, Tran et al. (2008) showed that parallel analysis on binary data is more\naccurate when using polychoric correlation matrices. This package will select the\nappropriate correlation estimate per pair of variables based on the number of unique\nvalues a each variable.\n\nAs pointed out by Dinno (2014), some implementations of parallel analysis do not\ncorrectly calculate the eigenvalues for FA, which are different from those for PCA. This\npackage uses the correct eigenvalues for both PCA and FA, like the `paran` package in R.\n\nHorns optionally produces a figure showing the eigenvalues and the quantiles via \nMatplotlib. \n\n### Performance\n\nSince there are apparently no other Python packages that perform parallel analysis, I \ndidn't profile or benchmark my code extensively. However, the package does perform \njust-in-time (JIT) compilation of many of its functions via Numba, and parallelises\nwhere possible, so it should be reasonably fast. Parallel analysis with polychoric \ncorrelations does take much longer than with Pearson and/or polyserial correlations\nbecause each correlation is found iteratively. \n\n## Installation\n\nYou can install Horns directly from PyPI using pip:\n\n```bash\npip install horns\n```\n\n## Quick Start\n\nHere's a quick example to get you started:\n\n```python\n\nimport pandas as pd  # <- not required by Horns, but you need to load your data somehow\nfrom horns import parallel_analysis\n\n# load your dataset\ndata = pd.read_csv(\"path/to/your/data.csv\")\n\n# perform parallel analysis to determine the optimal number of components for PCA\nm = parallel_analysis(data)\n\nprint(f\"Optimal number of components: {m}\")\n\n```\n\nThere should be no need to call anything other than `parallel_analysis`, but you may \nfind some of the ancillary functions useful for other applications.\n\n## Contributing\n\nContributions to Horns are welcome! Submit an issue or pull request if you have any\nsuggestions or would like to contribute.\n\n## License\n\nThis project is licensed under the MIT License.\n\n## Citation\n\nIf you use Horns in your research, please consider citing it:\n\n```bibtex\n\n@misc{horns2024,\n  title={Horns: Horn's parallel analysis in Python},\n  author={Samuel R. Mathias},\n  year={2024},\n  howpublished={\\url{https://github.com/sammosummo/Horns}},\n}\n```\n\n## Thanks\n\nThanks for choosing Horns for your factor analysis needs!",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Horn's parallel analysis in Python.",
    "version": "0.6.0",
    "project_urls": {
        "Documentation": "https://github.com/sammosummo/Horns",
        "Homepage": "https://github.com/sammosummo/Horns",
        "Repository": "https://github.com/sammosummo/Horns"
    },
    "split_keywords": [
        "psychology",
        "psychometrics",
        "factor analysis",
        "parallel analysis",
        "eigenvalues",
        "eigenvectors"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "066b6e118a84904aad92e3cd4bf7c51c406982973861e0fec6fc9f04acedd37b",
                "md5": "aad34d767b6f96b85595ac343c2bd54a",
                "sha256": "a9510204fc3a71210e66b18963aa1d8535b87d6930d7aaf94b8c4c6186d44986"
            },
            "downloads": -1,
            "filename": "horns-0.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "aad34d767b6f96b85595ac343c2bd54a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11,<4.0",
            "size": 9865,
            "upload_time": "2024-03-01T17:23:26",
            "upload_time_iso_8601": "2024-03-01T17:23:26.237375Z",
            "url": "https://files.pythonhosted.org/packages/06/6b/6e118a84904aad92e3cd4bf7c51c406982973861e0fec6fc9f04acedd37b/horns-0.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d688f058caec6a2130cff7aebdbc0638481f8114a646d618fcd73b6ae62ef58c",
                "md5": "851514563759247e18d245e4017a2df6",
                "sha256": "6d12b55bbb44637a67afe8cfcb4bff19dbe21ecf1b6c0b5c2892b5df8e41f511"
            },
            "downloads": -1,
            "filename": "horns-0.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "851514563759247e18d245e4017a2df6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11,<4.0",
            "size": 10854,
            "upload_time": "2024-03-01T17:23:28",
            "upload_time_iso_8601": "2024-03-01T17:23:28.187935Z",
            "url": "https://files.pythonhosted.org/packages/d6/88/f058caec6a2130cff7aebdbc0638481f8114a646d618fcd73b6ae62ef58c/horns-0.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-01 17:23:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sammosummo",
    "github_project": "Horns",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "horns"
}
        
Elapsed time: 2.03887s