many


Namemany JSON
Version 0.7.2 PyPI version JSON
download
home_pagehttps://github.com/kevinhu/many
SummaryStatistical methods for computing many correlations
upload_time2024-04-07 17:44:55
maintainerNone
docs_urlNone
authorKevin Hu
requires_python<4.0,>=3.9
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # many

This package provides a general-use toolkit for frequently-implemented statistical and visual methods. See the [blog post](https://kevinhu.io/2020/09/17/many.html) for an explanation of the purpose of this package and the methods used.

**[Full documentation](https://many.kevinhu.io)**

## Installation

```bash
pip install many
```

Note: if you want to use CUDA-accelerated statistical methods (i.e. `many.stats.mat_mwu_gpu`), you must also independently install the corresponding version of [cupy](https://github.com/cupy/cupy).

## Components

### Statistical methods

The statistical methods comprise several functions for association mining between variable pairs. These methods are optimized for `pandas` DataFrames and are inspired by the `corrcoef` function provided by `numpy`.

Because these functions rely on native matrix-level operations provided by `numpy`, many are orders of magnitude faster than naive looping-based alternatives. This makes them useful for constructing large association networks or for feature extraction, which have important uses in areas such as biomarker discovery. All methods also return estimates of statistical significance.

In certain cases such as the computation of correlation coefficients, **these vectorized methods come with the caveat of [numerical instability](https://stats.stackexchange.com/questions/94056/instability-of-one-pass-algorithm-for-correlation-coefficient)**. As a compromise, "naive" loop-based implementations are also provided for testing and comparison. It is recommended that any significant results obtained with the vectorized methods be verified with these base methods.

The current functions available are listed below by variable comparison type. Benchmarks are also provided with comparisons to the equivalent looping-based method. In all methods, a `melt` option is provided to return the outputs as a set of row-column variable-variable pair statistic matrices or as a single `DataFrame` with each statistic melted to a column.

### Visual methods

Several visual methods are also included for interpretation of results from the statistical methods. Like the statistical methods, these are also grouped by variable types plotted.

## Development

1. Install dependencies with `poetry install`
2. Initialize environment with `poetry shell`
3. Initialize pre-commit hooks with `pre-commit install`

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kevinhu/many",
    "name": "many",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Kevin Hu",
    "author_email": "kevinhuwest@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/74/85/a4bfeec8dbe9ebec3c56a4324c5e38e3383305f5f9e3fee78e4db6c55aed/many-0.7.2.tar.gz",
    "platform": null,
    "description": "# many\n\nThis package provides a general-use toolkit for frequently-implemented statistical and visual methods. See the [blog post](https://kevinhu.io/2020/09/17/many.html) for an explanation of the purpose of this package and the methods used.\n\n**[Full documentation](https://many.kevinhu.io)**\n\n## Installation\n\n```bash\npip install many\n```\n\nNote: if you want to use CUDA-accelerated statistical methods (i.e. `many.stats.mat_mwu_gpu`), you must also independently install the corresponding version of [cupy](https://github.com/cupy/cupy).\n\n## Components\n\n### Statistical methods\n\nThe statistical methods comprise several functions for association mining between variable pairs. These methods are optimized for `pandas` DataFrames and are inspired by the `corrcoef` function provided by `numpy`.\n\nBecause these functions rely on native matrix-level operations provided by `numpy`, many are orders of magnitude faster than naive looping-based alternatives. This makes them useful for constructing large association networks or for feature extraction, which have important uses in areas such as biomarker discovery. All methods also return estimates of statistical significance.\n\nIn certain cases such as the computation of correlation coefficients, **these vectorized methods come with the caveat of [numerical instability](https://stats.stackexchange.com/questions/94056/instability-of-one-pass-algorithm-for-correlation-coefficient)**. As a compromise, \"naive\" loop-based implementations are also provided for testing and comparison. It is recommended that any significant results obtained with the vectorized methods be verified with these base methods.\n\nThe current functions available are listed below by variable comparison type. Benchmarks are also provided with comparisons to the equivalent looping-based method. In all methods, a `melt` option is provided to return the outputs as a set of row-column variable-variable pair statistic matrices or as a single `DataFrame` with each statistic melted to a column.\n\n### Visual methods\n\nSeveral visual methods are also included for interpretation of results from the statistical methods. Like the statistical methods, these are also grouped by variable types plotted.\n\n## Development\n\n1. Install dependencies with `poetry install`\n2. Initialize environment with `poetry shell`\n3. Initialize pre-commit hooks with `pre-commit install`\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Statistical methods for computing many correlations",
    "version": "0.7.2",
    "project_urls": {
        "Homepage": "https://github.com/kevinhu/many",
        "Repository": "https://github.com/kevinhu/many"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b78fa4eb83bcfa75c289c6ae305676b2d9f93919123dfcc0130665bb2aca4986",
                "md5": "6d86700559d7def05a7bca238afbc69d",
                "sha256": "da14b65489672d1f81ee4aa5f0362efd9cd1a012a1390d753a32a1fef06553f7"
            },
            "downloads": -1,
            "filename": "many-0.7.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6d86700559d7def05a7bca238afbc69d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 27798,
            "upload_time": "2024-04-07T17:44:54",
            "upload_time_iso_8601": "2024-04-07T17:44:54.461235Z",
            "url": "https://files.pythonhosted.org/packages/b7/8f/a4eb83bcfa75c289c6ae305676b2d9f93919123dfcc0130665bb2aca4986/many-0.7.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7485a4bfeec8dbe9ebec3c56a4324c5e38e3383305f5f9e3fee78e4db6c55aed",
                "md5": "b7f27a583000dd6516fe6581acd9a9de",
                "sha256": "5002d88c49b6bfcab3476d65675ab5e7557a9ff7ea3d09b6a6aaaf12e6b7a196"
            },
            "downloads": -1,
            "filename": "many-0.7.2.tar.gz",
            "has_sig": false,
            "md5_digest": "b7f27a583000dd6516fe6581acd9a9de",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 22380,
            "upload_time": "2024-04-07T17:44:55",
            "upload_time_iso_8601": "2024-04-07T17:44:55.690573Z",
            "url": "https://files.pythonhosted.org/packages/74/85/a4bfeec8dbe9ebec3c56a4324c5e38e3383305f5f9e3fee78e4db6c55aed/many-0.7.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-07 17:44:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kevinhu",
    "github_project": "many",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "many"
}
        
Elapsed time: 0.19779s