hypex


Namehypex JSON
Version 0.1.7 PyPI version JSON
download
home_pagehttps://github.com/sb-ai-lab/HypEx
SummaryFast and customizable framework for Causal Inference
upload_time2024-08-27 08:44:54
maintainerNone
docs_urlNone
authorDmitry Tikhomirov
requires_python<3.13,>=3.8
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # HypEx: Advanced Causal Inference and AB Testing Toolkit

![Last release](https://img.shields.io/badge/pypi-v0.1.5-darkgreen)
[![Telegram](https://img.shields.io/badge/chat-on%20Telegram-2ba2d9.svg)](https://t.me/hypexchat)
![Pypi downloads](https://img.shields.io/badge/downloads-5K-1E782B)
![Python versions](https://img.shields.io/badge/python-3.8_|_3.9_|_3.10_|_3.11_|_3.12-blue)
![Pypi downloads\month](https://img.shields.io/badge/downloads\month->1K-1E782B)

## Introduction

HypEx (Hypotheses and Experiments) is a comprehensive library crafted to streamline the causal inference and AB testing
processes in data analytics. Developed for efficiency and effectiveness, HypEx employs Rubin's Causal Model (RCM) for
matching closely related pairs, ensuring equitable group comparisons when estimating treatment effects.

Boasting a fully automated pipeline, HypEx adeptly calculates the Average Treatment Effect (ATE), Average Treatment
Effect on the Treated (ATT), and Average Treatment Effect on the Control (ATC). It offers a standardized interface for
executing these estimations, providing insights into the impact of interventions across various population subgroups.

Beyond causal inference, HypEx is equipped with robust AB testing tools, including Difference-in-Differences (
Diff-in-Diff) and CUPED methods, to rigorously test hypotheses and validate experimental results.

## Features

- **Faiss KNN Matching**: Utilizes Faiss for efficient and precise nearest neighbor searches, aligning with RCM for
  optimal pair matching.
- **Data Filters**: Built-in outlier and Spearman filters ensure data quality for matching.
- **Result Validation**: Offers multiple validation methods, including random treatment, feature, and subset
  validations.
- **Data Tests**: Incorporates SMD, KS, PSI, and Repeats tests to affirm the robustness of effect estimations.
- **Feature Selection**: Employs LGBM and Catboost feature selection to pinpoint the most impactful features for causal
  analysis.
- **AB Testing Suite**: Features a suite of AB testing tools for comprehensive hypothesis evaluation.
- **Stratification support**: Stratify groups for nuanced analysis
- **Weights support**:  Empower your analysis by assigning custom weights to features, enhancing the matching precision
  to suit your specific research needs

## Warnings

Some functions in HypEx can facilitate solving specific auxiliary tasks but cannot automate decisions on experiment
design. Below, we will discuss features that are implemented in HypEx but do not automate the design of experiments.

**Note:** For Matching, it's recommended not to use more than 7 features as it might result in the curse of
dimensionality, making the results unrepresentative.

### Feature Selection

**Feature selection** models the significance of features for the accuracy of target approximation. However, it does not
rule out the possibility of overlooked features, the complex impact of features on target description, or the
significance of features from a business logic perspective. The algorithm will not function correctly if there are data
leaks.

Points to consider when selecting features:

* Data leaks - these should not be present.
* Influence on treatment distribution - features should not affect the treatment distribution.
* The target should be describable by features.
* All features significantly affecting the target should be included.
* The business rationale of features.
* The feature selection function can be useful for addressing these tasks, but it does not solve them nor does it
  absolve the user of the responsibility for their selection, nor does it justify it.

[Link to ReadTheDocs](https://hypex.readthedocs.io/en/latest/pages/modules/selectors.html#selector-classes)

### Random Treatment

**Random Treatment** algorithm randomly shuffles the actual treatment. It is expected that the treatment's effect on the
target will be close to 0.

These method is not sufficiently accurate marker of a successful experiment.

[Link to ReadTheDocs](https://hypex.readthedocs.io/en/latest/pages/modules/utils.html#validators)

## Installation

```bash
pip install -U hypex
```

## Quick start

Explore usage examples and tutorials [here](https://github.com/sb-ai-lab/Hypex/blob/master/examples/tutorials/).

### Matching example

```python
from hypex import Matcher
from hypex.utils.tutorial_data_creation import create_test_data

# Define your data and parameters
df = create_test_data(rs=42, na_step=45, nan_cols=['age', 'gender'])

info_col = ['user_id']
outcome = 'post_spends'
treatment = 'treat'
model = Matcher(input_data=df, outcome=outcome, treatment=treatment, info_col=info_col)
results, quality_results, df_matched = model.estimate()
```

### AA-test example

```python
from hypex import AATest
from hypex.utils.tutorial_data_creation import create_test_data

data = create_test_data(rs=52, na_step=10, nan_cols=['age', 'gender'])

info_cols = ['user_id', 'signup_month']
target = ['post_spends', 'pre_spends']

experiment = AATest(info_cols=info_cols, target_fields=target)
results = experiment.process(data, iterations=1000)
results.keys()
```

### AB-test example

```python
from hypex import ABTest
from hypex.utils.tutorial_data_creation import create_test_data

data = create_test_data(rs=52, na_step=10, nan_cols=['age', 'gender'])

model = ABTest()
results = model.execute(
    data=data,
    target_field='post_spends',
    target_field_before='pre_spends',
    group_field='group'
)

model.show_beautiful_result()
```

## Documentation

For more detailed information about the library and its features, visit
our [documentation on ReadTheDocs](https://hypex.readthedocs.io/en/latest/).

You'll find comprehensive guides and tutorials that will help you get started with HypEx, as well as detailed API
documentation for advanced use cases.

## Contributions

Join our vibrant community! For guidelines on contributing, reporting issues, or seeking support, please refer to
our [Contributing Guidelines](https://github.com/sb-ai-lab/Hypex/blob/master/.github/CONTRIBUTING.md).

## More Information & Resources

[Habr (ru)](https://habr.com/ru/companies/sberbank/articles/778774/) - discover how HypEx is revolutionizing causal
inference in various fields.      
[A/B testing seminar](https://www.youtube.com/watch?v=B9BE_yk8CjA&t=53s&ab_channel=NoML) - Seminar in NoML about
matching and A/B testing       
[Matching with HypEx: Simple Guide](https://www.kaggle.com/code/kseniavasilieva/matching-with-hypex-simple-guide) -
Simple matching guide with explanation           
[Matching with HypEx: Grouping](https://www.kaggle.com/code/kseniavasilieva/matching-with-hypex-grouping) - Matching
with grouping guide    
[HypEx vs Causal Inference and DoWhy](https://www.kaggle.com/code/kseniavasilieva/hypex-vs-causal-inference-and-dowhy) -
discover why HypEx is the best solution for causal inference           
[HypEx vs Causal Inference and DoWhy: part 2](https://www.kaggle.com/code/kseniavasilieva/hypex-vs-causal-inference-part-2) -
discover why HypEx is the best solution for causal inference

### Testing different libraries for the speed of matching

Visit [this](https://www.kaggle.com/code/kseniavasilieva/hypex-vs-causal-inference-part-2) notebook ain Kaggle and
estimate results by yourself.

| Group size             | 32 768 | 65 536 | 131 072 | 262 144 | 524 288 | 1 048 576 | 2 097 152 | 4 194 304 |
|------------------------|--------|--------|---------|---------|---------|-----------|-----------|-----------|
| Causal Inference       | 46s    | 169s   | None    | None    | None    | None      | None      | None      |
| DoWhy                  | 9s     | 19s    | 40s     | 77s     | 159s    | 312s      | 615s      | 1 235s    |
| HypEx with grouping    | 2s     | 6s     | 16s     | 42s     | 167s    | 509s      | 1 932s    | 7 248s    |
| HypEx without grouping | 2s     | 7s     | 21s     | 101s    | 273s    | 982s      | 3 750s    | 14 720s   |

## Join Our Community

Have questions or want to discuss HypEx? Join our [Telegram chat](https://t.me/HypExChat) and connect with the community
and the developers.

## Conclusion

HypEx stands as an indispensable resource for data analysts and researchers delving into causal inference and AB
testing. With its automated capabilities, sophisticated matching techniques, and thorough validation procedures, HypEx
is poised to unravel causal relationships in complex datasets with unprecedented speed and precision.

##                                                                               

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sb-ai-lab/HypEx",
    "name": "hypex",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Dmitry Tikhomirov",
    "author_email": "dimasta00@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/32/9e/a780f555e425f9cf93a2614c3ba551cffac2c398de39976250735b92711e/hypex-0.1.7.tar.gz",
    "platform": null,
    "description": "# HypEx: Advanced Causal Inference and AB Testing Toolkit\n\n![Last release](https://img.shields.io/badge/pypi-v0.1.5-darkgreen)\n[![Telegram](https://img.shields.io/badge/chat-on%20Telegram-2ba2d9.svg)](https://t.me/hypexchat)\n![Pypi downloads](https://img.shields.io/badge/downloads-5K-1E782B)\n![Python versions](https://img.shields.io/badge/python-3.8_|_3.9_|_3.10_|_3.11_|_3.12-blue)\n![Pypi downloads\\month](https://img.shields.io/badge/downloads\\month->1K-1E782B)\n\n## Introduction\n\nHypEx (Hypotheses and Experiments) is a comprehensive library crafted to streamline the causal inference and AB testing\nprocesses in data analytics. Developed for efficiency and effectiveness, HypEx employs Rubin's Causal Model (RCM) for\nmatching closely related pairs, ensuring equitable group comparisons when estimating treatment effects.\n\nBoasting a fully automated pipeline, HypEx adeptly calculates the Average Treatment Effect (ATE), Average Treatment\nEffect on the Treated (ATT), and Average Treatment Effect on the Control (ATC). It offers a standardized interface for\nexecuting these estimations, providing insights into the impact of interventions across various population subgroups.\n\nBeyond causal inference, HypEx is equipped with robust AB testing tools, including Difference-in-Differences (\nDiff-in-Diff) and CUPED methods, to rigorously test hypotheses and validate experimental results.\n\n## Features\n\n- **Faiss KNN Matching**: Utilizes Faiss for efficient and precise nearest neighbor searches, aligning with RCM for\n  optimal pair matching.\n- **Data Filters**: Built-in outlier and Spearman filters ensure data quality for matching.\n- **Result Validation**: Offers multiple validation methods, including random treatment, feature, and subset\n  validations.\n- **Data Tests**: Incorporates SMD, KS, PSI, and Repeats tests to affirm the robustness of effect estimations.\n- **Feature Selection**: Employs LGBM and Catboost feature selection to pinpoint the most impactful features for causal\n  analysis.\n- **AB Testing Suite**: Features a suite of AB testing tools for comprehensive hypothesis evaluation.\n- **Stratification support**: Stratify groups for nuanced analysis\n- **Weights support**:  Empower your analysis by assigning custom weights to features, enhancing the matching precision\n  to suit your specific research needs\n\n## Warnings\n\nSome functions in HypEx can facilitate solving specific auxiliary tasks but cannot automate decisions on experiment\ndesign. Below, we will discuss features that are implemented in HypEx but do not automate the design of experiments.\n\n**Note:** For Matching, it's recommended not to use more than 7 features as it might result in the curse of\ndimensionality, making the results unrepresentative.\n\n### Feature Selection\n\n**Feature selection** models the significance of features for the accuracy of target approximation. However, it does not\nrule out the possibility of overlooked features, the complex impact of features on target description, or the\nsignificance of features from a business logic perspective. The algorithm will not function correctly if there are data\nleaks.\n\nPoints to consider when selecting features:\n\n* Data leaks - these should not be present.\n* Influence on treatment distribution - features should not affect the treatment distribution.\n* The target should be describable by features.\n* All features significantly affecting the target should be included.\n* The business rationale of features.\n* The feature selection function can be useful for addressing these tasks, but it does not solve them nor does it\n  absolve the user of the responsibility for their selection, nor does it justify it.\n\n[Link to ReadTheDocs](https://hypex.readthedocs.io/en/latest/pages/modules/selectors.html#selector-classes)\n\n### Random Treatment\n\n**Random Treatment** algorithm randomly shuffles the actual treatment. It is expected that the treatment's effect on the\ntarget will be close to 0.\n\nThese method is not sufficiently accurate marker of a successful experiment.\n\n[Link to ReadTheDocs](https://hypex.readthedocs.io/en/latest/pages/modules/utils.html#validators)\n\n## Installation\n\n```bash\npip install -U hypex\n```\n\n## Quick start\n\nExplore usage examples and tutorials [here](https://github.com/sb-ai-lab/Hypex/blob/master/examples/tutorials/).\n\n### Matching example\n\n```python\nfrom hypex import Matcher\nfrom hypex.utils.tutorial_data_creation import create_test_data\n\n# Define your data and parameters\ndf = create_test_data(rs=42, na_step=45, nan_cols=['age', 'gender'])\n\ninfo_col = ['user_id']\noutcome = 'post_spends'\ntreatment = 'treat'\nmodel = Matcher(input_data=df, outcome=outcome, treatment=treatment, info_col=info_col)\nresults, quality_results, df_matched = model.estimate()\n```\n\n### AA-test example\n\n```python\nfrom hypex import AATest\nfrom hypex.utils.tutorial_data_creation import create_test_data\n\ndata = create_test_data(rs=52, na_step=10, nan_cols=['age', 'gender'])\n\ninfo_cols = ['user_id', 'signup_month']\ntarget = ['post_spends', 'pre_spends']\n\nexperiment = AATest(info_cols=info_cols, target_fields=target)\nresults = experiment.process(data, iterations=1000)\nresults.keys()\n```\n\n### AB-test example\n\n```python\nfrom hypex import ABTest\nfrom hypex.utils.tutorial_data_creation import create_test_data\n\ndata = create_test_data(rs=52, na_step=10, nan_cols=['age', 'gender'])\n\nmodel = ABTest()\nresults = model.execute(\n    data=data,\n    target_field='post_spends',\n    target_field_before='pre_spends',\n    group_field='group'\n)\n\nmodel.show_beautiful_result()\n```\n\n## Documentation\n\nFor more detailed information about the library and its features, visit\nour [documentation on ReadTheDocs](https://hypex.readthedocs.io/en/latest/).\n\nYou'll find comprehensive guides and tutorials that will help you get started with HypEx, as well as detailed API\ndocumentation for advanced use cases.\n\n## Contributions\n\nJoin our vibrant community! For guidelines on contributing, reporting issues, or seeking support, please refer to\nour [Contributing Guidelines](https://github.com/sb-ai-lab/Hypex/blob/master/.github/CONTRIBUTING.md).\n\n## More Information & Resources\n\n[Habr (ru)](https://habr.com/ru/companies/sberbank/articles/778774/) - discover how HypEx is revolutionizing causal\ninference in various fields.      \n[A/B testing seminar](https://www.youtube.com/watch?v=B9BE_yk8CjA&t=53s&ab_channel=NoML) - Seminar in NoML about\nmatching and A/B testing       \n[Matching with HypEx: Simple Guide](https://www.kaggle.com/code/kseniavasilieva/matching-with-hypex-simple-guide) -\nSimple matching guide with explanation           \n[Matching with HypEx: Grouping](https://www.kaggle.com/code/kseniavasilieva/matching-with-hypex-grouping) - Matching\nwith grouping guide    \n[HypEx vs Causal Inference and DoWhy](https://www.kaggle.com/code/kseniavasilieva/hypex-vs-causal-inference-and-dowhy) -\ndiscover why HypEx is the best solution for causal inference           \n[HypEx vs Causal Inference and DoWhy: part 2](https://www.kaggle.com/code/kseniavasilieva/hypex-vs-causal-inference-part-2) -\ndiscover why HypEx is the best solution for causal inference\n\n### Testing different libraries for the speed of matching\n\nVisit [this](https://www.kaggle.com/code/kseniavasilieva/hypex-vs-causal-inference-part-2) notebook ain Kaggle and\nestimate results by yourself.\n\n| Group size             | 32 768 | 65 536 | 131 072 | 262 144 | 524 288 | 1 048 576 | 2 097 152 | 4 194 304 |\n|------------------------|--------|--------|---------|---------|---------|-----------|-----------|-----------|\n| Causal Inference       | 46s    | 169s   | None    | None    | None    | None      | None      | None      |\n| DoWhy                  | 9s     | 19s    | 40s     | 77s     | 159s    | 312s      | 615s      | 1 235s    |\n| HypEx with grouping    | 2s     | 6s     | 16s     | 42s     | 167s    | 509s      | 1 932s    | 7 248s    |\n| HypEx without grouping | 2s     | 7s     | 21s     | 101s    | 273s    | 982s      | 3 750s    | 14 720s   |\n\n## Join Our Community\n\nHave questions or want to discuss HypEx? Join our [Telegram chat](https://t.me/HypExChat) and connect with the community\nand the developers.\n\n## Conclusion\n\nHypEx stands as an indispensable resource for data analysts and researchers delving into causal inference and AB\ntesting. With its automated capabilities, sophisticated matching techniques, and thorough validation procedures, HypEx\nis poised to unravel causal relationships in complex datasets with unprecedented speed and precision.\n\n##                                                                               \n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Fast and customizable framework for Causal Inference",
    "version": "0.1.7",
    "project_urls": {
        "Homepage": "https://github.com/sb-ai-lab/HypEx",
        "Repository": "https://github.com/sb-ai-lab/HypEx"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c9d94d62dacefb0cf4c16a6d2a7d0709ea0062a8ccc7769743abc1797b22b83",
                "md5": "d0fc0655a3c837d3f41a2636d73fb2ad",
                "sha256": "7bfbc1a07485a709d6f1a70c9caafb60ea4254eaba0124b3c61e551263de71ba"
            },
            "downloads": -1,
            "filename": "hypex-0.1.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d0fc0655a3c837d3f41a2636d73fb2ad",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.8",
            "size": 68584,
            "upload_time": "2024-08-27T08:44:52",
            "upload_time_iso_8601": "2024-08-27T08:44:52.679982Z",
            "url": "https://files.pythonhosted.org/packages/0c/9d/94d62dacefb0cf4c16a6d2a7d0709ea0062a8ccc7769743abc1797b22b83/hypex-0.1.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "329ea780f555e425f9cf93a2614c3ba551cffac2c398de39976250735b92711e",
                "md5": "2af3e6173023f63d01a9201e0a062fe7",
                "sha256": "c2367c6b8b8924c183a964636ae05e8f303fdbb8833107412b21e7e561bad3d0"
            },
            "downloads": -1,
            "filename": "hypex-0.1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "2af3e6173023f63d01a9201e0a062fe7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.8",
            "size": 63832,
            "upload_time": "2024-08-27T08:44:54",
            "upload_time_iso_8601": "2024-08-27T08:44:54.733361Z",
            "url": "https://files.pythonhosted.org/packages/32/9e/a780f555e425f9cf93a2614c3ba551cffac2c398de39976250735b92711e/hypex-0.1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-27 08:44:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sb-ai-lab",
    "github_project": "HypEx",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "hypex"
}
        
Elapsed time: 0.31763s