ab-test-toolkit


Nameab-test-toolkit JSON
Version 0.0.15 PyPI version JSON
download
home_pagehttps://github.com/k111git/ab-test-toolkit
SummaryToolkit to simulate and analyze AB tests
upload_time2023-06-20 13:51:18
maintainer
docs_urlNone
authorKolja
requires_python>=3.7
licenseApache Software License 2.0
keywords python experimentation ab-testing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ab-test-toolkit

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Install

``` sh
pip install ab_test_toolkit
```

## imports

``` python
from ab_test_toolkit.generator import (
    generate_binary_data,
    generate_continuous_data,
    data_to_contingency,
    contingency_from_counts,
)
from ab_test_toolkit.power import (
    simulate_power_binary,
    sample_size_binary,
    simulate_power_continuous,
    sample_size_continuous,
)
from ab_test_toolkit.plotting import (
    plot_power,
    plot_distribution,
    plot_betas,
    plot_binary_power,
)

from ab_test_toolkit.analyze import p_value_binary
```

## Binary target (e.g. conversion rate experiments)

### Sample size:

We can calculate the sample size required with the function
“sample_size_binary”. Input needed is:

- Conversion rate control: cr0

- Conversion rate variant for minimal detectable effect: cr1 (for
  example, if we have a conversion rate of 1% and want to detect an
  effect of at least 20% relate, we would set cr0=0.010 and cr1=0.012)

- Significance threshold: alpha. Usually set to 0.05, this defines our
  tolerance for falsely detecting an effect if in reality there is none
  (alpha=0.05 means that in 5% of the cases we will detect an effect
  even though the samples for control and variant are drawn from the
  exact same distribution).

- Statistical power. Usually set to 0.8. This means that if the effect
  is the minimal effect specified above, we have an 80% probability of
  identifying it at statistically significant (and hence 20% of not
  idenfitying it).

- one_sided: If the test is one-sided (one_sided=True) or if it is
  two-sided (one_sided=False). As a rule of thumb, if there are very
  strong reasons to believe that the variant cannot be inferior to the
  control, we can use a one sided test. In case of doubts, using a two
  sided test is better.

let us calculate the sample size for the following example:

``` python
n_sample = sample_size_binary(
    cr0=0.01,
    cr1=0.012,
    alpha=0.05,
    power=0.8,
    one_sided=True,
)
print(f"Required sample size per variant is {int(n_sample)}.")
```

    Required sample size per variant is 33560.

``` python
n_sample_two_sided = sample_size_binary(
    cr0=0.01,
    cr1=0.012,
    alpha=0.05,
    power=0.8,
    one_sided=False,
)
print(
    f"For the two-sided experiment, required sample size per variant is {int(n_sample_two_sided)}."
)
```

    For the two-sided experiment, required sample size per variant is 42606.

### Power simulations

What happens if we use a smaller sample size? And how can we understand
the sample size?

Let us analyze the statistical power with synthethic data. We can do
this with the simulate_power_binary function. We are using some default
argument here, see [this
page](https://k111git.github.io/ab-test-simulator/power.html) for more
information.

``` python
# simulation = simulate_power_binary()
```

Note: The simulation object return the total sample size, so we need to
split it per variant.

``` python
# simulation
```

Finally, we can plot the results (note: the plot function show the
sample size per variant):

``` python
# plot_power(
#     simulation,
#     added_lines=[{"sample_size": sample_size_binary(), "label": "Chi2"}],
# )
```

### Compute p-value

``` python
n0 = 5000
n1 = 5100
c0 = 450
c1 = 495
df_c = contingency_from_counts(n0, c0, n1, c1)
df_c
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|       | users | converted | not_converted | cvr      |
|-------|-------|-----------|---------------|----------|
| group |       |           |               |          |
| 0     | 5000  | 450       | 4550          | 0.090000 |
| 1     | 5100  | 495       | 4605          | 0.097059 |

</div>

``` python
p_value_binary(df_c)
```

    0.11824221841149218

### The problem of peaking

wip

## Contunious target (e.g. average)

Here we assume normally distributed data (which usually holds due to the
central limit theorem).

### Sample size

We can calculate the sample size required with the function
“sample_size_continuous”. Input needed is:

- mu1: Mean of the control group

- mu2: Mean of the variant group assuming minimal detectable effect
  (e.g. if the mean it 5, and we want to detect an effect as small as
  0.05, mu1=5.00 and mu2=5.05)

- sigma: Standard deviation (we assume the same for variant and control,
  should be estimated from historical data)

- alpha, power, one_sided: as in the binary case

Let us calculate an example:

``` python
n_sample = sample_size_continuous(
    mu1=5.0, mu2=5.05, sigma=1, alpha=0.05, power=0.8, one_sided=True
)
print(f"Required sample size per variant is {int(n_sample)}.")
```

Let us also do some simulations. These show results for the t-test as
well as bayesian testing (only 1-sided).

``` python
# simulation = simulate_power_continuous()
```

``` python
# plot_power(
#     simulation,
#     added_lines=[
#         {"sample_size": continuous_sample_size(), "label": "Formula"}
#     ],
# )
```

## Data Generators

We can also use the data generators for example data to analyze or
visualuze as if they were experiments.

Distribution without effect:

``` python
df_continuous = generate_continuous_data(effect=0)
# plot_distribution(df_continuous)
```

Distribution with effect:

``` python
df_continuous = generate_continuous_data(effect=1)
# plot_distribution(df_continuous)
```

## Visualizations

Plot beta distributions for a contingency table:

``` python
df = generate_binary_data()
df_contingency = data_to_contingency(df)
# fig = plot_betas(df_contingency, xmin=0, xmax=0.04)
```

## False positives

``` python
# simulation = simulate_power_binary(cr0=0.01, cr1=0.01, one_sided=False)
```

``` python
# plot_power(simulation, is_effect=False)
```

``` python
# simulation = simulate_power_binary(cr0=0.01, cr1=0.01, one_sided=True)
# plot_power(simulation, is_effect=False)
```



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/k111git/ab-test-toolkit",
    "name": "ab-test-toolkit",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "python experimentation AB-testing",
    "author": "Kolja",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/e3/7d/0a2e5bb7b5a3b4873e5a441df2b9857468479cf477559bce6658ea450b46/ab-test-toolkit-0.0.15.tar.gz",
    "platform": null,
    "description": "# ab-test-toolkit\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n## Install\n\n``` sh\npip install ab_test_toolkit\n```\n\n## imports\n\n``` python\nfrom ab_test_toolkit.generator import (\n    generate_binary_data,\n    generate_continuous_data,\n    data_to_contingency,\n    contingency_from_counts,\n)\nfrom ab_test_toolkit.power import (\n    simulate_power_binary,\n    sample_size_binary,\n    simulate_power_continuous,\n    sample_size_continuous,\n)\nfrom ab_test_toolkit.plotting import (\n    plot_power,\n    plot_distribution,\n    plot_betas,\n    plot_binary_power,\n)\n\nfrom ab_test_toolkit.analyze import p_value_binary\n```\n\n## Binary target (e.g.\u00a0conversion rate experiments)\n\n### Sample size:\n\nWe can calculate the sample size required with the function\n\u201csample_size_binary\u201d. Input needed is:\n\n- Conversion rate control: cr0\n\n- Conversion rate variant for minimal detectable effect: cr1 (for\n  example, if we have a conversion rate of 1% and want to detect an\n  effect of at least 20% relate, we would set cr0=0.010 and cr1=0.012)\n\n- Significance threshold: alpha. Usually set to 0.05, this defines our\n  tolerance for falsely detecting an effect if in reality there is none\n  (alpha=0.05 means that in 5% of the cases we will detect an effect\n  even though the samples for control and variant are drawn from the\n  exact same distribution).\n\n- Statistical power. Usually set to 0.8. This means that if the effect\n  is the minimal effect specified above, we have an 80% probability of\n  identifying it at statistically significant (and hence 20% of not\n  idenfitying it).\n\n- one_sided: If the test is one-sided (one_sided=True) or if it is\n  two-sided (one_sided=False). As a rule of thumb, if there are very\n  strong reasons to believe that the variant cannot be inferior to the\n  control, we can use a one sided test. In case of doubts, using a two\n  sided test is better.\n\nlet us calculate the sample size for the following example:\n\n``` python\nn_sample = sample_size_binary(\n    cr0=0.01,\n    cr1=0.012,\n    alpha=0.05,\n    power=0.8,\n    one_sided=True,\n)\nprint(f\"Required sample size per variant is {int(n_sample)}.\")\n```\n\n    Required sample size per variant is 33560.\n\n``` python\nn_sample_two_sided = sample_size_binary(\n    cr0=0.01,\n    cr1=0.012,\n    alpha=0.05,\n    power=0.8,\n    one_sided=False,\n)\nprint(\n    f\"For the two-sided experiment, required sample size per variant is {int(n_sample_two_sided)}.\"\n)\n```\n\n    For the two-sided experiment, required sample size per variant is 42606.\n\n### Power simulations\n\nWhat happens if we use a smaller sample size? And how can we understand\nthe sample size?\n\nLet us analyze the statistical power with synthethic data. We can do\nthis with the simulate_power_binary function. We are using some default\nargument here, see [this\npage](https://k111git.github.io/ab-test-simulator/power.html) for more\ninformation.\n\n``` python\n# simulation = simulate_power_binary()\n```\n\nNote: The simulation object return the total sample size, so we need to\nsplit it per variant.\n\n``` python\n# simulation\n```\n\nFinally, we can plot the results (note: the plot function show the\nsample size per variant):\n\n``` python\n# plot_power(\n#     simulation,\n#     added_lines=[{\"sample_size\": sample_size_binary(), \"label\": \"Chi2\"}],\n# )\n```\n\n### Compute p-value\n\n``` python\nn0 = 5000\nn1 = 5100\nc0 = 450\nc1 = 495\ndf_c = contingency_from_counts(n0, c0, n1, c1)\ndf_c\n```\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|       | users | converted | not_converted | cvr      |\n|-------|-------|-----------|---------------|----------|\n| group |       |           |               |          |\n| 0     | 5000  | 450       | 4550          | 0.090000 |\n| 1     | 5100  | 495       | 4605          | 0.097059 |\n\n</div>\n\n``` python\np_value_binary(df_c)\n```\n\n    0.11824221841149218\n\n### The problem of peaking\n\nwip\n\n## Contunious target (e.g.\u00a0average)\n\nHere we assume normally distributed data (which usually holds due to the\ncentral limit theorem).\n\n### Sample size\n\nWe can calculate the sample size required with the function\n\u201csample_size_continuous\u201d. Input needed is:\n\n- mu1: Mean of the control group\n\n- mu2: Mean of the variant group assuming minimal detectable effect\n  (e.g.\u00a0if the mean it 5, and we want to detect an effect as small as\n  0.05, mu1=5.00 and mu2=5.05)\n\n- sigma: Standard deviation (we assume the same for variant and control,\n  should be estimated from historical data)\n\n- alpha, power, one_sided: as in the binary case\n\nLet us calculate an example:\n\n``` python\nn_sample = sample_size_continuous(\n    mu1=5.0, mu2=5.05, sigma=1, alpha=0.05, power=0.8, one_sided=True\n)\nprint(f\"Required sample size per variant is {int(n_sample)}.\")\n```\n\nLet us also do some simulations. These show results for the t-test as\nwell as bayesian testing (only 1-sided).\n\n``` python\n# simulation = simulate_power_continuous()\n```\n\n``` python\n# plot_power(\n#     simulation,\n#     added_lines=[\n#         {\"sample_size\": continuous_sample_size(), \"label\": \"Formula\"}\n#     ],\n# )\n```\n\n## Data Generators\n\nWe can also use the data generators for example data to analyze or\nvisualuze as if they were experiments.\n\nDistribution without effect:\n\n``` python\ndf_continuous = generate_continuous_data(effect=0)\n# plot_distribution(df_continuous)\n```\n\nDistribution with effect:\n\n``` python\ndf_continuous = generate_continuous_data(effect=1)\n# plot_distribution(df_continuous)\n```\n\n## Visualizations\n\nPlot beta distributions for a contingency table:\n\n``` python\ndf = generate_binary_data()\ndf_contingency = data_to_contingency(df)\n# fig = plot_betas(df_contingency, xmin=0, xmax=0.04)\n```\n\n## False positives\n\n``` python\n# simulation = simulate_power_binary(cr0=0.01, cr1=0.01, one_sided=False)\n```\n\n``` python\n# plot_power(simulation, is_effect=False)\n```\n\n``` python\n# simulation = simulate_power_binary(cr0=0.01, cr1=0.01, one_sided=True)\n# plot_power(simulation, is_effect=False)\n```\n\n\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "Toolkit to simulate and analyze AB tests",
    "version": "0.0.15",
    "project_urls": {
        "Homepage": "https://github.com/k111git/ab-test-toolkit"
    },
    "split_keywords": [
        "python",
        "experimentation",
        "ab-testing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "07ee7e9f319c3657e0acd5b00ea507ff060c6726bff8d4a3cc413dac940e821b",
                "md5": "0e55583ced2739d1d580af3d1f99a6e1",
                "sha256": "1aaf1fcdbf1ed0dce651113a695138238c996a813d576aedda9ff0484bac8c16"
            },
            "downloads": -1,
            "filename": "ab_test_toolkit-0.0.15-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0e55583ced2739d1d580af3d1f99a6e1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 16993,
            "upload_time": "2023-06-20T13:51:16",
            "upload_time_iso_8601": "2023-06-20T13:51:16.516016Z",
            "url": "https://files.pythonhosted.org/packages/07/ee/7e9f319c3657e0acd5b00ea507ff060c6726bff8d4a3cc413dac940e821b/ab_test_toolkit-0.0.15-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e37d0a2e5bb7b5a3b4873e5a441df2b9857468479cf477559bce6658ea450b46",
                "md5": "e5a61f6aaaf3b0208c3bab7182f97c0d",
                "sha256": "fd157681885de12a1aa45cf39112b99b9f5bb16f30ec5058f09d924652ea7727"
            },
            "downloads": -1,
            "filename": "ab-test-toolkit-0.0.15.tar.gz",
            "has_sig": false,
            "md5_digest": "e5a61f6aaaf3b0208c3bab7182f97c0d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 17444,
            "upload_time": "2023-06-20T13:51:18",
            "upload_time_iso_8601": "2023-06-20T13:51:18.902152Z",
            "url": "https://files.pythonhosted.org/packages/e3/7d/0a2e5bb7b5a3b4873e5a441df2b9857468479cf477559bce6658ea450b46/ab-test-toolkit-0.0.15.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-20 13:51:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "k111git",
    "github_project": "ab-test-toolkit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "ab-test-toolkit"
}
        
Elapsed time: 0.08649s