# ab-test-toolkit
<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
## Install
``` sh
pip install ab_test_toolkit
```
## imports
``` python
from ab_test_toolkit.generator import (
generate_binary_data,
generate_continuous_data,
data_to_contingency,
contingency_from_counts,
)
from ab_test_toolkit.power import (
simulate_power_binary,
sample_size_binary,
simulate_power_continuous,
sample_size_continuous,
)
from ab_test_toolkit.plotting import (
plot_power,
plot_distribution,
plot_betas,
plot_binary_power,
)
from ab_test_toolkit.analyze import p_value_binary
```
## Binary target (e.g. conversion rate experiments)
### Sample size:
We can calculate the sample size required with the function
“sample_size_binary”. Input needed is:
- Conversion rate control: cr0
- Conversion rate variant for minimal detectable effect: cr1 (for
example, if we have a conversion rate of 1% and want to detect an
effect of at least 20% relate, we would set cr0=0.010 and cr1=0.012)
- Significance threshold: alpha. Usually set to 0.05, this defines our
tolerance for falsely detecting an effect if in reality there is none
(alpha=0.05 means that in 5% of the cases we will detect an effect
even though the samples for control and variant are drawn from the
exact same distribution).
- Statistical power. Usually set to 0.8. This means that if the effect
is the minimal effect specified above, we have an 80% probability of
identifying it at statistically significant (and hence 20% of not
idenfitying it).
- one_sided: If the test is one-sided (one_sided=True) or if it is
two-sided (one_sided=False). As a rule of thumb, if there are very
strong reasons to believe that the variant cannot be inferior to the
control, we can use a one sided test. In case of doubts, using a two
sided test is better.
let us calculate the sample size for the following example:
``` python
n_sample = sample_size_binary(
cr0=0.01,
cr1=0.012,
alpha=0.05,
power=0.8,
one_sided=True,
)
print(f"Required sample size per variant is {int(n_sample)}.")
```
Required sample size per variant is 33560.
``` python
n_sample_two_sided = sample_size_binary(
cr0=0.01,
cr1=0.012,
alpha=0.05,
power=0.8,
one_sided=False,
)
print(
f"For the two-sided experiment, required sample size per variant is {int(n_sample_two_sided)}."
)
```
For the two-sided experiment, required sample size per variant is 42606.
### Power simulations
What happens if we use a smaller sample size? And how can we understand
the sample size?
Let us analyze the statistical power with synthethic data. We can do
this with the simulate_power_binary function. We are using some default
argument here, see [this
page](https://k111git.github.io/ab-test-simulator/power.html) for more
information.
``` python
# simulation = simulate_power_binary()
```
Note: The simulation object return the total sample size, so we need to
split it per variant.
``` python
# simulation
```
Finally, we can plot the results (note: the plot function show the
sample size per variant):
``` python
# plot_power(
# simulation,
# added_lines=[{"sample_size": sample_size_binary(), "label": "Chi2"}],
# )
```
### Compute p-value
``` python
n0 = 5000
n1 = 5100
c0 = 450
c1 = 495
df_c = contingency_from_counts(n0, c0, n1, c1)
df_c
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | users | converted | not_converted | cvr |
|-------|-------|-----------|---------------|----------|
| group | | | | |
| 0 | 5000 | 450 | 4550 | 0.090000 |
| 1 | 5100 | 495 | 4605 | 0.097059 |
</div>
``` python
p_value_binary(df_c)
```
0.11824221841149218
### The problem of peaking
wip
## Contunious target (e.g. average)
Here we assume normally distributed data (which usually holds due to the
central limit theorem).
### Sample size
We can calculate the sample size required with the function
“sample_size_continuous”. Input needed is:
- mu1: Mean of the control group
- mu2: Mean of the variant group assuming minimal detectable effect
(e.g. if the mean it 5, and we want to detect an effect as small as
0.05, mu1=5.00 and mu2=5.05)
- sigma: Standard deviation (we assume the same for variant and control,
should be estimated from historical data)
- alpha, power, one_sided: as in the binary case
Let us calculate an example:
``` python
n_sample = sample_size_continuous(
mu1=5.0, mu2=5.05, sigma=1, alpha=0.05, power=0.8, one_sided=True
)
print(f"Required sample size per variant is {int(n_sample)}.")
```
Let us also do some simulations. These show results for the t-test as
well as bayesian testing (only 1-sided).
``` python
# simulation = simulate_power_continuous()
```
``` python
# plot_power(
# simulation,
# added_lines=[
# {"sample_size": continuous_sample_size(), "label": "Formula"}
# ],
# )
```
## Data Generators
We can also use the data generators for example data to analyze or
visualuze as if they were experiments.
Distribution without effect:
``` python
df_continuous = generate_continuous_data(effect=0)
# plot_distribution(df_continuous)
```
Distribution with effect:
``` python
df_continuous = generate_continuous_data(effect=1)
# plot_distribution(df_continuous)
```
## Visualizations
Plot beta distributions for a contingency table:
``` python
df = generate_binary_data()
df_contingency = data_to_contingency(df)
# fig = plot_betas(df_contingency, xmin=0, xmax=0.04)
```
## False positives
``` python
# simulation = simulate_power_binary(cr0=0.01, cr1=0.01, one_sided=False)
```
``` python
# plot_power(simulation, is_effect=False)
```
``` python
# simulation = simulate_power_binary(cr0=0.01, cr1=0.01, one_sided=True)
# plot_power(simulation, is_effect=False)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/k111git/ab-test-toolkit",
"name": "ab-test-toolkit",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "python experimentation AB-testing",
"author": "Kolja",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/e3/7d/0a2e5bb7b5a3b4873e5a441df2b9857468479cf477559bce6658ea450b46/ab-test-toolkit-0.0.15.tar.gz",
"platform": null,
"description": "# ab-test-toolkit\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n## Install\n\n``` sh\npip install ab_test_toolkit\n```\n\n## imports\n\n``` python\nfrom ab_test_toolkit.generator import (\n generate_binary_data,\n generate_continuous_data,\n data_to_contingency,\n contingency_from_counts,\n)\nfrom ab_test_toolkit.power import (\n simulate_power_binary,\n sample_size_binary,\n simulate_power_continuous,\n sample_size_continuous,\n)\nfrom ab_test_toolkit.plotting import (\n plot_power,\n plot_distribution,\n plot_betas,\n plot_binary_power,\n)\n\nfrom ab_test_toolkit.analyze import p_value_binary\n```\n\n## Binary target (e.g.\u00a0conversion rate experiments)\n\n### Sample size:\n\nWe can calculate the sample size required with the function\n\u201csample_size_binary\u201d. Input needed is:\n\n- Conversion rate control: cr0\n\n- Conversion rate variant for minimal detectable effect: cr1 (for\n example, if we have a conversion rate of 1% and want to detect an\n effect of at least 20% relate, we would set cr0=0.010 and cr1=0.012)\n\n- Significance threshold: alpha. Usually set to 0.05, this defines our\n tolerance for falsely detecting an effect if in reality there is none\n (alpha=0.05 means that in 5% of the cases we will detect an effect\n even though the samples for control and variant are drawn from the\n exact same distribution).\n\n- Statistical power. Usually set to 0.8. This means that if the effect\n is the minimal effect specified above, we have an 80% probability of\n identifying it at statistically significant (and hence 20% of not\n idenfitying it).\n\n- one_sided: If the test is one-sided (one_sided=True) or if it is\n two-sided (one_sided=False). As a rule of thumb, if there are very\n strong reasons to believe that the variant cannot be inferior to the\n control, we can use a one sided test. In case of doubts, using a two\n sided test is better.\n\nlet us calculate the sample size for the following example:\n\n``` python\nn_sample = sample_size_binary(\n cr0=0.01,\n cr1=0.012,\n alpha=0.05,\n power=0.8,\n one_sided=True,\n)\nprint(f\"Required sample size per variant is {int(n_sample)}.\")\n```\n\n Required sample size per variant is 33560.\n\n``` python\nn_sample_two_sided = sample_size_binary(\n cr0=0.01,\n cr1=0.012,\n alpha=0.05,\n power=0.8,\n one_sided=False,\n)\nprint(\n f\"For the two-sided experiment, required sample size per variant is {int(n_sample_two_sided)}.\"\n)\n```\n\n For the two-sided experiment, required sample size per variant is 42606.\n\n### Power simulations\n\nWhat happens if we use a smaller sample size? And how can we understand\nthe sample size?\n\nLet us analyze the statistical power with synthethic data. We can do\nthis with the simulate_power_binary function. We are using some default\nargument here, see [this\npage](https://k111git.github.io/ab-test-simulator/power.html) for more\ninformation.\n\n``` python\n# simulation = simulate_power_binary()\n```\n\nNote: The simulation object return the total sample size, so we need to\nsplit it per variant.\n\n``` python\n# simulation\n```\n\nFinally, we can plot the results (note: the plot function show the\nsample size per variant):\n\n``` python\n# plot_power(\n# simulation,\n# added_lines=[{\"sample_size\": sample_size_binary(), \"label\": \"Chi2\"}],\n# )\n```\n\n### Compute p-value\n\n``` python\nn0 = 5000\nn1 = 5100\nc0 = 450\nc1 = 495\ndf_c = contingency_from_counts(n0, c0, n1, c1)\ndf_c\n```\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | users | converted | not_converted | cvr |\n|-------|-------|-----------|---------------|----------|\n| group | | | | |\n| 0 | 5000 | 450 | 4550 | 0.090000 |\n| 1 | 5100 | 495 | 4605 | 0.097059 |\n\n</div>\n\n``` python\np_value_binary(df_c)\n```\n\n 0.11824221841149218\n\n### The problem of peaking\n\nwip\n\n## Contunious target (e.g.\u00a0average)\n\nHere we assume normally distributed data (which usually holds due to the\ncentral limit theorem).\n\n### Sample size\n\nWe can calculate the sample size required with the function\n\u201csample_size_continuous\u201d. Input needed is:\n\n- mu1: Mean of the control group\n\n- mu2: Mean of the variant group assuming minimal detectable effect\n (e.g.\u00a0if the mean it 5, and we want to detect an effect as small as\n 0.05, mu1=5.00 and mu2=5.05)\n\n- sigma: Standard deviation (we assume the same for variant and control,\n should be estimated from historical data)\n\n- alpha, power, one_sided: as in the binary case\n\nLet us calculate an example:\n\n``` python\nn_sample = sample_size_continuous(\n mu1=5.0, mu2=5.05, sigma=1, alpha=0.05, power=0.8, one_sided=True\n)\nprint(f\"Required sample size per variant is {int(n_sample)}.\")\n```\n\nLet us also do some simulations. These show results for the t-test as\nwell as bayesian testing (only 1-sided).\n\n``` python\n# simulation = simulate_power_continuous()\n```\n\n``` python\n# plot_power(\n# simulation,\n# added_lines=[\n# {\"sample_size\": continuous_sample_size(), \"label\": \"Formula\"}\n# ],\n# )\n```\n\n## Data Generators\n\nWe can also use the data generators for example data to analyze or\nvisualuze as if they were experiments.\n\nDistribution without effect:\n\n``` python\ndf_continuous = generate_continuous_data(effect=0)\n# plot_distribution(df_continuous)\n```\n\nDistribution with effect:\n\n``` python\ndf_continuous = generate_continuous_data(effect=1)\n# plot_distribution(df_continuous)\n```\n\n## Visualizations\n\nPlot beta distributions for a contingency table:\n\n``` python\ndf = generate_binary_data()\ndf_contingency = data_to_contingency(df)\n# fig = plot_betas(df_contingency, xmin=0, xmax=0.04)\n```\n\n## False positives\n\n``` python\n# simulation = simulate_power_binary(cr0=0.01, cr1=0.01, one_sided=False)\n```\n\n``` python\n# plot_power(simulation, is_effect=False)\n```\n\n``` python\n# simulation = simulate_power_binary(cr0=0.01, cr1=0.01, one_sided=True)\n# plot_power(simulation, is_effect=False)\n```\n\n\n",
"bugtrack_url": null,
"license": "Apache Software License 2.0",
"summary": "Toolkit to simulate and analyze AB tests",
"version": "0.0.15",
"project_urls": {
"Homepage": "https://github.com/k111git/ab-test-toolkit"
},
"split_keywords": [
"python",
"experimentation",
"ab-testing"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "07ee7e9f319c3657e0acd5b00ea507ff060c6726bff8d4a3cc413dac940e821b",
"md5": "0e55583ced2739d1d580af3d1f99a6e1",
"sha256": "1aaf1fcdbf1ed0dce651113a695138238c996a813d576aedda9ff0484bac8c16"
},
"downloads": -1,
"filename": "ab_test_toolkit-0.0.15-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0e55583ced2739d1d580af3d1f99a6e1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 16993,
"upload_time": "2023-06-20T13:51:16",
"upload_time_iso_8601": "2023-06-20T13:51:16.516016Z",
"url": "https://files.pythonhosted.org/packages/07/ee/7e9f319c3657e0acd5b00ea507ff060c6726bff8d4a3cc413dac940e821b/ab_test_toolkit-0.0.15-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e37d0a2e5bb7b5a3b4873e5a441df2b9857468479cf477559bce6658ea450b46",
"md5": "e5a61f6aaaf3b0208c3bab7182f97c0d",
"sha256": "fd157681885de12a1aa45cf39112b99b9f5bb16f30ec5058f09d924652ea7727"
},
"downloads": -1,
"filename": "ab-test-toolkit-0.0.15.tar.gz",
"has_sig": false,
"md5_digest": "e5a61f6aaaf3b0208c3bab7182f97c0d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 17444,
"upload_time": "2023-06-20T13:51:18",
"upload_time_iso_8601": "2023-06-20T13:51:18.902152Z",
"url": "https://files.pythonhosted.org/packages/e3/7d/0a2e5bb7b5a3b4873e5a441df2b9857468479cf477559bce6658ea450b46/ab-test-toolkit-0.0.15.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-20 13:51:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "k111git",
"github_project": "ab-test-toolkit",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "ab-test-toolkit"
}