sreg


Namesreg JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/jutrifonov/sreg.py
SummaryStratified Randomized Experiments
upload_time2024-08-19 22:12:41
maintainerNone
docs_urlNone
authorJuri Trifonov, Yuehao Bai, Azeem Shaikh, Max Tabord-Meehan
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Stratified Randomized Experiments (Python™ Edition) <img src="https://github.com/jutrifonov/sreg.dev/blob/main/logo.png" align="right" height="250" /></a>
![version](https://img.shields.io/badge/sreg-v.1.0.0.9000-green?style=flat&logo=github&labelColor=2A2523)
[![codecov](https://codecov.io/github/jutrifonov/sreg/graph/badge.svg?token=KAUXB0ETCA)](https://app.codecov.io/github/jutrifonov/sreg)

The `sreg` package offers a toolkit for estimating average treatment effects (ATEs) in stratified randomized experiments. The package is designed to accommodate scenarios with multiple treatments and cluster-level treatment assignments, and accomodates optimal linear covariate adjustment based on baseline observable characteristics. The package computes estimators and standard errors based on Bugni, Canay, Shaikh (2018); Bugni, Canay, Shaikh, Tabord-Meehan (2023); and Jiang, Linton, Tang, Zhang (2023).

**Dependencies:** `numpy`, `pandas`, `scipy`

## Authors
- Juri Trifonov jutrifonov@uchicago.edu

- Yuehao Bai yuehao.bai@usc.edu

- Azeem Shaikh amshaikh@uchicago.edu

- Max Tabord-Meehan maxtm@uchicago.edu

## Supplementary files 
-  Sketch of the derivation of the ATE variance estimator under cluster-level treatment assignment: [Download PDF](https://github.com/jutrifonov/sreg.dev/blob/main/cluster.pdf)

-  Expressions for the multiple treatment case (with and without clusters): [Download PDF](https://github.com/jutrifonov/sreg.dev/blob/main/multiple.pdf)

## Installation
The official released version can be installed from `PyPI`.
```
pip install sreg
```

The latest development version can be installed from `GitHub`. 
```
pip install git+https://github.com/jutrifonov/sreg.py
```

## The function `sreg()`
Estimates the ATE(s) and the corresponding standard error(s) for a (collection of) treatment(s) relative to a control.

### Syntax
```python
sreg(Y, S = None, D = None, G_id = None, Ng = None, X = None, HC1 = True)
```
### Arguments
- **`Y (float)` -** a `numpy.array` of the observed outcomes;
- **`S (int)` -** a `numpy.array` of strata indicators $\\{0, 1, 2, \ldots\\}$; if `None` then the estimation is performed assuming no stratification;
- **`D (int)` -** a `numpy.array` of treatments indexed by $\\{0, 1, 2, \ldots\\}$, where `D = 0` denotes the control;
- **`G_id (int)` -** a `numpy.array` of cluster indicators; if `None` then estimation is performed assuming treatment is assigned at the individual level;
- **`Ng (int)` -** a `numpy.array` of cluster sizes; if `None` then `Ng` is assumed to be equal to the number of available observations in every cluster;
- **`X (DataFrame)` -** a `pandas.DataFrame` with columns representing the covariate values for every observation; if `None` then the estimator without linear adjustments is applied [^*];
- **`HC1 (bool)` -** a `True/False` logical argument indicating whether the small sample correction should be applied to the variance estimator.
[^*]: *Note: sreg cannot use individual-level covariates for covariate adjustment in cluster-randomized experiments. Any individual-level covariates will be aggregated to their cluster-level averages.*

### Data Structure
Here we provide an example of a data frame that can be used with `sreg`.
``` r
|       Y      | S | D | G_id | Ng |     X1     |      X2       |
|--------------|---|---|------|----|------------|---------------|
| -0.57773576  | 2 | 0 |  1   | 10 |  1.5597899 |  0.03023334   |
|  1.69495638  | 2 | 0 |  1   | 10 |  1.5597899 |  0.03023334   |
|  2.02033740  | 4 | 2 |  2   | 30 |  0.8747419 | -0.77090031   |
|  1.22020493  | 4 | 2 |  2   | 30 |  0.8747419 | -0.77090031   |
|  1.64466086  | 4 | 2 |  2   | 30 |  0.8747419 | -0.77090031   |
| -0.32365109  | 4 | 2 |  2   | 30 |  0.8747419 | -0.77090031   |
|  2.21008191  | 4 | 2 |  2   | 30 |  0.8747419 | -0.77090031   |
| -2.25064316  | 4 | 2 |  2   | 30 |  0.8747419 | -0.77090031   |
|  0.37962312  | 4 | 2 |  2   | 30 |  0.8747419 | -0.77090031   |
```
### Summary

`sreg` prints a *"Stata-style"* table containing the ATE estimates, corresponding standard errors, $t$-statistics, $p$-values, $95$% asymptotic confidence intervals, and significance indicators for different levels $\alpha$. The example of the printed output is provided below.
```python
Saturated Model Estimation Results under CAR with clusters
Observations: 24680
Clusters: 1000
Number of treatments: 2
Number of strata: 5
---
Coefficients:
     Tau   As.se   T-stat  P-value  CI.left(95%)  CI.right(95%) Significance
-0.03836 0.09008 -0.42585  0.67021      -0.21491        0.13819             
 0.80719 0.09096  8.87453  0.00000       0.62892        0.98546          ***
---
Signif. codes:  0 `***` 0.001 `**` 0.01 `*` 0.05 `.` 0.1 ` ` 1
```
### Return Value

Returns an object of class `Sreg` that is a dictionary containing the following elements:

- **`tau_hat` -**  a numpy array of shape $(1, |A|)$ containing the ATE estimates, where $|A|$ represents the number of treatments;
  
- **`se_rob` -** a numpy array of shape $(1, |A|)$ containing the standard error estimates, where $|A|$ represents the number of treatments;
  
- **`t_stat` -** a numpy array of shape $(1, |A|)$ containing the t-statistics, where $|A|$ represents the number of treatments;
  
- **`p_value` -** a numpy array of shape $(1, |A|)$ containing the corresponding p-values, where $|A|$ represents the number of treatments;
  
- **`CI_left` -** a numpy array of shape $(1, |A|)$ containing the left bounds of the $95$% as. confidence interval;
  
- **`CI_right` -** a numpy array of shape $(1, |A|)$ containing the right bounds of the $95$% as. confidence interval;
  
- **`data` -** the original data provided, stored as a `pandas DataFrame` with the columns `[Y, S, D, G_id, Ng, X]`;
  
- **`lin_adj` -** a `pandas DataFrame` representing the covariates that were used in implementing linear adjustments.

### Empirical Example

Here, we provide the empirical application example using the data from (Chong et al., 2016), who studied the effect of iron deficiency anemia on school-age children's educational attainment and cognitive ability in Peru. The example replicates the empirical illustration from (Bugni et al., 2019). For replication purposes, the data is included in the package and can be accessed by running `AEJapp()`.

``` python
from sreg import sreg, sreg_rgen, AEJapp
```
We can upload the `AEJapp` dataset to the `Python` session via `AEJapp` function:
``` python
data = AEJapp()
```
It is pretty straightforward to prepare the data to fit the package syntax:
``` python
Y = data['gradesq34']
D = data['treatment']
S = data['class_level']
pills = data['pills_taken']
age = data['age_months']
data_clean = pd.DataFrame({'Y': Y, 'D': D, 'S': S, 'pills': pills, 'age': age})

data_clean['D'] = data_clean['D'].apply(lambda x: 0 if x == 3 else x)

Y = data_clean['Y']
D = data_clean['D']
S = data_clean['S']
pills = data_clean['pills']
age = data_clean['age']
X = data_clean[['pills', 'age']]
```
We can take a look at the frequency table of `D` and `S`:
``` python
contingency_table = pd.crosstab(data_clean['D'], data_clean['S'])
print(contingency_table)
S   1   2   3   4   5
D                    
0  15  19  16  12  10
1  16  19  15  10  10
2  17  20  15  11  10
```
Now, it is straightforward to replicate the results from (Bugni et al, 2019) using `sreg`:
``` python
result = sreg(Y = Y, S = S, D = D, G_id = None, Ng = None, X = None, HC1 = True)
print(result) 
```
``` python
Saturated Model Estimation Results under CAR
Observations: 215
Number of treatments: 2
Number of strata: 5
---
Coefficients:
     Tau   As.se   T-stat  P-value  CI.left(95%)  CI.right(95%) Significance
-0.05113 0.20645 -0.24766  0.80440      -0.45577        0.35351             
 0.40903 0.20651  1.98065  0.04763       0.00427        0.81379            *
---
Signif. codes:  0 `***` 0.001 `**` 0.01 `*` 0.05 `.` 0.1 ` ` 1
```
Besides that, `sreg` allows adding linear adjustments (covariates) to the estimation procedure: 
``` python
result = sreg(Y = Y, S = S, D = D, G_id = None, Ng = None, X = X, HC1 = True)
print(result)
Saturated Model Estimation Results under CAR with linear adjustments
Observations: 215
Number of treatments: 2
Number of strata: 5
Covariates used in linear adjustments: pills, age
---
Coefficients:
     Tau   As.se   T-stat  P-value  CI.left(95%)  CI.right(95%) Significance
-0.02862 0.17964 -0.15929  0.87344      -0.38071        0.32348             
 0.34609 0.18362  1.88477  0.05946      -0.01381        0.70598            .
---
Signif. codes:  0 `***` 0.001 `**` 0.01 `*` 0.05 `.` 0.1 ` ` 1
```
## The function `sreg_rgen()`
Generates the observed outcomes, treatment assignments, strata indicators, cluster indicators, cluster sizes, and covariates for estimating the treatment effect following the stratified block randomization design under covariate-adaptive randomization (CAR).

### Syntax
``` python
sreg_rgen(n, Nmax = 50, n_strata = 5,
          tau_vec = [0], gamma_vec = [0.4, 0.2, 1],
          cluster = True, is_cov = True)
```
### Arguments
- **`n (int)` -** The total number of observations in the sample;
- **`Nmax (int)` -** The maximum size of generated clusters (maximum number of observations in a cluster);
- **`n_strata (int)` -** An `integer` specifying the number of strata;
- **`tau_vec (list of float)` -** A `list` of treatment effects of length |A|, where |A| represents the number of treatments;
- **`gamma_vec (list of float)` -** A `list` of three parameters corresponding to covariates;
- **`cluster (bool)` -** A `boolean` indicating whether the data generation process (DGP) should use cluster-level treatment assignment (`True`) or individual-level treatment assignment (`False`);
- **`is.cov (bool)` -** A `boolean` indicating whether the DGP should include covariates (`True`) or not (`False`).

### Return Value
`pd.DataFrame`: A `DataFrame` with `n` observations containing the generated values of the following variables:
- **`Y (pd.Series of float)` -** A numeric Series of length `n` representing the observed outcomes;
- **`S (pd.Series of int)` -** A numeric Series of length `n` representing the strata indicators;
- **`D (pd.Series of int)` -** A numeric Series of length `n` representing the treatment assignments, indexed by $\{0, 1, 2, ...\}$, where `D = 0` denotes the control group;
- **`G_id (pd.Series of int)` -** A numeric Series of length `n` representing the cluster indicators;
- **`Ng (pd.DataFrame)` -** A numeric Series of length `n` representing the cluster indicators;
- **`X (pd.DataFrame)` -** A `DataFrame` with columns representing the covariate values for every observation.

### Example
``` python
from sreg import sreg_rgen
data = sreg_rgen(n = 1000, tau_vec = [0, 0.8], cluster = False, is_cov = True)
print(data)
            Y  S  D        X1        X2
0    4.689501  1  0  7.120830  3.792420
1    3.629002  3  2  3.234888  1.674029
2    0.739461  3  1  4.822114  1.165004
3    0.292031  4  0  4.360900  1.171521
4    0.755504  5  1  6.417946  0.176026
..        ... .. ..       ...       ...
995  1.732812  4  0  3.695054  0.866644
996  2.529121  4  0  5.449032  1.639192
997  2.174121  3  1  4.929872  0.566262
998  2.649385  3  0  3.535942  1.995133
999  4.868684  2  2  7.111149  1.646865
```

## References
Bugni, F. A., Canay, I. A., and Shaikh, A. M. (2018). Inference Under Covariate-Adaptive Randomization. *Journal of the American Statistical Association*, 113(524), 1784–1796, doi:10.1080/01621459.2017.1375934.

Bugni, F., Canay, I., Shaikh, A., and Tabord-Meehan, M. (2024+). Inference for Cluster Randomized Experiments with Non-ignorable Cluster Sizes. *Forthcoming in the Journal of Political Economy: Microeconomics*, doi:10.48550/arXiv.2204.08356.

Jiang, L., Linton, O. B., Tang, H., and Zhang, Y. (2023+). Improving Estimation Efficiency via Regression-Adjustment in Covariate-Adaptive Randomizations with Imperfect Compliance. *Forthcoming in Review of Economics and Statistics*, doi:10.48550/arXiv.2204.08356.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jutrifonov/sreg.py",
    "name": "sreg",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Juri Trifonov, Yuehao Bai, Azeem Shaikh, Max Tabord-Meehan",
    "author_email": "jutrifonov@uchicago.edu",
    "download_url": "https://files.pythonhosted.org/packages/10/51/45f93245c61cc42f57c4fb29a2a6df19c39def67ee0573df2889da2b4dcf/sreg-1.0.1.tar.gz",
    "platform": null,
    "description": "# Stratified Randomized Experiments (Python\u2122 Edition) <img src=\"https://github.com/jutrifonov/sreg.dev/blob/main/logo.png\" align=\"right\" height=\"250\" /></a>\n![version](https://img.shields.io/badge/sreg-v.1.0.0.9000-green?style=flat&logo=github&labelColor=2A2523)\n[![codecov](https://codecov.io/github/jutrifonov/sreg/graph/badge.svg?token=KAUXB0ETCA)](https://app.codecov.io/github/jutrifonov/sreg)\n\nThe `sreg` package offers a toolkit for estimating average treatment effects (ATEs) in stratified randomized experiments. The package is designed to accommodate scenarios with multiple treatments and cluster-level treatment assignments, and accomodates optimal linear covariate adjustment based on baseline observable characteristics. The package computes estimators and standard errors based on Bugni, Canay, Shaikh (2018); Bugni, Canay, Shaikh, Tabord-Meehan (2023); and Jiang, Linton, Tang, Zhang (2023).\n\n**Dependencies:** `numpy`, `pandas`, `scipy`\n\n## Authors\n- Juri Trifonov jutrifonov@uchicago.edu\n\n- Yuehao Bai yuehao.bai@usc.edu\n\n- Azeem Shaikh amshaikh@uchicago.edu\n\n- Max Tabord-Meehan maxtm@uchicago.edu\n\n## Supplementary files \n-  Sketch of the derivation of the ATE variance estimator under cluster-level treatment assignment: [Download PDF](https://github.com/jutrifonov/sreg.dev/blob/main/cluster.pdf)\n\n-  Expressions for the multiple treatment case (with and without clusters): [Download PDF](https://github.com/jutrifonov/sreg.dev/blob/main/multiple.pdf)\n\n## Installation\nThe official released version can be installed from `PyPI`.\n```\npip install sreg\n```\n\nThe latest development version can be installed from `GitHub`. \n```\npip install git+https://github.com/jutrifonov/sreg.py\n```\n\n## The function `sreg()`\nEstimates the ATE(s) and the corresponding standard error(s) for a (collection of) treatment(s) relative to a control.\n\n### Syntax\n```python\nsreg(Y, S = None, D = None, G_id = None, Ng = None, X = None, HC1 = True)\n```\n### Arguments\n- **`Y (float)` -** a `numpy.array` of the observed outcomes;\n- **`S (int)` -** a `numpy.array` of strata indicators $\\\\{0, 1, 2, \\ldots\\\\}$; if `None` then the estimation is performed assuming no stratification;\n- **`D (int)` -** a `numpy.array` of treatments indexed by $\\\\{0, 1, 2, \\ldots\\\\}$, where `D = 0` denotes the control;\n- **`G_id (int)` -** a `numpy.array` of cluster indicators; if `None` then estimation is performed assuming treatment is assigned at the individual level;\n- **`Ng (int)` -** a `numpy.array` of cluster sizes; if `None` then `Ng` is assumed to be equal to the number of available observations in every cluster;\n- **`X (DataFrame)` -** a `pandas.DataFrame` with columns representing the covariate values for every observation; if `None` then the estimator without linear adjustments is applied [^*];\n- **`HC1 (bool)` -** a `True/False` logical argument indicating whether the small sample correction should be applied to the variance estimator.\n[^*]: *Note: sreg cannot use individual-level covariates for covariate adjustment in cluster-randomized experiments. Any individual-level covariates will be aggregated to their cluster-level averages.*\n\n### Data Structure\nHere we provide an example of a data frame that can be used with `sreg`.\n``` r\n|       Y      | S | D | G_id | Ng |     X1     |      X2       |\n|--------------|---|---|------|----|------------|---------------|\n| -0.57773576  | 2 | 0 |  1   | 10 |  1.5597899 |  0.03023334   |\n|  1.69495638  | 2 | 0 |  1   | 10 |  1.5597899 |  0.03023334   |\n|  2.02033740  | 4 | 2 |  2   | 30 |  0.8747419 | -0.77090031   |\n|  1.22020493  | 4 | 2 |  2   | 30 |  0.8747419 | -0.77090031   |\n|  1.64466086  | 4 | 2 |  2   | 30 |  0.8747419 | -0.77090031   |\n| -0.32365109  | 4 | 2 |  2   | 30 |  0.8747419 | -0.77090031   |\n|  2.21008191  | 4 | 2 |  2   | 30 |  0.8747419 | -0.77090031   |\n| -2.25064316  | 4 | 2 |  2   | 30 |  0.8747419 | -0.77090031   |\n|  0.37962312  | 4 | 2 |  2   | 30 |  0.8747419 | -0.77090031   |\n```\n### Summary\n\n`sreg` prints a *\"Stata-style\"* table containing the ATE estimates, corresponding standard errors, $t$-statistics, $p$-values, $95$% asymptotic confidence intervals, and significance indicators for different levels $\\alpha$. The example of the printed output is provided below.\n```python\nSaturated Model Estimation Results under CAR with clusters\nObservations: 24680\nClusters: 1000\nNumber of treatments: 2\nNumber of strata: 5\n---\nCoefficients:\n     Tau   As.se   T-stat  P-value  CI.left(95%)  CI.right(95%) Significance\n-0.03836 0.09008 -0.42585  0.67021      -0.21491        0.13819             \n 0.80719 0.09096  8.87453  0.00000       0.62892        0.98546          ***\n---\nSignif. codes:  0 `***` 0.001 `**` 0.01 `*` 0.05 `.` 0.1 ` ` 1\n```\n### Return Value\n\nReturns an object of class `Sreg` that is a dictionary containing the following elements:\n\n- **`tau_hat` -**  a numpy array of shape $(1, |A|)$ containing the ATE estimates, where $|A|$ represents the number of treatments;\n  \n- **`se_rob` -** a numpy array of shape $(1, |A|)$ containing the standard error estimates, where $|A|$ represents the number of treatments;\n  \n- **`t_stat` -** a numpy array of shape $(1, |A|)$ containing the t-statistics, where $|A|$ represents the number of treatments;\n  \n- **`p_value` -** a numpy array of shape $(1, |A|)$ containing the corresponding p-values, where $|A|$ represents the number of treatments;\n  \n- **`CI_left` -** a numpy array of shape $(1, |A|)$ containing the left bounds of the $95$% as. confidence interval;\n  \n- **`CI_right` -** a numpy array of shape $(1, |A|)$ containing the right bounds of the $95$% as. confidence interval;\n  \n- **`data` -** the original data provided, stored as a `pandas DataFrame` with the columns `[Y, S, D, G_id, Ng, X]`;\n  \n- **`lin_adj` -** a `pandas DataFrame` representing the covariates that were used in implementing linear adjustments.\n\n### Empirical Example\n\nHere, we provide the empirical application example using the data from (Chong et al., 2016), who studied the effect of iron deficiency anemia on school-age children's educational attainment and cognitive ability in Peru. The example replicates the empirical illustration from (Bugni et al., 2019). For replication purposes, the data is included in the package and can be accessed by running `AEJapp()`.\n\n``` python\nfrom sreg import sreg, sreg_rgen, AEJapp\n```\nWe can upload the `AEJapp` dataset to the `Python` session via `AEJapp` function:\n``` python\ndata = AEJapp()\n```\nIt is pretty straightforward to prepare the data to fit the package syntax:\n``` python\nY = data['gradesq34']\nD = data['treatment']\nS = data['class_level']\npills = data['pills_taken']\nage = data['age_months']\ndata_clean = pd.DataFrame({'Y': Y, 'D': D, 'S': S, 'pills': pills, 'age': age})\n\ndata_clean['D'] = data_clean['D'].apply(lambda x: 0 if x == 3 else x)\n\nY = data_clean['Y']\nD = data_clean['D']\nS = data_clean['S']\npills = data_clean['pills']\nage = data_clean['age']\nX = data_clean[['pills', 'age']]\n```\nWe can take a look at the frequency table of `D` and `S`:\n``` python\ncontingency_table = pd.crosstab(data_clean['D'], data_clean['S'])\nprint(contingency_table)\nS   1   2   3   4   5\nD                    \n0  15  19  16  12  10\n1  16  19  15  10  10\n2  17  20  15  11  10\n```\nNow, it is straightforward to replicate the results from (Bugni et al, 2019) using `sreg`:\n``` python\nresult = sreg(Y = Y, S = S, D = D, G_id = None, Ng = None, X = None, HC1 = True)\nprint(result) \n```\n``` python\nSaturated Model Estimation Results under CAR\nObservations: 215\nNumber of treatments: 2\nNumber of strata: 5\n---\nCoefficients:\n     Tau   As.se   T-stat  P-value  CI.left(95%)  CI.right(95%) Significance\n-0.05113 0.20645 -0.24766  0.80440      -0.45577        0.35351             \n 0.40903 0.20651  1.98065  0.04763       0.00427        0.81379            *\n---\nSignif. codes:  0 `***` 0.001 `**` 0.01 `*` 0.05 `.` 0.1 ` ` 1\n```\nBesides that, `sreg` allows adding linear adjustments (covariates) to the estimation procedure: \n``` python\nresult = sreg(Y = Y, S = S, D = D, G_id = None, Ng = None, X = X, HC1 = True)\nprint(result)\nSaturated Model Estimation Results under CAR with linear adjustments\nObservations: 215\nNumber of treatments: 2\nNumber of strata: 5\nCovariates used in linear adjustments: pills, age\n---\nCoefficients:\n     Tau   As.se   T-stat  P-value  CI.left(95%)  CI.right(95%) Significance\n-0.02862 0.17964 -0.15929  0.87344      -0.38071        0.32348             \n 0.34609 0.18362  1.88477  0.05946      -0.01381        0.70598            .\n---\nSignif. codes:  0 `***` 0.001 `**` 0.01 `*` 0.05 `.` 0.1 ` ` 1\n```\n## The function `sreg_rgen()`\nGenerates the observed outcomes, treatment assignments, strata indicators, cluster indicators, cluster sizes, and covariates for estimating the treatment effect following the stratified block randomization design under covariate-adaptive randomization (CAR).\n\n### Syntax\n``` python\nsreg_rgen(n, Nmax = 50, n_strata = 5,\n          tau_vec = [0], gamma_vec = [0.4, 0.2, 1],\n          cluster = True, is_cov = True)\n```\n### Arguments\n- **`n (int)` -** The total number of observations in the sample;\n- **`Nmax (int)` -** The maximum size of generated clusters (maximum number of observations in a cluster);\n- **`n_strata (int)` -** An `integer` specifying the number of strata;\n- **`tau_vec (list of float)` -** A `list` of treatment effects of length |A|, where |A| represents the number of treatments;\n- **`gamma_vec (list of float)` -** A `list` of three parameters corresponding to covariates;\n- **`cluster (bool)` -** A `boolean` indicating whether the data generation process (DGP) should use cluster-level treatment assignment (`True`) or individual-level treatment assignment (`False`);\n- **`is.cov (bool)` -** A `boolean` indicating whether the DGP should include covariates (`True`) or not (`False`).\n\n### Return Value\n`pd.DataFrame`: A `DataFrame` with `n` observations containing the generated values of the following variables:\n- **`Y (pd.Series of float)` -** A numeric Series of length `n` representing the observed outcomes;\n- **`S (pd.Series of int)` -** A numeric Series of length `n` representing the strata indicators;\n- **`D (pd.Series of int)` -** A numeric Series of length `n` representing the treatment assignments, indexed by $\\{0, 1, 2, ...\\}$, where `D = 0` denotes the control group;\n- **`G_id (pd.Series of int)` -** A numeric Series of length `n` representing the cluster indicators;\n- **`Ng (pd.DataFrame)` -** A numeric Series of length `n` representing the cluster indicators;\n- **`X (pd.DataFrame)` -** A `DataFrame` with columns representing the covariate values for every observation.\n\n### Example\n``` python\nfrom sreg import sreg_rgen\ndata = sreg_rgen(n = 1000, tau_vec = [0, 0.8], cluster = False, is_cov = True)\nprint(data)\n            Y  S  D        X1        X2\n0    4.689501  1  0  7.120830  3.792420\n1    3.629002  3  2  3.234888  1.674029\n2    0.739461  3  1  4.822114  1.165004\n3    0.292031  4  0  4.360900  1.171521\n4    0.755504  5  1  6.417946  0.176026\n..        ... .. ..       ...       ...\n995  1.732812  4  0  3.695054  0.866644\n996  2.529121  4  0  5.449032  1.639192\n997  2.174121  3  1  4.929872  0.566262\n998  2.649385  3  0  3.535942  1.995133\n999  4.868684  2  2  7.111149  1.646865\n```\n\n## References\nBugni, F. A., Canay, I. A., and Shaikh, A. M. (2018). Inference Under Covariate-Adaptive Randomization. *Journal of the American Statistical Association*, 113(524), 1784\u20131796, doi:10.1080/01621459.2017.1375934.\n\nBugni, F., Canay, I., Shaikh, A., and Tabord-Meehan, M. (2024+). Inference for Cluster Randomized Experiments with Non-ignorable Cluster Sizes. *Forthcoming in the Journal of Political Economy: Microeconomics*, doi:10.48550/arXiv.2204.08356.\n\nJiang, L., Linton, O. B., Tang, H., and Zhang, Y. (2023+). Improving Estimation Efficiency via Regression-Adjustment in Covariate-Adaptive Randomizations with Imperfect Compliance. *Forthcoming in Review of Economics and Statistics*, doi:10.48550/arXiv.2204.08356.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Stratified Randomized Experiments",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://github.com/jutrifonov/sreg.py"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7d5a07ba2633078949d57dfd3dcba719d3ef408d83c33d0bf59e05b0ae4f3c13",
                "md5": "47d45131c8c6ec09134bfae52b629082",
                "sha256": "f88aa407bb2806de46864597d19c37f8cee2f962c18c659582daaecb7e902330"
            },
            "downloads": -1,
            "filename": "sreg-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "47d45131c8c6ec09134bfae52b629082",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 43159,
            "upload_time": "2024-08-19T22:12:39",
            "upload_time_iso_8601": "2024-08-19T22:12:39.623470Z",
            "url": "https://files.pythonhosted.org/packages/7d/5a/07ba2633078949d57dfd3dcba719d3ef408d83c33d0bf59e05b0ae4f3c13/sreg-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "105145f93245c61cc42f57c4fb29a2a6df19c39def67ee0573df2889da2b4dcf",
                "md5": "2b939b0942532633bc5d60719edf3b41",
                "sha256": "7198d3a37a0a46fba859e084a432f9668420e523ab92be5acb3584f4dbd3765b"
            },
            "downloads": -1,
            "filename": "sreg-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "2b939b0942532633bc5d60719edf3b41",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 39888,
            "upload_time": "2024-08-19T22:12:41",
            "upload_time_iso_8601": "2024-08-19T22:12:41.501516Z",
            "url": "https://files.pythonhosted.org/packages/10/51/45f93245c61cc42f57c4fb29a2a6df19c39def67ee0573df2889da2b4dcf/sreg-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-19 22:12:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jutrifonov",
    "github_project": "sreg.py",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "sreg"
}
        
Elapsed time: 0.29010s