optimalgiv

Name	optimalgiv JSON
Version	0.2.1.post2 JSON
	download
home_page	None
Summary	Python ⇄ Julia bridge for the OptimalGIV package
upload_time	2025-07-17 20:21:51
maintainer	None
docs_url	None
author	Marco Zhang, Julie Z. Fu
requires_python	>=3.9
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage

            # optimalgiv

[![CI](https://github.com/FuZhiyu/optimalgiv/actions/workflows/ci.yml/badge.svg)](https://github.com/FuZhiyu/optimalgiv/actions/workflows/ci.yml)

A Python wrapper for the Julia package [OptimalGIV.jl](https://github.com/FuZhiyu/OptimalGIV.jl)

This wrapper uses [PythonCall.jl](https://github.com/JuliaPy/PythonCall.jl) to call the Julia package directly from Python. Julia is automatically installed and all dependencies are resolved without manual setup. 

**This python package is under active development** The core algorithms are implemented in Julia, and thoroughly tested under simulations, but documentations are working in progress, and bugs may exists for minor features. Feature requests and bug reports are welcomed. 

> This README focuses on the use for Python API.  For more technical documentation, please see [the Julia package](https://github.com/FuZhiyu/OptimalGIV.jl/blob/main/README.md) and the [companion paper](https://fuzhiyu.me/TreasuryGIVPaper/Treasury_GIV_draft.pdf).


## Installation

```python
pip install optimalgiv
```

### First import

The first time you run:

```python
import optimalgiv as og
```

it will:

1. **Install Julia** (if not present; ≈ 1-2 min),
2. **Set up a Julia environment** with `OptimalGIV.jl` and **precompile** (≈ 2–4 min).

Later imports will be much faster (≈ 6–10 s), which is typical for Julia project activation—the environment is compiled once and then reused.


---

## Model Specification

The Granular Instrumental Variables (GIV) model estimated by this package follows the specification:

```math
\begin{aligned}
\left.\begin{array}{c}
\begin{array}{cl}
q_{i,t} & =-p_{t}\times\mathbf{C}_{i,t}'\boldsymbol{\zeta}+\mathbf{X}_{i,t}'\boldsymbol{\beta}+u_{i,t},\\
0 & =\sum_{i}S_{i,t}q_{i,t}
\end{array}\end{array}\right\} \implies & p_{t}=\frac{1}{\mathbf{C}_{S,t}'\boldsymbol{\zeta}}\left[\mathbf{X}_{S,t}'\boldsymbol{\beta}+u_{S,t}\right],
\end{aligned}
```


where:

* $q_{i,t}$ and $p_t$ are endogenous,
* $\mathbf{C}_{i,t}$ is a vector of controls for slopes,
* $\mathbf{X}_{i,t}$ is a vector of controls,
* $\boldsymbol{\zeta}$, $\boldsymbol{\beta}$ are coefficient vectors,
* $u_{i,t}$ is the idiosyncratic shock, and
* $S_{i,t}$ is the weighting variable.

The equilibrium price $p_t$ is derived by imposing the market clearing condition and the model is estimated using the moment condition:

$$
\mathbb{E}[u_{i,t} u_{j,t}] = 0
$$

for all $i \neq j$. This implies orthogonality across sectors' residuals.

---

### Panel Data and Coverage

The GIV model supports unbalanced panel data. However, some estimation algorithms (e.g. "scalar_search" and "debiased_ols") **require complete coverage**, meaning:

$$
\sum_i S_{i,t} q_{i,t} = 0
$$

must hold exactly **within the sample**. This ensures internal consistency of the equilibrium condition. 

If the adding-up constraint is not satisfied, the model will adjust accordingly, but **the interpretation of estimated coefficients should be made with caution**, as residual market imbalances may bias elasticities and standard errors. (See the `complete_coverage` argument below for details.)

---

### Internal PC

Internal PC extractions are supported. With internal PCs, the moment conditions become:

$$
\mathbb E[u_{i,t}u_{j,t}] = \Lambda \Lambda'
$$

where $\Lambda$ is the factor loadings estimated internally using [HeteroPCA.jl](https://github.com/FuZhiyu/HeteroPCA.jl) from $u_{i,t}(z) \equiv q_{i,t} + p_{t}\times\mathbf{C}_{i,t}'\boldsymbol{z}$ at each guess of $z$. 

However, with small samples, the exactly root solving the moment condition may not exist, and users may want to use an minimizer to minimize the error instead. Also, be noted that a model with fully flexible elasticity specification and fully flexible factor loadings is not theoretically identifiable. 


---

## Usage

### Basic Example

```python
import pandas as pd
import numpy as np
from optimalgiv import giv

df = pd.read_csv("./simdata1.csv") # you can find simdata under the git repo examples/
# or simulate using simulate_data below

df['id'] = df['id'].astype('category') # ensure id interactions map to distinct groups

# Define the model formula
formula = "q + id & endog(p) ~ 0 + fe(id) + fe(id) & (η1 + η2)"

# Provide an initial guess (a good guess is critical)
guess = np.ones(5)

# Estimate the model
model = giv(
    df = df,
    formula = "q + id & endog(p) ~ 0 + fe(id) + fe(id) & (η1 + η2)",
    id = "id",
    t = "t",
    weight = "absS",
    algorithm = "iv",
    guess = guess,
    save = 'all', # saves both fixed‐effects (model.fe) and residuals (model.residual_df)
)

# View the result
model.summary()

##                     GIVModel (Aggregate coef: 2.13)                     
## ─────────────────────────────────────────────────────────────────────────
##            Estimate  Std. Error    t-stat  Pr(>|t|)  Lower 95%  Upper 95%
## ─────────────────────────────────────────────────────────────────────────
## id: 1 & p  1.00723     1.30407   0.772377    0.4405  -1.55923    3.57369
## id: 2 & p  1.77335     0.475171  3.73204     0.0002   0.8382     2.70851
## id: 3 & p  1.36863     0.382177  3.58114     0.0004   0.616491   2.12077
## id: 4 & p  3.3846      0.382352  8.85207     <1e-16   2.63212    4.13709
## id: 5 & p  0.619882    0.161687  3.83385     0.0002   0.301676   0.938087


```
---

### Formula Specification

The model formula follows the convention:

```python
q + interactions & endog(p) ~ exog_controls + pc(k)
```

Where:

* `q`: **Response variable** (e.g., quantity).
* `endog(p)`: **Endogenous variable** (e.g., price). Must appear on the **left-hand side**.

  > **Note:** A *positive* estimated coefficient implies a *negative* response of `q` to `p` (i.e., a downward-sloping demand curve).
* `interactions`: Exogenous variables used to parameterize **heterogeneous elasticities**, such as entity identifiers or group characteristics.
* `exog_controls`: Exogenous control variables. Supports **fixed effects** (e.g., `fe(id)`) using the same syntax as `FixedEffectModels.jl`.
* `pc(k)`: Principal component extraction with `k` factors (optional). When specified, `k` common factors are extracted from residuals using HeteroPCA.jl 


#### Examples of formulas:

```
# Homogeneous elasticity with entity-specific loadings (estimated) and fixed effects (absorbed)
formula = "q + endog(p) ~ id & η + fe(id)"

# Heterogeneous elasticity by entity
formula = "q + id & endog(p) ~ id & η + fe(id)"

# Multiple interactions
formula = "q + id & endog(p) + category & endog(p) ~ fe(id) & η1 + η2"

formula = "q + id & endog(p) ~ 0 + id & η"

# With PC extraction (2 factors)
formula = "q + endog(p) ~ 0 + pc(2)"

# exogneous controls with PC extraction
formula = "q + endog(p) ~ fe(id) & η1 + pc(3)"
```
---

### Key Function: `giv()`
```python
giv(df, formula: str, id: str, t: str, weight: str, **kwargs) -> GIVModel
```

#### Required Arguments

* `df`: `pandas.DataFrame` containing panel data. **Must be balanced** for some algorithms (e.g., `scalar_search`).
* `formula`: A **string** representing the model (Julia-style formula syntax). See examples above.
* `id`: Name of the column identifying entities (e.g., `"firm_id"`).
* `t`: Name of the time variable column.
* `weight`: Name of the weight/size column (e.g., market shares `S_i,t`).

#### Keyword Arguments (Optional)

* `algorithm`: One of `"iv"` (default), `"iv_twopass"`, `"debiased_ols"`, or `"scalar_search"`.
* `guess`: Initial guess for ζ coefficients. (See below for usage details)
* `exclude_pairs`: Dictionary excluding pairs from moment conditions.
    * Example: `{1: [2, 3], 4: [5]}` excludes entity pair with code (1,2), (1,3), and (4,5) from the moment conditions entering the estimation. 
* `quiet`: Set `True` to suppress warnings and info messages.
* `save`: `"none"` (default), `"residuals"`, `"fe"`, or `"all"` — controls what is stored on the returned model:

  * `"none"`: neither residuals nor fixed-effects are saved
  * `"residuals"`: saves residuals in `model.residual_df`
  * `"fe"`: saves fixed-effects in `model.fe`
  * `"all"`: saves both `model.residual_df` and `model.fe`

* `save_df`: If `True`, the full estimation dataframe (with residuals, coefficients, fixed effects) is stored in `model.df`.
* `complete_coverage`: Whether the dataset **covers the full market in each time period**, meaning
$\sum_i S_{i,t} q_{i,t} = 0$ holds exactly within the sample.

  * Default is `None`, which triggers auto-detection: the model checks this condition period-by-period and sets the flag to `True` or `False` accordingly.
  * If the condition does not hold (`False`), you can still force estimation by setting `quiet=True`, but results may be biased. Use with caution.
  * Required for `"scalar_search"` and `"debiased_ols"` algorithms.

* `return_vcov`: Whether to compute and return the variance–covariance matrices. (default: `True`)
* `tol`: Convergence tolerance for the solver (: `1e-6`)
* `iterations`: Maximum number of solver iterations (: `100`)
* `pca_option`: Dictionary of options for PC extraction when using `pc(k)` in formula:
  * `'algorithm'`: HeteroPCA algorithm - `DeflatedHeteroPCA`, `'StandardHeteroPCA'`, or `'DiagonalDeletion'`
  * `'impute_method'`: `'zero'` or `'pairwise'` for handling missing values (default: `'zero'`)
  * `'demean'`: Whether to demean data before PCA (default: `False`)
  * `'maxiter'`: Maximum iterations for PCA algorithm (default: `100`)

#### Advanced keyword arguments (Optional; Use with caution)

* **`solver_options`** (`Dict[str, Any]`)
  Extra options passed to the nonlinear system solver from [`NLsolve.jl`](https://github.com/JuliaNLSolvers/NLsolve.jl).
  The Python dict is converted to a Julia `NamedTuple` with keyword-style arguments.
  Common options include:

  * `"method"`: `"newton"` , `"anderson"`, `"trust_region"`, etc.
  * `"ftol"`: absolute residual tolerance
  * `"xtol"`: absolute solution tolerance
  * `"iterations"`: max iterations
  * `"show_trace"`: verbose output

  **Example:**

  ```python
  solver_opts = {
      "method": "newton",
      "ftol": 1e-8,
      "xtol": 1e-8,
      "iterations": 1000,
      "show_trace": True,
  }

  model = giv(df, formula, id="id", t="t", solver_options=solver_opts)
  ```

  For the full list of options, see the [NLsolve.jl documentation](https://docs.sciml.ai/NonlinearSolve/stable/api/nlsolve/).
---

### Algorithms

The package implements four algorithms for GIV estimation:

1. **`"iv"`**  
   - Default, recommended  
   - Uses moment condition $$\(\mathbb{E}[u_i\,u_{S,-i}]=0\)$$  
   - $$O(N)\$$ implementation  
   - Supports `exclude_pairs` (exclude certain pairs $E[u_i u_j] = 0$ from the moment conditions)
   - Supports flexible elasticity specs, unbalanced panels  

2. **`"iv_twopass"`**: Numerically identical to `iv` but uses a more straightforward O(N²) implementation with two passes over entity pairs. This is useful for:
   - Debugging purposes
   - When the O(N) optimization in `iv` might cause numerical issues
   - When there are many pairs to be excluded, which will slow down the algorithm in `iv`
   - Understanding the computational flow of the moment conditions 

5. **`"debiased_ols"`**  
   - Uses $$\mathbb{E}[u_iC_{it}p_{it}] = \sigma_i^2 / \zeta_{St}$$
   - Requires **complete market coverage**  
   - More efficient but restrictive  

6. **`"scalar_search"`**  
   - Finds a single aggregate elasticity  
   - Requires **balanced panel, constant weights, complete coverage** 
   - Useful for diagnostics or initial-guess formation  

---

### Initial Guesses

A good guess is key to stable estimation. If omitted, OLS‐based defaults will typically fail. Examples:

```python
import numpy as np
from optimalgiv import giv
# 1) Scalar guess (for homogeneous elasticity)
guess = 1.0
model1 = giv(
    df,
    "q + endog(p) ~ n1 + fe(id)",
    id="id", t="t", weight="S",
    guess=guess
)

# 2) Dict by group name (heterogeneous by id)
guess = {"id": [1.2, 0.8]}
model2 = giv(
    df,
    "q + id & endog(p) ~ 1",
    id="id", t="t", weight="S",
    guess=guess
)

# 3) Dict for multiple interactions
guess = {
    "id": [1.0, 0.9],
    "n1": [0.5, 0.3]
}
model3 = giv(
    df,
    "q + id & endog(p) + n1 & endog(p) ~ fe(id)",
    id="id", t="t", weight="S",
    guess=guess
)

# 4) Dict keyed by exact coefnames
names = model3.coefnames
guess = {name: 0.1 for name in names}
model4 = giv(
    df,
    "q + id & endog(p) + n1 & endog(p) ~ fe(id)",
    id="id", t="t", weight="S",
    guess=guess
)

# 5) Scalar-search with heterogeneous formula
guess = {"Aggregate": 2.5}
model5 = giv(
    df,
    "q + id & endog(p) ~ 0 + fe(id) + fe(id)&(n1 + n2)",
    id="id", t="t", weight="S",
    algorithm="scalar_search",
    guess=guess
)

# 6) Use estimated ζ from model5 as initial guess
guess = model5.endog_coef
model6 = giv(
    df,
    "q + id & endog(p) ~ 0 + fe(id) + fe(id)&(n1 + n2)",
    id="id", t="t", weight="S",
    guess=guess
)

```
---

### Principal Components (PC) in Formulas

The package supports extracting principal components from residuals to capture unobserved factors:

```python
# Add pc(k) to the formula to extract k principal components
model = giv(
    df,
    "q + endog(p) ~ fe(id) + pc(2)",  # Extract 2 PCs from residuals
    id="id", t="t", weight="S",
    save_df=True  # Needed to access PC factors/loadings in df
)

# Access PC results
model.n_pcs          # Number of PCs extracted
model.pc_factors     # k×T matrix of time factors
model.pc_loadings    # N×k matrix of entity loadings
model.pc_model       # HeteroPCAModel object with details
```

#### Internal PCA

Internal PC extractions are supported. With internal PCs, the moment conditions become $\mathbb E[u_{i,t}u_{j,t}] = \Lambda \Lambda'$, where $\Lambda$ is the factor loadings estimated internally using [HeteroPCA.jl](https://github.com/FuZhiyu/HeteroPCA.jl) from $u_{i,t}(z) \equiv q_{i,t} + p_{t}\times\mathbf{C}_{i,t}'\boldsymbol{z}$ at each guess of $z$. However, following caveats apply:

- With internal PC extraction, the weighting scheme is no longer optimal as it does not consider the covariance in the moment conditions due to common factor estimation. The standard error formula also no longer applies and hence was not returned. One can consider bootstrapping for statistical inference; 

- In small samples, the exactly root solving the moment condition may not exist, and users may want to use an minimizer to minimize the error instead. 

- A model with fully flexible elasticity specification and fully flexible internal factor loadings is not theoretically identifiable. Hence, one needs to assume certain level of homogeneity to estimate factors internally. 


You can customize the PC extraction algorithm using the `pca_option` parameter:

```python
# Example with custom PCA options
model = giv(
    df,
    "q + id & endog(p) ~ X + pc(3)",
    id="id", t="t", weight="S",
    pca_option={
        # Preferred: let the wrapper build the constructor for you
        'algorithm': 'DeflatedHeteroPCA',
        'algorithm_options': dict(
            t_block=20,
            condition_number_threshold=5.0,
        ),

        'impute_method': 'zero',   # auto-converted to :zero
        'demean': False,
        'maxiter': 200,
    }
)

```

Available algorithms:
- `'algorithm': 'DeflatedHeteroPCA', which supports additional 'algorithm_options': {'t_block': 10, 'condition_number_threshold': 4.0}`: Deflated algorithm with adaptive block sizing
- `'algorithm': 'StandardHeteroPCA'`: Standard iterative algorithm
- `'algorithm': 'DiagonalDeletion'`: Single-step diagonal deletion method

When `save_df=True`, PC factors and loadings are added to the saved dataframe with columns like `pc_factor_1`, `pc_factor_2`, `pc_loading_1`, etc.

---


### Working with Results

```python
# Methods
model.summary()            # ▶ print full Julia-style summary
model.residuals()          # ▶ numpy array of the residuals for each observation
model.confint(level=0.95)  # ▶ (n×2) array of confidence intervals
model.coeftable(level=0.95)# ▶ pandas.DataFrame of estimates, SEs, t-stats, p-values

# Fields
model.endog_coef           # ▶ numpy array of ζ coefficients
model.exog_coef            # ▶ numpy array of β coefficients
model.agg_coef             # ▶ float: aggregate elasticity
model.endog_vcov           # ▶ VCOV of ζ coefficients
model.exog_vcov            # ▶ VCOV of β coefficients
model.nobs                 # ▶ int: number of observations
model.dof_residual         # ▶ int: residual degrees of freedom
model.formula              # ▶ str: Julia-style formula
model.formula_schema       # ▶ str: the internal schema of the Julia‐style formula after parsing
model.residual_variance    # ▶ numpy array of the estimated variance of the residuals for each entity (ûᵢ’s variance)
model.N                    # ▶ int: the number of cross‐section entities in the panel
model.T                    # ▶ int: the number of time periods per entity in the panel
model.dof                  # ▶ int: the total number of estimated parameters (length of ζ plus length of β)
model.responsename         # ▶ str: the name of the response variable(s)
model.converged            # ▶ bool: solver convergence status
model.endog_coefnames      # ▶ list[str]: ζ coefficient names
model.exog_coefnames       # ▶ list[str]: β coefficient names
model.idvar                # ▶ str: entity identifier column name
model.tvar                 # ▶ str: time identifier column name
model.weightvar            # ▶ str or None: weight column name
model.exclude_pairs        # ▶ dict: excluded moment-condition pairs
model.n_pcs                # ▶ int: number of principal components extracted
model.pc_factors           # ▶ numpy array (k×T) of PC time factors (if pc(k) used)
model.pc_loadings          # ▶ numpy array (N×k) of PC entity loadings (if pc(k) used)
model.pc_model             # ▶ HeteroPCAModel object with PC details (if pc(k) used)
model.coefdf               # ▶ pandas.DataFrame of entity-specific coefficients
model.fe                   # ▶ pandas.DataFrame of fixed-effects and fixed-effect interaction with exogenous controls (if saved) 
model.residual_df          # ▶ pandas.DataFrame of residuals (if saved)
model.df                   # ▶ pandas.DataFrame of full estimation output (if save_df=True)
model.coef                 # ▶ numpy array of [ζ; β]
model.vcov                 # ▶ full (ζ+β) variance–covariance matrix
model.stderror             # ▶ numpy array of standard errors
model.coefnames            # ▶ list[str]: names of all coefficients (ζ then β)
```
#### Entity-specific Coefficients DataFrame (coefdf)
The `model.coefdf` field provides a convenient way to access and report coefficients organized by categorical variables (e.g., by sector, entity, or other groupings). This DataFrame contains:

* All categorical variable values used in the model (e.g., entity IDs, sectors)
* Estimated coefficients for each term in the formula, stored in columns named `<term>_coef`
* Fixed effect estimates and fixed effect interaction with exogenous controls(if `save = 'fe'` or `save = 'all'` was specified)

Example:
```python

# Using the estimated model above as an example
print(model.coefdf)
# id  id & p_coef     fe_id  fe_id&η1  fe_id&η2
# 1     1.007234  0.770445 -0.075198  0.905689
# 2     1.773353 -0.376699  0.452851  0.825657
# 3     1.368630 -0.827939 -1.033757 -0.512825
# 4     3.384603 -0.275443  1.348865   1.37676
# 5     0.619882 -0.419348  0.663217  1.108182

```
---
## Simulation
The package includes utilities for Monte Carlo simulations using the `simulate_data` function:

```python
from optimalgiv import simulate_data, SimParam

# Generate simulated panel datasets
simulated_dfs = simulate_data(
    params = SimParam(
        N=20,      # Number of entities
        T=50,      # Time periods
        K=3,       # Number of factors
        M=0.7,     # Aggregate elasticity
        sigma_zeta=0.5  # Elasticity dispersion
    ),
    nsims=1,      # Number of simulations
    seed=123      # Random seed
)

# Use the first dataset
df = simulated_dfs[0]
```

### Simulation Parameters
The `SimParam` class accepts the following parameters:

| Parameter     | Description | Default |
|---------------|-------------|---------|
| `N`           | Number of entities | 10 |
| `T`           | Number of time periods | 100 |
| `K`           | Number of common factors | 2 |
| `M`           | Aggregate price elasticity | 0.5 |
| `sigma_zeta`  | Standard deviation of entity elasticities | 1.0 |
| `sigma_p`     | Price volatility to target | 2.0 |
| `h`           | Excess HHI for size distribution | 0.2 |
| `ushare`      | Share of price variation from idiosyncratic shocks | 0.2 (if K>0) |
| `sigma_u_curv`| Curvature for size-dependent volatility | 0.1 |
| `nu`          | Degrees of freedom for t-distribution (Inf = Normal) | np.inf |
| `missingperc` | Percentage of missing values | 0.0 |

### Data Generating Process
The simulated data follows this economic model:

```math
\begin{align}
q_{it} &= u_{it} + \Lambda_i \cdot \eta_t - \zeta_i \cdot p_t \\
p_t &= M \cdot \sum_i S_i \cdot (u_{it} + \Lambda_i \cdot \eta_t)
\end{align}
```

Where:
- `q_it`: Quantity for entity i at time t
- `p_t`: Price (common across entities at time t)
- `u_it`: Idiosyncratic shocks
- `η_t`: Common factors
- `Λ_i`: Factor loadings
- `ζ_i`: Entity-specific elasticities
- `S_i`: Entity size/weights

Entity sizes follow a power law distribution calibrated to match the target excess HHI (`h`).

### Output DataFrame
Each simulation returns a pandas DataFrame with columns:
- `id`: Entity identifier
- `t`: Time period
- `q`: Quantity (response variable)
- `p`: Price (endogenous regressor)
- `S`: Entity size/weight
- `ζ`: True entity-specific elasticity
- `η1, η2, ...`: Common factor realizations
- `λ1, λ2, ...`: Entity-specific factor loadings

---

## Limitations
- **PC extraction limitations**: Only `iv` and `iv_twopass` algorithms support internal PC extraction. The `debiased_ols` and `scalar_search` algorithms do not support PC extraction.
- **Variance-covariance matrix**: When PC extraction is used (pc(k) in formula), the variance-covariance matrix calculation is automatically disabled as it is not correct. One should consider bootstrapping instead.
- **Time fixed effects** are not supported directly, but one can use a single factor pc(1) instead.
- Some algorithms require **balanced panels**.
- The `debiased_ols` and `scalar_search` algorithms require **complete market coverage**

---

## To-do List
- Expose `build_error_function` interface.

---

## References

Please cite:

- Gabaix, Xavier, and Ralph S.J. Koijen. Granular Instrumental Variables. Journal of Political Economy, 132(7), 2024, pp. 2274–2303.
- Chaudhary, Manav, Zhiyu Fu, and Haonan Zhou. Anatomy of the Treasury Market: Who Moves Yields? Available at SSRN: https://ssrn.com/abstract=5021055

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "optimalgiv",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Marco Zhang, Julie Z. Fu",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/7e/08/67e1f01f2e3c9df3824694e44754e9cf1da8fec4a45eee975044f96002d8/optimalgiv-0.2.1.post2.tar.gz",
    "platform": null,
    "description": "# optimalgiv\n\n[![CI](https://github.com/FuZhiyu/optimalgiv/actions/workflows/ci.yml/badge.svg)](https://github.com/FuZhiyu/optimalgiv/actions/workflows/ci.yml)\n\nA Python wrapper for the Julia package [OptimalGIV.jl](https://github.com/FuZhiyu/OptimalGIV.jl)\n\nThis wrapper uses [PythonCall.jl](https://github.com/JuliaPy/PythonCall.jl) to call the Julia package directly from Python. Julia is automatically installed and all dependencies are resolved without manual setup. \n\n**This python package is under active development** The core algorithms are implemented in Julia, and thoroughly tested under simulations, but documentations are working in progress, and bugs may exists for minor features. Feature requests and bug reports are welcomed. \n\n> This README focuses on the use for Python API.  For more technical documentation, please see [the Julia package](https://github.com/FuZhiyu/OptimalGIV.jl/blob/main/README.md) and the [companion paper](https://fuzhiyu.me/TreasuryGIVPaper/Treasury_GIV_draft.pdf).\n\n\n## Installation\n\n```python\npip install optimalgiv\n```\n\n### First import\n\nThe first time you run:\n\n```python\nimport optimalgiv as og\n```\n\nit will:\n\n1. **Install Julia** (if not present; \u2248 1-2 min),\n2. **Set up a Julia environment** with `OptimalGIV.jl` and **precompile** (\u2248 2\u20134 min).\n\nLater imports will be much faster (\u2248 6\u201310 s), which is typical for Julia project activation\u2014the environment is compiled once and then reused.\n\n\n---\n\n## Model Specification\n\nThe Granular Instrumental Variables (GIV) model estimated by this package follows the specification:\n\n```math\n\\begin{aligned}\n\\left.\\begin{array}{c}\n\\begin{array}{cl}\nq_{i,t} & =-p_{t}\\times\\mathbf{C}_{i,t}'\\boldsymbol{\\zeta}+\\mathbf{X}_{i,t}'\\boldsymbol{\\beta}+u_{i,t},\\\\\n0 & =\\sum_{i}S_{i,t}q_{i,t}\n\\end{array}\\end{array}\\right\\} \\implies & p_{t}=\\frac{1}{\\mathbf{C}_{S,t}'\\boldsymbol{\\zeta}}\\left[\\mathbf{X}_{S,t}'\\boldsymbol{\\beta}+u_{S,t}\\right],\n\\end{aligned}\n```\n\n\nwhere:\n\n* $q_{i,t}$ and $p_t$ are endogenous,\n* $\\mathbf{C}_{i,t}$ is a vector of controls for slopes,\n* $\\mathbf{X}_{i,t}$ is a vector of controls,\n* $\\boldsymbol{\\zeta}$, $\\boldsymbol{\\beta}$ are coefficient vectors,\n* $u_{i,t}$ is the idiosyncratic shock, and\n* $S_{i,t}$ is the weighting variable.\n\nThe equilibrium price $p_t$ is derived by imposing the market clearing condition and the model is estimated using the moment condition:\n\n$$\n\\mathbb{E}[u_{i,t} u_{j,t}] = 0\n$$\n\nfor all $i \\neq j$. This implies orthogonality across sectors' residuals.\n\n---\n\n### Panel Data and Coverage\n\nThe GIV model supports unbalanced panel data. However, some estimation algorithms (e.g. \"scalar_search\" and \"debiased_ols\") **require complete coverage**, meaning:\n\n$$\n\\sum_i S_{i,t} q_{i,t} = 0\n$$\n\nmust hold exactly **within the sample**. This ensures internal consistency of the equilibrium condition. \n\nIf the adding-up constraint is not satisfied, the model will adjust accordingly, but **the interpretation of estimated coefficients should be made with caution**, as residual market imbalances may bias elasticities and standard errors. (See the `complete_coverage` argument below for details.)\n\n---\n\n### Internal PC\n\nInternal PC extractions are supported. With internal PCs, the moment conditions become:\n\n$$\n\\mathbb E[u_{i,t}u_{j,t}] = \\Lambda \\Lambda'\n$$\n\nwhere $\\Lambda$ is the factor loadings estimated internally using [HeteroPCA.jl](https://github.com/FuZhiyu/HeteroPCA.jl) from $u_{i,t}(z) \\equiv q_{i,t} + p_{t}\\times\\mathbf{C}_{i,t}'\\boldsymbol{z}$ at each guess of $z$. \n\nHowever, with small samples, the exactly root solving the moment condition may not exist, and users may want to use an minimizer to minimize the error instead. Also, be noted that a model with fully flexible elasticity specification and fully flexible factor loadings is not theoretically identifiable. \n\n\n---\n\n## Usage\n\n### Basic Example\n\n```python\nimport pandas as pd\nimport numpy as np\nfrom optimalgiv import giv\n\ndf = pd.read_csv(\"./simdata1.csv\") # you can find simdata under the git repo examples/\n# or simulate using simulate_data below\n\ndf['id'] = df['id'].astype('category') # ensure id interactions map to distinct groups\n\n# Define the model formula\nformula = \"q + id & endog(p) ~ 0 + fe(id) + fe(id) & (\u03b71 + \u03b72)\"\n\n# Provide an initial guess (a good guess is critical)\nguess = np.ones(5)\n\n# Estimate the model\nmodel = giv(\n    df = df,\n    formula = \"q + id & endog(p) ~ 0 + fe(id) + fe(id) & (\u03b71 + \u03b72)\",\n    id = \"id\",\n    t = \"t\",\n    weight = \"absS\",\n    algorithm = \"iv\",\n    guess = guess,\n    save = 'all', # saves both fixed\u2010effects (model.fe) and residuals (model.residual_df)\n)\n\n# View the result\nmodel.summary()\n\n##                     GIVModel (Aggregate coef: 2.13)                     \n## \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n##            Estimate  Std. Error    t-stat  Pr(>|t|)  Lower 95%  Upper 95%\n## \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n## id: 1 & p  1.00723     1.30407   0.772377    0.4405  -1.55923    3.57369\n## id: 2 & p  1.77335     0.475171  3.73204     0.0002   0.8382     2.70851\n## id: 3 & p  1.36863     0.382177  3.58114     0.0004   0.616491   2.12077\n## id: 4 & p  3.3846      0.382352  8.85207     <1e-16   2.63212    4.13709\n## id: 5 & p  0.619882    0.161687  3.83385     0.0002   0.301676   0.938087\n\n\n```\n---\n\n### Formula Specification\n\nThe model formula follows the convention:\n\n```python\nq + interactions & endog(p) ~ exog_controls + pc(k)\n```\n\nWhere:\n\n* `q`: **Response variable** (e.g., quantity).\n* `endog(p)`: **Endogenous variable** (e.g., price). Must appear on the **left-hand side**.\n\n  > **Note:** A *positive* estimated coefficient implies a *negative* response of `q` to `p` (i.e., a downward-sloping demand curve).\n* `interactions`: Exogenous variables used to parameterize **heterogeneous elasticities**, such as entity identifiers or group characteristics.\n* `exog_controls`: Exogenous control variables. Supports **fixed effects** (e.g., `fe(id)`) using the same syntax as `FixedEffectModels.jl`.\n* `pc(k)`: Principal component extraction with `k` factors (optional). When specified, `k` common factors are extracted from residuals using HeteroPCA.jl \n\n\n#### Examples of formulas:\n\n```\n# Homogeneous elasticity with entity-specific loadings (estimated) and fixed effects (absorbed)\nformula = \"q + endog(p) ~ id & \u03b7 + fe(id)\"\n\n# Heterogeneous elasticity by entity\nformula = \"q + id & endog(p) ~ id & \u03b7 + fe(id)\"\n\n# Multiple interactions\nformula = \"q + id & endog(p) + category & endog(p) ~ fe(id) & \u03b71 + \u03b72\"\n\nformula = \"q + id & endog(p) ~ 0 + id & \u03b7\"\n\n# With PC extraction (2 factors)\nformula = \"q + endog(p) ~ 0 + pc(2)\"\n\n# exogneous controls with PC extraction\nformula = \"q + endog(p) ~ fe(id) & \u03b71 + pc(3)\"\n```\n---\n\n### Key Function: `giv()`\n```python\ngiv(df, formula: str, id: str, t: str, weight: str, **kwargs) -> GIVModel\n```\n\n#### Required Arguments\n\n* `df`: `pandas.DataFrame` containing panel data. **Must be balanced** for some algorithms (e.g., `scalar_search`).\n* `formula`: A **string** representing the model (Julia-style formula syntax). See examples above.\n* `id`: Name of the column identifying entities (e.g., `\"firm_id\"`).\n* `t`: Name of the time variable column.\n* `weight`: Name of the weight/size column (e.g., market shares `S_i,t`).\n\n#### Keyword Arguments (Optional)\n\n* `algorithm`: One of `\"iv\"` (default), `\"iv_twopass\"`, `\"debiased_ols\"`, or `\"scalar_search\"`.\n* `guess`: Initial guess for \u03b6 coefficients. (See below for usage details)\n* `exclude_pairs`: Dictionary excluding pairs from moment conditions.\n    * Example: `{1: [2, 3], 4: [5]}` excludes entity pair with code (1,2), (1,3), and (4,5) from the moment conditions entering the estimation. \n* `quiet`: Set `True` to suppress warnings and info messages.\n* `save`: `\"none\"` (default), `\"residuals\"`, `\"fe\"`, or `\"all\"` \u2014 controls what is stored on the returned model:\n\n  * `\"none\"`: neither residuals nor fixed-effects are saved\n  * `\"residuals\"`: saves residuals in `model.residual_df`\n  * `\"fe\"`: saves fixed-effects in `model.fe`\n  * `\"all\"`: saves both `model.residual_df` and `model.fe`\n\n* `save_df`: If `True`, the full estimation dataframe (with residuals, coefficients, fixed effects) is stored in `model.df`.\n* `complete_coverage`: Whether the dataset **covers the full market in each time period**, meaning\n$\\sum_i S_{i,t} q_{i,t} = 0$ holds exactly within the sample.\n\n  * Default is `None`, which triggers auto-detection: the model checks this condition period-by-period and sets the flag to `True` or `False` accordingly.\n  * If the condition does not hold (`False`), you can still force estimation by setting `quiet=True`, but results may be biased. Use with caution.\n  * Required for `\"scalar_search\"` and `\"debiased_ols\"` algorithms.\n\n* `return_vcov`: Whether to compute and return the variance\u2013covariance matrices. (default: `True`)\n* `tol`: Convergence tolerance for the solver (: `1e-6`)\n* `iterations`: Maximum number of solver iterations (: `100`)\n* `pca_option`: Dictionary of options for PC extraction when using `pc(k)` in formula:\n  * `'algorithm'`: HeteroPCA algorithm - `DeflatedHeteroPCA`, `'StandardHeteroPCA'`, or `'DiagonalDeletion'`\n  * `'impute_method'`: `'zero'` or `'pairwise'` for handling missing values (default: `'zero'`)\n  * `'demean'`: Whether to demean data before PCA (default: `False`)\n  * `'maxiter'`: Maximum iterations for PCA algorithm (default: `100`)\n\n#### Advanced keyword arguments (Optional; Use with caution)\n\n* **`solver_options`** (`Dict[str, Any]`)\n  Extra options passed to the nonlinear system solver from [`NLsolve.jl`](https://github.com/JuliaNLSolvers/NLsolve.jl).\n  The Python dict is converted to a Julia `NamedTuple` with keyword-style arguments.\n  Common options include:\n\n  * `\"method\"`: `\"newton\"` , `\"anderson\"`, `\"trust_region\"`, etc.\n  * `\"ftol\"`: absolute residual tolerance\n  * `\"xtol\"`: absolute solution tolerance\n  * `\"iterations\"`: max iterations\n  * `\"show_trace\"`: verbose output\n\n  **Example:**\n\n  ```python\n  solver_opts = {\n      \"method\": \"newton\",\n      \"ftol\": 1e-8,\n      \"xtol\": 1e-8,\n      \"iterations\": 1000,\n      \"show_trace\": True,\n  }\n\n  model = giv(df, formula, id=\"id\", t=\"t\", solver_options=solver_opts)\n  ```\n\n  For the full list of options, see the [NLsolve.jl documentation](https://docs.sciml.ai/NonlinearSolve/stable/api/nlsolve/).\n---\n\n### Algorithms\n\nThe package implements four algorithms for GIV estimation:\n\n1. **`\"iv\"`**  \n   - Default, recommended  \n   - Uses moment condition $$\\(\\mathbb{E}[u_i\\,u_{S,-i}]=0\\)$$  \n   - $$O(N)\\$$ implementation  \n   - Supports `exclude_pairs` (exclude certain pairs $E[u_i u_j] = 0$ from the moment conditions)\n   - Supports flexible elasticity specs, unbalanced panels  \n\n2. **`\"iv_twopass\"`**: Numerically identical to `iv` but uses a more straightforward O(N\u00b2) implementation with two passes over entity pairs. This is useful for:\n   - Debugging purposes\n   - When the O(N) optimization in `iv` might cause numerical issues\n   - When there are many pairs to be excluded, which will slow down the algorithm in `iv`\n   - Understanding the computational flow of the moment conditions \n\n5. **`\"debiased_ols\"`**  \n   - Uses $$\\mathbb{E}[u_iC_{it}p_{it}] = \\sigma_i^2 / \\zeta_{St}$$\n   - Requires **complete market coverage**  \n   - More efficient but restrictive  \n\n6. **`\"scalar_search\"`**  \n   - Finds a single aggregate elasticity  \n   - Requires **balanced panel, constant weights, complete coverage** \n   - Useful for diagnostics or initial-guess formation  \n\n---\n\n### Initial Guesses\n\nA good guess is key to stable estimation. If omitted, OLS\u2010based defaults will typically fail. Examples:\n\n```python\nimport numpy as np\nfrom optimalgiv import giv\n# 1) Scalar guess (for homogeneous elasticity)\nguess = 1.0\nmodel1 = giv(\n    df,\n    \"q + endog(p) ~ n1 + fe(id)\",\n    id=\"id\", t=\"t\", weight=\"S\",\n    guess=guess\n)\n\n# 2) Dict by group name (heterogeneous by id)\nguess = {\"id\": [1.2, 0.8]}\nmodel2 = giv(\n    df,\n    \"q + id & endog(p) ~ 1\",\n    id=\"id\", t=\"t\", weight=\"S\",\n    guess=guess\n)\n\n# 3) Dict for multiple interactions\nguess = {\n    \"id\": [1.0, 0.9],\n    \"n1\": [0.5, 0.3]\n}\nmodel3 = giv(\n    df,\n    \"q + id & endog(p) + n1 & endog(p) ~ fe(id)\",\n    id=\"id\", t=\"t\", weight=\"S\",\n    guess=guess\n)\n\n# 4) Dict keyed by exact coefnames\nnames = model3.coefnames\nguess = {name: 0.1 for name in names}\nmodel4 = giv(\n    df,\n    \"q + id & endog(p) + n1 & endog(p) ~ fe(id)\",\n    id=\"id\", t=\"t\", weight=\"S\",\n    guess=guess\n)\n\n# 5) Scalar-search with heterogeneous formula\nguess = {\"Aggregate\": 2.5}\nmodel5 = giv(\n    df,\n    \"q + id & endog(p) ~ 0 + fe(id) + fe(id)&(n1 + n2)\",\n    id=\"id\", t=\"t\", weight=\"S\",\n    algorithm=\"scalar_search\",\n    guess=guess\n)\n\n# 6) Use estimated \u03b6 from model5 as initial guess\nguess = model5.endog_coef\nmodel6 = giv(\n    df,\n    \"q + id & endog(p) ~ 0 + fe(id) + fe(id)&(n1 + n2)\",\n    id=\"id\", t=\"t\", weight=\"S\",\n    guess=guess\n)\n\n```\n---\n\n### Principal Components (PC) in Formulas\n\nThe package supports extracting principal components from residuals to capture unobserved factors:\n\n```python\n# Add pc(k) to the formula to extract k principal components\nmodel = giv(\n    df,\n    \"q + endog(p) ~ fe(id) + pc(2)\",  # Extract 2 PCs from residuals\n    id=\"id\", t=\"t\", weight=\"S\",\n    save_df=True  # Needed to access PC factors/loadings in df\n)\n\n# Access PC results\nmodel.n_pcs          # Number of PCs extracted\nmodel.pc_factors     # k\u00d7T matrix of time factors\nmodel.pc_loadings    # N\u00d7k matrix of entity loadings\nmodel.pc_model       # HeteroPCAModel object with details\n```\n\n#### Internal PCA\n\nInternal PC extractions are supported. With internal PCs, the moment conditions become $\\mathbb E[u_{i,t}u_{j,t}] = \\Lambda \\Lambda'$, where $\\Lambda$ is the factor loadings estimated internally using [HeteroPCA.jl](https://github.com/FuZhiyu/HeteroPCA.jl) from $u_{i,t}(z) \\equiv q_{i,t} + p_{t}\\times\\mathbf{C}_{i,t}'\\boldsymbol{z}$ at each guess of $z$. However, following caveats apply:\n\n- With internal PC extraction, the weighting scheme is no longer optimal as it does not consider the covariance in the moment conditions due to common factor estimation. The standard error formula also no longer applies and hence was not returned. One can consider bootstrapping for statistical inference; \n\n- In small samples, the exactly root solving the moment condition may not exist, and users may want to use an minimizer to minimize the error instead. \n\n- A model with fully flexible elasticity specification and fully flexible internal factor loadings is not theoretically identifiable. Hence, one needs to assume certain level of homogeneity to estimate factors internally. \n\n\nYou can customize the PC extraction algorithm using the `pca_option` parameter:\n\n```python\n# Example with custom PCA options\nmodel = giv(\n    df,\n    \"q + id & endog(p) ~ X + pc(3)\",\n    id=\"id\", t=\"t\", weight=\"S\",\n    pca_option={\n        # Preferred: let the wrapper build the constructor for you\n        'algorithm': 'DeflatedHeteroPCA',\n        'algorithm_options': dict(\n            t_block=20,\n            condition_number_threshold=5.0,\n        ),\n\n        'impute_method': 'zero',   # auto-converted to :zero\n        'demean': False,\n        'maxiter': 200,\n    }\n)\n\n```\n\nAvailable algorithms:\n- `'algorithm': 'DeflatedHeteroPCA', which supports additional 'algorithm_options': {'t_block': 10, 'condition_number_threshold': 4.0}`: Deflated algorithm with adaptive block sizing\n- `'algorithm': 'StandardHeteroPCA'`: Standard iterative algorithm\n- `'algorithm': 'DiagonalDeletion'`: Single-step diagonal deletion method\n\nWhen `save_df=True`, PC factors and loadings are added to the saved dataframe with columns like `pc_factor_1`, `pc_factor_2`, `pc_loading_1`, etc.\n\n---\n\n\n### Working with Results\n\n```python\n# Methods\nmodel.summary()            # \u25b6 print full Julia-style summary\nmodel.residuals()          # \u25b6 numpy array of the residuals for each observation\nmodel.confint(level=0.95)  # \u25b6 (n\u00d72) array of confidence intervals\nmodel.coeftable(level=0.95)# \u25b6 pandas.DataFrame of estimates, SEs, t-stats, p-values\n\n# Fields\nmodel.endog_coef           # \u25b6 numpy array of \u03b6 coefficients\nmodel.exog_coef            # \u25b6 numpy array of \u03b2 coefficients\nmodel.agg_coef             # \u25b6 float: aggregate elasticity\nmodel.endog_vcov           # \u25b6 VCOV of \u03b6 coefficients\nmodel.exog_vcov            # \u25b6 VCOV of \u03b2 coefficients\nmodel.nobs                 # \u25b6 int: number of observations\nmodel.dof_residual         # \u25b6 int: residual degrees of freedom\nmodel.formula              # \u25b6 str: Julia-style formula\nmodel.formula_schema       # \u25b6 str: the internal schema of the Julia\u2010style formula after parsing\nmodel.residual_variance    # \u25b6 numpy array of the estimated variance of the residuals for each entity (u\u0302\u1d62\u2019s variance)\nmodel.N                    # \u25b6 int: the number of cross\u2010section entities in the panel\nmodel.T                    # \u25b6 int: the number of time periods per entity in the panel\nmodel.dof                  # \u25b6 int: the total number of estimated parameters (length of \u03b6 plus length of \u03b2)\nmodel.responsename         # \u25b6 str: the name of the response variable(s)\nmodel.converged            # \u25b6 bool: solver convergence status\nmodel.endog_coefnames      # \u25b6 list[str]: \u03b6 coefficient names\nmodel.exog_coefnames       # \u25b6 list[str]: \u03b2 coefficient names\nmodel.idvar                # \u25b6 str: entity identifier column name\nmodel.tvar                 # \u25b6 str: time identifier column name\nmodel.weightvar            # \u25b6 str or None: weight column name\nmodel.exclude_pairs        # \u25b6 dict: excluded moment-condition pairs\nmodel.n_pcs                # \u25b6 int: number of principal components extracted\nmodel.pc_factors           # \u25b6 numpy array (k\u00d7T) of PC time factors (if pc(k) used)\nmodel.pc_loadings          # \u25b6 numpy array (N\u00d7k) of PC entity loadings (if pc(k) used)\nmodel.pc_model             # \u25b6 HeteroPCAModel object with PC details (if pc(k) used)\nmodel.coefdf               # \u25b6 pandas.DataFrame of entity-specific coefficients\nmodel.fe                   # \u25b6 pandas.DataFrame of fixed-effects and fixed-effect interaction with exogenous controls (if saved) \nmodel.residual_df          # \u25b6 pandas.DataFrame of residuals (if saved)\nmodel.df                   # \u25b6 pandas.DataFrame of full estimation output (if save_df=True)\nmodel.coef                 # \u25b6 numpy array of [\u03b6; \u03b2]\nmodel.vcov                 # \u25b6 full (\u03b6+\u03b2) variance\u2013covariance matrix\nmodel.stderror             # \u25b6 numpy array of standard errors\nmodel.coefnames            # \u25b6 list[str]: names of all coefficients (\u03b6 then \u03b2)\n```\n#### Entity-specific Coefficients DataFrame (coefdf)\nThe `model.coefdf` field provides a convenient way to access and report coefficients organized by categorical variables (e.g., by sector, entity, or other groupings). This DataFrame contains:\n\n* All categorical variable values used in the model (e.g., entity IDs, sectors)\n* Estimated coefficients for each term in the formula, stored in columns named `<term>_coef`\n* Fixed effect estimates and fixed effect interaction with exogenous controls(if `save = 'fe'` or `save = 'all'` was specified)\n\nExample:\n```python\n\n# Using the estimated model above as an example\nprint(model.coefdf)\n# id  id & p_coef     fe_id  fe_id&\u03b71  fe_id&\u03b72\n# 1     1.007234  0.770445 -0.075198  0.905689\n# 2     1.773353 -0.376699  0.452851  0.825657\n# 3     1.368630 -0.827939 -1.033757 -0.512825\n# 4     3.384603 -0.275443  1.348865   1.37676\n# 5     0.619882 -0.419348  0.663217  1.108182\n\n```\n---\n## Simulation\nThe package includes utilities for Monte Carlo simulations using the `simulate_data` function:\n\n```python\nfrom optimalgiv import simulate_data, SimParam\n\n# Generate simulated panel datasets\nsimulated_dfs = simulate_data(\n    params = SimParam(\n        N=20,      # Number of entities\n        T=50,      # Time periods\n        K=3,       # Number of factors\n        M=0.7,     # Aggregate elasticity\n        sigma_zeta=0.5  # Elasticity dispersion\n    ),\n    nsims=1,      # Number of simulations\n    seed=123      # Random seed\n)\n\n# Use the first dataset\ndf = simulated_dfs[0]\n```\n\n### Simulation Parameters\nThe `SimParam` class accepts the following parameters:\n\n| Parameter     | Description | Default |\n|---------------|-------------|---------|\n| `N`           | Number of entities | 10 |\n| `T`           | Number of time periods | 100 |\n| `K`           | Number of common factors | 2 |\n| `M`           | Aggregate price elasticity | 0.5 |\n| `sigma_zeta`  | Standard deviation of entity elasticities | 1.0 |\n| `sigma_p`     | Price volatility to target | 2.0 |\n| `h`           | Excess HHI for size distribution | 0.2 |\n| `ushare`      | Share of price variation from idiosyncratic shocks | 0.2 (if K>0) |\n| `sigma_u_curv`| Curvature for size-dependent volatility | 0.1 |\n| `nu`          | Degrees of freedom for t-distribution (Inf = Normal) | np.inf |\n| `missingperc` | Percentage of missing values | 0.0 |\n\n### Data Generating Process\nThe simulated data follows this economic model:\n\n```math\n\\begin{align}\nq_{it} &= u_{it} + \\Lambda_i \\cdot \\eta_t - \\zeta_i \\cdot p_t \\\\\np_t &= M \\cdot \\sum_i S_i \\cdot (u_{it} + \\Lambda_i \\cdot \\eta_t)\n\\end{align}\n```\n\nWhere:\n- `q_it`: Quantity for entity i at time t\n- `p_t`: Price (common across entities at time t)\n- `u_it`: Idiosyncratic shocks\n- `\u03b7_t`: Common factors\n- `\u039b_i`: Factor loadings\n- `\u03b6_i`: Entity-specific elasticities\n- `S_i`: Entity size/weights\n\nEntity sizes follow a power law distribution calibrated to match the target excess HHI (`h`).\n\n### Output DataFrame\nEach simulation returns a pandas DataFrame with columns:\n- `id`: Entity identifier\n- `t`: Time period\n- `q`: Quantity (response variable)\n- `p`: Price (endogenous regressor)\n- `S`: Entity size/weight\n- `\u03b6`: True entity-specific elasticity\n- `\u03b71, \u03b72, ...`: Common factor realizations\n- `\u03bb1, \u03bb2, ...`: Entity-specific factor loadings\n\n---\n\n## Limitations\n- **PC extraction limitations**: Only `iv` and `iv_twopass` algorithms support internal PC extraction. The `debiased_ols` and `scalar_search` algorithms do not support PC extraction.\n- **Variance-covariance matrix**: When PC extraction is used (pc(k) in formula), the variance-covariance matrix calculation is automatically disabled as it is not correct. One should consider bootstrapping instead.\n- **Time fixed effects** are not supported directly, but one can use a single factor pc(1) instead.\n- Some algorithms require **balanced panels**.\n- The `debiased_ols` and `scalar_search` algorithms require **complete market coverage**\n\n---\n\n## To-do List\n- Expose `build_error_function` interface.\n\n---\n\n## References\n\nPlease cite:\n\n- Gabaix, Xavier, and Ralph S.J. Koijen. Granular Instrumental Variables. Journal of Political Economy, 132(7), 2024, pp. 2274\u20132303.\n- Chaudhary, Manav, Zhiyu Fu, and Haonan Zhou. Anatomy of the Treasury Market: Who Moves Yields? Available at SSRN: https://ssrn.com/abstract=5021055\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python \u21c4 Julia bridge for the OptimalGIV package",
    "version": "0.2.1.post2",
    "project_urls": {
        "Source": "https://github.com/FuZhiyu/optimalgiv"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "fb5172bb548d0e12f11ba787305bb54e65fcc89d9a70f58eec3d331d0a0010d5",
                "md5": "6b96b1f4a92fb4e17d2fbcd7a4866139",
                "sha256": "a69478ab26bf896f37c17ab2c3bd3603ff1302eef934679d9b7c0ad47db79a0b"
            },
            "downloads": -1,
            "filename": "optimalgiv-0.2.1.post2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6b96b1f4a92fb4e17d2fbcd7a4866139",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 18764,
            "upload_time": "2025-07-17T20:21:50",
            "upload_time_iso_8601": "2025-07-17T20:21:50.290279Z",
            "url": "https://files.pythonhosted.org/packages/fb/51/72bb548d0e12f11ba787305bb54e65fcc89d9a70f58eec3d331d0a0010d5/optimalgiv-0.2.1.post2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7e0867e1f01f2e3c9df3824694e44754e9cf1da8fec4a45eee975044f96002d8",
                "md5": "452dc507aa88a03f852cf9158f7e86f4",
                "sha256": "fdf63bdbab126332b30efaf6cdacacd665892f01b1730188f58889fb15fb9f53"
            },
            "downloads": -1,
            "filename": "optimalgiv-0.2.1.post2.tar.gz",
            "has_sig": false,
            "md5_digest": "452dc507aa88a03f852cf9158f7e86f4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 30357,
            "upload_time": "2025-07-17T20:21:51",
            "upload_time_iso_8601": "2025-07-17T20:21:51.625874Z",
            "url": "https://files.pythonhosted.org/packages/7e/08/67e1f01f2e3c9df3824694e44754e9cf1da8fec4a45eee975044f96002d8/optimalgiv-0.2.1.post2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-17 20:21:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "FuZhiyu",
    "github_project": "optimalgiv",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "optimalgiv"
}

Marco Zhang, Julie Z. Fu