# A non-iterated general implementation of the LPLS estimator for cOLS, TM, and custom cases
The program implements the **LPLS** (linear programming through least squares) estimator with the help of the Moore-Penrose inverse (pseudoinverse), calculated using *singular value decomposition (SVD)*, with emphasis on the estimation of OLS constrained in values (**cOLS**), Transaction Matrix (**TM**), and **custom** (user-defined) cases. The pseudoinverse offers a
unique minimum-norm least-squares solution, which is the best linear unbiased estimator (BLUE); see Albert (1972, Chapter VI). (Over)determined problems are accompanied by *regression* analysis, which is feasible in their case. For such and especially all remaining cases, a Monte Carlo-based
*ttest* of mean **NRMSE** (normalized by the standard deviation of the RHS) is performed, the sample being drawn from a uniform or user-provided distribution (via a *Python function*).
OLS constrained in values (**cOLS**) is an estimation based on constraints in the model and/or data but not in parameters. Typically, such models are of size ≤ **kN**, where **N** is the number of observations, since the number of their constraints may vary in the LHS (e.g., level, derivatives, etc.).
**Example of a cOLS problem:**
*Estimate the trend and the cyclical component of a country's GDP given the textbook or any other definition of its peaks, troughs, and saddles. For a pre-LPLS approach to this problem, see (Bolotov, 2014).*
Transaction Matrix (**TM**) of size (**M x N**) is a formal model of interaction (allocation, assignment, etc.) between **M** and **N** elements in any imaginable system, such as intercompany transactions (netting tables), industries within/between economies (input-output tables), cross-border trade/investment (trade/investment matrices), etc., where **row** and **column sums** are known, but **individual elements** of the TM may not be:
- a netting table is a type of **TM** where **M = N** and the elements are subsidiaries of a MNC;
- an input-output table (IOT) is a type of **TM** where **M = N** and the elements are industries;
- a matrix of trade/investment is a type of **TM** where **M = N** and the elements are countries or (macro)regions, where diagonal elements may be equal to zero;
- a country-product matrix is a type of **TM** where **M ≠ N** and the elements are of different types;
...
**Example of a TM problem:**
*Estimate the matrix of trade/investment with/without zero diagonal elements, the country shares in which are unknown. For a pre-LPLS approach to this problem, see (Bolotov, 2015).*
## Methods and formulas:
The LP problem in the **LPLS** estimator is a matrix equation **`a @ x = b`**, loosely based on the structure of the Simplex tableau, where **`a`** consists of coefficients for CONSTRAINTS, LP-type CHARACTERISTIC and/or SPECIFIC, and for SLACK/SURPLUS VARIABLES (the upper part) as well as for the MODEL (the lower part), as illustrated in Figure 1. Each part of **`a`** can be omitted to accommodate a particular case:
- **cOLS** problems require SPECIFIC CONSTRAINTS, no LP-type CHARACTERISTIC CONSTRAINTS, and a MODEL;
- **TM** requires LP-type CHARACTERISTIC CONSTRAINTS, no SPECIFIC CONSTRAINTS, and an optional MODEL;
- SLACK/SURPLUS VARIABLES are included only for inequality constraints and should be set to **1** or **-1**;
...
**Figure 1: Matrix equation `a @ x = b`**
| `a` || `b` |
| :-: |:-:| :-: |
| CONSTRAINTS: CHARACTERISTIC/SPECIFIC | SL/SU VARIABLES | CONSTRAINTS |
| MODEL || MODEL |
Source: self-prepared
The solution to the equation, **`x = pinv(a) @ b`**, is estimated with the help of SVD and is a **minimum-norm least-squares generalized solution** if the rank of **`a`** is not full. To check if **`a`** is within the computational limits, its (maximum) dimensions can be calculated using the formulas:
- **(k \* N) x (K + K\*)** **cOLS** - without slack/surplus variables;
- **(k \* N) x (K + K\* + l)** **cOLS** - with slack/surplus variables;
- **(M + N) x (M \* N)** **TM** - without slack/surplus variables;
- **(M + N) x (M \* N + l**) **TM** - with slack/surplus variables;
- **M x N** **custom** - without slack/surplus variables;
- **M x (N + l)** **custom** - with slack/surplus variables;
where, in **cOLS** problems, **K** is the number of independent variables in the model (including the constant), **K\*** is the number of eventual extra variables in CONSTRAINTS, and **N** is the number of observations; in **TM**, **M** and **N** are the dimensions of the matrix; and in **custom** cases, **M** and **N** or **M x (N + l)** are the dimensions of **`a`**.
## Parameters:
**Type of the LP problem**
- `cols=False` : *bool*, OLS constrained in values (**cOLS**)
- `tm=False` : *bool*, Transaction Matrix (**TM**), options **cols** and **tm** are mutually exclusive (not specifying any of them equals **custom**)
- `rhs=np.empty((0,0))` : *array_like*, right-hand side of the problem (rowsums first for TM problems)
**Constructing the LHS**
- `model=np.empty((0,0))` : *array_like*, the MODEL part of **`a`**, including an eventual user-provided constant as a matrix column in **cOLS** (see Methods and formulas)
- `constraints=np.empty((0,0))` : *array_like*, the CONSTRAINTS part of **`a`**
- `slackvars=np.empty((0,0))` : *array_like*, the SLACK/SURPLUS VARIABLES part of **`a`**
- `zero_diagonal=False` : *bool*, set all the diagonal elements of **`a`** to 0
**SVD-based estimation**
- `tolerance=None` : *float*, scipy.linalg.lstsq's cutoff for `small' singular values; used to determine effective rank of a. Singular values smaller than cond \* largest_singular_value are considered zero
- `level=95` : *int*, confidence level (by default: **95**)
**Monte-Carlo-based t-test**
- `seed=123456789` : *int*, random-number seed, # is any number between 0 and 2^31-1 (or 2,147,483,647) (by default: **123456789**)
- `iterate=300` : *int*, number of iterations, # must be divisible by 50 (by default: **300**)
- `distribution=runiform`: *fucntion*, random-variable generating function, name of an earlier declared Python object returning an *array_like* **(r x c)** with two arguments, real scalars **r** and **c** (by default: **lppinv.runiform**, see Examples on how to pass np.random.uniform to **lppinv**)
- `mc=True` : *bool*, skip the Monte Carlo-based t-test
- `trace=True`: *bool*, hide any output with the exception of dots
## Results:
**`class lppinv.LPpinvResult(`**
- `OLSResults` : *statsmodels.regression.linear_model.OLSResults* or *None*, statsmodels' results class for for an OLS model
- `TtestResult` : *(scipy.stats._result_classes.TtestResult, scipy.stats._result_classes.TtestResult, scipy.stats._result_classes.TtestResult)* or *(None, None, None)*, scipy's result of a t-test
- `solution` : *array_like*, the solution obtained from the **LPLS** estimator
- `a` : *array_like*, the **`a`** matrix
- `nrmse` : *float*, root mean square error normalized by the standard deviation of **`b'**
- `r2_c` : *float* or *np.nan*, R-squared for CONSTRAINTS in TM
**)**
## Errors:
- `LPpinvError` : *lppinv.LPpinvError*, specific Python-exception-derived object raised by lppinv functions
- `LinAlgError` : *scipy.linalg.LinAlgError*, generic Python-exception-derived object raised by linalg functions
## Examples:
```
import numpy as np
import lppinv as lp
# cOLS problem
print('\n', lp.solve(
cols=True,
rhs=np.array([[0, 0, 0, 0, -1, 0.2, 0.9, 2.1]]).T,
constraints=np.array([np.ones(4), [0, 1, 2, 3], [0, 5, 2, 8]]).T,
model=np.array([np.ones(4), [0, 1, 2, 3], [0, 5, 2, 8]]).T,
slackvars=np.array([[-1, 1, -1, 0]]).T,
mc=False
))
# TM problem
def rnormal(r=None, c=None):
return (np.random.normal(loc=0.0, scale=1.0, size=(r, c)))
print('\n', lp.solve(
tm=True,
rhs=np.array([[4, 5, 3, 4, 6], [1, 2, 0, np.nan, np.nan]]).T,
zero_diagonal=True,
distribution=rnormal
))
# custom problem
print('\n', lp.solve(
rhs=np.array([[2, 3, 9], [5, 7, 9]]).T,
model=np.vstack([[0, 1, 1], [1, 0, 1], [1, 1, 0]]),
tolerance=10**-10,
trace=False
))
```
## References:
1. Albert, A., 1972. *Regression And The Moore-Penrose Pseudoinverse.* New York: Academic Press.
2. Bolotov, I. 2014. *Modeling of Time Series Cyclical Component on a Defined Set of Stationary Points and its Application on the US Business Cycle. [Paper presentation]. The 8th International Days of Statistics and Economics: Prague.* https://msed.vse.cz/msed_2014/article/348-Bolotov-Ilya-paper.pdf
3. Bolotov, I. 2015. *Modeling Bilateral Flows in Economics by Means of Exact Mathematical Methods.* [Paper presentation]. The 9th International Days of Statistics and Economics: Prague. https://msed.vse.cz/msed_2015/article/111-Bolotov-Ilya-paper.pdf
**PS** Please also check the Web of Science (WoS) for new research on LPLS.
Raw data
{
"_id": null,
"home_page": "https://github.com/econcz/lppinv",
"name": "lppinv",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "estimator,linear programming,least squares,OLS constrained in values,transaction matrix,custom,pseudoinverse,singular value decomposition,numpy,scipy,statsmodels",
"author": "econcz",
"author_email": "29724411+econcz@users.noreply.github.com",
"download_url": "https://files.pythonhosted.org/packages/ba/72/088878855e240f7117edf1e3c258fdbd6ef4e4047bf3cb2881b7af649926/lppinv-0.4.2.tar.gz",
"platform": null,
"description": "# A non-iterated general implementation of the LPLS estimator for cOLS, TM, and custom cases\n\nThe program implements the **LPLS** (linear programming through least squares) estimator with the help of the Moore-Penrose inverse (pseudoinverse), calculated using *singular value decomposition (SVD)*, with emphasis on the estimation of OLS constrained in values (**cOLS**), Transaction Matrix (**TM**), and **custom** (user-defined) cases. The pseudoinverse offers a\nunique minimum-norm least-squares solution, which is the best linear unbiased estimator (BLUE); see Albert (1972, Chapter VI). (Over)determined problems are accompanied by *regression* analysis, which is feasible in their case. For such and especially all remaining cases, a Monte Carlo-based\n*ttest* of mean **NRMSE** (normalized by the standard deviation of the RHS) is performed, the sample being drawn from a uniform or user-provided distribution (via a *Python function*).\n\nOLS constrained in values (**cOLS**) is an estimation based on constraints in the model and/or data but not in parameters. Typically, such models are of size \u2264 **kN**, where **N** is the number of observations, since the number of their constraints may vary in the LHS (e.g., level, derivatives, etc.).\n\n**Example of a cOLS problem:** \n*Estimate the trend and the cyclical component of a country's GDP given the textbook or any other definition of its peaks, troughs, and saddles. For a pre-LPLS approach to this problem, see (Bolotov, 2014).*\n\nTransaction Matrix (**TM**) of size (**M x N**) is a formal model of interaction (allocation, assignment, etc.) between **M** and **N** elements in any imaginable system, such as intercompany transactions (netting tables), industries within/between economies (input-output tables), cross-border trade/investment (trade/investment matrices), etc., where **row** and **column sums** are known, but **individual elements** of the TM may not be: \n\n- a netting table is a type of **TM** where **M = N** and the elements are subsidiaries of a MNC;\n- an input-output table (IOT) is a type of **TM** where **M = N** and the elements are industries;\n- a matrix of trade/investment is a type of **TM** where **M = N** and the elements are countries or (macro)regions, where diagonal elements may be equal to zero;\n- a country-product matrix is a type of **TM** where **M \u2260 N** and the elements are of different types; \n...\n\n**Example of a TM problem:** \n*Estimate the matrix of trade/investment with/without zero diagonal elements, the country shares in which are unknown. For a pre-LPLS approach to this problem, see (Bolotov, 2015).*\n\n## Methods and formulas:\nThe LP problem in the **LPLS** estimator is a matrix equation **`a @ x = b`**, loosely based on the structure of the Simplex tableau, where **`a`** consists of coefficients for CONSTRAINTS, LP-type CHARACTERISTIC and/or SPECIFIC, and for SLACK/SURPLUS VARIABLES (the upper part) as well as for the MODEL (the lower part), as illustrated in Figure 1. Each part of **`a`** can be omitted to accommodate a particular case: \n\n- **cOLS** problems require SPECIFIC CONSTRAINTS, no LP-type CHARACTERISTIC CONSTRAINTS, and a MODEL; \n- **TM** requires LP-type CHARACTERISTIC CONSTRAINTS, no SPECIFIC CONSTRAINTS, and an optional MODEL; \n- SLACK/SURPLUS VARIABLES are included only for inequality constraints and should be set to **1** or **-1**; \n...\n\n**Figure 1: Matrix equation `a @ x = b`**\n\n| `a` || `b` |\n| :-: |:-:| :-: |\n| CONSTRAINTS: CHARACTERISTIC/SPECIFIC | SL/SU VARIABLES | CONSTRAINTS |\n| MODEL || MODEL |\n\nSource: self-prepared\n\nThe solution to the equation, **`x = pinv(a) @ b`**, is estimated with the help of SVD and is a **minimum-norm least-squares generalized solution** if the rank of **`a`** is not full. To check if **`a`** is within the computational limits, its (maximum) dimensions can be calculated using the formulas: \n\n- **(k \\* N) x (K + K\\*)** **cOLS** - without slack/surplus variables;\n- **(k \\* N) x (K + K\\* + l)** **cOLS** - with slack/surplus variables;\n- **(M + N) x (M \\* N)** **TM** - without slack/surplus variables;\n- **(M + N) x (M \\* N + l**) **TM** - with slack/surplus variables;\n- **M x N** **custom** - without slack/surplus variables;\n- **M x (N + l)** **custom** - with slack/surplus variables;\n\nwhere, in **cOLS** problems, **K** is the number of independent variables in the model (including the constant), **K\\*** is the number of eventual extra variables in CONSTRAINTS, and **N** is the number of observations; in **TM**, **M** and **N** are the dimensions of the matrix; and in **custom** cases, **M** and **N** or **M x (N + l)** are the dimensions of **`a`**.\n\n## Parameters:\n**Type of the LP problem** \n\n- `cols=False` : *bool*, OLS constrained in values (**cOLS**) \n- `tm=False` : *bool*, Transaction Matrix (**TM**), options **cols** and **tm** are mutually exclusive (not specifying any of them equals **custom**) \n- `rhs=np.empty((0,0))` : *array_like*, right-hand side of the problem (rowsums first for TM problems)\n\n**Constructing the LHS** \n\n- `model=np.empty((0,0))` : *array_like*, the MODEL part of **`a`**, including an eventual user-provided constant as a matrix column in **cOLS** (see Methods and formulas) \n- `constraints=np.empty((0,0))` : *array_like*, the CONSTRAINTS part of **`a`** \n- `slackvars=np.empty((0,0))` : *array_like*, the SLACK/SURPLUS VARIABLES part of **`a`** \n- `zero_diagonal=False` : *bool*, set all the diagonal elements of **`a`** to 0 \n\n**SVD-based estimation** \n\n- `tolerance=None` : *float*, scipy.linalg.lstsq's cutoff for `small' singular values; used to determine effective rank of a. Singular values smaller than cond \\* largest_singular_value are considered zero\n- `level=95` : *int*, confidence level (by default: **95**)\n\n**Monte-Carlo-based t-test** \n\n- `seed=123456789` : *int*, random-number seed, # is any number between 0 and 2^31-1 (or 2,147,483,647) (by default: **123456789**) \n- `iterate=300` : *int*, number of iterations, # must be divisible by 50 (by default: **300**) \n- `distribution=runiform`: *fucntion*, random-variable generating function, name of an earlier declared Python object returning an *array_like* **(r x c)** with two arguments, real scalars **r** and **c** (by default: **lppinv.runiform**, see Examples on how to pass np.random.uniform to **lppinv**) \n- `mc=True` : *bool*, skip the Monte Carlo-based t-test \n- `trace=True`: *bool*, hide any output with the exception of dots\n\n## Results:\n\n**`class lppinv.LPpinvResult(`** \n\n- `OLSResults` : *statsmodels.regression.linear_model.OLSResults* or *None*, statsmodels' results class for for an OLS model \n- `TtestResult` : *(scipy.stats._result_classes.TtestResult, scipy.stats._result_classes.TtestResult, scipy.stats._result_classes.TtestResult)* or *(None, None, None)*, scipy's result of a t-test \n- `solution` : *array_like*, the solution obtained from the **LPLS** estimator \n- `a` : *array_like*, the **`a`** matrix \n- `nrmse` : *float*, root mean square error normalized by the standard deviation of **`b'**\n- `r2_c` : *float* or *np.nan*, R-squared for CONSTRAINTS in TM \n\n**)**\n\n## Errors:\n\n- `LPpinvError` : *lppinv.LPpinvError*, specific Python-exception-derived object raised by lppinv functions \n- `LinAlgError` : *scipy.linalg.LinAlgError*, generic Python-exception-derived object raised by linalg functions \n\n## Examples:\n```\nimport numpy as np\nimport lppinv as lp\n\n# cOLS problem\nprint('\\n', lp.solve(\n cols=True,\n rhs=np.array([[0, 0, 0, 0, -1, 0.2, 0.9, 2.1]]).T,\n constraints=np.array([np.ones(4), [0, 1, 2, 3], [0, 5, 2, 8]]).T,\n model=np.array([np.ones(4), [0, 1, 2, 3], [0, 5, 2, 8]]).T,\n slackvars=np.array([[-1, 1, -1, 0]]).T,\n mc=False\n))\n\n# TM problem\ndef rnormal(r=None, c=None):\n return (np.random.normal(loc=0.0, scale=1.0, size=(r, c)))\n\nprint('\\n', lp.solve(\n tm=True,\n rhs=np.array([[4, 5, 3, 4, 6], [1, 2, 0, np.nan, np.nan]]).T,\n zero_diagonal=True,\n distribution=rnormal\n))\n\n# custom problem\nprint('\\n', lp.solve(\n rhs=np.array([[2, 3, 9], [5, 7, 9]]).T,\n model=np.vstack([[0, 1, 1], [1, 0, 1], [1, 1, 0]]),\n tolerance=10**-10,\n trace=False\n))\n```\n\n## References:\n\n1. Albert, A., 1972. *Regression And The Moore-Penrose Pseudoinverse.* New York: Academic Press. \n\n2. Bolotov, I. 2014. *Modeling of Time Series Cyclical Component on a Defined Set of Stationary Points and its Application on the US Business Cycle. [Paper presentation]. The 8th International Days of Statistics and Economics: Prague.* https://msed.vse.cz/msed_2014/article/348-Bolotov-Ilya-paper.pdf \n\n3. Bolotov, I. 2015. *Modeling Bilateral Flows in Economics by Means of Exact Mathematical Methods.* [Paper presentation]. The 9th International Days of Statistics and Economics: Prague. https://msed.vse.cz/msed_2015/article/111-Bolotov-Ilya-paper.pdf \n\n**PS** Please also check the Web of Science (WoS) for new research on LPLS.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A non-iterated general implementation of the LPLS estimator for cOLS, TM, and custom cases",
"version": "0.4.2",
"project_urls": {
"Download": "https://github.com/econcz/lppinv/archive/pypi-0_4_2.tar.gz",
"Homepage": "https://github.com/econcz/lppinv"
},
"split_keywords": [
"estimator",
"linear programming",
"least squares",
"ols constrained in values",
"transaction matrix",
"custom",
"pseudoinverse",
"singular value decomposition",
"numpy",
"scipy",
"statsmodels"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ba72088878855e240f7117edf1e3c258fdbd6ef4e4047bf3cb2881b7af649926",
"md5": "fcc76a80fb8b47489fc3d878114c482c",
"sha256": "131988b281f59844f62efcbba5c722113f74c3874c8fce9af76e8252296b1d48"
},
"downloads": -1,
"filename": "lppinv-0.4.2.tar.gz",
"has_sig": false,
"md5_digest": "fcc76a80fb8b47489fc3d878114c482c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 8889,
"upload_time": "2024-02-17T18:08:12",
"upload_time_iso_8601": "2024-02-17T18:08:12.117911Z",
"url": "https://files.pythonhosted.org/packages/ba/72/088878855e240f7117edf1e3c258fdbd6ef4e4047bf3cb2881b7af649926/lppinv-0.4.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-17 18:08:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "econcz",
"github_project": "lppinv",
"github_not_found": true,
"lcname": "lppinv"
}