# Economic Complexity and Product Complexity
By the Growth Lab at Harvard's Center for International Development
This package is part of Harvard Growth Lab’s portfolio of software packages, digital products and interactive data visualizations. To browse our entire portfolio, please visit [growthlab.app](growthlab.app). To learn more about our research, please visit [Harvard Growth Lab’s home page](https://growthlab.cid.harvard.edu/).
# About
Python package to calculate economic complexity indices.
STATA implementation of the economic complexity index available at: <https://github.com/cid-harvard/ecomplexity>
Explore complexity and associated data using Harvard CID's Atlas tool: <http://atlas.cid.harvard.edu>
## Tutorial
**Installation**:
For the latest stable version: `pip install ecomplexity`
Latest version of the package under development (untested and possibly with bugs), install directly from GitHub:
`pip install git+https://github.com/cid-harvard/py-ecomplexity@develop`
**Usage**:
```python
from ecomplexity import ecomplexity
from ecomplexity import proximity
# Import trade data from CID Atlas
data_url = "https://intl-atlas-downloads.s3.amazonaws.com/country_hsproduct2digit_year.csv.zip"
data = pd.read_csv(data_url, compression="zip", low_memory=False)
data = data[['year','location_code','hs_product_code','export_value']]
# Calculate complexity
trade_cols = {'time':'year', 'loc':'location_code', 'prod':'hs_product_code', 'val':'export_value'}
cdata = ecomplexity(data, trade_cols)
# Calculate proximity matrix
prox_df = proximity(data, trade_cols)
```
**Arguments**:
```text
data: pandas dataframe containing production / trade data.
Including variables indicating time, location, product and value
cols_input: dict of column names for time, location, product and value.
Example: {'time':'year', 'loc':'origin', 'prod':'hs92', 'val':'export_val'}
presence_test: str for test used for presence of industry in location.
One of "rca" (default), "rpop", or "manual".
Determines which values are used for M_cp calculations.
If "manual", M_cp is taken as given from the "value" column in data
val_errors_flag: {'coerce','ignore','raise'}. Passed to pd.to_numeric
*default* coerce.
rca_mcp_threshold: numeric indicating RCA threshold beyond which mcp is 1.
*default* 1.
rpop_mcp_threshold: numeric indicating RPOP threshold beyond which mcp is 1.
*default* 1. Only used if presence_test is not "rca".
pop: pandas df, with time, location and corresponding population, in that order.
Not required if presence_test is "rca", which is the default.
continuous: Used to calculate product proximities, indicates whether
to consider correlation of every product pair (True) or product
co-occurrence (False). *default* False.
asymmetric: Used to calculate product proximities, indicates whether
to generate asymmetric proximity matrix (True) or symmetric (False).
*default* False.
proximity_edgelist: pandas df with cols 'prod1', 'prod2', 'proximity'.
If None (default), proximity values are calculated from data.
knn: Number of nearest neighbors from proximity matrix to use to calculate
density. Will use entire proximity matrix if None.
*default* None.
check_logsupermodularity: If True (default), check log-supermodularity.
If int, use roughly that many samples to check log-supermodularity.
report_logsupermodularity: If True, print warning if log-supermodularity.
If False (default), don't.
verbose: Print year being processed
```
## FAQ
- Why are ECI and PCI are both normalized using ECI's mean and std. dev?
+ This normalization preserves the property that ECI = (mean of PCI of products for which MCP=1)
- What is log-supermodularity?
+ Refer `ecomplexity/log_supermodularity.py` for a brief explanation. More at Schetter, U. (2019). A Structural Ranking of Economic Complexity (SSRN Scholarly Paper 3485842). https://doi.org/10.2139/ssrn.3485842.
### References
- Hausmann, R., Hidalgo, C. A., Bustos, S., Coscia, M., Simoes, A., & Yıldırım, M. (2013). The Atlas of Economic Complexity: Mapping Paths to Prosperity (Part 1). Retrieved from <https://growthlab.cid.harvard.edu/files/growthlab/files/atlas_2013_part1.pdf>
- Hidalgo, C. A., Klinger, B., Barabasi, A.-L., & Hausmann, R. (2007). The Product Space Conditions the Development of Nations. Science, 317(5837), 482–487. <http://doi.org/10.1126/science.1144581>
Raw data
{
"_id": null,
"home_page": "https://github.com/cid-harvard/py-ecomplexity",
"name": "ecomplexity",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3",
"maintainer_email": null,
"keywords": "pandas python networks economics complexity",
"author": "Shreyas Gadgin Matha",
"author_email": "shreyas.gm61@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/5b/5e/5efc2dfef2c6fc0e993ad00c02951a11be9b01ab5f32aae0180bf1c38d28/ecomplexity-0.5.3.tar.gz",
"platform": null,
"description": "# Economic Complexity and Product Complexity\n\nBy the Growth Lab at Harvard's Center for International Development\n\nThis package is part of Harvard Growth Lab\u2019s portfolio of software packages, digital products and interactive data visualizations. To browse our entire portfolio, please visit [growthlab.app](growthlab.app). To learn more about our research, please visit [Harvard Growth Lab\u2019s home page](https://growthlab.cid.harvard.edu/).\n\n# About\nPython package to calculate economic complexity indices.\n\nSTATA implementation of the economic complexity index available at: <https://github.com/cid-harvard/ecomplexity>\n\nExplore complexity and associated data using Harvard CID's Atlas tool: <http://atlas.cid.harvard.edu>\n\n## Tutorial\n\n**Installation**:\nFor the latest stable version: `pip install ecomplexity`\n\nLatest version of the package under development (untested and possibly with bugs), install directly from GitHub:\n`pip install git+https://github.com/cid-harvard/py-ecomplexity@develop`\n\n**Usage**:\n\n```python\nfrom ecomplexity import ecomplexity\nfrom ecomplexity import proximity\n\n# Import trade data from CID Atlas\ndata_url = \"https://intl-atlas-downloads.s3.amazonaws.com/country_hsproduct2digit_year.csv.zip\"\ndata = pd.read_csv(data_url, compression=\"zip\", low_memory=False)\ndata = data[['year','location_code','hs_product_code','export_value']]\n\n# Calculate complexity\ntrade_cols = {'time':'year', 'loc':'location_code', 'prod':'hs_product_code', 'val':'export_value'}\ncdata = ecomplexity(data, trade_cols)\n\n# Calculate proximity matrix\nprox_df = proximity(data, trade_cols)\n```\n\n**Arguments**:\n\n```text\ndata: pandas dataframe containing production / trade data.\n Including variables indicating time, location, product and value\ncols_input: dict of column names for time, location, product and value.\n Example: {'time':'year', 'loc':'origin', 'prod':'hs92', 'val':'export_val'}\npresence_test: str for test used for presence of industry in location.\n One of \"rca\" (default), \"rpop\", or \"manual\".\n Determines which values are used for M_cp calculations.\n If \"manual\", M_cp is taken as given from the \"value\" column in data\nval_errors_flag: {'coerce','ignore','raise'}. Passed to pd.to_numeric\n *default* coerce.\nrca_mcp_threshold: numeric indicating RCA threshold beyond which mcp is 1.\n *default* 1.\nrpop_mcp_threshold: numeric indicating RPOP threshold beyond which mcp is 1.\n *default* 1. Only used if presence_test is not \"rca\".\npop: pandas df, with time, location and corresponding population, in that order.\n Not required if presence_test is \"rca\", which is the default.\ncontinuous: Used to calculate product proximities, indicates whether\n to consider correlation of every product pair (True) or product\n co-occurrence (False). *default* False.\nasymmetric: Used to calculate product proximities, indicates whether\n to generate asymmetric proximity matrix (True) or symmetric (False).\n *default* False.\nproximity_edgelist: pandas df with cols 'prod1', 'prod2', 'proximity'.\n If None (default), proximity values are calculated from data.\nknn: Number of nearest neighbors from proximity matrix to use to calculate\n density. Will use entire proximity matrix if None.\n *default* None.\ncheck_logsupermodularity: If True (default), check log-supermodularity.\n If int, use roughly that many samples to check log-supermodularity.\nreport_logsupermodularity: If True, print warning if log-supermodularity.\n If False (default), don't.\nverbose: Print year being processed\n```\n\n## FAQ\n\n- Why are ECI and PCI are both normalized using ECI's mean and std. dev?\n + This normalization preserves the property that ECI = (mean of PCI of products for which MCP=1)\n- What is log-supermodularity?\n + Refer `ecomplexity/log_supermodularity.py` for a brief explanation. More at Schetter, U. (2019). A Structural Ranking of Economic Complexity (SSRN Scholarly Paper 3485842). https://doi.org/10.2139/ssrn.3485842.\n\n### References\n\n- Hausmann, R., Hidalgo, C. A., Bustos, S., Coscia, M., Simoes, A., & Y\u0131ld\u0131r\u0131m, M. (2013). The Atlas of Economic Complexity: Mapping Paths to Prosperity (Part 1). Retrieved from <https://growthlab.cid.harvard.edu/files/growthlab/files/atlas_2013_part1.pdf>\n- Hidalgo, C. A., Klinger, B., Barabasi, A.-L., & Hausmann, R. (2007). The Product Space Conditions the Development of Nations. Science, 317(5837), 482\u2013487. <http://doi.org/10.1126/science.1144581>\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Package to calculate economic complexity and associated variables",
"version": "0.5.3",
"project_urls": {
"Homepage": "https://github.com/cid-harvard/py-ecomplexity"
},
"split_keywords": [
"pandas",
"python",
"networks",
"economics",
"complexity"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e5abb35b670e7ea69fcfd6f5e015edbb47e0e003fa712c54f3481a21ef87aead",
"md5": "08dbc6fef690413f3f281f53bf312bf1",
"sha256": "60581b61bd7e4e32174f344a36918c833d83ab5a4892427dd1f8d034a3dedd54"
},
"downloads": -1,
"filename": "ecomplexity-0.5.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "08dbc6fef690413f3f281f53bf312bf1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3",
"size": 17395,
"upload_time": "2024-04-14T10:26:03",
"upload_time_iso_8601": "2024-04-14T10:26:03.458812Z",
"url": "https://files.pythonhosted.org/packages/e5/ab/b35b670e7ea69fcfd6f5e015edbb47e0e003fa712c54f3481a21ef87aead/ecomplexity-0.5.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5b5e5efc2dfef2c6fc0e993ad00c02951a11be9b01ab5f32aae0180bf1c38d28",
"md5": "ad19916f9231a2903665f9f680484681",
"sha256": "f9eaed2bf58495de4f8648e05c452edc1dc8ced300b1d99856a19dd38c88b06c"
},
"downloads": -1,
"filename": "ecomplexity-0.5.3.tar.gz",
"has_sig": false,
"md5_digest": "ad19916f9231a2903665f9f680484681",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3",
"size": 15447,
"upload_time": "2024-04-14T10:26:05",
"upload_time_iso_8601": "2024-04-14T10:26:05.553556Z",
"url": "https://files.pythonhosted.org/packages/5b/5e/5efc2dfef2c6fc0e993ad00c02951a11be9b01ab5f32aae0180bf1c38d28/ecomplexity-0.5.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-14 10:26:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cid-harvard",
"github_project": "py-ecomplexity",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pandas",
"specs": [
[
"==",
"1.5.2"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"1.23.5"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
"==",
"1.2.0"
]
]
}
],
"lcname": "ecomplexity"
}