Name | weightipy JSON |
Version |
0.3.3
JSON |
| download |
home_page | None |
Summary | None |
upload_time | 2024-07-12 10:19:40 |
maintainer | None |
docs_url | None |
author | Remi Sebastian Kits |
requires_python | None |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Weightipy
Weightipy is a cut down version of [Quantipy3](https://github.com/Quantipy/quantipy3) for weighting people data using the RIM (iterative raking) algorithm.
### Changes from Quantipy
- Removed all quantipy overhead. Weightipy supports the latest versions of Pandas and Numpy and is tested for Python 3.7, 3.8, 3.9, 3.10 and 3.11.
- Weightipy runs up to 6 times faster than Quantipy, depending on the dataset.
- Rim class will not generate reports like Quantipy did, unless the parameter verbose is set to True on the Rim constructor.
## Installation
`pip install weightipy`
or
`python3 -m pip install weightipy`
#### Create a virtual envirionment
If you want to create a virtual environment when using Weightipy:
conda
```python
conda create -n envwp python=3
```
with venv
```python
python -m venv [your_env_name]
```
## 5-minutes to Weightipy
**Get started**
Assuming we have the variables `gender` and `agecat` we can weight the dataset like this:
```Python
import weightipy as wp
targets = {
"agecat": {"18-24": 5.0, "25-34": 30.0, "35-49": 26.0, "50-64": 19.0, "65+": 20.0},
"gender": {"Male": 49, "Female": 51}
}
scheme = wp.scheme_from_dict(targets)
df_weighted = wp.weight_dataframe(
df=my_df,
scheme=scheme,
weight_column="weights"
)
efficiency = wp.weighting_efficiency(df_weighted["weights"])
```
In case we are working with census data, which also includes a region variable and we would
like to weight the data by age and gender in each region, we can use the `scheme_from_df` function:
```Python
import weightipy as wp
import pandas as pd
df_data = pd.read_csv("data_to_weight.csv")
df_census = pd.read_csv("census_data.csv")
scheme = wp.scheme_from_df(
df=df_census,
cols_weighting=["agecat", "gender"],
col_filter="region",
col_freq="freq"
)
df_weighted = wp.weight_dataframe(
df=d,
scheme=scheme,
weight_column="weights"
)
efficiency = wp.weighting_efficiency(df_weighted["weights"])
```
Or by using the underlying functions that will give more access to the weighting process, we
can use the Rim and WeightEngine classes directly:
```Python
import weightipy as wp
# in this example, agecat and gender are int dtype
age_targets = {'agecat':{1:5.0, 2:30.0, 3:26.0, 4:19.0, 5:20.0}}
gender_targets = {'gender':{0:49, 1:51}}
scheme = wp.Rim('gender_and_age')
scheme.set_targets(targets=[age_targets, gender_targets])
my_df["identity"] = range(len(my_df))
engine = wp.WeightEngine(data=df)
engine.add_scheme(scheme=scheme, key="identity", verbose=False)
engine.run()
df_weighted = engine.dataframe()
col_weights = f"weights_{scheme.name}"
efficiency = wp.weighting_efficiency(df_weighted[col_weights])
print(engine.get_report())
Weight variable weights_gender_and_age
Weight group _default_name_
Weight filter None
Total: unweighted 582.000000
Total: weighted 582.000000
Weighting efficiency 60.009826
Iterations required 14.000000
Mean weight factor 1.000000
Minimum weight factor 0.465818
Maximum weight factor 6.187700
Weight factor ratio 13.283522
```
For more references on the underlying classes, refer to the Quantipy
[documentation](https://quantipy.readthedocs.io/en/staging-develop/sites/lib_doc/weights/02_rim.html#using-the-rim-class)
Overview of functions to get started:
| Function | Description |
|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| weight_dataframe | Weights data by scheme, returns modified dataframe with new weight column. |
| weighting_efficiency | Takes weights and returns efficiency of weighting. See: https://quantipy.readthedocs.io/en/staging-develop/sites/lib_doc/weights/03_diags.html#the-weighting-efficiency |
| scheme_from_dict | Turns a dict of dicts into a Rim scheme. Keys of the dict are column names and the values are distributions. These are normalized. |
| scheme_from_df | Creates a Rim scheme from a dataframe from specified weighting columns and frequency column. Useful when working with census data. |
| Rim class | Useful for creation of more complex weighting schemas. For example when weighting subregions or groups, which require filters. See: https://quantipy.readthedocs.io/en/staging-develop/sites/lib_doc/weights/02_rim.html#using-the-rim-class |
| WeightEngine class | Useful for more specialised manipulation of the weighting process |
## Planned features
- More utility functions to simplify the weighting process
- More performance improvements, in order to better support batch weighting of many datasets
- Support for multithreaded weighting (possibly using Polars)
- Rewrite of the API to be less oriented towards how Quantipy worked and more in line with simple weighting needs
- Far future: Support for more weighting algorithms
# Contributing
The test suite for Weightipy can be run with the command
`python3 -m pytest tests`
But when developing a specific aspect of Weightipy, it might be quicker to run (e.g. for the Rim class)
`python3 -m unittest tests.test_rim`
We welcome volunteers and supporters. Please include a test case with any pull request, especially those that run calculations.
# Quantipy
#### Origins
- Quantipy was concieved of and instigated by Gary Nelson: http://www.datasmoothie.com
### Contributors on Quantipy
- Alexander Buchhammer, Alasdair Eaglestone, James Griffiths, Kerstin Müller : https://yougov.co.uk
- Datasmoothie’s Birgir Hrafn Sigurðsson and [Geir Freysson](http://www.twitter.com/@geirfreysson): http://www.datasmoothie.com
Raw data
{
"_id": null,
"home_page": null,
"name": "weightipy",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Remi Sebastian Kits",
"author_email": "kaitumisuuringute.keskus@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/27/c9/d4e91ebdbe3b92630d6dd9cbe1aefd5797946156ba7a7112c094d5b8b6e7/weightipy-0.3.3.tar.gz",
"platform": null,
"description": "# Weightipy\n\nWeightipy is a cut down version of [Quantipy3](https://github.com/Quantipy/quantipy3) for weighting people data using the RIM (iterative raking) algorithm.\n\n### Changes from Quantipy\n- Removed all quantipy overhead. Weightipy supports the latest versions of Pandas and Numpy and is tested for Python 3.7, 3.8, 3.9, 3.10 and 3.11.\n- Weightipy runs up to 6 times faster than Quantipy, depending on the dataset.\n- Rim class will not generate reports like Quantipy did, unless the parameter verbose is set to True on the Rim constructor.\n\n## Installation\n\n`pip install weightipy`\n\nor\n\n`python3 -m pip install weightipy`\n\n#### Create a virtual envirionment\n\nIf you want to create a virtual environment when using Weightipy:\n\nconda\n```python\nconda create -n envwp python=3\n```\n\nwith venv\n```python\npython -m venv [your_env_name]\n ```\n\n## 5-minutes to Weightipy\n\n**Get started**\n\nAssuming we have the variables `gender` and `agecat` we can weight the dataset like this:\n\n```Python\nimport weightipy as wp\n\ntargets = {\n \"agecat\": {\"18-24\": 5.0, \"25-34\": 30.0, \"35-49\": 26.0, \"50-64\": 19.0, \"65+\": 20.0},\n \"gender\": {\"Male\": 49, \"Female\": 51}\n}\nscheme = wp.scheme_from_dict(targets)\n\ndf_weighted = wp.weight_dataframe(\n df=my_df,\n scheme=scheme,\n weight_column=\"weights\"\n)\nefficiency = wp.weighting_efficiency(df_weighted[\"weights\"])\n```\n\nIn case we are working with census data, which also includes a region variable and we would\nlike to weight the data by age and gender in each region, we can use the `scheme_from_df` function:\n```Python\nimport weightipy as wp\nimport pandas as pd\n\ndf_data = pd.read_csv(\"data_to_weight.csv\")\ndf_census = pd.read_csv(\"census_data.csv\")\n\nscheme = wp.scheme_from_df(\n df=df_census,\n cols_weighting=[\"agecat\", \"gender\"],\n col_filter=\"region\",\n col_freq=\"freq\"\n)\ndf_weighted = wp.weight_dataframe(\n df=d,\n scheme=scheme,\n weight_column=\"weights\"\n)\nefficiency = wp.weighting_efficiency(df_weighted[\"weights\"])\n```\n\nOr by using the underlying functions that will give more access to the weighting process, we\ncan use the Rim and WeightEngine classes directly:\n```Python\nimport weightipy as wp\n\n# in this example, agecat and gender are int dtype\n\nage_targets = {'agecat':{1:5.0, 2:30.0, 3:26.0, 4:19.0, 5:20.0}}\ngender_targets = {'gender':{0:49, 1:51}}\nscheme = wp.Rim('gender_and_age')\nscheme.set_targets(targets=[age_targets, gender_targets])\n\nmy_df[\"identity\"] = range(len(my_df))\nengine = wp.WeightEngine(data=df)\nengine.add_scheme(scheme=scheme, key=\"identity\", verbose=False)\nengine.run()\ndf_weighted = engine.dataframe()\ncol_weights = f\"weights_{scheme.name}\"\n\nefficiency = wp.weighting_efficiency(df_weighted[col_weights])\n\nprint(engine.get_report())\n\nWeight variable weights_gender_and_age\nWeight group _default_name_\nWeight filter None\nTotal: unweighted 582.000000\nTotal: weighted 582.000000\nWeighting efficiency 60.009826\nIterations required 14.000000\nMean weight factor 1.000000\nMinimum weight factor 0.465818\nMaximum weight factor 6.187700\nWeight factor ratio 13.283522\n```\n\nFor more references on the underlying classes, refer to the Quantipy \n[documentation](https://quantipy.readthedocs.io/en/staging-develop/sites/lib_doc/weights/02_rim.html#using-the-rim-class)\n\nOverview of functions to get started:\n\n| Function | Description |\n|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| weight_dataframe | Weights data by scheme, returns modified dataframe with new weight column. |\n| weighting_efficiency | Takes weights and returns efficiency of weighting. See: https://quantipy.readthedocs.io/en/staging-develop/sites/lib_doc/weights/03_diags.html#the-weighting-efficiency |\n| scheme_from_dict | Turns a dict of dicts into a Rim scheme. Keys of the dict are column names and the values are distributions. These are normalized. |\n| scheme_from_df | Creates a Rim scheme from a dataframe from specified weighting columns and frequency column. Useful when working with census data. |\n| Rim class | Useful for creation of more complex weighting schemas. For example when weighting subregions or groups, which require filters. See: https://quantipy.readthedocs.io/en/staging-develop/sites/lib_doc/weights/02_rim.html#using-the-rim-class |\n| WeightEngine class | Useful for more specialised manipulation of the weighting process |\n\n## Planned features\n- More utility functions to simplify the weighting process\n- More performance improvements, in order to better support batch weighting of many datasets\n- Support for multithreaded weighting (possibly using Polars)\n- Rewrite of the API to be less oriented towards how Quantipy worked and more in line with simple weighting needs\n- Far future: Support for more weighting algorithms\n\n\n# Contributing\n\nThe test suite for Weightipy can be run with the command\n\n`python3 -m pytest tests`\n\nBut when developing a specific aspect of Weightipy, it might be quicker to run (e.g. for the Rim class)\n\n`python3 -m unittest tests.test_rim`\n\nWe welcome volunteers and supporters. Please include a test case with any pull request, especially those that run calculations.\n\n# Quantipy\n\n#### Origins\n- Quantipy was concieved of and instigated by Gary Nelson: http://www.datasmoothie.com\n\n\n### Contributors on Quantipy\n- Alexander Buchhammer, Alasdair Eaglestone, James Griffiths, Kerstin M\u00fcller : https://yougov.co.uk\n- Datasmoothie\u2019s Birgir Hrafn Sigur\u00f0sson and [Geir Freysson](http://www.twitter.com/@geirfreysson): http://www.datasmoothie.com\n",
"bugtrack_url": null,
"license": null,
"summary": null,
"version": "0.3.3",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ebdfa2b082ff21de32fa8b8a578e65f392bfcbf66b673c14dc23d5b87d96242a",
"md5": "867c54ee4fb4cbccf20e6470c374f230",
"sha256": "5b51099a5b309e25cf2176c69b0edea049180a190dcf79eb39107c3a499c2d35"
},
"downloads": -1,
"filename": "weightipy-0.3.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "867c54ee4fb4cbccf20e6470c374f230",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 15522,
"upload_time": "2024-07-12T10:19:39",
"upload_time_iso_8601": "2024-07-12T10:19:39.329167Z",
"url": "https://files.pythonhosted.org/packages/eb/df/a2b082ff21de32fa8b8a578e65f392bfcbf66b673c14dc23d5b87d96242a/weightipy-0.3.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "27c9d4e91ebdbe3b92630d6dd9cbe1aefd5797946156ba7a7112c094d5b8b6e7",
"md5": "d5305c99cab7a55b2b796d92c7df6dda",
"sha256": "0020603643974155fa637552ac708fae154f0013472a842405831f6d454eb62b"
},
"downloads": -1,
"filename": "weightipy-0.3.3.tar.gz",
"has_sig": false,
"md5_digest": "d5305c99cab7a55b2b796d92c7df6dda",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 20506,
"upload_time": "2024-07-12T10:19:40",
"upload_time_iso_8601": "2024-07-12T10:19:40.793590Z",
"url": "https://files.pythonhosted.org/packages/27/c9/d4e91ebdbe3b92630d6dd9cbe1aefd5797946156ba7a7112c094d5b8b6e7/weightipy-0.3.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-12 10:19:40",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "weightipy"
}