weightipy


Nameweightipy JSON
Version 0.3.3 PyPI version JSON
download
home_pageNone
SummaryNone
upload_time2024-07-12 10:19:40
maintainerNone
docs_urlNone
authorRemi Sebastian Kits
requires_pythonNone
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Weightipy

Weightipy is a cut down version of [Quantipy3](https://github.com/Quantipy/quantipy3) for weighting people data using the RIM (iterative raking) algorithm.

### Changes from Quantipy
- Removed all quantipy overhead. Weightipy supports the latest versions of Pandas and Numpy and is tested for Python 3.7, 3.8, 3.9, 3.10 and 3.11.
- Weightipy runs up to 6 times faster than Quantipy, depending on the dataset.
- Rim class will not generate reports like Quantipy did, unless the parameter verbose is set to True on the Rim constructor.

## Installation

`pip install weightipy`

or

`python3 -m pip install weightipy`

#### Create a virtual envirionment

If you want to create a virtual environment when using Weightipy:

conda
```python
conda create -n envwp python=3
```

with venv
```python
python -m venv [your_env_name]
 ```

## 5-minutes to Weightipy

**Get started**

Assuming we have the variables `gender` and `agecat` we can weight the dataset like this:

```Python
import weightipy as wp

targets = {
    "agecat": {"18-24": 5.0, "25-34": 30.0, "35-49": 26.0, "50-64": 19.0, "65+": 20.0},
    "gender": {"Male": 49, "Female": 51}
}
scheme = wp.scheme_from_dict(targets)

df_weighted = wp.weight_dataframe(
    df=my_df,
    scheme=scheme,
    weight_column="weights"
)
efficiency = wp.weighting_efficiency(df_weighted["weights"])
```

In case we are working with census data, which also includes a region variable and we would
like to weight the data by age and gender in each region, we can use the `scheme_from_df` function:
```Python
import weightipy as wp
import pandas as pd

df_data = pd.read_csv("data_to_weight.csv")
df_census = pd.read_csv("census_data.csv")

scheme = wp.scheme_from_df(
    df=df_census,
    cols_weighting=["agecat", "gender"],
    col_filter="region",
    col_freq="freq"
)
df_weighted = wp.weight_dataframe(
    df=d,
    scheme=scheme,
    weight_column="weights"
)
efficiency = wp.weighting_efficiency(df_weighted["weights"])
```

Or by using the underlying functions that will give more access to the weighting process, we
can use the Rim and WeightEngine classes directly:
```Python
import weightipy as wp

# in this example, agecat and gender are int dtype

age_targets = {'agecat':{1:5.0, 2:30.0, 3:26.0, 4:19.0, 5:20.0}}
gender_targets = {'gender':{0:49, 1:51}}
scheme = wp.Rim('gender_and_age')
scheme.set_targets(targets=[age_targets, gender_targets])

my_df["identity"] = range(len(my_df))
engine = wp.WeightEngine(data=df)
engine.add_scheme(scheme=scheme, key="identity", verbose=False)
engine.run()
df_weighted = engine.dataframe()
col_weights = f"weights_{scheme.name}"

efficiency = wp.weighting_efficiency(df_weighted[col_weights])

print(engine.get_report())

Weight variable       weights_gender_and_age
Weight group                  _default_name_
Weight filter                           None
Total: unweighted                 582.000000
Total: weighted                   582.000000
Weighting efficiency               60.009826
Iterations required                14.000000
Mean weight factor                  1.000000
Minimum weight factor               0.465818
Maximum weight factor               6.187700
Weight factor ratio                13.283522
```

For more references on the underlying classes, refer to the Quantipy 
[documentation](https://quantipy.readthedocs.io/en/staging-develop/sites/lib_doc/weights/02_rim.html#using-the-rim-class)

Overview of functions to get started:

| Function             | Description                                                                                                                                                                                                                                  |
|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| weight_dataframe     | Weights data by scheme, returns modified dataframe with new weight column.                                                                                                                                                                   |
| weighting_efficiency | Takes weights and returns efficiency of weighting. See: https://quantipy.readthedocs.io/en/staging-develop/sites/lib_doc/weights/03_diags.html#the-weighting-efficiency                                                                      |
| scheme_from_dict     | Turns a dict of dicts into a Rim scheme. Keys of the dict are column names and the values are distributions. These are normalized.                                                                                                           |
| scheme_from_df       | Creates a Rim scheme from a dataframe from specified weighting columns and frequency column. Useful when working with census data.                                                                                                           |
| Rim class            | Useful for creation of more complex weighting schemas. For example when weighting subregions or groups, which require filters. See: https://quantipy.readthedocs.io/en/staging-develop/sites/lib_doc/weights/02_rim.html#using-the-rim-class |
| WeightEngine class   | Useful for more specialised manipulation of the weighting process                                                                                                                                                                            |

## Planned features
- More utility functions to simplify the weighting process
- More performance improvements, in order to better support batch weighting of many datasets
- Support for multithreaded weighting (possibly using Polars)
- Rewrite of the API to be less oriented towards how Quantipy worked and more in line with simple weighting needs
- Far future: Support for more weighting algorithms


# Contributing

The test suite for Weightipy can be run with the command

`python3 -m pytest tests`

But when developing a specific aspect of Weightipy, it might be quicker to run (e.g. for the Rim class)

`python3 -m unittest tests.test_rim`

We welcome volunteers and supporters. Please include a test case with any pull request, especially those that run calculations.

# Quantipy

#### Origins
- Quantipy was concieved of and instigated by Gary Nelson: http://www.datasmoothie.com


### Contributors on Quantipy
- Alexander Buchhammer, Alasdair Eaglestone, James Griffiths, Kerstin Müller : https://yougov.co.uk
- Datasmoothie’s Birgir Hrafn Sigurðsson and [Geir Freysson](http://www.twitter.com/@geirfreysson): http://www.datasmoothie.com

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "weightipy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Remi Sebastian Kits",
    "author_email": "kaitumisuuringute.keskus@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/27/c9/d4e91ebdbe3b92630d6dd9cbe1aefd5797946156ba7a7112c094d5b8b6e7/weightipy-0.3.3.tar.gz",
    "platform": null,
    "description": "# Weightipy\n\nWeightipy is a cut down version of [Quantipy3](https://github.com/Quantipy/quantipy3) for weighting people data using the RIM (iterative raking) algorithm.\n\n### Changes from Quantipy\n- Removed all quantipy overhead. Weightipy supports the latest versions of Pandas and Numpy and is tested for Python 3.7, 3.8, 3.9, 3.10 and 3.11.\n- Weightipy runs up to 6 times faster than Quantipy, depending on the dataset.\n- Rim class will not generate reports like Quantipy did, unless the parameter verbose is set to True on the Rim constructor.\n\n## Installation\n\n`pip install weightipy`\n\nor\n\n`python3 -m pip install weightipy`\n\n#### Create a virtual envirionment\n\nIf you want to create a virtual environment when using Weightipy:\n\nconda\n```python\nconda create -n envwp python=3\n```\n\nwith venv\n```python\npython -m venv [your_env_name]\n ```\n\n## 5-minutes to Weightipy\n\n**Get started**\n\nAssuming we have the variables `gender` and `agecat` we can weight the dataset like this:\n\n```Python\nimport weightipy as wp\n\ntargets = {\n    \"agecat\": {\"18-24\": 5.0, \"25-34\": 30.0, \"35-49\": 26.0, \"50-64\": 19.0, \"65+\": 20.0},\n    \"gender\": {\"Male\": 49, \"Female\": 51}\n}\nscheme = wp.scheme_from_dict(targets)\n\ndf_weighted = wp.weight_dataframe(\n    df=my_df,\n    scheme=scheme,\n    weight_column=\"weights\"\n)\nefficiency = wp.weighting_efficiency(df_weighted[\"weights\"])\n```\n\nIn case we are working with census data, which also includes a region variable and we would\nlike to weight the data by age and gender in each region, we can use the `scheme_from_df` function:\n```Python\nimport weightipy as wp\nimport pandas as pd\n\ndf_data = pd.read_csv(\"data_to_weight.csv\")\ndf_census = pd.read_csv(\"census_data.csv\")\n\nscheme = wp.scheme_from_df(\n    df=df_census,\n    cols_weighting=[\"agecat\", \"gender\"],\n    col_filter=\"region\",\n    col_freq=\"freq\"\n)\ndf_weighted = wp.weight_dataframe(\n    df=d,\n    scheme=scheme,\n    weight_column=\"weights\"\n)\nefficiency = wp.weighting_efficiency(df_weighted[\"weights\"])\n```\n\nOr by using the underlying functions that will give more access to the weighting process, we\ncan use the Rim and WeightEngine classes directly:\n```Python\nimport weightipy as wp\n\n# in this example, agecat and gender are int dtype\n\nage_targets = {'agecat':{1:5.0, 2:30.0, 3:26.0, 4:19.0, 5:20.0}}\ngender_targets = {'gender':{0:49, 1:51}}\nscheme = wp.Rim('gender_and_age')\nscheme.set_targets(targets=[age_targets, gender_targets])\n\nmy_df[\"identity\"] = range(len(my_df))\nengine = wp.WeightEngine(data=df)\nengine.add_scheme(scheme=scheme, key=\"identity\", verbose=False)\nengine.run()\ndf_weighted = engine.dataframe()\ncol_weights = f\"weights_{scheme.name}\"\n\nefficiency = wp.weighting_efficiency(df_weighted[col_weights])\n\nprint(engine.get_report())\n\nWeight variable       weights_gender_and_age\nWeight group                  _default_name_\nWeight filter                           None\nTotal: unweighted                 582.000000\nTotal: weighted                   582.000000\nWeighting efficiency               60.009826\nIterations required                14.000000\nMean weight factor                  1.000000\nMinimum weight factor               0.465818\nMaximum weight factor               6.187700\nWeight factor ratio                13.283522\n```\n\nFor more references on the underlying classes, refer to the Quantipy \n[documentation](https://quantipy.readthedocs.io/en/staging-develop/sites/lib_doc/weights/02_rim.html#using-the-rim-class)\n\nOverview of functions to get started:\n\n| Function             | Description                                                                                                                                                                                                                                  |\n|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| weight_dataframe     | Weights data by scheme, returns modified dataframe with new weight column.                                                                                                                                                                   |\n| weighting_efficiency | Takes weights and returns efficiency of weighting. See: https://quantipy.readthedocs.io/en/staging-develop/sites/lib_doc/weights/03_diags.html#the-weighting-efficiency                                                                      |\n| scheme_from_dict     | Turns a dict of dicts into a Rim scheme. Keys of the dict are column names and the values are distributions. These are normalized.                                                                                                           |\n| scheme_from_df       | Creates a Rim scheme from a dataframe from specified weighting columns and frequency column. Useful when working with census data.                                                                                                           |\n| Rim class            | Useful for creation of more complex weighting schemas. For example when weighting subregions or groups, which require filters. See: https://quantipy.readthedocs.io/en/staging-develop/sites/lib_doc/weights/02_rim.html#using-the-rim-class |\n| WeightEngine class   | Useful for more specialised manipulation of the weighting process                                                                                                                                                                            |\n\n## Planned features\n- More utility functions to simplify the weighting process\n- More performance improvements, in order to better support batch weighting of many datasets\n- Support for multithreaded weighting (possibly using Polars)\n- Rewrite of the API to be less oriented towards how Quantipy worked and more in line with simple weighting needs\n- Far future: Support for more weighting algorithms\n\n\n# Contributing\n\nThe test suite for Weightipy can be run with the command\n\n`python3 -m pytest tests`\n\nBut when developing a specific aspect of Weightipy, it might be quicker to run (e.g. for the Rim class)\n\n`python3 -m unittest tests.test_rim`\n\nWe welcome volunteers and supporters. Please include a test case with any pull request, especially those that run calculations.\n\n# Quantipy\n\n#### Origins\n- Quantipy was concieved of and instigated by Gary Nelson: http://www.datasmoothie.com\n\n\n### Contributors on Quantipy\n- Alexander Buchhammer, Alasdair Eaglestone, James Griffiths, Kerstin M\u00fcller : https://yougov.co.uk\n- Datasmoothie\u2019s Birgir Hrafn Sigur\u00f0sson and [Geir Freysson](http://www.twitter.com/@geirfreysson): http://www.datasmoothie.com\n",
    "bugtrack_url": null,
    "license": null,
    "summary": null,
    "version": "0.3.3",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ebdfa2b082ff21de32fa8b8a578e65f392bfcbf66b673c14dc23d5b87d96242a",
                "md5": "867c54ee4fb4cbccf20e6470c374f230",
                "sha256": "5b51099a5b309e25cf2176c69b0edea049180a190dcf79eb39107c3a499c2d35"
            },
            "downloads": -1,
            "filename": "weightipy-0.3.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "867c54ee4fb4cbccf20e6470c374f230",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 15522,
            "upload_time": "2024-07-12T10:19:39",
            "upload_time_iso_8601": "2024-07-12T10:19:39.329167Z",
            "url": "https://files.pythonhosted.org/packages/eb/df/a2b082ff21de32fa8b8a578e65f392bfcbf66b673c14dc23d5b87d96242a/weightipy-0.3.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "27c9d4e91ebdbe3b92630d6dd9cbe1aefd5797946156ba7a7112c094d5b8b6e7",
                "md5": "d5305c99cab7a55b2b796d92c7df6dda",
                "sha256": "0020603643974155fa637552ac708fae154f0013472a842405831f6d454eb62b"
            },
            "downloads": -1,
            "filename": "weightipy-0.3.3.tar.gz",
            "has_sig": false,
            "md5_digest": "d5305c99cab7a55b2b796d92c7df6dda",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 20506,
            "upload_time": "2024-07-12T10:19:40",
            "upload_time_iso_8601": "2024-07-12T10:19:40.793590Z",
            "url": "https://files.pythonhosted.org/packages/27/c9/d4e91ebdbe3b92630d6dd9cbe1aefd5797946156ba7a7112c094d5b8b6e7/weightipy-0.3.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-12 10:19:40",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "weightipy"
}
        
Elapsed time: 0.23355s