instate: predict the state of residence from last name
=============================================================
.. image:: https://img.shields.io/pypi/v/instate.svg
:target: https://pypi.python.org/pypi/instate
.. image:: https://readthedocs.org/projects/instate/badge/?version=latest
:target: http://instate.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
.. image:: https://pepy.tech/badge/instate
:target: https://pepy.tech/project/instate
Using the Indian electoral rolls data (2017), we provide a Python package that takes the last name of a person and gives its distribution across states.
Potential Use Cases
---------------------
India has 22 official languages. And to serve such a diverse language base is a challenge for businesses and surveyors. To the extent that businesses have access to the last name (and no other information) and in absence of other data that allows us to model a person's spoken language, the distribution of last name across states is the best we have.
Installation
-------------
We strongly recommend installing `indicate` inside a Python virtual environment
(see `venv documentation <https://docs.python.org/3/library/venv.html#creating-virtual-environments>`__)
::
pip install instate
Examples
--------
::
from instate import last_state
last_dat <- pd.read_csv("last_dat.csv")
last_state_dat <- last_state(last_dat, "dhingra")
print(last_state_dat)
API
----------
instate exposes 3 functions.
- **last_state**
- takes a pandas dataframe, the column name for the df column with the last names, and produces a dataframe with 31 more columns, reflecting the number of states for which we have the data.
::
from instate import last_state
df = pd.DataFrame({'last_name': ['Dhingra', 'Sood', 'Gowda']})
last_state(df, "last_name").iloc[:, : 5]
last_name __last_name andaman andhra arunachal
0 Dhingra dhingra 0.001737 0.000744 0.000000
1 Sood sood 0.000258 0.002492 0.000043
2 Gowda gowda 0.000000 0.528533 0.000000
- **pred_last_state**
- takes a pandas dataframe, the column name with the last names, and produces a dataframe with 1 more column (pred_state), reflecting the top-3 predictions from GRU model.
::
from instate import pred_last_state
df = pd.DataFrame({'last_name': ['Dhingra', 'Sood', 'Gowda']})
last_state(df, "last_name").iloc[:, : 5]
last_name pred_state
0 dhingra [Daman and Diu, Andaman and Nicobar Islands, Puducherry]
1 sood [Meghalaya, Chandigarh, Punjab]
2 gowda [Puducherry, Nagaland, Daman and Diu]
- **state_to_lang**
- takes a pandas dataframe, the column name with the state, and appends census mappings from state to languages
::
from instate import state_to_lang
df = pd.DataFrame({'last_name': ['dhingra', 'sood', 'gowda']})
state_last = last_state(df, "last_name")
small_state = state_last.loc[:, "andaman":"utt"]
state_last["modal_state"] = small_state.idxmax(axis = 1)
state_to_lang(state_last, "modal_state")[["last_name", "modal_state", "official_languages"]]
last_name modal_state official_languages
0 dhingra delhi Hindi, English
1 sood punjab Punjabi
2 gowda andhra Telugu
Data
----
The underlying data for the package can be accessed at: https://doi.org/10.7910/DVN/ZXMVTJ
Evaluation
----------
The model has a top-3 accuracy of 85.3\% on unseen names.
Authors
-------
Atul Dhingra and Gaurav Sood
Contributor Code of Conduct
---------------------------------
The project welcomes contributions from everyone! In fact, it depends on
it. To maintain this welcoming atmosphere, and to collaborate in a fun
and productive way, we expect contributors to the project to abide by
the `Contributor Code of
Conduct <http://contributor-covenant.org/version/1/0/0/>`__.
License
----------
The package is released under the `MIT
License <https://opensource.org/licenses/MIT>`__.
Raw data
{
"_id": null,
"home_page": "https://github.com/appeler/instate",
"name": "instate",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "predict the state of residence from last name",
"author": "Atul Dhingra, Gaurav Sood",
"author_email": "dhingra.atul92@gmail.com, gsood07@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/2f/18/47ebfbc9b9ae39065bc6dd057d49d85d764cc52e5f6566fc4ac6d87b823e/instate-0.1.2.tar.gz",
"platform": null,
"description": "instate: predict the state of residence from last name \n=============================================================\n\n.. image:: https://img.shields.io/pypi/v/instate.svg\n :target: https://pypi.python.org/pypi/instate\n.. image:: https://readthedocs.org/projects/instate/badge/?version=latest\n :target: http://instate.readthedocs.io/en/latest/?badge=latest\n :alt: Documentation Status\n.. image:: https://pepy.tech/badge/instate\n :target: https://pepy.tech/project/instate\n\n\nUsing the Indian electoral rolls data (2017), we provide a Python package that takes the last name of a person and gives its distribution across states. \n\nPotential Use Cases\n---------------------\nIndia has 22 official languages. And to serve such a diverse language base is a challenge for businesses and surveyors. To the extent that businesses have access to the last name (and no other information) and in absence of other data that allows us to model a person's spoken language, the distribution of last name across states is the best we have.\n\nInstallation\n-------------\nWe strongly recommend installing `indicate` inside a Python virtual environment\n(see `venv documentation <https://docs.python.org/3/library/venv.html#creating-virtual-environments>`__)\n\n::\n\n pip install instate\n\nExamples\n--------\n::\n\n from instate import last_state\n last_dat <- pd.read_csv(\"last_dat.csv\")\n last_state_dat <- last_state(last_dat, \"dhingra\")\n print(last_state_dat)\n\nAPI\n----------\n\ninstate exposes 3 functions. \n\n- **last_state**\n\n - takes a pandas dataframe, the column name for the df column with the last names, and produces a dataframe with 31 more columns, reflecting the number of states for which we have the data. \n\n::\n \n from instate import last_state\n df = pd.DataFrame({'last_name': ['Dhingra', 'Sood', 'Gowda']})\n last_state(df, \"last_name\").iloc[:, : 5]\n \n last_name __last_name andaman andhra arunachal\n 0 Dhingra dhingra 0.001737 0.000744 0.000000\n 1 Sood sood 0.000258 0.002492 0.000043\n 2 Gowda gowda 0.000000 0.528533 0.000000\n\n- **pred_last_state**\n \n - takes a pandas dataframe, the column name with the last names, and produces a dataframe with 1 more column (pred_state), reflecting the top-3 predictions from GRU model.\n\n::\n \n from instate import pred_last_state\n df = pd.DataFrame({'last_name': ['Dhingra', 'Sood', 'Gowda']})\n last_state(df, \"last_name\").iloc[:, : 5]\n last_name\tpred_state\n 0\tdhingra\t[Daman and Diu, Andaman and Nicobar Islands, Puducherry]\n 1\tsood\t[Meghalaya, Chandigarh, Punjab]\n 2\tgowda\t[Puducherry, Nagaland, Daman and Diu]\n\n- **state_to_lang**\n\n - takes a pandas dataframe, the column name with the state, and appends census mappings from state to languages\n\n::\n\n from instate import state_to_lang\n df = pd.DataFrame({'last_name': ['dhingra', 'sood', 'gowda']})\n state_last = last_state(df, \"last_name\")\n small_state = state_last.loc[:, \"andaman\":\"utt\"]\n state_last[\"modal_state\"] = small_state.idxmax(axis = 1)\n state_to_lang(state_last, \"modal_state\")[[\"last_name\", \"modal_state\", \"official_languages\"]]\n\n last_name modal_state official_languages\n 0 dhingra delhi Hindi, English\n 1 sood punjab Punjabi\n 2 gowda andhra Telugu\n\nData\n----\n\nThe underlying data for the package can be accessed at: https://doi.org/10.7910/DVN/ZXMVTJ\n\nEvaluation\n----------\n\nThe model has a top-3 accuracy of 85.3\\% on unseen names.\n\nAuthors\n-------\n\nAtul Dhingra and Gaurav Sood\n\nContributor Code of Conduct\n---------------------------------\n\nThe project welcomes contributions from everyone! In fact, it depends on\nit. To maintain this welcoming atmosphere, and to collaborate in a fun\nand productive way, we expect contributors to the project to abide by\nthe `Contributor Code of\nConduct <http://contributor-covenant.org/version/1/0/0/>`__.\n\nLicense\n----------\n\nThe package is released under the `MIT\nLicense <https://opensource.org/licenses/MIT>`__.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Instate: predict the state of residence from last name",
"version": "0.1.2",
"split_keywords": [
"predict",
"the",
"state",
"of",
"residence",
"from",
"last",
"name"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2f1847ebfbc9b9ae39065bc6dd057d49d85d764cc52e5f6566fc4ac6d87b823e",
"md5": "a4bcb397efea87e1f937adf8137c707d",
"sha256": "7c5749b4ac61b80c973a65ac56c67a82e30c2bdb5890eb559d8546650157cf42"
},
"downloads": -1,
"filename": "instate-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "a4bcb397efea87e1f937adf8137c707d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 12517,
"upload_time": "2023-03-24T23:52:26",
"upload_time_iso_8601": "2023-03-24T23:52:26.455830Z",
"url": "https://files.pythonhosted.org/packages/2f/18/47ebfbc9b9ae39065bc6dd057d49d85d764cc52e5f6566fc4ac6d87b823e/instate-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-24 23:52:26",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "appeler",
"github_project": "instate",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "pandas",
"specs": []
},
{
"name": "torch",
"specs": [
[
"==",
"1.13.1"
]
]
},
{
"name": "typing",
"specs": []
},
{
"name": "pytest",
"specs": []
}
],
"tox": true,
"lcname": "instate"
}