# geoLid
Geographically-informed language identification
This Python package carries out language identification with geographic priors to increase performance for low-resource and under-represented languages.
A description and evaluation of this approach can be found here: https://jdunn.name/2024/03/13/geographically-informed-language-identification/
A complete list of language codes and names per regional model can be found in the *language_names* directory.
**Downloading models**
geoLid contains a baseline non-geographic model as well as models for 16 specific regions, as shown below:
baseline (916 languages)
africa_north (44 languages)
africa_southern (58 languages)
africa_sub (166 languages)
america_brazil (88 languages)
america_central (188 languages)
america_north (68 languages)
america_south (129 languages)
asia_central (54 languages)
asia_east (46 languages)
asia_south (60 languages)
asia_southeast (325 languages)
europe_east (65 languages)
europe_russia (65 languages)
europe_west (108 languages)
middle_east (53 languages)
oceania (49 languages)
To download models, use this command:
from geoLid import download_model
download_model("baseline")
The model name "all" will download all region-specific models.
**Usage**
Language identification can be used as shown below:
from geoLid import geoLid
lid = geoLid(model_location = "models")
labels = lid.predict(data = data, region = "baseline")
The *model_location* during initialization points to the directory containing the LID models.
The input variable *data* is a list containing at least one string that represents a text to make predictions about.
The *region* variable indicates which region-specific model should be used. The default is to use the non-geographic baseline model.
Raw data
{
"_id": null,
"home_page": "https://github.com/jonathandunn/geoLid",
"name": "geoLid",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "lid,language identification,geographic,geography",
"author": "Jonathan Dunn",
"author_email": "jedunn@illinois.edu",
"download_url": "https://files.pythonhosted.org/packages/6b/6d/bc009965a0dde8be84b41bdf83774d7835f207997325412a927e4f7516be/geoLid-1.0.tar.gz",
"platform": null,
"description": "# geoLid\r\nGeographically-informed language identification\r\n\r\nThis Python package carries out language identification with geographic priors to increase performance for low-resource and under-represented languages.\r\n\r\nA description and evaluation of this approach can be found here: https://jdunn.name/2024/03/13/geographically-informed-language-identification/\r\n\r\nA complete list of language codes and names per regional model can be found in the *language_names* directory.\r\n\r\n**Downloading models**\r\n\r\ngeoLid contains a baseline non-geographic model as well as models for 16 specific regions, as shown below:\r\n\r\n baseline (916 languages)\r\n africa_north (44 languages)\r\n africa_southern (58 languages)\r\n africa_sub (166 languages)\r\n america_brazil (88 languages)\r\n america_central (188 languages)\r\n america_north (68 languages)\r\n america_south (129 languages)\r\n asia_central (54 languages)\r\n asia_east (46 languages)\r\n asia_south (60 languages)\r\n asia_southeast (325 languages)\r\n europe_east (65 languages)\r\n europe_russia (65 languages)\r\n europe_west (108 languages)\r\n middle_east (53 languages)\r\n oceania (49 languages)\r\n\r\nTo download models, use this command:\r\n\r\n from geoLid import download_model\r\n download_model(\"baseline\")\r\n\r\nThe model name \"all\" will download all region-specific models.\r\n\r\n**Usage**\r\n\r\nLanguage identification can be used as shown below:\r\n\r\n from geoLid import geoLid\r\n lid = geoLid(model_location = \"models\")\r\n labels = lid.predict(data = data, region = \"baseline\")\r\n\r\nThe *model_location* during initialization points to the directory containing the LID models.\r\n\r\nThe input variable *data* is a list containing at least one string that represents a text to make predictions about.\r\n\r\nThe *region* variable indicates which region-specific model should be used. The default is to use the non-geographic baseline model.\r\n",
"bugtrack_url": null,
"license": "GNU GENERAL PUBLIC LICENSE v3",
"summary": "Geographically-informed language identification",
"version": "1.0",
"project_urls": {
"Homepage": "https://github.com/jonathandunn/geoLid"
},
"split_keywords": [
"lid",
"language identification",
"geographic",
"geography"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7a4f8660f683f89d4d158e666f2178d9c01bf731100834d572b9bed7f31c2775",
"md5": "18b81bfaf5731ac222b75bb111d5bad0",
"sha256": "a5d43ab29f4c11d7884e03f337b4fe520b5d351bd298769023352408d6dd5c4c"
},
"downloads": -1,
"filename": "geoLid-1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "18b81bfaf5731ac222b75bb111d5bad0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 16257,
"upload_time": "2024-03-14T21:47:45",
"upload_time_iso_8601": "2024-03-14T21:47:45.292943Z",
"url": "https://files.pythonhosted.org/packages/7a/4f/8660f683f89d4d158e666f2178d9c01bf731100834d572b9bed7f31c2775/geoLid-1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6b6dbc009965a0dde8be84b41bdf83774d7835f207997325412a927e4f7516be",
"md5": "85cba871c29a29f2c60e4e11d6929b88",
"sha256": "40e4ef3a4ee2df6482db3ed883931e9338a3e8014c7374e4191324e4dc49e002"
},
"downloads": -1,
"filename": "geoLid-1.0.tar.gz",
"has_sig": false,
"md5_digest": "85cba871c29a29f2c60e4e11d6929b88",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 16183,
"upload_time": "2024-03-14T21:47:46",
"upload_time_iso_8601": "2024-03-14T21:47:46.681476Z",
"url": "https://files.pythonhosted.org/packages/6b/6d/bc009965a0dde8be84b41bdf83774d7835f207997325412a927e4f7516be/geoLid-1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-14 21:47:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jonathandunn",
"github_project": "geoLid",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "geolid"
}