geoLid


NamegeoLid JSON
Version 1.0 PyPI version JSON
download
home_pagehttps://github.com/jonathandunn/geoLid
SummaryGeographically-informed language identification
upload_time2024-03-14 21:47:46
maintainer
docs_urlNone
authorJonathan Dunn
requires_python
licenseGNU GENERAL PUBLIC LICENSE v3
keywords lid language identification geographic geography
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # geoLid
Geographically-informed language identification

This Python package carries out language identification with geographic priors to increase performance for low-resource and under-represented languages.

A description and evaluation of this approach can be found here: https://jdunn.name/2024/03/13/geographically-informed-language-identification/

A complete list of language codes and names per regional model can be found in the *language_names* directory.

**Downloading models**

geoLid contains a baseline non-geographic model as well as models for 16 specific regions, as shown below:

    baseline (916 languages)
    africa_north (44 languages)
    africa_southern (58 languages)
    africa_sub (166 languages)
    america_brazil (88 languages)
    america_central (188 languages)
    america_north (68 languages)
    america_south (129 languages)
    asia_central (54 languages)
    asia_east (46 languages)
    asia_south (60 languages)
    asia_southeast (325 languages)
    europe_east (65 languages)
    europe_russia (65 languages)
    europe_west (108 languages)
    middle_east (53 languages)
    oceania (49 languages)

To download models, use this command:

    from geoLid import download_model
    download_model("baseline")

The model name "all" will download all region-specific models.

**Usage**

Language identification can be used as shown below:

    from geoLid import geoLid
    lid = geoLid(model_location = "models")
    labels = lid.predict(data = data, region = "baseline")

The *model_location* during initialization points to the directory containing the LID models.

The input variable *data* is a list containing at least one string that represents a text to make predictions about.

The *region* variable indicates which region-specific model should be used. The default is to use the non-geographic baseline model.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jonathandunn/geoLid",
    "name": "geoLid",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "lid,language identification,geographic,geography",
    "author": "Jonathan Dunn",
    "author_email": "jedunn@illinois.edu",
    "download_url": "https://files.pythonhosted.org/packages/6b/6d/bc009965a0dde8be84b41bdf83774d7835f207997325412a927e4f7516be/geoLid-1.0.tar.gz",
    "platform": null,
    "description": "# geoLid\r\nGeographically-informed language identification\r\n\r\nThis Python package carries out language identification with geographic priors to increase performance for low-resource and under-represented languages.\r\n\r\nA description and evaluation of this approach can be found here: https://jdunn.name/2024/03/13/geographically-informed-language-identification/\r\n\r\nA complete list of language codes and names per regional model can be found in the *language_names* directory.\r\n\r\n**Downloading models**\r\n\r\ngeoLid contains a baseline non-geographic model as well as models for 16 specific regions, as shown below:\r\n\r\n    baseline (916 languages)\r\n    africa_north (44 languages)\r\n    africa_southern (58 languages)\r\n    africa_sub (166 languages)\r\n    america_brazil (88 languages)\r\n    america_central (188 languages)\r\n    america_north (68 languages)\r\n    america_south (129 languages)\r\n    asia_central (54 languages)\r\n    asia_east (46 languages)\r\n    asia_south (60 languages)\r\n    asia_southeast (325 languages)\r\n    europe_east (65 languages)\r\n    europe_russia (65 languages)\r\n    europe_west (108 languages)\r\n    middle_east (53 languages)\r\n    oceania (49 languages)\r\n\r\nTo download models, use this command:\r\n\r\n    from geoLid import download_model\r\n    download_model(\"baseline\")\r\n\r\nThe model name \"all\" will download all region-specific models.\r\n\r\n**Usage**\r\n\r\nLanguage identification can be used as shown below:\r\n\r\n    from geoLid import geoLid\r\n    lid = geoLid(model_location = \"models\")\r\n    labels = lid.predict(data = data, region = \"baseline\")\r\n\r\nThe *model_location* during initialization points to the directory containing the LID models.\r\n\r\nThe input variable *data* is a list containing at least one string that represents a text to make predictions about.\r\n\r\nThe *region* variable indicates which region-specific model should be used. The default is to use the non-geographic baseline model.\r\n",
    "bugtrack_url": null,
    "license": "GNU GENERAL PUBLIC LICENSE v3",
    "summary": "Geographically-informed language identification",
    "version": "1.0",
    "project_urls": {
        "Homepage": "https://github.com/jonathandunn/geoLid"
    },
    "split_keywords": [
        "lid",
        "language identification",
        "geographic",
        "geography"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7a4f8660f683f89d4d158e666f2178d9c01bf731100834d572b9bed7f31c2775",
                "md5": "18b81bfaf5731ac222b75bb111d5bad0",
                "sha256": "a5d43ab29f4c11d7884e03f337b4fe520b5d351bd298769023352408d6dd5c4c"
            },
            "downloads": -1,
            "filename": "geoLid-1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "18b81bfaf5731ac222b75bb111d5bad0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 16257,
            "upload_time": "2024-03-14T21:47:45",
            "upload_time_iso_8601": "2024-03-14T21:47:45.292943Z",
            "url": "https://files.pythonhosted.org/packages/7a/4f/8660f683f89d4d158e666f2178d9c01bf731100834d572b9bed7f31c2775/geoLid-1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6b6dbc009965a0dde8be84b41bdf83774d7835f207997325412a927e4f7516be",
                "md5": "85cba871c29a29f2c60e4e11d6929b88",
                "sha256": "40e4ef3a4ee2df6482db3ed883931e9338a3e8014c7374e4191324e4dc49e002"
            },
            "downloads": -1,
            "filename": "geoLid-1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "85cba871c29a29f2c60e4e11d6929b88",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 16183,
            "upload_time": "2024-03-14T21:47:46",
            "upload_time_iso_8601": "2024-03-14T21:47:46.681476Z",
            "url": "https://files.pythonhosted.org/packages/6b/6d/bc009965a0dde8be84b41bdf83774d7835f207997325412a927e4f7516be/geoLid-1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-14 21:47:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jonathandunn",
    "github_project": "geoLid",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "geolid"
}
        
Elapsed time: 0.20175s