langdetect


Namelangdetect JSON
Version 1.0.9 PyPI version JSON
download
home_pagehttps://github.com/Mimino666/langdetect
SummaryLanguage detection library ported from Google's language-detection.
upload_time2021-05-07 07:54:13
maintainer
docs_urlNone
authorMichal Mimino Danilak
requires_python
licenseMIT
keywords language detection library
VCS
bugtrack_url
requirements six
Travis-CI
coveralls test coverage No coveralls.
            langdetect
==========

[![Build Status](https://travis-ci.org/Mimino666/langdetect.svg?branch=master)](https://travis-ci.org/Mimino666/langdetect)

Port of Nakatani Shuyo's [language-detection](https://github.com/shuyo/language-detection) library (version from 03/03/2014) to Python.


Installation
============

    $ pip install langdetect

Supported Python versions 2.7, 3.4+.


Languages
=========

``langdetect`` supports 55 languages out of the box ([ISO 639-1 codes](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes)):

    af, ar, bg, bn, ca, cs, cy, da, de, el, en, es, et, fa, fi, fr, gu, he,
    hi, hr, hu, id, it, ja, kn, ko, lt, lv, mk, ml, mr, ne, nl, no, pa, pl,
    pt, ro, ru, sk, sl, so, sq, sv, sw, ta, te, th, tl, tr, uk, ur, vi, zh-cn, zh-tw


Basic usage
===========

To detect the language of the text:

```python
>>> from langdetect import detect
>>> detect("War doesn't show who's right, just who's left.")
'en'
>>> detect("Ein, zwei, drei, vier")
'de'
```

To find out the probabilities for the top languages:

```python
>>> from langdetect import detect_langs
>>> detect_langs("Otec matka syn.")
[sk:0.572770823327, pl:0.292872522702, cs:0.134356653968]
```

**NOTE**

Language detection algorithm is non-deterministic, which means that if you try to run it on a text which is either too short or too ambiguous, you might get different results everytime you run it.

To enforce consistent results, call following code before the first language detection:

```python
from langdetect import DetectorFactory
DetectorFactory.seed = 0
```

How to add new language?
========================

You need to create a new language profile. The easiest way to do it is to use the [langdetect.jar](https://github.com/shuyo/language-detection/raw/master/lib/langdetect.jar) tool, which can generate language profiles from Wikipedia abstract database files or plain text.

Wikipedia abstract database files can be retrieved from "Wikipedia Downloads" ([http://download.wikimedia.org/](http://download.wikimedia.org/)). They form '(language code)wiki-(version)-abstract.xml' (e.g. 'enwiki-20101004-abstract.xml' ).

usage: ``java -jar langdetect.jar --genprofile -d [directory path] [language codes]``

- Specify the directory which has abstract databases by -d option.
- This tool can handle gzip compressed file.

Remark: The database filename in Chinese is like 'zhwiki-(version)-abstract-zh-cn.xml' or zhwiki-(version)-abstract-zh-tw.xml', so that it must be modified 'zh-cnwiki-(version)-abstract.xml' or 'zh-twwiki-(version)-abstract.xml'.

To generate language profile from a plain text, use the genprofile-text command.

usage: ``java -jar langdetect.jar --genprofile-text -l [language code] [text file path]``

For more details see [language-detection Wiki](https://code.google.com/archive/p/language-detection/wikis/Tools.wiki).


Original project
================

This library is a direct port of Google's [language-detection](https://code.google.com/p/language-detection/) library from Java to Python. All the classes and methods are unchanged, so for more information see the project's website or wiki.

Presentation of the language detection algorithm: [http://www.slideshare.net/shuyo/language-detection-library-for-java](http://www.slideshare.net/shuyo/language-detection-library-for-java).



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Mimino666/langdetect",
    "name": "langdetect",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "language detection library",
    "author": "Michal Mimino Danilak",
    "author_email": "michal.danilak@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/0e/72/a3add0e4eec4eb9e2569554f7c70f4a3c27712f40e3284d483e88094cc0e/langdetect-1.0.9.tar.gz",
    "platform": "",
    "description": "langdetect\n==========\n\n[![Build Status](https://travis-ci.org/Mimino666/langdetect.svg?branch=master)](https://travis-ci.org/Mimino666/langdetect)\n\nPort of Nakatani Shuyo's [language-detection](https://github.com/shuyo/language-detection) library (version from 03/03/2014) to Python.\n\n\nInstallation\n============\n\n    $ pip install langdetect\n\nSupported Python versions 2.7, 3.4+.\n\n\nLanguages\n=========\n\n``langdetect`` supports 55 languages out of the box ([ISO 639-1 codes](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes)):\n\n    af, ar, bg, bn, ca, cs, cy, da, de, el, en, es, et, fa, fi, fr, gu, he,\n    hi, hr, hu, id, it, ja, kn, ko, lt, lv, mk, ml, mr, ne, nl, no, pa, pl,\n    pt, ro, ru, sk, sl, so, sq, sv, sw, ta, te, th, tl, tr, uk, ur, vi, zh-cn, zh-tw\n\n\nBasic usage\n===========\n\nTo detect the language of the text:\n\n```python\n>>> from langdetect import detect\n>>> detect(\"War doesn't show who's right, just who's left.\")\n'en'\n>>> detect(\"Ein, zwei, drei, vier\")\n'de'\n```\n\nTo find out the probabilities for the top languages:\n\n```python\n>>> from langdetect import detect_langs\n>>> detect_langs(\"Otec matka syn.\")\n[sk:0.572770823327, pl:0.292872522702, cs:0.134356653968]\n```\n\n**NOTE**\n\nLanguage detection algorithm is non-deterministic, which means that if you try to run it on a text which is either too short or too ambiguous, you might get different results everytime you run it.\n\nTo enforce consistent results, call following code before the first language detection:\n\n```python\nfrom langdetect import DetectorFactory\nDetectorFactory.seed = 0\n```\n\nHow to add new language?\n========================\n\nYou need to create a new language profile. The easiest way to do it is to use the [langdetect.jar](https://github.com/shuyo/language-detection/raw/master/lib/langdetect.jar) tool, which can generate language profiles from Wikipedia abstract database files or plain text.\n\nWikipedia abstract database files can be retrieved from \"Wikipedia Downloads\" ([http://download.wikimedia.org/](http://download.wikimedia.org/)). They form '(language code)wiki-(version)-abstract.xml' (e.g. 'enwiki-20101004-abstract.xml' ).\n\nusage: ``java -jar langdetect.jar --genprofile -d [directory path] [language codes]``\n\n- Specify the directory which has abstract databases by -d option.\n- This tool can handle gzip compressed file.\n\nRemark: The database filename in Chinese is like 'zhwiki-(version)-abstract-zh-cn.xml' or zhwiki-(version)-abstract-zh-tw.xml', so that it must be modified 'zh-cnwiki-(version)-abstract.xml' or 'zh-twwiki-(version)-abstract.xml'.\n\nTo generate language profile from a plain text, use the genprofile-text command.\n\nusage: ``java -jar langdetect.jar --genprofile-text -l [language code] [text file path]``\n\nFor more details see [language-detection Wiki](https://code.google.com/archive/p/language-detection/wikis/Tools.wiki).\n\n\nOriginal project\n================\n\nThis library is a direct port of Google's [language-detection](https://code.google.com/p/language-detection/) library from Java to Python. All the classes and methods are unchanged, so for more information see the project's website or wiki.\n\nPresentation of the language detection algorithm: [http://www.slideshare.net/shuyo/language-detection-library-for-java](http://www.slideshare.net/shuyo/language-detection-library-for-java).\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Language detection library ported from Google's language-detection.",
    "version": "1.0.9",
    "split_keywords": [
        "language",
        "detection",
        "library"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "bde70216a07accec2406672e2204fe31",
                "sha256": "7cbc0746252f19e76f77c0b1690aadf01963be835ef0cd4b56dddf2a8f1dfc2a"
            },
            "downloads": -1,
            "filename": "langdetect-1.0.9-py2-none-any.whl",
            "has_sig": false,
            "md5_digest": "bde70216a07accec2406672e2204fe31",
            "packagetype": "bdist_wheel",
            "python_version": "py2",
            "requires_python": null,
            "size": 993221,
            "upload_time": "2021-05-07T07:54:10",
            "upload_time_iso_8601": "2021-05-07T07:54:10.843555Z",
            "url": "https://files.pythonhosted.org/packages/d0/04/31f2f175c475c0274a7fb20552e18c4f5b1ef04b211dd4d7301ca6bf4534/langdetect-1.0.9-py2-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "6569d39fc69c5b5104ca268ce1a0de51",
                "sha256": "cbc1fef89f8d062739774bd51eda3da3274006b3661d199c2655f6b3f6d605a0"
            },
            "downloads": -1,
            "filename": "langdetect-1.0.9.tar.gz",
            "has_sig": false,
            "md5_digest": "6569d39fc69c5b5104ca268ce1a0de51",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 981474,
            "upload_time": "2021-05-07T07:54:13",
            "upload_time_iso_8601": "2021-05-07T07:54:13.562344Z",
            "url": "https://files.pythonhosted.org/packages/0e/72/a3add0e4eec4eb9e2569554f7c70f4a3c27712f40e3284d483e88094cc0e/langdetect-1.0.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-05-07 07:54:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "Mimino666",
    "github_project": "langdetect",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "six",
            "specs": []
        }
    ],
    "lcname": "langdetect"
}
        
Elapsed time: 0.05654s