py3langid

Name	py3langid JSON
Version	0.3.0 JSON
	download
home_page	None
Summary	Fork of the language identification tool langid.py, featuring a modernized codebase and faster execution times.
upload_time	2024-06-18 11:31:04
maintainer	None
docs_url	None
author	Marco Lui
requires_python	>=3.8
license	BSD
keywords	language detection language identification langid langid.py
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            =============
``py3langid``
=============


``py3langid`` is a fork of the standalone language identification tool ``langid.py`` by Marco Lui.

Original license: BSD-2-Clause. Fork license: BSD-3-Clause.



Changes in this fork
--------------------

Execution speed has been improved and the code base has been optimized for Python 3.6+:

- Import: Loading the package (``import py3langid``) is about 30% faster
- Startup: Loading the default classification model is 25-30x faster
- Execution: Language detection with ``langid.classify`` is 5-6x faster on paragraphs (less on longer texts)

For implementation details see this blog post: `How to make language detection with langid.py faster <https://adrien.barbaresi.eu/blog/language-detection-langid-py-faster.html>`_.

For more information and older Python versions see `changelog <https://github.com/adbar/py3langid/blob/master/HISTORY.rst>`_.


Usage
-----

Drop-in replacement
~~~~~~~~~~~~~~~~~~~


1. Install the package:

   * ``pip3 install py3langid`` (or ``pip`` where applicable)

2. Use it:

   * with Python: ``import py3langid as langid``
   * on the command-line: ``langid``


With Python
~~~~~~~~~~~

Basics:

.. code-block:: python

    >>> import py3langid as langid
    
    >>> text = 'This text is in English.'
    # identified language and probability
    >>> langid.classify(text)
    ('en', -56.77429)
    # unpack the result tuple in variables
    >>> lang, prob = langid.classify(text)
    # all potential languages
    >>> langid.rank(text)


More options:

.. code-block:: python

    >>> from py3langid.langid import LanguageIdentifier, MODEL_FILE

    # subset of target languages
    >>> identifier = LanguageIdentifier.from_pickled_model(MODEL_FILE)
    >>> identifier.set_languages(['de', 'en', 'fr'])
    # this won't work well...
    >>> identifier.classify('这样不好')
    ('en', -81.831665)

    # normalization of probabilities to an interval between 0 and 1
    >>> identifier = LanguageIdentifier.from_pickled_model(MODEL_FILE, norm_probs=True)
    >>> identifier.classify('This should be enough text.')
    ('en', 1.0)


Note: the Numpy data type for the feature vector has been changed to optimize for speed. If results are inconsistent, try restoring the original setting:

.. code-block:: python

    >>> langid.classify(text, datatype='uint32')


On the command-line
~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

    # basic usage with probability normalization
    $ echo "This should be enough text." | langid -n
    ('en', 1.0)

    # define a subset of target languages
    $ echo "This won't be recognized properly." | langid -n -l fr,it,tr
    ('it', 0.97038305)


Legacy documentation
--------------------


**The docs below are provided for reference, only part of the functions are currently tested and maintained.**


Introduction
------------

``langid.py`` is a standalone Language Identification (LangID) tool.

The design principles are as follows:

1. Fast
2. Pre-trained over a large number of languages (currently 97)
3. Not sensitive to domain-specific features (e.g. HTML/XML markup)
4. Single .py file with minimal dependencies
5. Deployable as a web service

All that is required to run ``langid.py`` is Python >= 3.6 and numpy. 

The accompanying training tools are still Python2-only.

``langid.py`` is WSGI-compliant.  ``langid.py`` will use ``fapws3`` as a web server if 
available, and default to ``wsgiref.simple_server`` otherwise.

``langid.py`` comes pre-trained on 97 languages (ISO 639-1 codes given):

    af, am, an, ar, as, az, be, bg, bn, br, 
    bs, ca, cs, cy, da, de, dz, el, en, eo, 
    es, et, eu, fa, fi, fo, fr, ga, gl, gu, 
    he, hi, hr, ht, hu, hy, id, is, it, ja, 
    jv, ka, kk, km, kn, ko, ku, ky, la, lb, 
    lo, lt, lv, mg, mk, ml, mn, mr, ms, mt, 
    nb, ne, nl, nn, no, oc, or, pa, pl, ps, 
    pt, qu, ro, ru, rw, se, si, sk, sl, sq, 
    sr, sv, sw, ta, te, th, tl, tr, ug, uk, 
    ur, vi, vo, wa, xh, zh, zu

The training data was drawn from 5 different sources:

* JRC-Acquis 
* ClueWeb 09
* Wikipedia
* Reuters RCV2
* Debian i18n


Usage
-----

    langid [options]

optional arguments:
  -h, --help            show this help message and exit
  -s, --serve           launch web service
  --host=HOST           host/ip to bind to
  --port=PORT           port to listen on
  -v                    increase verbosity (repeat for greater effect)
  -m MODEL              load model from file
  -l LANGS, --langs=LANGS
                        comma-separated set of target ISO639 language codes
                        (e.g en,de)
  -r, --remote          auto-detect IP address for remote access
  -b, --batch           specify a list of files on the command line
  -d, --dist            show full distribution over languages
  -u URL, --url=URL     langid of URL
  --line                process pipes line-by-line rather than as a document
  -n, --normalize       normalize confidence scores to probability values


The simplest way to use ``langid.py`` is as a command-line tool, and you can 
invoke using ``python langid.py``. If you installed ``langid.py`` as a Python 
module (e.g. via ``pip install langid``), you can invoke ``langid`` instead of 
``python langid.py -n`` (the two are equivalent).  This will cause a prompt to 
display. Enter text to identify, and hit enter::

  >>> This is a test
  ('en', -54.41310358047485)
  >>> Questa e una prova
  ('it', -35.41771221160889)


``langid.py`` can also detect when the input is redirected (only tested under Linux), and in this
case will process until EOF rather than until newline like in interactive mode::

  python langid.py < README.rst 
  ('en', -22552.496054649353)


The value returned is the unnormalized probability estimate for the language. Calculating 
the exact probability estimate is disabled by default, but can be enabled through a flag::

  python langid.py -n < README.rst 
  ('en', 1.0)

More details are provided in this README in the section on `Probability Normalization`.

You can also use ``langid.py`` as a Python library::

  # python
  Python 2.7.2+ (default, Oct  4 2011, 20:06:09) 
  [GCC 4.6.1] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import langid
  >>> langid.classify("This is a test")
  ('en', -54.41310358047485)
  
Finally, ``langid.py`` can use Python's built-in ``wsgiref.simple_server`` (or ``fapws3`` if available) to
provide language identification as a web service. To do this, launch ``python langid.py -s``, and
access http://localhost:9008/detect . The web service supports GET, POST and PUT. If GET is performed
with no data, a simple HTML forms interface is displayed.

The response is generated in JSON, here is an example::

  {"responseData": {"confidence": -54.41310358047485, "language": "en"}, "responseDetails": null, "responseStatus": 200}

A utility such as curl can be used to access the web service::

  # curl -d "q=This is a test" localhost:9008/detect
  {"responseData": {"confidence": -54.41310358047485, "language": "en"}, "responseDetails": null, "responseStatus": 200}

You can also use HTTP PUT::

  # curl -T readme.rst localhost:9008/detect
    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  100  2871  100   119  100  2752    117   2723  0:00:01  0:00:01 --:--:--  2727
  {"responseData": {"confidence": -22552.496054649353, "language": "en"}, "responseDetails": null, "responseStatus": 200}

If no "q=XXX" key-value pair is present in the HTTP POST payload, ``langid.py`` will interpret the entire
file as a single query. This allows for redirection via curl::

  # echo "This is a test" | curl -d @- localhost:9008/detect
  {"responseData": {"confidence": -54.41310358047485, "language": "en"}, "responseDetails": null, "responseStatus": 200}

``langid.py`` will attempt to discover the host IP address automatically. Often, this is set to localhost(127.0.1.1), even 
though the machine has a different external IP address. ``langid.py`` can attempt to automatically discover the external
IP address. To enable this functionality, start ``langid.py`` with the ``-r`` flag.

``langid.py`` supports constraining of the output language set using the ``-l`` flag and a comma-separated list of ISO639-1 
language codes (the ``-n`` flag enables probability normalization)::

  # python langid.py -n -l it,fr
  >>> Io non parlo italiano
  ('it', 0.99999999988965627)
  >>> Je ne parle pas français
  ('fr', 1.0)
  >>> I don't speak english
  ('it', 0.92210605672341062)

When using ``langid.py`` as a library, the set_languages method can be used to constrain the language set::

  python                      
  Python 2.7.2+ (default, Oct  4 2011, 20:06:09) 
  [GCC 4.6.1] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import langid
  >>> langid.classify("I do not speak english")
  ('en', 0.57133487679900674)
  >>> langid.set_languages(['de','fr','it'])
  >>> langid.classify("I do not speak english")
  ('it', 0.99999835791478453)
  >>> langid.set_languages(['en','it'])
  >>> langid.classify("I do not speak english")
  ('en', 0.99176190378750373)


Batch Mode
----------

``langid.py`` supports batch mode processing, which can be invoked with the ``-b`` flag.
In this mode, ``langid.py`` reads a list of paths to files to classify as arguments.
If no arguments are supplied, ``langid.py`` reads the list of paths from ``stdin``,
this is useful for using ``langid.py`` with UNIX utilities such as ``find``.

In batch mode, ``langid.py`` uses ``multiprocessing`` to invoke multiple instances of
the classifier, utilizing all available CPUs to classify documents in parallel. 


Probability Normalization
-------------------------

The probabilistic model implemented by ``langid.py`` involves the multiplication of a
large number of probabilities. For computational reasons, the actual calculations are
implemented in the log-probability space (a common numerical technique for dealing with
vanishingly small probabilities). One side-effect of this is that it is not necessary to
compute a full probability in order to determine the most probable language in a set
of candidate languages. However, users sometimes find it helpful to have a "confidence"
score for the probability prediction. Thus, ``langid.py`` implements a re-normalization
that produces an output in the 0-1 range.

``langid.py`` disables probability normalization by default. For
command-line usages of ``langid.py``, it can be enabled by passing the ``-n`` flag. For
probability normalization in library use, the user must instantiate their own 
``LanguageIdentifier``. An example of such usage is as follows::
  
  >> from py3langid.langid import LanguageIdentifier, MODEL_FILE
  >> identifier = LanguageIdentifier.from_pickled_model(MODEL_FILE, norm_probs=True)
  >> identifier.classify("This is a test")
  ('en', 0.9999999909903544)


Training a model
----------------

So far Python 2.7 only, see the `original instructions <https://github.com/saffsd/langid.py#training-a-model>`_.


Read more
---------

``langid.py`` is based on published research. [1] describes the LD feature selection technique in detail,
and [2] provides more detail about the module ``langid.py`` itself.

[1] Lui, Marco and Timothy Baldwin (2011) Cross-domain Feature Selection for Language Identification, 
In Proceedings of the Fifth International Joint Conference on Natural Language Processing (IJCNLP 2011), 
Chiang Mai, Thailand, pp. 553—561. Available from http://www.aclweb.org/anthology/I11-1062

[2] Lui, Marco and Timothy Baldwin (2012) langid.py: An Off-the-shelf Language Identification Tool, 
In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012), 
Demo Session, Jeju, Republic of Korea. Available from www.aclweb.org/anthology/P12-3005

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "py3langid",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "language detection, language identification, langid, langid.py",
    "author": "Marco Lui",
    "author_email": "Adrien Barbaresi <barbaresi@bbaw.de>",
    "download_url": "https://files.pythonhosted.org/packages/99/43/c3f7a3c5150c56a0ca70c3039e53cc58046698b7ce0913bb8fa86d71abcb/py3langid-0.3.0.tar.gz",
    "platform": null,
    "description": "=============\n``py3langid``\n=============\n\n\n``py3langid`` is a fork of the standalone language identification tool ``langid.py`` by Marco Lui.\n\nOriginal license: BSD-2-Clause. Fork license: BSD-3-Clause.\n\n\n\nChanges in this fork\n--------------------\n\nExecution speed has been improved and the code base has been optimized for Python 3.6+:\n\n- Import: Loading the package (``import py3langid``) is about 30% faster\n- Startup: Loading the default classification model is 25-30x faster\n- Execution: Language detection with ``langid.classify`` is 5-6x faster on paragraphs (less on longer texts)\n\nFor implementation details see this blog post: `How to make language detection with langid.py faster <https://adrien.barbaresi.eu/blog/language-detection-langid-py-faster.html>`_.\n\nFor more information and older Python versions see `changelog <https://github.com/adbar/py3langid/blob/master/HISTORY.rst>`_.\n\n\nUsage\n-----\n\nDrop-in replacement\n~~~~~~~~~~~~~~~~~~~\n\n\n1. Install the package:\n\n   * ``pip3 install py3langid`` (or ``pip`` where applicable)\n\n2. Use it:\n\n   * with Python: ``import py3langid as langid``\n   * on the command-line: ``langid``\n\n\nWith Python\n~~~~~~~~~~~\n\nBasics:\n\n.. code-block:: python\n\n    >>> import py3langid as langid\n    \n    >>> text = 'This text is in English.'\n    # identified language and probability\n    >>> langid.classify(text)\n    ('en', -56.77429)\n    # unpack the result tuple in variables\n    >>> lang, prob = langid.classify(text)\n    # all potential languages\n    >>> langid.rank(text)\n\n\nMore options:\n\n.. code-block:: python\n\n    >>> from py3langid.langid import LanguageIdentifier, MODEL_FILE\n\n    # subset of target languages\n    >>> identifier = LanguageIdentifier.from_pickled_model(MODEL_FILE)\n    >>> identifier.set_languages(['de', 'en', 'fr'])\n    # this won't work well...\n    >>> identifier.classify('\u8fd9\u6837\u4e0d\u597d')\n    ('en', -81.831665)\n\n    # normalization of probabilities to an interval between 0 and 1\n    >>> identifier = LanguageIdentifier.from_pickled_model(MODEL_FILE, norm_probs=True)\n    >>> identifier.classify('This should be enough text.')\n    ('en', 1.0)\n\n\nNote: the Numpy data type for the feature vector has been changed to optimize for speed. If results are inconsistent, try restoring the original setting:\n\n.. code-block:: python\n\n    >>> langid.classify(text, datatype='uint32')\n\n\nOn the command-line\n~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: bash\n\n    # basic usage with probability normalization\n    $ echo \"This should be enough text.\" | langid -n\n    ('en', 1.0)\n\n    # define a subset of target languages\n    $ echo \"This won't be recognized properly.\" | langid -n -l fr,it,tr\n    ('it', 0.97038305)\n\n\nLegacy documentation\n--------------------\n\n\n**The docs below are provided for reference, only part of the functions are currently tested and maintained.**\n\n\nIntroduction\n------------\n\n``langid.py`` is a standalone Language Identification (LangID) tool.\n\nThe design principles are as follows:\n\n1. Fast\n2. Pre-trained over a large number of languages (currently 97)\n3. Not sensitive to domain-specific features (e.g. HTML/XML markup)\n4. Single .py file with minimal dependencies\n5. Deployable as a web service\n\nAll that is required to run ``langid.py`` is Python >= 3.6 and numpy. \n\nThe accompanying training tools are still Python2-only.\n\n``langid.py`` is WSGI-compliant.  ``langid.py`` will use ``fapws3`` as a web server if \navailable, and default to ``wsgiref.simple_server`` otherwise.\n\n``langid.py`` comes pre-trained on 97 languages (ISO 639-1 codes given):\n\n    af, am, an, ar, as, az, be, bg, bn, br, \n    bs, ca, cs, cy, da, de, dz, el, en, eo, \n    es, et, eu, fa, fi, fo, fr, ga, gl, gu, \n    he, hi, hr, ht, hu, hy, id, is, it, ja, \n    jv, ka, kk, km, kn, ko, ku, ky, la, lb, \n    lo, lt, lv, mg, mk, ml, mn, mr, ms, mt, \n    nb, ne, nl, nn, no, oc, or, pa, pl, ps, \n    pt, qu, ro, ru, rw, se, si, sk, sl, sq, \n    sr, sv, sw, ta, te, th, tl, tr, ug, uk, \n    ur, vi, vo, wa, xh, zh, zu\n\nThe training data was drawn from 5 different sources:\n\n* JRC-Acquis \n* ClueWeb 09\n* Wikipedia\n* Reuters RCV2\n* Debian i18n\n\n\nUsage\n-----\n\n    langid [options]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -s, --serve           launch web service\n  --host=HOST           host/ip to bind to\n  --port=PORT           port to listen on\n  -v                    increase verbosity (repeat for greater effect)\n  -m MODEL              load model from file\n  -l LANGS, --langs=LANGS\n                        comma-separated set of target ISO639 language codes\n                        (e.g en,de)\n  -r, --remote          auto-detect IP address for remote access\n  -b, --batch           specify a list of files on the command line\n  -d, --dist            show full distribution over languages\n  -u URL, --url=URL     langid of URL\n  --line                process pipes line-by-line rather than as a document\n  -n, --normalize       normalize confidence scores to probability values\n\n\nThe simplest way to use ``langid.py`` is as a command-line tool, and you can \ninvoke using ``python langid.py``. If you installed ``langid.py`` as a Python \nmodule (e.g. via ``pip install langid``), you can invoke ``langid`` instead of \n``python langid.py -n`` (the two are equivalent).  This will cause a prompt to \ndisplay. Enter text to identify, and hit enter::\n\n  >>> This is a test\n  ('en', -54.41310358047485)\n  >>> Questa e una prova\n  ('it', -35.41771221160889)\n\n\n``langid.py`` can also detect when the input is redirected (only tested under Linux), and in this\ncase will process until EOF rather than until newline like in interactive mode::\n\n  python langid.py < README.rst \n  ('en', -22552.496054649353)\n\n\nThe value returned is the unnormalized probability estimate for the language. Calculating \nthe exact probability estimate is disabled by default, but can be enabled through a flag::\n\n  python langid.py -n < README.rst \n  ('en', 1.0)\n\nMore details are provided in this README in the section on `Probability Normalization`.\n\nYou can also use ``langid.py`` as a Python library::\n\n  # python\n  Python 2.7.2+ (default, Oct  4 2011, 20:06:09) \n  [GCC 4.6.1] on linux2\n  Type \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n  >>> import langid\n  >>> langid.classify(\"This is a test\")\n  ('en', -54.41310358047485)\n  \nFinally, ``langid.py`` can use Python's built-in ``wsgiref.simple_server`` (or ``fapws3`` if available) to\nprovide language identification as a web service. To do this, launch ``python langid.py -s``, and\naccess http://localhost:9008/detect . The web service supports GET, POST and PUT. If GET is performed\nwith no data, a simple HTML forms interface is displayed.\n\nThe response is generated in JSON, here is an example::\n\n  {\"responseData\": {\"confidence\": -54.41310358047485, \"language\": \"en\"}, \"responseDetails\": null, \"responseStatus\": 200}\n\nA utility such as curl can be used to access the web service::\n\n  # curl -d \"q=This is a test\" localhost:9008/detect\n  {\"responseData\": {\"confidence\": -54.41310358047485, \"language\": \"en\"}, \"responseDetails\": null, \"responseStatus\": 200}\n\nYou can also use HTTP PUT::\n\n  # curl -T readme.rst localhost:9008/detect\n    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n  100  2871  100   119  100  2752    117   2723  0:00:01  0:00:01 --:--:--  2727\n  {\"responseData\": {\"confidence\": -22552.496054649353, \"language\": \"en\"}, \"responseDetails\": null, \"responseStatus\": 200}\n\nIf no \"q=XXX\" key-value pair is present in the HTTP POST payload, ``langid.py`` will interpret the entire\nfile as a single query. This allows for redirection via curl::\n\n  # echo \"This is a test\" | curl -d @- localhost:9008/detect\n  {\"responseData\": {\"confidence\": -54.41310358047485, \"language\": \"en\"}, \"responseDetails\": null, \"responseStatus\": 200}\n\n``langid.py`` will attempt to discover the host IP address automatically. Often, this is set to localhost(127.0.1.1), even \nthough the machine has a different external IP address. ``langid.py`` can attempt to automatically discover the external\nIP address. To enable this functionality, start ``langid.py`` with the ``-r`` flag.\n\n``langid.py`` supports constraining of the output language set using the ``-l`` flag and a comma-separated list of ISO639-1 \nlanguage codes (the ``-n`` flag enables probability normalization)::\n\n  # python langid.py -n -l it,fr\n  >>> Io non parlo italiano\n  ('it', 0.99999999988965627)\n  >>> Je ne parle pas fran\u00e7ais\n  ('fr', 1.0)\n  >>> I don't speak english\n  ('it', 0.92210605672341062)\n\nWhen using ``langid.py`` as a library, the set_languages method can be used to constrain the language set::\n\n  python                      \n  Python 2.7.2+ (default, Oct  4 2011, 20:06:09) \n  [GCC 4.6.1] on linux2\n  Type \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n  >>> import langid\n  >>> langid.classify(\"I do not speak english\")\n  ('en', 0.57133487679900674)\n  >>> langid.set_languages(['de','fr','it'])\n  >>> langid.classify(\"I do not speak english\")\n  ('it', 0.99999835791478453)\n  >>> langid.set_languages(['en','it'])\n  >>> langid.classify(\"I do not speak english\")\n  ('en', 0.99176190378750373)\n\n\nBatch Mode\n----------\n\n``langid.py`` supports batch mode processing, which can be invoked with the ``-b`` flag.\nIn this mode, ``langid.py`` reads a list of paths to files to classify as arguments.\nIf no arguments are supplied, ``langid.py`` reads the list of paths from ``stdin``,\nthis is useful for using ``langid.py`` with UNIX utilities such as ``find``.\n\nIn batch mode, ``langid.py`` uses ``multiprocessing`` to invoke multiple instances of\nthe classifier, utilizing all available CPUs to classify documents in parallel. \n\n\nProbability Normalization\n-------------------------\n\nThe probabilistic model implemented by ``langid.py`` involves the multiplication of a\nlarge number of probabilities. For computational reasons, the actual calculations are\nimplemented in the log-probability space (a common numerical technique for dealing with\nvanishingly small probabilities). One side-effect of this is that it is not necessary to\ncompute a full probability in order to determine the most probable language in a set\nof candidate languages. However, users sometimes find it helpful to have a \"confidence\"\nscore for the probability prediction. Thus, ``langid.py`` implements a re-normalization\nthat produces an output in the 0-1 range.\n\n``langid.py`` disables probability normalization by default. For\ncommand-line usages of ``langid.py``, it can be enabled by passing the ``-n`` flag. For\nprobability normalization in library use, the user must instantiate their own \n``LanguageIdentifier``. An example of such usage is as follows::\n  \n  >> from py3langid.langid import LanguageIdentifier, MODEL_FILE\n  >> identifier = LanguageIdentifier.from_pickled_model(MODEL_FILE, norm_probs=True)\n  >> identifier.classify(\"This is a test\")\n  ('en', 0.9999999909903544)\n\n\nTraining a model\n----------------\n\nSo far Python 2.7 only, see the `original instructions <https://github.com/saffsd/langid.py#training-a-model>`_.\n\n\nRead more\n---------\n\n``langid.py`` is based on published research. [1] describes the LD feature selection technique in detail,\nand [2] provides more detail about the module ``langid.py`` itself.\n\n[1] Lui, Marco and Timothy Baldwin (2011) Cross-domain Feature Selection for Language Identification, \nIn Proceedings of the Fifth International Joint Conference on Natural Language Processing (IJCNLP 2011), \nChiang Mai, Thailand, pp. 553\u2014561. Available from http://www.aclweb.org/anthology/I11-1062\n\n[2] Lui, Marco and Timothy Baldwin (2012) langid.py: An Off-the-shelf Language Identification Tool, \nIn Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012), \nDemo Session, Jeju, Republic of Korea. Available from www.aclweb.org/anthology/P12-3005\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "Fork of the language identification tool langid.py, featuring a modernized codebase and faster execution times.",
    "version": "0.3.0",
    "project_urls": {
        "Blog": "https://adrien.barbaresi.eu/blog/language-detection-langid-py-faster.html",
        "Homepage": "https://github.com/adbar/py3langid",
        "Tracker": "https://github.com/adbar/py3langid/issues"
    },
    "split_keywords": [
        "language detection",
        " language identification",
        " langid",
        " langid.py"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9d1c8212ea872d236af0aea37043fb6feeaa9a43449183782b19d342f8ddd343",
                "md5": "3af78872b7419e22d74a93f799a1eb84",
                "sha256": "38f022eec31cf9a2bf6f142acb2a9b350fd7d0d5ae7762b1392c6d3567401fd3"
            },
            "downloads": -1,
            "filename": "py3langid-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3af78872b7419e22d74a93f799a1eb84",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 746125,
            "upload_time": "2024-06-18T11:30:51",
            "upload_time_iso_8601": "2024-06-18T11:30:51.265961Z",
            "url": "https://files.pythonhosted.org/packages/9d/1c/8212ea872d236af0aea37043fb6feeaa9a43449183782b19d342f8ddd343/py3langid-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9943c3f7a3c5150c56a0ca70c3039e53cc58046698b7ce0913bb8fa86d71abcb",
                "md5": "7e45e4e22f94a8308a115ffb58859750",
                "sha256": "0a875a031a58aaf9dbda7bb8285fd75e801a7bd276216ffabe037901d4b449ec"
            },
            "downloads": -1,
            "filename": "py3langid-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "7e45e4e22f94a8308a115ffb58859750",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 752935,
            "upload_time": "2024-06-18T11:31:04",
            "upload_time_iso_8601": "2024-06-18T11:31:04.301853Z",
            "url": "https://files.pythonhosted.org/packages/99/43/c3f7a3c5150c56a0ca70c3039e53cc58046698b7ce0913bb8fa86d71abcb/py3langid-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-18 11:31:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "adbar",
    "github_project": "py3langid",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "py3langid"
}

Marco Lui