pycantonese


Namepycantonese JSON
Version 3.4.0 PyPI version JSON
download
home_pagehttps://pycantonese.org
SummaryCantonese Linguistics and NLP in Python
upload_time2021-12-28 21:31:47
maintainer
docs_urlNone
authorJackson L. Lee
requires_python>=3.7
licenseMIT License
keywords computational linguistics natural language processing nlp cantonese linguistics corpora speech language chinese jyutping
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            PyCantonese: Cantonese Linguistics and NLP in Python
====================================================

.. image:: https://jacksonllee.com/logos/pycantonese-logo.png
   :width: 250px

Full Documentation: https://pycantonese.org

|

.. image:: https://badge.fury.io/py/pycantonese.svg
   :target: https://pypi.python.org/pypi/pycantonese
   :alt: PyPI version

.. image:: https://img.shields.io/pypi/pyversions/pycantonese.svg
   :target: https://pypi.python.org/pypi/pycantonese
   :alt: Supported Python versions

.. image:: https://circleci.com/gh/jacksonllee/pycantonese.svg?style=shield
   :target: https://circleci.com/gh/jacksonllee/pycantonese
   :alt: CircleCI Builds

|

.. start-sphinx-website-index-page

PyCantonese is a Python library for Cantonese linguistics and natural language
processing (NLP). Currently implemented features (more to come!):

- Accessing and searching corpus data
- Parsing and conversion tools for Jyutping romanization
- Parsing Cantonese text
- Stop words
- Word segmentation
- Part-of-speech tagging

.. _download_install:

Download and Install
--------------------

To download and install the stable, most recent version::

    $ pip install --upgrade pycantonese

Ready for more?
Check out the `Quickstart <https://pycantonese.org/quickstart.html>`_ page.

Consulting
----------

If your team would like professional assistance in using PyCantonese,
freelance consulting and training services are available for both academic and commercial groups.
Please email `Jackson L. Lee <https://jacksonllee.com>`_.

Support
-------

If you have found PyCantonese useful and would like to offer support,
`buying me a coffee <https://www.buymeacoffee.com/pycantonese>`_ would go a long way!

Links
-----

* Source code: https://github.com/jacksonllee/pycantonese
* Bug tracker: https://github.com/jacksonllee/pycantonese/issues
* Social media:
  `Facebook <https://www.facebook.com/pycantonese>`_
  and `Twitter <https://twitter.com/pycantonese>`_

How to Cite
-----------

PyCantonese is authored and maintained by `Jackson L. Lee <https://jacksonllee.com>`_.

A talk introducing PyCantonese:

Lee, Jackson L. 2015. PyCantonese: Cantonese linguistic research in the age of big data.
Talk at the Childhood Bilingualism Research Centre, Chinese University of Hong Kong. September 15. 2015.
`Notes+slides <https://pycantonese.org/papers/Lee-pycantonese-2015.html>`_

License
-------

MIT License. Please see ``LICENSE.txt`` in the GitHub source code for details.

The HKCanCor dataset included in PyCantonese is substantially modified from
its source in terms of format. The original dataset has a CC BY license.
Please see ``pycantonese/data/hkcancor/README.md``
in the GitHub source code for details.

The rime-cantonese data (release 2021.05.16) is
incorporated into PyCantonese for word segmentation and
characters-to-Jyutping conversion.
This data has a CC BY 4.0 license.
Please see ``pycantonese/data/rime_cantonese/README.md``
in the GitHub source code for details.

Logo
----

The PyCantonese logo is the Chinese character 粵 meaning Cantonese,
with artistic design by albino.snowman (Instagram handle).

Acknowledgments
---------------

Wonderful resources with a permissive license that have been incorporated into PyCantonese:

- HKCanCor
- rime-cantonese

Individuals who have contributed feedback, bug reports, etc.
(in alphabetical order of last names):

- @cathug
- Litong Chen
- Jenny Chim
- @g-traveller
- Rachel Han
- Ryan Lai
- Charles Lam
- Chaak Ming Lau
- Hill Ma
- @richielo
- @rylanchiu
- Stephan Stiller
- Tsz-Him Tsui
- Robin Yuen

.. end-sphinx-website-index-page

Changelog
---------

Please see ``CHANGELOG.md``.

Setting up a Development Environment
------------------------------------

The latest code under development is available on Github at
`jacksonllee/pycantonese <https://github.com/jacksonllee/pycantonese>`_.
You need to have `Git LFS <https://git-lfs.github.com/>`_ installed on your system
(run `brew install git-lfs` if you have Homebrew installed on MacOS,
or run `sudo apt-get install git-lfs` if you're on Ubuntu).
To obtain this version for experimental features or for development:

.. code-block:: bash

   $ git clone https://github.com/jacksonllee/pycantonese.git
   $ cd pycantonese
   $ git lfs pull
   $ pip install -r dev-requirements.txt
   $ pip install -e .

To run tests and styling checks:

.. code-block:: bash

   $ pytest -vv --doctest-modules --cov=pycantonese pycantonese docs/source
   $ flake8 pycantonese
   $ black --check pycantonese

To build the documentation website files:

.. code-block:: bash

    $ python docs/source/build_docs.py



            

Raw data

            {
    "_id": null,
    "home_page": "https://pycantonese.org",
    "name": "pycantonese",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "computational linguistics,natural language processing,NLP,Cantonese,linguistics,corpora,speech,language,Chinese,Jyutping",
    "author": "Jackson L. Lee",
    "author_email": "jacksonlunlee@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/44/28/4b0cbc503f0be6dd4c55306d85643a785f64cd733778a79d472c779eb491/pycantonese-3.4.0.tar.gz",
    "platform": "",
    "description": "PyCantonese: Cantonese Linguistics and NLP in Python\n====================================================\n\n.. image:: https://jacksonllee.com/logos/pycantonese-logo.png\n   :width: 250px\n\nFull Documentation: https://pycantonese.org\n\n|\n\n.. image:: https://badge.fury.io/py/pycantonese.svg\n   :target: https://pypi.python.org/pypi/pycantonese\n   :alt: PyPI version\n\n.. image:: https://img.shields.io/pypi/pyversions/pycantonese.svg\n   :target: https://pypi.python.org/pypi/pycantonese\n   :alt: Supported Python versions\n\n.. image:: https://circleci.com/gh/jacksonllee/pycantonese.svg?style=shield\n   :target: https://circleci.com/gh/jacksonllee/pycantonese\n   :alt: CircleCI Builds\n\n|\n\n.. start-sphinx-website-index-page\n\nPyCantonese is a Python library for Cantonese linguistics and natural language\nprocessing (NLP). Currently implemented features (more to come!):\n\n- Accessing and searching corpus data\n- Parsing and conversion tools for Jyutping romanization\n- Parsing Cantonese text\n- Stop words\n- Word segmentation\n- Part-of-speech tagging\n\n.. _download_install:\n\nDownload and Install\n--------------------\n\nTo download and install the stable, most recent version::\n\n    $ pip install --upgrade pycantonese\n\nReady for more?\nCheck out the `Quickstart <https://pycantonese.org/quickstart.html>`_ page.\n\nConsulting\n----------\n\nIf your team would like professional assistance in using PyCantonese,\nfreelance consulting and training services are available for both academic and commercial groups.\nPlease email `Jackson L. Lee <https://jacksonllee.com>`_.\n\nSupport\n-------\n\nIf you have found PyCantonese useful and would like to offer support,\n`buying me a coffee <https://www.buymeacoffee.com/pycantonese>`_ would go a long way!\n\nLinks\n-----\n\n* Source code: https://github.com/jacksonllee/pycantonese\n* Bug tracker: https://github.com/jacksonllee/pycantonese/issues\n* Social media:\n  `Facebook <https://www.facebook.com/pycantonese>`_\n  and `Twitter <https://twitter.com/pycantonese>`_\n\nHow to Cite\n-----------\n\nPyCantonese is authored and maintained by `Jackson L. Lee <https://jacksonllee.com>`_.\n\nA talk introducing PyCantonese:\n\nLee, Jackson L. 2015. PyCantonese: Cantonese linguistic research in the age of big data.\nTalk at the Childhood Bilingualism Research Centre, Chinese University of Hong Kong. September 15. 2015.\n`Notes+slides <https://pycantonese.org/papers/Lee-pycantonese-2015.html>`_\n\nLicense\n-------\n\nMIT License. Please see ``LICENSE.txt`` in the GitHub source code for details.\n\nThe HKCanCor dataset included in PyCantonese is substantially modified from\nits source in terms of format. The original dataset has a CC BY license.\nPlease see ``pycantonese/data/hkcancor/README.md``\nin the GitHub source code for details.\n\nThe rime-cantonese data (release 2021.05.16) is\nincorporated into PyCantonese for word segmentation and\ncharacters-to-Jyutping conversion.\nThis data has a CC BY 4.0 license.\nPlease see ``pycantonese/data/rime_cantonese/README.md``\nin the GitHub source code for details.\n\nLogo\n----\n\nThe PyCantonese logo is the Chinese character \u7cb5 meaning Cantonese,\nwith artistic design by albino.snowman (Instagram handle).\n\nAcknowledgments\n---------------\n\nWonderful resources with a permissive license that have been incorporated into PyCantonese:\n\n- HKCanCor\n- rime-cantonese\n\nIndividuals who have contributed feedback, bug reports, etc.\n(in alphabetical order of last names):\n\n- @cathug\n- Litong Chen\n- Jenny Chim\n- @g-traveller\n- Rachel Han\n- Ryan Lai\n- Charles Lam\n- Chaak Ming Lau\n- Hill Ma\n- @richielo\n- @rylanchiu\n- Stephan Stiller\n- Tsz-Him Tsui\n- Robin Yuen\n\n.. end-sphinx-website-index-page\n\nChangelog\n---------\n\nPlease see ``CHANGELOG.md``.\n\nSetting up a Development Environment\n------------------------------------\n\nThe latest code under development is available on Github at\n`jacksonllee/pycantonese <https://github.com/jacksonllee/pycantonese>`_.\nYou need to have `Git LFS <https://git-lfs.github.com/>`_ installed on your system\n(run `brew install git-lfs` if you have Homebrew installed on MacOS,\nor run `sudo apt-get install git-lfs` if you're on Ubuntu).\nTo obtain this version for experimental features or for development:\n\n.. code-block:: bash\n\n   $ git clone https://github.com/jacksonllee/pycantonese.git\n   $ cd pycantonese\n   $ git lfs pull\n   $ pip install -r dev-requirements.txt\n   $ pip install -e .\n\nTo run tests and styling checks:\n\n.. code-block:: bash\n\n   $ pytest -vv --doctest-modules --cov=pycantonese pycantonese docs/source\n   $ flake8 pycantonese\n   $ black --check pycantonese\n\nTo build the documentation website files:\n\n.. code-block:: bash\n\n    $ python docs/source/build_docs.py\n\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Cantonese Linguistics and NLP in Python",
    "version": "3.4.0",
    "project_urls": {
        "Changelog": "https://pycantonese.org/changelog.html",
        "Download": "https://pypi.org/project/pycantonese/#files",
        "Homepage": "https://pycantonese.org",
        "Source": "https://github.com/jacksonllee/pycantonese",
        "Tracker": "https://github.com/jacksonllee/pycantonese/issues"
    },
    "split_keywords": [
        "computational linguistics",
        "natural language processing",
        "nlp",
        "cantonese",
        "linguistics",
        "corpora",
        "speech",
        "language",
        "chinese",
        "jyutping"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d1b8bb21891cc1cc0466d15e211896b614c73b494434b837e326008b501851c0",
                "md5": "f4a0b519cd6f29010ddc2b56717f173b",
                "sha256": "2585ae8070cc6a3a32f1cf0fd395c93f10aa531272e5292c4d082215104d7958"
            },
            "downloads": -1,
            "filename": "pycantonese-3.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f4a0b519cd6f29010ddc2b56717f173b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 3903038,
            "upload_time": "2021-12-28T21:30:25",
            "upload_time_iso_8601": "2021-12-28T21:30:25.944310Z",
            "url": "https://files.pythonhosted.org/packages/d1/b8/bb21891cc1cc0466d15e211896b614c73b494434b837e326008b501851c0/pycantonese-3.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "44284b0cbc503f0be6dd4c55306d85643a785f64cd733778a79d472c779eb491",
                "md5": "70aea9b4210540826362d0ba9ae6a753",
                "sha256": "8c0768bbfbc9862b9a149525edfd24dc34f380d5d654fae3597da3f0951a0752"
            },
            "downloads": -1,
            "filename": "pycantonese-3.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "70aea9b4210540826362d0ba9ae6a753",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 3831061,
            "upload_time": "2021-12-28T21:31:47",
            "upload_time_iso_8601": "2021-12-28T21:31:47.530384Z",
            "url": "https://files.pythonhosted.org/packages/44/28/4b0cbc503f0be6dd4c55306d85643a785f64cd733778a79d472c779eb491/pycantonese-3.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-12-28 21:31:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jacksonllee",
    "github_project": "pycantonese",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "circle": true,
    "lcname": "pycantonese"
}
        
Elapsed time: 4.46887s