PyCantonese: Cantonese Linguistics and NLP in Python
====================================================
.. image:: https://jacksonllee.com/logos/pycantonese-logo.png
:width: 250px
Full Documentation: https://pycantonese.org
|
.. image:: https://badge.fury.io/py/pycantonese.svg
:target: https://pypi.python.org/pypi/pycantonese
:alt: PyPI version
.. image:: https://img.shields.io/pypi/pyversions/pycantonese.svg
:target: https://pypi.python.org/pypi/pycantonese
:alt: Supported Python versions
.. image:: https://circleci.com/gh/jacksonllee/pycantonese.svg?style=shield
:target: https://circleci.com/gh/jacksonllee/pycantonese
:alt: CircleCI Builds
|
.. start-sphinx-website-index-page
PyCantonese is a Python library for Cantonese linguistics and natural language
processing (NLP). Currently implemented features (more to come!):
- Accessing and searching corpus data
- Parsing and conversion tools for Jyutping romanization
- Parsing Cantonese text
- Stop words
- Word segmentation
- Part-of-speech tagging
.. _download_install:
Download and Install
--------------------
To download and install the stable, most recent version::
$ pip install --upgrade pycantonese
Ready for more?
Check out the `Quickstart <https://pycantonese.org/quickstart.html>`_ page.
Consulting
----------
If your team would like professional assistance in using PyCantonese,
freelance consulting and training services are available for both academic and commercial groups.
Please email `Jackson L. Lee <https://jacksonllee.com>`_.
Support
-------
If you have found PyCantonese useful and would like to offer support,
`buying me a coffee <https://www.buymeacoffee.com/pycantonese>`_ would go a long way!
Links
-----
* Source code: https://github.com/jacksonllee/pycantonese
* Bug tracker: https://github.com/jacksonllee/pycantonese/issues
* Social media:
`Facebook <https://www.facebook.com/pycantonese>`_
and `Twitter <https://twitter.com/pycantonese>`_
How to Cite
-----------
PyCantonese is authored and maintained by `Jackson L. Lee <https://jacksonllee.com>`_.
A talk introducing PyCantonese:
Lee, Jackson L. 2015. PyCantonese: Cantonese linguistic research in the age of big data.
Talk at the Childhood Bilingualism Research Centre, Chinese University of Hong Kong. September 15. 2015.
`Notes+slides <https://pycantonese.org/papers/Lee-pycantonese-2015.html>`_
License
-------
MIT License. Please see ``LICENSE.txt`` in the GitHub source code for details.
The HKCanCor dataset included in PyCantonese is substantially modified from
its source in terms of format. The original dataset has a CC BY license.
Please see ``pycantonese/data/hkcancor/README.md``
in the GitHub source code for details.
The rime-cantonese data (release 2021.05.16) is
incorporated into PyCantonese for word segmentation and
characters-to-Jyutping conversion.
This data has a CC BY 4.0 license.
Please see ``pycantonese/data/rime_cantonese/README.md``
in the GitHub source code for details.
Logo
----
The PyCantonese logo is the Chinese character 粵 meaning Cantonese,
with artistic design by albino.snowman (Instagram handle).
Acknowledgments
---------------
Wonderful resources with a permissive license that have been incorporated into PyCantonese:
- HKCanCor
- rime-cantonese
Individuals who have contributed feedback, bug reports, etc.
(in alphabetical order of last names):
- @cathug
- Litong Chen
- Jenny Chim
- @g-traveller
- Rachel Han
- Ryan Lai
- Charles Lam
- Chaak Ming Lau
- Hill Ma
- @richielo
- @rylanchiu
- Stephan Stiller
- Tsz-Him Tsui
- Robin Yuen
.. end-sphinx-website-index-page
Changelog
---------
Please see ``CHANGELOG.md``.
Setting up a Development Environment
------------------------------------
The latest code under development is available on Github at
`jacksonllee/pycantonese <https://github.com/jacksonllee/pycantonese>`_.
You need to have `Git LFS <https://git-lfs.github.com/>`_ installed on your system
(run `brew install git-lfs` if you have Homebrew installed on MacOS,
or run `sudo apt-get install git-lfs` if you're on Ubuntu).
To obtain this version for experimental features or for development:
.. code-block:: bash
$ git clone https://github.com/jacksonllee/pycantonese.git
$ cd pycantonese
$ git lfs pull
$ pip install -r dev-requirements.txt
$ pip install -e .
To run tests and styling checks:
.. code-block:: bash
$ pytest -vv --doctest-modules --cov=pycantonese pycantonese docs/source
$ flake8 pycantonese
$ black --check pycantonese
To build the documentation website files:
.. code-block:: bash
$ python docs/source/build_docs.py
Raw data
{
"_id": null,
"home_page": "https://pycantonese.org",
"name": "pycantonese",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "computational linguistics,natural language processing,NLP,Cantonese,linguistics,corpora,speech,language,Chinese,Jyutping",
"author": "Jackson L. Lee",
"author_email": "jacksonlunlee@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/44/28/4b0cbc503f0be6dd4c55306d85643a785f64cd733778a79d472c779eb491/pycantonese-3.4.0.tar.gz",
"platform": "",
"description": "PyCantonese: Cantonese Linguistics and NLP in Python\n====================================================\n\n.. image:: https://jacksonllee.com/logos/pycantonese-logo.png\n :width: 250px\n\nFull Documentation: https://pycantonese.org\n\n|\n\n.. image:: https://badge.fury.io/py/pycantonese.svg\n :target: https://pypi.python.org/pypi/pycantonese\n :alt: PyPI version\n\n.. image:: https://img.shields.io/pypi/pyversions/pycantonese.svg\n :target: https://pypi.python.org/pypi/pycantonese\n :alt: Supported Python versions\n\n.. image:: https://circleci.com/gh/jacksonllee/pycantonese.svg?style=shield\n :target: https://circleci.com/gh/jacksonllee/pycantonese\n :alt: CircleCI Builds\n\n|\n\n.. start-sphinx-website-index-page\n\nPyCantonese is a Python library for Cantonese linguistics and natural language\nprocessing (NLP). Currently implemented features (more to come!):\n\n- Accessing and searching corpus data\n- Parsing and conversion tools for Jyutping romanization\n- Parsing Cantonese text\n- Stop words\n- Word segmentation\n- Part-of-speech tagging\n\n.. _download_install:\n\nDownload and Install\n--------------------\n\nTo download and install the stable, most recent version::\n\n $ pip install --upgrade pycantonese\n\nReady for more?\nCheck out the `Quickstart <https://pycantonese.org/quickstart.html>`_ page.\n\nConsulting\n----------\n\nIf your team would like professional assistance in using PyCantonese,\nfreelance consulting and training services are available for both academic and commercial groups.\nPlease email `Jackson L. Lee <https://jacksonllee.com>`_.\n\nSupport\n-------\n\nIf you have found PyCantonese useful and would like to offer support,\n`buying me a coffee <https://www.buymeacoffee.com/pycantonese>`_ would go a long way!\n\nLinks\n-----\n\n* Source code: https://github.com/jacksonllee/pycantonese\n* Bug tracker: https://github.com/jacksonllee/pycantonese/issues\n* Social media:\n `Facebook <https://www.facebook.com/pycantonese>`_\n and `Twitter <https://twitter.com/pycantonese>`_\n\nHow to Cite\n-----------\n\nPyCantonese is authored and maintained by `Jackson L. Lee <https://jacksonllee.com>`_.\n\nA talk introducing PyCantonese:\n\nLee, Jackson L. 2015. PyCantonese: Cantonese linguistic research in the age of big data.\nTalk at the Childhood Bilingualism Research Centre, Chinese University of Hong Kong. September 15. 2015.\n`Notes+slides <https://pycantonese.org/papers/Lee-pycantonese-2015.html>`_\n\nLicense\n-------\n\nMIT License. Please see ``LICENSE.txt`` in the GitHub source code for details.\n\nThe HKCanCor dataset included in PyCantonese is substantially modified from\nits source in terms of format. The original dataset has a CC BY license.\nPlease see ``pycantonese/data/hkcancor/README.md``\nin the GitHub source code for details.\n\nThe rime-cantonese data (release 2021.05.16) is\nincorporated into PyCantonese for word segmentation and\ncharacters-to-Jyutping conversion.\nThis data has a CC BY 4.0 license.\nPlease see ``pycantonese/data/rime_cantonese/README.md``\nin the GitHub source code for details.\n\nLogo\n----\n\nThe PyCantonese logo is the Chinese character \u7cb5 meaning Cantonese,\nwith artistic design by albino.snowman (Instagram handle).\n\nAcknowledgments\n---------------\n\nWonderful resources with a permissive license that have been incorporated into PyCantonese:\n\n- HKCanCor\n- rime-cantonese\n\nIndividuals who have contributed feedback, bug reports, etc.\n(in alphabetical order of last names):\n\n- @cathug\n- Litong Chen\n- Jenny Chim\n- @g-traveller\n- Rachel Han\n- Ryan Lai\n- Charles Lam\n- Chaak Ming Lau\n- Hill Ma\n- @richielo\n- @rylanchiu\n- Stephan Stiller\n- Tsz-Him Tsui\n- Robin Yuen\n\n.. end-sphinx-website-index-page\n\nChangelog\n---------\n\nPlease see ``CHANGELOG.md``.\n\nSetting up a Development Environment\n------------------------------------\n\nThe latest code under development is available on Github at\n`jacksonllee/pycantonese <https://github.com/jacksonllee/pycantonese>`_.\nYou need to have `Git LFS <https://git-lfs.github.com/>`_ installed on your system\n(run `brew install git-lfs` if you have Homebrew installed on MacOS,\nor run `sudo apt-get install git-lfs` if you're on Ubuntu).\nTo obtain this version for experimental features or for development:\n\n.. code-block:: bash\n\n $ git clone https://github.com/jacksonllee/pycantonese.git\n $ cd pycantonese\n $ git lfs pull\n $ pip install -r dev-requirements.txt\n $ pip install -e .\n\nTo run tests and styling checks:\n\n.. code-block:: bash\n\n $ pytest -vv --doctest-modules --cov=pycantonese pycantonese docs/source\n $ flake8 pycantonese\n $ black --check pycantonese\n\nTo build the documentation website files:\n\n.. code-block:: bash\n\n $ python docs/source/build_docs.py\n\n\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Cantonese Linguistics and NLP in Python",
"version": "3.4.0",
"project_urls": {
"Changelog": "https://pycantonese.org/changelog.html",
"Download": "https://pypi.org/project/pycantonese/#files",
"Homepage": "https://pycantonese.org",
"Source": "https://github.com/jacksonllee/pycantonese",
"Tracker": "https://github.com/jacksonllee/pycantonese/issues"
},
"split_keywords": [
"computational linguistics",
"natural language processing",
"nlp",
"cantonese",
"linguistics",
"corpora",
"speech",
"language",
"chinese",
"jyutping"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d1b8bb21891cc1cc0466d15e211896b614c73b494434b837e326008b501851c0",
"md5": "f4a0b519cd6f29010ddc2b56717f173b",
"sha256": "2585ae8070cc6a3a32f1cf0fd395c93f10aa531272e5292c4d082215104d7958"
},
"downloads": -1,
"filename": "pycantonese-3.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f4a0b519cd6f29010ddc2b56717f173b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 3903038,
"upload_time": "2021-12-28T21:30:25",
"upload_time_iso_8601": "2021-12-28T21:30:25.944310Z",
"url": "https://files.pythonhosted.org/packages/d1/b8/bb21891cc1cc0466d15e211896b614c73b494434b837e326008b501851c0/pycantonese-3.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "44284b0cbc503f0be6dd4c55306d85643a785f64cd733778a79d472c779eb491",
"md5": "70aea9b4210540826362d0ba9ae6a753",
"sha256": "8c0768bbfbc9862b9a149525edfd24dc34f380d5d654fae3597da3f0951a0752"
},
"downloads": -1,
"filename": "pycantonese-3.4.0.tar.gz",
"has_sig": false,
"md5_digest": "70aea9b4210540826362d0ba9ae6a753",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 3831061,
"upload_time": "2021-12-28T21:31:47",
"upload_time_iso_8601": "2021-12-28T21:31:47.530384Z",
"url": "https://files.pythonhosted.org/packages/44/28/4b0cbc503f0be6dd4c55306d85643a785f64cd733778a79d472c779eb491/pycantonese-3.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2021-12-28 21:31:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jacksonllee",
"github_project": "pycantonese",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"circle": true,
"lcname": "pycantonese"
}