corpy

Name	corpy JSON
Version	0.6.1 JSON
	download
home_page
Summary	Tools for processing language data.
upload_time	2023-04-05 13:44:59
maintainer
docs_url	None
author
requires_python	>=3.10
license	GPL-3.0-or-later
keywords	corpus linguistics nlp
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            =====
CorPy
=====

.. image:: https://readthedocs.org/projects/corpy/badge/?version=stable
   :target: https://corpy.readthedocs.io/en/stable/?badge=stable
   :alt: Documentation status

.. image:: https://badge.fury.io/py/corpy.svg
   :target: https://badge.fury.io/py/corpy
   :alt: PyPI package

.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
   :target: https://github.com/python/black
   :alt: Code style

Installation
============

.. code:: bash

   $ python3 -m pip install corpy

Only recent versions of Python 3 (3.10+) are supported by design.

Help and feedback
=================

If you get stuck, it's always a good idea to start by searching the
documentation, the short URL to which is https://corpy.rtfd.io/.

The project is developed on GitHub_. You can ask for help via `GitHub
discussions`_ and report bugs and give other kinds of feedback via `GitHub
issues`_. Support is provided gladly, time and other engagements permitting, but
cannot be guaranteed.

.. _GitHub: https://github.com/dlukes/corpy
.. _GitHub discussions: https://github.com/dlukes/corpy/discussions
.. _GitHub issues: https://github.com/dlukes/corpy/issues

What is CorPy?
==============

A fancy plural for *corpus* ;) Also, a collection of handy but not especially
mutually integrated tools for dealing with linguistic data. It abstracts away
functionality which is often needed in practice for teaching and/or day to day
work at the `Czech National Corpus <https://korpus.cz>`__, without aspiring to
be a fully featured or consistent NLP framework.

Here's an idea of what you can do with CorPy:

- add linguistic annotation to raw textual data using either `UDPipe
  <https://corpy.rtfd.io/en/stable/guides/udpipe.html>`__ or `MorphoDiTa
  <https://corpy.rtfd.io/en/stable/guides/morphodita.html>`__
- `easily generate word clouds
  <https://corpy.rtfd.io/en/stable/guides/vis.html>`__
- run code in `a sanitized global environment
  <https://corpy.rtfd.io/en/stable/guides/no_globals.html>`__ (useful for
  debugging in interactive sessions, e.g. with Jupyter notebooks in `JupyterLab
  <https://jupyterlab.rtfd.io>`__)
- `generate phonetic transcripts of Czech texts
  <https://corpy.rtfd.io/en/stable/guides/phonetics_cs.html>`__
- `wrangle corpora in the vertical format
  <https://corpy.rtfd.io/en/stable/guides/vertical.html>`__ devised originally
  for `CWB <http://cwb.sourceforge.net/>`__, used also by `(No)SketchEngine
  <https://nlp.fi.muni.cz/trac/noske/>`__
- plus some `command line utilities
  <https://corpy.rtfd.io/en/stable/guides/cli.html>`__

.. note::

   **Should I pick UDPipe or MorphoDiTa?**

   Both are developed at `ÚFAL MFF UK`_. UDPipe_ has more features at the cost
   of being somewhat more complex: it does both `morphological tagging
   (including lemmatization) and syntactic parsing
   <https://corpy.rtfd.io/en/stable/guides/udpipe.html>`__, and it handles a
   number of different input and output formats. You can also download
   `pre-trained models <http://ufal.mff.cuni.cz/udpipe/models>`__ for many
   different languages.

   By contrast, MorphoDiTa_ only has `pre-trained models for Czech and English
   <http://ufal.mff.cuni.cz/morphodita/users-manual>`__, and only performs
   `morphological tagging (including lemmatization)
   <https://corpy.rtfd.io/en/stable/guides/morphodita.html>`__. However, its
   output is more straightforward -- it just splits your text into tokens and
   annotates them, whereas UDPipe can (depending on the model) introduce
   additional tokens necessary for a more explicit analysis, add multi-word
   tokens etc. This is because UDPipe is tailored to the type of linguistic
   analysis conducted within the UniversalDependencies_ project, using the
   CoNLL-U_ data format.

   MorphoDiTa can also help you if you just want to tokenize text and don't have
   a language model available.

.. _`ÚFAL MFF UK`: https://ufal.mff.cuni.cz/
.. _UDPipe: https://ufal.mff.cuni.cz/udpipe
.. _MorphoDiTa: https://ufal.mff.cuni.cz/morphodita
.. _UniversalDependencies: https://universaldependencies.org
.. _CoNLL-U: https://universaldependencies.org/format.html

.. development-marker

Development
===========

Dependencies and building the docs
----------------------------------

``corpy`` needs to be installed in the ReadTheDocs virtualenv for ``autodoc`` to
work. The optional dependencies in the ``doc`` group are also needed. This is
all configured in ``.readthedocs.yml``.

.. license-marker

License
=======

Copyright © 2016--present `ÚČNK <http://korpus.cz>`__/David Lukeš

Distributed under the `GNU General Public License v3
<http://www.gnu.org/licenses/gpl-3.0.en.html>`__.

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "corpy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "corpus,linguistics,NLP",
    "author": "",
    "author_email": "David Lukes <dafydd.lukes@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/ce/e7/83ab9a9cafce95fae887a0ec349ac92f14d99a36a35af032d49d9cfb82df/corpy-0.6.1.tar.gz",
    "platform": null,
    "description": "=====\nCorPy\n=====\n\n.. image:: https://readthedocs.org/projects/corpy/badge/?version=stable\n   :target: https://corpy.readthedocs.io/en/stable/?badge=stable\n   :alt: Documentation status\n\n.. image:: https://badge.fury.io/py/corpy.svg\n   :target: https://badge.fury.io/py/corpy\n   :alt: PyPI package\n\n.. image:: https://img.shields.io/badge/code%20style-black-000000.svg\n   :target: https://github.com/python/black\n   :alt: Code style\n\nInstallation\n============\n\n.. code:: bash\n\n   $ python3 -m pip install corpy\n\nOnly recent versions of Python 3 (3.10+) are supported by design.\n\nHelp and feedback\n=================\n\nIf you get stuck, it's always a good idea to start by searching the\ndocumentation, the short URL to which is https://corpy.rtfd.io/.\n\nThe project is developed on GitHub_. You can ask for help via `GitHub\ndiscussions`_ and report bugs and give other kinds of feedback via `GitHub\nissues`_. Support is provided gladly, time and other engagements permitting, but\ncannot be guaranteed.\n\n.. _GitHub: https://github.com/dlukes/corpy\n.. _GitHub discussions: https://github.com/dlukes/corpy/discussions\n.. _GitHub issues: https://github.com/dlukes/corpy/issues\n\nWhat is CorPy?\n==============\n\nA fancy plural for *corpus* ;) Also, a collection of handy but not especially\nmutually integrated tools for dealing with linguistic data. It abstracts away\nfunctionality which is often needed in practice for teaching and/or day to day\nwork at the `Czech National Corpus <https://korpus.cz>`__, without aspiring to\nbe a fully featured or consistent NLP framework.\n\nHere's an idea of what you can do with CorPy:\n\n- add linguistic annotation to raw textual data using either `UDPipe\n  <https://corpy.rtfd.io/en/stable/guides/udpipe.html>`__ or `MorphoDiTa\n  <https://corpy.rtfd.io/en/stable/guides/morphodita.html>`__\n- `easily generate word clouds\n  <https://corpy.rtfd.io/en/stable/guides/vis.html>`__\n- run code in `a sanitized global environment\n  <https://corpy.rtfd.io/en/stable/guides/no_globals.html>`__ (useful for\n  debugging in interactive sessions, e.g. with Jupyter notebooks in `JupyterLab\n  <https://jupyterlab.rtfd.io>`__)\n- `generate phonetic transcripts of Czech texts\n  <https://corpy.rtfd.io/en/stable/guides/phonetics_cs.html>`__\n- `wrangle corpora in the vertical format\n  <https://corpy.rtfd.io/en/stable/guides/vertical.html>`__ devised originally\n  for `CWB <http://cwb.sourceforge.net/>`__, used also by `(No)SketchEngine\n  <https://nlp.fi.muni.cz/trac/noske/>`__\n- plus some `command line utilities\n  <https://corpy.rtfd.io/en/stable/guides/cli.html>`__\n\n.. note::\n\n   **Should I pick UDPipe or MorphoDiTa?**\n\n   Both are developed at `\u00daFAL MFF UK`_. UDPipe_ has more features at the cost\n   of being somewhat more complex: it does both `morphological tagging\n   (including lemmatization) and syntactic parsing\n   <https://corpy.rtfd.io/en/stable/guides/udpipe.html>`__, and it handles a\n   number of different input and output formats. You can also download\n   `pre-trained models <http://ufal.mff.cuni.cz/udpipe/models>`__ for many\n   different languages.\n\n   By contrast, MorphoDiTa_ only has `pre-trained models for Czech and English\n   <http://ufal.mff.cuni.cz/morphodita/users-manual>`__, and only performs\n   `morphological tagging (including lemmatization)\n   <https://corpy.rtfd.io/en/stable/guides/morphodita.html>`__. However, its\n   output is more straightforward -- it just splits your text into tokens and\n   annotates them, whereas UDPipe can (depending on the model) introduce\n   additional tokens necessary for a more explicit analysis, add multi-word\n   tokens etc. This is because UDPipe is tailored to the type of linguistic\n   analysis conducted within the UniversalDependencies_ project, using the\n   CoNLL-U_ data format.\n\n   MorphoDiTa can also help you if you just want to tokenize text and don't have\n   a language model available.\n\n.. _`\u00daFAL MFF UK`: https://ufal.mff.cuni.cz/\n.. _UDPipe: https://ufal.mff.cuni.cz/udpipe\n.. _MorphoDiTa: https://ufal.mff.cuni.cz/morphodita\n.. _UniversalDependencies: https://universaldependencies.org\n.. _CoNLL-U: https://universaldependencies.org/format.html\n\n.. development-marker\n\nDevelopment\n===========\n\nDependencies and building the docs\n----------------------------------\n\n``corpy`` needs to be installed in the ReadTheDocs virtualenv for ``autodoc`` to\nwork. The optional dependencies in the ``doc`` group are also needed. This is\nall configured in ``.readthedocs.yml``.\n\n.. license-marker\n\nLicense\n=======\n\nCopyright \u00a9 2016--present `\u00da\u010cNK <http://korpus.cz>`__/David Luke\u0161\n\nDistributed under the `GNU General Public License v3\n<http://www.gnu.org/licenses/gpl-3.0.en.html>`__.\n",
    "bugtrack_url": null,
    "license": "GPL-3.0-or-later",
    "summary": "Tools for processing language data.",
    "version": "0.6.1",
    "split_keywords": [
        "corpus",
        "linguistics",
        "nlp"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "94601c0fd32200d014484b98463de3e6b7f87d65fae257cde935022508a4eb64",
                "md5": "d50de2d6f2dfcb971ef25651f85f6558",
                "sha256": "3b7ee9366ac0920664b18a43704354af12281fb8f45de2263027b8c9535b041f"
            },
            "downloads": -1,
            "filename": "corpy-0.6.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d50de2d6f2dfcb971ef25651f85f6558",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 38727,
            "upload_time": "2023-04-05T13:44:56",
            "upload_time_iso_8601": "2023-04-05T13:44:56.440239Z",
            "url": "https://files.pythonhosted.org/packages/94/60/1c0fd32200d014484b98463de3e6b7f87d65fae257cde935022508a4eb64/corpy-0.6.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cee783ab9a9cafce95fae887a0ec349ac92f14d99a36a35af032d49d9cfb82df",
                "md5": "8d0f8c99a0abdcada5dd58409594ebe4",
                "sha256": "a90efa7e85eb43ac947dc84c98b12832905233ec978dee9e79efa07c46ab4f6f"
            },
            "downloads": -1,
            "filename": "corpy-0.6.1.tar.gz",
            "has_sig": false,
            "md5_digest": "8d0f8c99a0abdcada5dd58409594ebe4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 383675,
            "upload_time": "2023-04-05T13:44:59",
            "upload_time_iso_8601": "2023-04-05T13:44:59.426201Z",
            "url": "https://files.pythonhosted.org/packages/ce/e7/83ab9a9cafce95fae887a0ec349ac92f14d99a36a35af032d49d9cfb82df/corpy-0.6.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-05 13:44:59",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "corpy"
}