pybiber

Name	pybiber JSON
Version	0.1.1 JSON
	download
home_page	None
Summary	Extract Biber features from a document parsed and annotated by spaCy.
upload_time	2025-02-21 20:59:17
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	None
keywords	nlp language
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            
pybiber: Aggregate counts of linguistic features retrieved from spaCy parsing based on Biber's taxonomy
=======================================================================================================
|pypi| |pypi_downloads|

The pybiber package aggregates the lexicogrammatical and functional features described by `Biber (1988) <https://books.google.com/books?id=CVTPaSSYEroC&dq=variation+across+speech+and+writing&lr=&source=gbs_navlinks_s>`_ and widely used for text-type, register, and genre classification tasks.

The package uses `spaCy <https://spacy.io/models>`_ part-of-speech tagging and dependency parsing to summarize and aggregate patterns.

Because feature extraction builds from the outputs of probabilistic taggers, the accuracy of the resulting counts are reliant on the accuracy of those models. Thus, texts with irregular spellings, non-normative punctuation, etc. will likely produce unreliable outputs, unless taggers are tuned specifically for those purposes.

See `the documentation <https://browndw.github.io/pybiber>`_ for description of the package's full functionality.

See `pseudobibeR <https://cran.r-project.org/web/packages/pseudobibeR/index.html>`_ for the R implementation.

Installation
------------

You can install the released version of pybiber from `PyPI <https://pypi.org/project/pybiber/>`_:

.. code-block:: install-pybiber

    pip install pybiber

Install a `spaCY model <https://spacy.io/usage/models#download>`_:

.. code-block:: install-model

    python -m spacy download en_core_web_sm

Usage
-----

To use the pybiber package, you must first import `spaCy <https://spacy.io/models>`_ and initiate an instance. You will also need to create a corpus. The :code:`biber` function expects a `polars DataFrame <https://docs.pola.rs/api/python/stable/reference/dataframe/index.html>`_ with a :code:`doc_id` column and a :code:`text` column. This follows the convention for `readtext <https://readtext.quanteda.io/articles/readtext_vignette.html>`_ and corpus processing using `quanteda <https://quanteda.io/>`_ in R.

.. code-block:: import

    import spacy
    import pybiber as pb
    from pybiber.data import micusp_mini

The pybiber package requires a model that will carry out part-of-speech tagging and `dependency parsing <https://spacy.io/usage/linguistic-features>`_.

.. code-block:: import

    nlp = spacy.load("en_core_web_sm", disable=["ner"])

To process the corpus, use :code:`spacy_parse`. Processing the :code:`micusp_mini` corpus should take between 20-30 seconds.

.. code-block:: import

    df_spacy = pb.spacy_parse(micusp_mini, nlp)

After parsing the corpus, features can then be aggregated using :code:`biber`.

.. code-block:: import

    df_biber = pb.biber(df_spacy)

License
-------

Code licensed under `Apache License 2.0 <https://www.apache.org/licenses/LICENSE-2.0>`_.
See `LICENSE <https://github.com/browndw/docuscospacy/blob/master/LICENSE>`_ file.

.. |pypi| image:: https://badge.fury.io/py/pybiber.svg
    :target: https://badge.fury.io/py/pybiber
    :alt: PyPI Version

.. |pypi_downloads| image:: https://img.shields.io/pypi/dm/pybiber
    :target: https://pypi.org/project/pybiber/
    :alt: Downloads from PyPI

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pybiber",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "David Brown <dwb2@andrew.cmu.edu>",
    "keywords": "nlp, language",
    "author": null,
    "author_email": "David Brown <dwb2@andrew.cmu.edu>",
    "download_url": "https://files.pythonhosted.org/packages/3b/ae/68f61ffc358334e159b0d1df3317dd4855affd490441d5c37f22d9bfe544/pybiber-0.1.1.tar.gz",
    "platform": null,
    "description": "\npybiber: Aggregate counts of linguistic features retrieved from spaCy parsing based on Biber's taxonomy\n=======================================================================================================\n|pypi| |pypi_downloads|\n\nThe pybiber package aggregates the lexicogrammatical and functional features described by `Biber (1988) <https://books.google.com/books?id=CVTPaSSYEroC&dq=variation+across+speech+and+writing&lr=&source=gbs_navlinks_s>`_ and widely used for text-type, register, and genre classification tasks.\n\nThe package uses `spaCy <https://spacy.io/models>`_ part-of-speech tagging and dependency parsing to summarize and aggregate patterns.\n\nBecause feature extraction builds from the outputs of probabilistic taggers, the accuracy of the resulting counts are reliant on the accuracy of those models. Thus, texts with irregular spellings, non-normative punctuation, etc. will likely produce unreliable outputs, unless taggers are tuned specifically for those purposes.\n\nSee `the documentation <https://browndw.github.io/pybiber>`_ for description of the package's full functionality.\n\nSee `pseudobibeR <https://cran.r-project.org/web/packages/pseudobibeR/index.html>`_ for the R implementation.\n\nInstallation\n------------\n\nYou can install the released version of pybiber from `PyPI <https://pypi.org/project/pybiber/>`_:\n\n.. code-block:: install-pybiber\n\n    pip install pybiber\n\nInstall a `spaCY model <https://spacy.io/usage/models#download>`_:\n\n.. code-block:: install-model\n\n    python -m spacy download en_core_web_sm\n\nUsage\n-----\n\nTo use the pybiber package, you must first import `spaCy <https://spacy.io/models>`_ and initiate an instance. You will also need to create a corpus. The :code:`biber` function expects a `polars DataFrame <https://docs.pola.rs/api/python/stable/reference/dataframe/index.html>`_ with a :code:`doc_id` column and a :code:`text` column. This follows the convention for `readtext <https://readtext.quanteda.io/articles/readtext_vignette.html>`_ and corpus processing using `quanteda <https://quanteda.io/>`_ in R.\n\n.. code-block:: import\n\n    import spacy\n    import pybiber as pb\n    from pybiber.data import micusp_mini\n\nThe pybiber package requires a model that will carry out part-of-speech tagging and `dependency parsing <https://spacy.io/usage/linguistic-features>`_.\n\n.. code-block:: import\n\n    nlp = spacy.load(\"en_core_web_sm\", disable=[\"ner\"])\n\nTo process the corpus, use :code:`spacy_parse`. Processing the :code:`micusp_mini` corpus should take between 20-30 seconds.\n\n.. code-block:: import\n\n    df_spacy = pb.spacy_parse(micusp_mini, nlp)\n\nAfter parsing the corpus, features can then be aggregated using :code:`biber`.\n\n.. code-block:: import\n\n    df_biber = pb.biber(df_spacy)\n\nLicense\n-------\n\nCode licensed under `Apache License 2.0 <https://www.apache.org/licenses/LICENSE-2.0>`_.\nSee `LICENSE <https://github.com/browndw/docuscospacy/blob/master/LICENSE>`_ file.\n\n.. |pypi| image:: https://badge.fury.io/py/pybiber.svg\n    :target: https://badge.fury.io/py/pybiber\n    :alt: PyPI Version\n\n.. |pypi_downloads| image:: https://img.shields.io/pypi/dm/pybiber\n    :target: https://pypi.org/project/pybiber/\n    :alt: Downloads from PyPI\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Extract Biber features from a document parsed and annotated by spaCy.",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://browndw.github.io/pybiber",
        "Homepage": "https://github.com/browndw/pybiber"
    },
    "split_keywords": [
        "nlp",
        " language"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "dd8bc05f5570b54705d80587a6235990c059ddd35d50626129e78a60f085d391",
                "md5": "1272ee81aeb97d3d03a6c22c38bc307f",
                "sha256": "3c9fc4def57b3e292ce7e28863ad44925570545ca18388f50e73c9632d2e111f"
            },
            "downloads": -1,
            "filename": "pybiber-0.1.1-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1272ee81aeb97d3d03a6c22c38bc307f",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.9",
            "size": 1088668,
            "upload_time": "2025-02-21T20:59:14",
            "upload_time_iso_8601": "2025-02-21T20:59:14.737867Z",
            "url": "https://files.pythonhosted.org/packages/dd/8b/c05f5570b54705d80587a6235990c059ddd35d50626129e78a60f085d391/pybiber-0.1.1-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3bae68f61ffc358334e159b0d1df3317dd4855affd490441d5c37f22d9bfe544",
                "md5": "fe6e28c84f2d641e9cd21621f7da4c07",
                "sha256": "a03b2e6602323345c7665f2fd901c9b69d16486d859f5a595f5c724d50e815a1"
            },
            "downloads": -1,
            "filename": "pybiber-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "fe6e28c84f2d641e9cd21621f7da4c07",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 1885271,
            "upload_time": "2025-02-21T20:59:17",
            "upload_time_iso_8601": "2025-02-21T20:59:17.939495Z",
            "url": "https://files.pythonhosted.org/packages/3b/ae/68f61ffc358334e159b0d1df3317dd4855affd490441d5c37f22d9bfe544/pybiber-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-21 20:59:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "browndw",
    "github_project": "pybiber",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pybiber"
}

None