Name | pybiber JSON |
Version |
0.1.1
JSON |
| download |
home_page | None |
Summary | Extract Biber features from a document parsed and annotated by spaCy. |
upload_time | 2025-02-21 20:59:17 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | None |
keywords |
nlp
language
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
pybiber: Aggregate counts of linguistic features retrieved from spaCy parsing based on Biber's taxonomy
=======================================================================================================
|pypi| |pypi_downloads|
The pybiber package aggregates the lexicogrammatical and functional features described by `Biber (1988) <https://books.google.com/books?id=CVTPaSSYEroC&dq=variation+across+speech+and+writing&lr=&source=gbs_navlinks_s>`_ and widely used for text-type, register, and genre classification tasks.
The package uses `spaCy <https://spacy.io/models>`_ part-of-speech tagging and dependency parsing to summarize and aggregate patterns.
Because feature extraction builds from the outputs of probabilistic taggers, the accuracy of the resulting counts are reliant on the accuracy of those models. Thus, texts with irregular spellings, non-normative punctuation, etc. will likely produce unreliable outputs, unless taggers are tuned specifically for those purposes.
See `the documentation <https://browndw.github.io/pybiber>`_ for description of the package's full functionality.
See `pseudobibeR <https://cran.r-project.org/web/packages/pseudobibeR/index.html>`_ for the R implementation.
Installation
------------
You can install the released version of pybiber from `PyPI <https://pypi.org/project/pybiber/>`_:
.. code-block:: install-pybiber
pip install pybiber
Install a `spaCY model <https://spacy.io/usage/models#download>`_:
.. code-block:: install-model
python -m spacy download en_core_web_sm
Usage
-----
To use the pybiber package, you must first import `spaCy <https://spacy.io/models>`_ and initiate an instance. You will also need to create a corpus. The :code:`biber` function expects a `polars DataFrame <https://docs.pola.rs/api/python/stable/reference/dataframe/index.html>`_ with a :code:`doc_id` column and a :code:`text` column. This follows the convention for `readtext <https://readtext.quanteda.io/articles/readtext_vignette.html>`_ and corpus processing using `quanteda <https://quanteda.io/>`_ in R.
.. code-block:: import
import spacy
import pybiber as pb
from pybiber.data import micusp_mini
The pybiber package requires a model that will carry out part-of-speech tagging and `dependency parsing <https://spacy.io/usage/linguistic-features>`_.
.. code-block:: import
nlp = spacy.load("en_core_web_sm", disable=["ner"])
To process the corpus, use :code:`spacy_parse`. Processing the :code:`micusp_mini` corpus should take between 20-30 seconds.
.. code-block:: import
df_spacy = pb.spacy_parse(micusp_mini, nlp)
After parsing the corpus, features can then be aggregated using :code:`biber`.
.. code-block:: import
df_biber = pb.biber(df_spacy)
License
-------
Code licensed under `Apache License 2.0 <https://www.apache.org/licenses/LICENSE-2.0>`_.
See `LICENSE <https://github.com/browndw/docuscospacy/blob/master/LICENSE>`_ file.
.. |pypi| image:: https://badge.fury.io/py/pybiber.svg
:target: https://badge.fury.io/py/pybiber
:alt: PyPI Version
.. |pypi_downloads| image:: https://img.shields.io/pypi/dm/pybiber
:target: https://pypi.org/project/pybiber/
:alt: Downloads from PyPI
Raw data
{
"_id": null,
"home_page": null,
"name": "pybiber",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "David Brown <dwb2@andrew.cmu.edu>",
"keywords": "nlp, language",
"author": null,
"author_email": "David Brown <dwb2@andrew.cmu.edu>",
"download_url": "https://files.pythonhosted.org/packages/3b/ae/68f61ffc358334e159b0d1df3317dd4855affd490441d5c37f22d9bfe544/pybiber-0.1.1.tar.gz",
"platform": null,
"description": "\npybiber: Aggregate counts of linguistic features retrieved from spaCy parsing based on Biber's taxonomy\n=======================================================================================================\n|pypi| |pypi_downloads|\n\nThe pybiber package aggregates the lexicogrammatical and functional features described by `Biber (1988) <https://books.google.com/books?id=CVTPaSSYEroC&dq=variation+across+speech+and+writing&lr=&source=gbs_navlinks_s>`_ and widely used for text-type, register, and genre classification tasks.\n\nThe package uses `spaCy <https://spacy.io/models>`_ part-of-speech tagging and dependency parsing to summarize and aggregate patterns.\n\nBecause feature extraction builds from the outputs of probabilistic taggers, the accuracy of the resulting counts are reliant on the accuracy of those models. Thus, texts with irregular spellings, non-normative punctuation, etc. will likely produce unreliable outputs, unless taggers are tuned specifically for those purposes.\n\nSee `the documentation <https://browndw.github.io/pybiber>`_ for description of the package's full functionality.\n\nSee `pseudobibeR <https://cran.r-project.org/web/packages/pseudobibeR/index.html>`_ for the R implementation.\n\nInstallation\n------------\n\nYou can install the released version of pybiber from `PyPI <https://pypi.org/project/pybiber/>`_:\n\n.. code-block:: install-pybiber\n\n pip install pybiber\n\nInstall a `spaCY model <https://spacy.io/usage/models#download>`_:\n\n.. code-block:: install-model\n\n python -m spacy download en_core_web_sm\n\nUsage\n-----\n\nTo use the pybiber package, you must first import `spaCy <https://spacy.io/models>`_ and initiate an instance. You will also need to create a corpus. The :code:`biber` function expects a `polars DataFrame <https://docs.pola.rs/api/python/stable/reference/dataframe/index.html>`_ with a :code:`doc_id` column and a :code:`text` column. This follows the convention for `readtext <https://readtext.quanteda.io/articles/readtext_vignette.html>`_ and corpus processing using `quanteda <https://quanteda.io/>`_ in R.\n\n.. code-block:: import\n\n import spacy\n import pybiber as pb\n from pybiber.data import micusp_mini\n\nThe pybiber package requires a model that will carry out part-of-speech tagging and `dependency parsing <https://spacy.io/usage/linguistic-features>`_.\n\n.. code-block:: import\n\n nlp = spacy.load(\"en_core_web_sm\", disable=[\"ner\"])\n\nTo process the corpus, use :code:`spacy_parse`. Processing the :code:`micusp_mini` corpus should take between 20-30 seconds.\n\n.. code-block:: import\n\n df_spacy = pb.spacy_parse(micusp_mini, nlp)\n\nAfter parsing the corpus, features can then be aggregated using :code:`biber`.\n\n.. code-block:: import\n\n df_biber = pb.biber(df_spacy)\n\nLicense\n-------\n\nCode licensed under `Apache License 2.0 <https://www.apache.org/licenses/LICENSE-2.0>`_.\nSee `LICENSE <https://github.com/browndw/docuscospacy/blob/master/LICENSE>`_ file.\n\n.. |pypi| image:: https://badge.fury.io/py/pybiber.svg\n :target: https://badge.fury.io/py/pybiber\n :alt: PyPI Version\n\n.. |pypi_downloads| image:: https://img.shields.io/pypi/dm/pybiber\n :target: https://pypi.org/project/pybiber/\n :alt: Downloads from PyPI\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Extract Biber features from a document parsed and annotated by spaCy.",
"version": "0.1.1",
"project_urls": {
"Documentation": "https://browndw.github.io/pybiber",
"Homepage": "https://github.com/browndw/pybiber"
},
"split_keywords": [
"nlp",
" language"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "dd8bc05f5570b54705d80587a6235990c059ddd35d50626129e78a60f085d391",
"md5": "1272ee81aeb97d3d03a6c22c38bc307f",
"sha256": "3c9fc4def57b3e292ce7e28863ad44925570545ca18388f50e73c9632d2e111f"
},
"downloads": -1,
"filename": "pybiber-0.1.1-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "1272ee81aeb97d3d03a6c22c38bc307f",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.9",
"size": 1088668,
"upload_time": "2025-02-21T20:59:14",
"upload_time_iso_8601": "2025-02-21T20:59:14.737867Z",
"url": "https://files.pythonhosted.org/packages/dd/8b/c05f5570b54705d80587a6235990c059ddd35d50626129e78a60f085d391/pybiber-0.1.1-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "3bae68f61ffc358334e159b0d1df3317dd4855affd490441d5c37f22d9bfe544",
"md5": "fe6e28c84f2d641e9cd21621f7da4c07",
"sha256": "a03b2e6602323345c7665f2fd901c9b69d16486d859f5a595f5c724d50e815a1"
},
"downloads": -1,
"filename": "pybiber-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "fe6e28c84f2d641e9cd21621f7da4c07",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 1885271,
"upload_time": "2025-02-21T20:59:17",
"upload_time_iso_8601": "2025-02-21T20:59:17.939495Z",
"url": "https://files.pythonhosted.org/packages/3b/ae/68f61ffc358334e159b0d1df3317dd4855affd490441d5c37f22d9bfe544/pybiber-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-21 20:59:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "browndw",
"github_project": "pybiber",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pybiber"
}