spacy-utils


Namespacy-utils JSON
Version 0.0.1 PyPI version JSON
download
home_page
SummarySome basic spaCy utility functions.
upload_time2023-03-28 14:38:31
maintainer
docs_urlNone
authorWJB Mattingly
requires_python
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # About

This repository houses a series of utility functions for spaCy. I designed these during the course of my work at the Smithsonian Institution and United States Holocaust Memorial Museum.


# Functions

This will serve as the basic docs for the functions found in this repository.

## `count_spans`

spaCy comes built with a `count_by()` method for the Doc container. It can count anything that sits in a the nlp object's vocab. This works for NER because the position of ENT_TYPE is 78 in the nlp.vocab. Span labels, however, are large numbers that cannot be processed via this approach because of how spaCy uses NumPy. This is a naive approach to resolve this issue.

This function takes a Spacy Doc object and a string span_key as input. The span_key argument is used to specify the type of span to count such as "named_entities" or "noun_chunks".

The function then initializes two empty lists, spans and labels. It loops through all the spans of the specified type in the doc object, and for each span, it appends a tuple of (span.text, span.label_) to the spans list, where span.text is the actual text of the span and span.label_ is the label of the span. The function also appends the span.label_ to the labels list.

The function then creates two Counter objects: span_counts and label_counts. The Counter object span_counts counts the frequency of each unique (span.text, span.label_) tuple in the spans list, while the Counter object label_counts counts the frequency of each unique span.label_ in the labels list.

Finally, the function returns a tuple containing the two Counter objects span_counts and label_counts. The span_counts object maps each unique (span.text, span.label_) tuple to its frequency in the document, while the label_counts object maps each unique span.label_ to its frequency in the document.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "spacy-utils",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "WJB Mattingly",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/c7/20/54fa1cf650b6de850ba002a2035182f14d2cc8b870645417540f43301ea2/spacy-utils-0.0.1.tar.gz",
    "platform": null,
    "description": "# About\n\nThis repository houses a series of utility functions for spaCy. I designed these during the course of my work at the Smithsonian Institution and United States Holocaust Memorial Museum.\n\n\n# Functions\n\nThis will serve as the basic docs for the functions found in this repository.\n\n## `count_spans`\n\nspaCy comes built with a `count_by()` method for the Doc container. It can count anything that sits in a the nlp object's vocab. This works for NER because the position of ENT_TYPE is 78 in the nlp.vocab. Span labels, however, are large numbers that cannot be processed via this approach because of how spaCy uses NumPy. This is a naive approach to resolve this issue.\n\nThis function takes a Spacy Doc object and a string span_key as input. The span_key argument is used to specify the type of span to count such as \"named_entities\" or \"noun_chunks\".\n\nThe function then initializes two empty lists, spans and labels. It loops through all the spans of the specified type in the doc object, and for each span, it appends a tuple of (span.text, span.label_) to the spans list, where span.text is the actual text of the span and span.label_ is the label of the span. The function also appends the span.label_ to the labels list.\n\nThe function then creates two Counter objects: span_counts and label_counts. The Counter object span_counts counts the frequency of each unique (span.text, span.label_) tuple in the spans list, while the Counter object label_counts counts the frequency of each unique span.label_ in the labels list.\n\nFinally, the function returns a tuple containing the two Counter objects span_counts and label_counts. The span_counts object maps each unique (span.text, span.label_) tuple to its frequency in the document, while the label_counts object maps each unique span.label_ to its frequency in the document.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Some basic spaCy utility functions.",
    "version": "0.0.1",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bb9520622623b9186d49dffd72f15eedf88f92b9f255bcd833407fb25af05be3",
                "md5": "dd0655bc71e8fab7ad329f6765efd17f",
                "sha256": "f7646de8e61dd31a2b683e30f0afcd7076f36d20ab30b4c21f5b4b4fa40ce2c4"
            },
            "downloads": -1,
            "filename": "spacy_utils-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dd0655bc71e8fab7ad329f6765efd17f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 1817,
            "upload_time": "2023-03-28T14:38:29",
            "upload_time_iso_8601": "2023-03-28T14:38:29.414072Z",
            "url": "https://files.pythonhosted.org/packages/bb/95/20622623b9186d49dffd72f15eedf88f92b9f255bcd833407fb25af05be3/spacy_utils-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c72054fa1cf650b6de850ba002a2035182f14d2cc8b870645417540f43301ea2",
                "md5": "bc754baed3e5fbcdab5b2528256e60c2",
                "sha256": "daa4778aab3a04d474f5ab1ca4f801c409178ffeb1e0091b1f2d9c41e405d1d3"
            },
            "downloads": -1,
            "filename": "spacy-utils-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "bc754baed3e5fbcdab5b2528256e60c2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 1853,
            "upload_time": "2023-03-28T14:38:31",
            "upload_time_iso_8601": "2023-03-28T14:38:31.776729Z",
            "url": "https://files.pythonhosted.org/packages/c7/20/54fa1cf650b6de850ba002a2035182f14d2cc8b870645417540f43301ea2/spacy-utils-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-28 14:38:31",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "spacy-utils"
}
        
Elapsed time: 0.08403s