Name | spacy-utils JSON |
Version |
0.0.1
JSON |
| download |
home_page | |
Summary | Some basic spaCy utility functions. |
upload_time | 2023-03-28 14:38:31 |
maintainer | |
docs_url | None |
author | WJB Mattingly |
requires_python | |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# About
This repository houses a series of utility functions for spaCy. I designed these during the course of my work at the Smithsonian Institution and United States Holocaust Memorial Museum.
# Functions
This will serve as the basic docs for the functions found in this repository.
## `count_spans`
spaCy comes built with a `count_by()` method for the Doc container. It can count anything that sits in a the nlp object's vocab. This works for NER because the position of ENT_TYPE is 78 in the nlp.vocab. Span labels, however, are large numbers that cannot be processed via this approach because of how spaCy uses NumPy. This is a naive approach to resolve this issue.
This function takes a Spacy Doc object and a string span_key as input. The span_key argument is used to specify the type of span to count such as "named_entities" or "noun_chunks".
The function then initializes two empty lists, spans and labels. It loops through all the spans of the specified type in the doc object, and for each span, it appends a tuple of (span.text, span.label_) to the spans list, where span.text is the actual text of the span and span.label_ is the label of the span. The function also appends the span.label_ to the labels list.
The function then creates two Counter objects: span_counts and label_counts. The Counter object span_counts counts the frequency of each unique (span.text, span.label_) tuple in the spans list, while the Counter object label_counts counts the frequency of each unique span.label_ in the labels list.
Finally, the function returns a tuple containing the two Counter objects span_counts and label_counts. The span_counts object maps each unique (span.text, span.label_) tuple to its frequency in the document, while the label_counts object maps each unique span.label_ to its frequency in the document.
Raw data
{
"_id": null,
"home_page": "",
"name": "spacy-utils",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "WJB Mattingly",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/c7/20/54fa1cf650b6de850ba002a2035182f14d2cc8b870645417540f43301ea2/spacy-utils-0.0.1.tar.gz",
"platform": null,
"description": "# About\n\nThis repository houses a series of utility functions for spaCy. I designed these during the course of my work at the Smithsonian Institution and United States Holocaust Memorial Museum.\n\n\n# Functions\n\nThis will serve as the basic docs for the functions found in this repository.\n\n## `count_spans`\n\nspaCy comes built with a `count_by()` method for the Doc container. It can count anything that sits in a the nlp object's vocab. This works for NER because the position of ENT_TYPE is 78 in the nlp.vocab. Span labels, however, are large numbers that cannot be processed via this approach because of how spaCy uses NumPy. This is a naive approach to resolve this issue.\n\nThis function takes a Spacy Doc object and a string span_key as input. The span_key argument is used to specify the type of span to count such as \"named_entities\" or \"noun_chunks\".\n\nThe function then initializes two empty lists, spans and labels. It loops through all the spans of the specified type in the doc object, and for each span, it appends a tuple of (span.text, span.label_) to the spans list, where span.text is the actual text of the span and span.label_ is the label of the span. The function also appends the span.label_ to the labels list.\n\nThe function then creates two Counter objects: span_counts and label_counts. The Counter object span_counts counts the frequency of each unique (span.text, span.label_) tuple in the spans list, while the Counter object label_counts counts the frequency of each unique span.label_ in the labels list.\n\nFinally, the function returns a tuple containing the two Counter objects span_counts and label_counts. The span_counts object maps each unique (span.text, span.label_) tuple to its frequency in the document, while the label_counts object maps each unique span.label_ to its frequency in the document.\n",
"bugtrack_url": null,
"license": "",
"summary": "Some basic spaCy utility functions.",
"version": "0.0.1",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bb9520622623b9186d49dffd72f15eedf88f92b9f255bcd833407fb25af05be3",
"md5": "dd0655bc71e8fab7ad329f6765efd17f",
"sha256": "f7646de8e61dd31a2b683e30f0afcd7076f36d20ab30b4c21f5b4b4fa40ce2c4"
},
"downloads": -1,
"filename": "spacy_utils-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dd0655bc71e8fab7ad329f6765efd17f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 1817,
"upload_time": "2023-03-28T14:38:29",
"upload_time_iso_8601": "2023-03-28T14:38:29.414072Z",
"url": "https://files.pythonhosted.org/packages/bb/95/20622623b9186d49dffd72f15eedf88f92b9f255bcd833407fb25af05be3/spacy_utils-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c72054fa1cf650b6de850ba002a2035182f14d2cc8b870645417540f43301ea2",
"md5": "bc754baed3e5fbcdab5b2528256e60c2",
"sha256": "daa4778aab3a04d474f5ab1ca4f801c409178ffeb1e0091b1f2d9c41e405d1d3"
},
"downloads": -1,
"filename": "spacy-utils-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "bc754baed3e5fbcdab5b2528256e60c2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 1853,
"upload_time": "2023-03-28T14:38:31",
"upload_time_iso_8601": "2023-03-28T14:38:31.776729Z",
"url": "https://files.pythonhosted.org/packages/c7/20/54fa1cf650b6de850ba002a2035182f14d2cc8b870645417540f43301ea2/spacy-utils-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-28 14:38:31",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "spacy-utils"
}