elucidoc

Name	elucidoc JSON
Version	2024.7.24 JSON
	download
home_page	None
Summary	Screens legal and other texts for sentences and clauses containing user defined search phrases
upload_time	2024-07-24 10:42:45
maintainer	None
docs_url	None
author	John Blake
requires_python	None
license	Copyright 2023 John R. Blake, Jr. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. See also README.md for external dependencies' licenses.
keywords	legal text analysis
VCS
bugtrack_url
requirements	annotated-types backports.tarfile blis build cachetools catalogue certifi cffi charset-normalizer click cloudpathlib colorama confection cryptography cymem cytoolz docutils docx2python en-core-web-lg et-xmlfile filelock floret fsspec huggingface-hub idna importlib_metadata jaraco.classes jaraco.context jaraco.functools jellyfish Jinja2 joblib keyring langcodes lxml markdown-it-py MarkupSafe mdurl more-itertools mpmath murmurhash networkx nh3 numpy openpyxl packaging pandas paragraphs pathlib_abc pathy pdfminer.six pkginfo preshed pyarrow pycparser pydantic pydantic_core Pygments PyInputPlus pyparsing pyphen pyproject_hooks PySimpleValidate python-dateutil python-docx python-utils pytz pywin32-ctypes PyYAML readme_renderer regex requests requests-toolbelt rfc3986 rich safetensors scikit-learn scipy shellingham six smart-open spacy spacy-alignments spacy-legacy spacy-loggers spacy-transformers srsly stdiomask sympy textacy thinc threadpoolctl tokenizers toolz torch tqdm transformers twine typer typing_extensions tzdata urllib3 wasabi weasel wrapt zipp
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![eluciDoc_header](https://github.com/jblake1965/eluciDoc/assets/100727736/e7f94b7f-fb1b-4f55-8665-4dc11c6b93af)

[![CodeQL](https://github.com/jblake1965/eluciDoc/actions/workflows/github-code-scanning/codeql/badge.svg)](https://github.com/jblake1965/eluciDoc/actions/workflows/github-code-scanning/codeql) [![GitHub Discussions](https://img.shields.io/github/discussions/jblake1965/eluciDoc?labelColor=blue&color=orange)](https://github.com/jblake1965/eluciDoc/discussions/3) [![PYPI Version](https://img.shields.io/pypi/v/elucidoc?logoColor=blue&labelColor=green)](https://img.shields.io/pypi/v/elucidoc?logoColor=blue&labelColor=green)

# What this is:
This CLI Python project, written for the Windows™ environment, filters sentences and clauses containing specific user input
terms from a single document. This project was originally created as a tool to aid in the review of legal contracts, 
but can be used with any text. Documents can be in docx, .pdf or .txt file formats.
The general principle behind its function is subject-predicate sentence analysis. Searches are based on user-selected parties
in the document, followed by a user-selected phrase.  It is used in conjunction with Microsoft™ Office 365™
Word and Excel™ apps.
# How it works:
A .docx, .pdf or .txt file and path is entered (drag and drop work in the Windows terminal):

![file_input](https://github.com/jblake1965/eluciDoc/assets/100727736/c08d59a4-a019-4a42-b895-427a1815b474)

The file is then processed as utf8 text, with MS Word Smart Quotes being converted to straight quotes and non-ASCII and
non-breaking spaces removed. The term for the party being searched in the document is entered next:

![enter_party_name](https://github.com/jblake1965/eluciDoc/assets/100727736/bd1e9603-137c-4475-aa1d-d09ed157738a)

and then passed with the processed text to textacy's Keyword in Context (KWIC) function.  The result is saved as an Excel
file with the same name in the same location as the searched document, with "..._[name of the party]_search_result.xlsx" 
appended. The Excel file automatically opens with a subprocess call, and the results can be converted to a table for
further sorting:

![textacy_rendering](https://github.com/jblake1965/eluciDoc/assets/100727736/a9bfd1a8-8477-4401-8e96-bd83801d5488)

Note: the subprocess call below uses the default Office install location:

```python
subprocess.Popen([r'C:\Program Files\Microsoft Office\root\Office16\EXCEL.EXE', result_file])
```

If the user has Office installed in a different location, then the code must be changed to reflect that directory.

The document is chunked into sentences (or clauses, depending on the formatting) with the spaCy module.
The user is prompted to enter predicate search phrases culled from the Excel search file which phrases are stored in a list.  
Once finished entering the predicate search phrases, the script iterates through the list of search phrases looking for
a match in each sentence. Sentences and clauses containing a match are added to a result list. The user has the option of 
having the search phrases appear in ALL CAPS in the Word document containing the results, as shown:

![all_caps_rendering](https://github.com/user-attachments/assets/5d07f8a5-ed26-4341-9bf8-0445de315245)


The result list is then saved as a Word file that is opened automatically at the end of the run (as with Excel, 
note the location of the Word executable and adjust the path if it is not in the standard install location). 
# External Dependencies and Licenses

| Name:        | License:                                                              |
|--------------|-----------------------------------------------------------------------|
| docx2python  | [MIT](https://pypi.org/project/docx2python/)                          |
| openpyxl     | [MIT](https://pypi.org/project/openpyxl/)                             |
| pandas       | [BSD](https://pypi.org/project/pandas/)                               |
| pdfminer.six | [MIT/X](https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE) |
| python-docx  | [MIT](https://github.com/atriumlts/python-docx/blob/master/LICENSE)   |
| rich         | [MIT](https://pypi.org/project/rich/)                                 |
| spacy        | [MIT](https://pypi.org/project/spacy/)                                |
| textacy      | [Apache 2.0](https://pypi.org/project/textacy/)                       |

# Installation
It is strongly recommended that this package be installed in a virtual environment.  The package is available at https://pypi.org/project/elucidoc/ 
and can be installed with ```pip install elucidoc``` .

***THE SPACY PIPELINE  `  en_core_web_lg  `  MUST ALSO BE INSTALLED INTO THE VIRTUAL
ENVIRONMENT FOR THE SCRIPT TO WORK***.

The pipeline can be installed as follows:
```
python -m spacy download en_core_web_lg
```
You must also be sure to verify the directory for the Office install is the same as noted above.  If not, the code must be 
changed to the directory where the Excel and Word apps are located.
# Running the Script
The project is run as a script.  It can be run with a .bat file calling the virtual environment and the executable
file per the below example:
```commandline
@"C:\Users\..\venv\Scripts\python.exe" "C:\Users\..\venv\lib\elucidoc\eluciDoc.py"

@pause
```
Additionally, the location of the ```elucidoc.py``` executable can be included in the Windows ```PATH``` environment variable.
# Case Sensitive Searches
General convention in legal texts is to capitalize defined terms.  For that reason, the user may want to make the search
case-sensitive to target the appropriate instances of the term.  For searches where the specific use of the subject term
is not important but broader capture is, the case-sensitive feature can be turned off.  Once a selection is made, it applies
for all subsequent searches until the script is restarted.
# Possessive Case and Other Punctuation
Textacy divides the party search term from both following words and punctuation including the possessive case, as shown below:

![textacy_rendering](https://github.com/jblake1965/eluciDoc/assets/100727736/1fd67f92-57bd-402a-b99f-95d5847f49f7)

To capture an instance of a possessive case of the party being searched, a 's or ' (for the plural possessive) must be
the first character in the predicate search phrase, as illustrated by the prompt below:

![enter_predicate_phrase](https://github.com/jblake1965/eluciDoc/assets/100727736/edc9f616-97d7-4bdc-8553-f89292e43332)

The same principal applies to the comma, colon, semicolon and closed parentheses immediately following the party name.

# Smart Quotes
Microsoft Word's default settings utilize smart quotes, which are the
curly type fonts. Those are problematic when searching
documents converted to text (rendered as slanted quotes in Utf8), and
are replaced with straight quotes via the following code:

```python
text = re.sub(r'”', '\"', text)  # replace double smartquote open quote
text = re.sub(r'“', '\"', text)  # replace double smartquote close quote
text = re.sub(r'’', '\'', text)  # replace single smartquote close quote
text = re.sub(r'‘', '\'', text)  # replace single smartquote open quote
```

# PDFs
Due to the nature of .pdf files and the sometimes-inconsistent results
that occur when converting pdf documents to text format, additional
processing is done. Some characters and extra spaces between word boundaries are removed as part of the
text processing:
```python
text = re.sub(u'[^\u0020-\uD7FF\u0009\u000A\u000D\uE000-\uFFFD\U00010000-\U0010FFFF]+', '', text)
text = re.sub(r'(\b)(\s{2,4})(\b)', r'\g<1> ', text)
```
The above solution is not a comprehensive fix for pdf issues. The accuracy of the results with searches of .pdf files
may be negatively impacted by the quality or formatting of the underlying document, particularly with
scanned documents.

# Open Files
If a consecutive search is run for the same party and the Excel file with the prior search results is still open,
the script will notify the user of such and not overwrite the existing Excel file.  With the Word files, the user will 
be prompted to save the existing file with
another name and close it before proceeding with a second search for the same party.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "elucidoc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "legal, text, analysis",
    "author": "John Blake",
    "author_email": "elucidoc535@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/18/a3/d35e57d43c051a02357c6a2b877c01ea9c80071114c84a42a186da491b2e/elucidoc-2024.7.24.tar.gz",
    "platform": null,
    "description": "![eluciDoc_header](https://github.com/jblake1965/eluciDoc/assets/100727736/e7f94b7f-fb1b-4f55-8665-4dc11c6b93af)\r\n\r\n[![CodeQL](https://github.com/jblake1965/eluciDoc/actions/workflows/github-code-scanning/codeql/badge.svg)](https://github.com/jblake1965/eluciDoc/actions/workflows/github-code-scanning/codeql) [![GitHub Discussions](https://img.shields.io/github/discussions/jblake1965/eluciDoc?labelColor=blue&color=orange)](https://github.com/jblake1965/eluciDoc/discussions/3) [![PYPI Version](https://img.shields.io/pypi/v/elucidoc?logoColor=blue&labelColor=green)](https://img.shields.io/pypi/v/elucidoc?logoColor=blue&labelColor=green)\r\n\r\n# What this is:\r\nThis CLI Python project, written for the Windows\u2122 environment, filters sentences and clauses containing specific user input\r\nterms from a single document. This project was originally created as a tool to aid in the review of legal contracts, \r\nbut can be used with any text. Documents can be in docx, .pdf or .txt file formats.\r\nThe general principle behind its function is subject-predicate sentence analysis. Searches are based on user-selected parties\r\nin the document, followed by a user-selected phrase.  It is used in conjunction with Microsoft\u2122 Office 365\u2122\r\nWord and Excel\u2122 apps.\r\n# How it works:\r\nA .docx, .pdf or .txt file and path is entered (drag and drop work in the Windows terminal):\r\n\r\n![file_input](https://github.com/jblake1965/eluciDoc/assets/100727736/c08d59a4-a019-4a42-b895-427a1815b474)\r\n\r\nThe file is then processed as utf8 text, with MS Word Smart Quotes being converted to straight quotes and non-ASCII and\r\nnon-breaking spaces removed. The term for the party being searched in the document is entered next:\r\n\r\n![enter_party_name](https://github.com/jblake1965/eluciDoc/assets/100727736/bd1e9603-137c-4475-aa1d-d09ed157738a)\r\n\r\nand then passed with the processed text to textacy's Keyword in Context (KWIC) function.  The result is saved as an Excel\r\nfile with the same name in the same location as the searched document, with \"..._[name of the party]_search_result.xlsx\" \r\nappended. The Excel file automatically opens with a subprocess call, and the results can be converted to a table for\r\nfurther sorting:\r\n\r\n![textacy_rendering](https://github.com/jblake1965/eluciDoc/assets/100727736/a9bfd1a8-8477-4401-8e96-bd83801d5488)\r\n\r\nNote: the subprocess call below uses the default Office install location:\r\n\r\n```python\r\nsubprocess.Popen([r'C:\\Program Files\\Microsoft Office\\root\\Office16\\EXCEL.EXE', result_file])\r\n```\r\n\r\nIf the user has Office installed in a different location, then the code must be changed to reflect that directory.\r\n\r\nThe document is chunked into sentences (or clauses, depending on the formatting) with the spaCy module.\r\nThe user is prompted to enter predicate search phrases culled from the Excel search file which phrases are stored in a list.  \r\nOnce finished entering the predicate search phrases, the script iterates through the list of search phrases looking for\r\na match in each sentence. Sentences and clauses containing a match are added to a result list. The user has the option of \r\nhaving the search phrases appear in ALL CAPS in the Word document containing the results, as shown:\r\n\r\n![all_caps_rendering](https://github.com/user-attachments/assets/5d07f8a5-ed26-4341-9bf8-0445de315245)\r\n\r\n\r\nThe result list is then saved as a Word file that is opened automatically at the end of the run (as with Excel, \r\nnote the location of the Word executable and adjust the path if it is not in the standard install location). \r\n# External Dependencies and Licenses\r\n\r\n| Name:        | License:                                                              |\r\n|--------------|-----------------------------------------------------------------------|\r\n| docx2python  | [MIT](https://pypi.org/project/docx2python/)                          |\r\n| openpyxl     | [MIT](https://pypi.org/project/openpyxl/)                             |\r\n| pandas       | [BSD](https://pypi.org/project/pandas/)                               |\r\n| pdfminer.six | [MIT/X](https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE) |\r\n| python-docx  | [MIT](https://github.com/atriumlts/python-docx/blob/master/LICENSE)   |\r\n| rich         | [MIT](https://pypi.org/project/rich/)                                 |\r\n| spacy        | [MIT](https://pypi.org/project/spacy/)                                |\r\n| textacy      | [Apache 2.0](https://pypi.org/project/textacy/)                       |\r\n\r\n# Installation\r\nIt is strongly recommended that this package be installed in a virtual environment.  The package is available at https://pypi.org/project/elucidoc/ \r\nand can be installed with ```pip install elucidoc``` .\r\n\r\n***THE SPACY PIPELINE  `  en_core_web_lg  `  MUST ALSO BE INSTALLED INTO THE VIRTUAL\r\nENVIRONMENT FOR THE SCRIPT TO WORK***.\r\n\r\nThe pipeline can be installed as follows:\r\n```\r\npython -m spacy download en_core_web_lg\r\n```\r\nYou must also be sure to verify the directory for the Office install is the same as noted above.  If not, the code must be \r\nchanged to the directory where the Excel and Word apps are located.\r\n# Running the Script\r\nThe project is run as a script.  It can be run with a .bat file calling the virtual environment and the executable\r\nfile per the below example:\r\n```commandline\r\n@\"C:\\Users\\..\\venv\\Scripts\\python.exe\" \"C:\\Users\\..\\venv\\lib\\elucidoc\\eluciDoc.py\"\r\n\r\n@pause\r\n```\r\nAdditionally, the location of the ```elucidoc.py``` executable can be included in the Windows ```PATH``` environment variable.\r\n# Case Sensitive Searches\r\nGeneral convention in legal texts is to capitalize defined terms.  For that reason, the user may want to make the search\r\ncase-sensitive to target the appropriate instances of the term.  For searches where the specific use of the subject term\r\nis not important but broader capture is, the case-sensitive feature can be turned off.  Once a selection is made, it applies\r\nfor all subsequent searches until the script is restarted.\r\n# Possessive Case and Other Punctuation\r\nTextacy divides the party search term from both following words and punctuation including the possessive case, as shown below:\r\n\r\n![textacy_rendering](https://github.com/jblake1965/eluciDoc/assets/100727736/1fd67f92-57bd-402a-b99f-95d5847f49f7)\r\n\r\nTo capture an instance of a possessive case of the party being searched, a 's or ' (for the plural possessive) must be\r\nthe first character in the predicate search phrase, as illustrated by the prompt below:\r\n\r\n![enter_predicate_phrase](https://github.com/jblake1965/eluciDoc/assets/100727736/edc9f616-97d7-4bdc-8553-f89292e43332)\r\n\r\nThe same principal applies to the comma, colon, semicolon and closed parentheses immediately following the party name.\r\n\r\n# Smart Quotes\r\nMicrosoft Word's default settings utilize smart quotes, which are the\r\ncurly type fonts. Those are problematic when searching\r\ndocuments converted to text (rendered as slanted quotes in Utf8), and\r\nare replaced with straight quotes via the following code:\r\n\r\n```python\r\ntext = re.sub(r'\u201d', '\\\"', text)  # replace double smartquote open quote\r\ntext = re.sub(r'\u201c', '\\\"', text)  # replace double smartquote close quote\r\ntext = re.sub(r'\u2019', '\\'', text)  # replace single smartquote close quote\r\ntext = re.sub(r'\u2018', '\\'', text)  # replace single smartquote open quote\r\n```\r\n\r\n# PDFs\r\nDue to the nature of .pdf files and the sometimes-inconsistent results\r\nthat occur when converting pdf documents to text format, additional\r\nprocessing is done. Some characters and extra spaces between word boundaries are removed as part of the\r\ntext processing:\r\n```python\r\ntext = re.sub(u'[^\\u0020-\\uD7FF\\u0009\\u000A\\u000D\\uE000-\\uFFFD\\U00010000-\\U0010FFFF]+', '', text)\r\ntext = re.sub(r'(\\b)(\\s{2,4})(\\b)', r'\\g<1> ', text)\r\n```\r\nThe above solution is not a comprehensive fix for pdf issues. The accuracy of the results with searches of .pdf files\r\nmay be negatively impacted by the quality or formatting of the underlying document, particularly with\r\nscanned documents.\r\n\r\n# Open Files\r\nIf a consecutive search is run for the same party and the Excel file with the prior search results is still open,\r\nthe script will notify the user of such and not overwrite the existing Excel file.  With the Word files, the user will \r\nbe prompted to save the existing file with\r\nanother name and close it before proceeding with a second search for the same party.\r\n",
    "bugtrack_url": null,
    "license": "Copyright 2023 John R. Blake, Jr.  Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \u201cAS IS\u201d AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.  See also README.md for external dependencies' licenses.  ",
    "summary": "Screens legal and other texts for sentences and clauses containing user defined search phrases",
    "version": "2024.7.24",
    "project_urls": {
        "License": "https://github.com/jblake1965/eluciDoc/blob/developer/LICENSE.txt",
        "Source Code": "https://github.com/jblake1965/eluciDoc"
    },
    "split_keywords": [
        "legal",
        " text",
        " analysis"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "801b8f08d687bb6e548757ecf408f18010db144e7e90545f0bf1c05f1bec3dbd",
                "md5": "956d65346563c21e99723e95d0995325",
                "sha256": "91fc004bba051cfd7b397b5e26114ab66aa03604e2cfac0fe5f3a9f42c1584a2"
            },
            "downloads": -1,
            "filename": "elucidoc-2024.7.24-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "956d65346563c21e99723e95d0995325",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 9943,
            "upload_time": "2024-07-24T10:42:44",
            "upload_time_iso_8601": "2024-07-24T10:42:44.179295Z",
            "url": "https://files.pythonhosted.org/packages/80/1b/8f08d687bb6e548757ecf408f18010db144e7e90545f0bf1c05f1bec3dbd/elucidoc-2024.7.24-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "18a3d35e57d43c051a02357c6a2b877c01ea9c80071114c84a42a186da491b2e",
                "md5": "e7c4889dd840b9023f485bc1474cfe5a",
                "sha256": "eae6cc6c34dbb8cc52b3685a607d590179193c4abf673c0ec91adea77a7c02f5"
            },
            "downloads": -1,
            "filename": "elucidoc-2024.7.24.tar.gz",
            "has_sig": false,
            "md5_digest": "e7c4889dd840b9023f485bc1474cfe5a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 12588,
            "upload_time": "2024-07-24T10:42:45",
            "upload_time_iso_8601": "2024-07-24T10:42:45.440714Z",
            "url": "https://files.pythonhosted.org/packages/18/a3/d35e57d43c051a02357c6a2b877c01ea9c80071114c84a42a186da491b2e/elucidoc-2024.7.24.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-24 10:42:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jblake1965",
    "github_project": "eluciDoc",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "annotated-types",
            "specs": [
                [
                    "==",
                    "0.7.0"
                ]
            ]
        },
        {
            "name": "backports.tarfile",
            "specs": [
                [
                    "==",
                    "1.2.0"
                ]
            ]
        },
        {
            "name": "blis",
            "specs": [
                [
                    "==",
                    "0.9.1"
                ]
            ]
        },
        {
            "name": "build",
            "specs": [
                [
                    "==",
                    "1.2.1"
                ]
            ]
        },
        {
            "name": "cachetools",
            "specs": [
                [
                    "==",
                    "5.4.0"
                ]
            ]
        },
        {
            "name": "catalogue",
            "specs": [
                [
                    "==",
                    "2.0.10"
                ]
            ]
        },
        {
            "name": "certifi",
            "specs": [
                [
                    "==",
                    "2024.7.4"
                ]
            ]
        },
        {
            "name": "cffi",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "charset-normalizer",
            "specs": [
                [
                    "==",
                    "3.3.2"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    "==",
                    "8.1.7"
                ]
            ]
        },
        {
            "name": "cloudpathlib",
            "specs": [
                [
                    "==",
                    "0.18.1"
                ]
            ]
        },
        {
            "name": "colorama",
            "specs": [
                [
                    "==",
                    "0.4.6"
                ]
            ]
        },
        {
            "name": "confection",
            "specs": [
                [
                    "==",
                    "0.1.5"
                ]
            ]
        },
        {
            "name": "cryptography",
            "specs": [
                [
                    "==",
                    "43.0.0"
                ]
            ]
        },
        {
            "name": "cymem",
            "specs": [
                [
                    "==",
                    "2.0.8"
                ]
            ]
        },
        {
            "name": "cytoolz",
            "specs": [
                [
                    "==",
                    "0.12.3"
                ]
            ]
        },
        {
            "name": "docutils",
            "specs": [
                [
                    "==",
                    "0.21.2"
                ]
            ]
        },
        {
            "name": "docx2python",
            "specs": [
                [
                    "==",
                    "2.10.1"
                ]
            ]
        },
        {
            "name": "en-core-web-lg",
            "specs": []
        },
        {
            "name": "et-xmlfile",
            "specs": [
                [
                    "==",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "filelock",
            "specs": [
                [
                    "==",
                    "3.15.4"
                ]
            ]
        },
        {
            "name": "floret",
            "specs": [
                [
                    "==",
                    "0.10.5"
                ]
            ]
        },
        {
            "name": "fsspec",
            "specs": [
                [
                    "==",
                    "2024.6.1"
                ]
            ]
        },
        {
            "name": "huggingface-hub",
            "specs": [
                [
                    "==",
                    "0.24.1"
                ]
            ]
        },
        {
            "name": "idna",
            "specs": [
                [
                    "==",
                    "3.7"
                ]
            ]
        },
        {
            "name": "importlib_metadata",
            "specs": [
                [
                    "==",
                    "8.1.0"
                ]
            ]
        },
        {
            "name": "jaraco.classes",
            "specs": [
                [
                    "==",
                    "3.4.0"
                ]
            ]
        },
        {
            "name": "jaraco.context",
            "specs": [
                [
                    "==",
                    "5.3.0"
                ]
            ]
        },
        {
            "name": "jaraco.functools",
            "specs": [
                [
                    "==",
                    "4.0.1"
                ]
            ]
        },
        {
            "name": "jellyfish",
            "specs": [
                [
                    "==",
                    "1.0.4"
                ]
            ]
        },
        {
            "name": "Jinja2",
            "specs": [
                [
                    "==",
                    "3.1.4"
                ]
            ]
        },
        {
            "name": "joblib",
            "specs": [
                [
                    "==",
                    "1.4.2"
                ]
            ]
        },
        {
            "name": "keyring",
            "specs": [
                [
                    "==",
                    "25.2.1"
                ]
            ]
        },
        {
            "name": "langcodes",
            "specs": [
                [
                    "==",
                    "3.4.0"
                ]
            ]
        },
        {
            "name": "lxml",
            "specs": [
                [
                    "==",
                    "5.2.2"
                ]
            ]
        },
        {
            "name": "markdown-it-py",
            "specs": [
                [
                    "==",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "MarkupSafe",
            "specs": [
                [
                    "==",
                    "2.1.5"
                ]
            ]
        },
        {
            "name": "mdurl",
            "specs": [
                [
                    "==",
                    "0.1.2"
                ]
            ]
        },
        {
            "name": "more-itertools",
            "specs": [
                [
                    "==",
                    "10.3.0"
                ]
            ]
        },
        {
            "name": "mpmath",
            "specs": [
                [
                    "==",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "murmurhash",
            "specs": [
                [
                    "==",
                    "1.0.10"
                ]
            ]
        },
        {
            "name": "networkx",
            "specs": [
                [
                    "==",
                    "3.3"
                ]
            ]
        },
        {
            "name": "nh3",
            "specs": [
                [
                    "==",
                    "0.2.18"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.26.4"
                ]
            ]
        },
        {
            "name": "openpyxl",
            "specs": [
                [
                    "==",
                    "3.1.5"
                ]
            ]
        },
        {
            "name": "packaging",
            "specs": [
                [
                    "==",
                    "24.1"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.2"
                ]
            ]
        },
        {
            "name": "paragraphs",
            "specs": [
                [
                    "==",
                    "0.2.1"
                ]
            ]
        },
        {
            "name": "pathlib_abc",
            "specs": [
                [
                    "==",
                    "0.3.1"
                ]
            ]
        },
        {
            "name": "pathy",
            "specs": [
                [
                    "==",
                    "0.11.0"
                ]
            ]
        },
        {
            "name": "pdfminer.six",
            "specs": [
                [
                    "==",
                    "20240706"
                ]
            ]
        },
        {
            "name": "pkginfo",
            "specs": [
                [
                    "==",
                    "1.11.1"
                ]
            ]
        },
        {
            "name": "preshed",
            "specs": [
                [
                    "==",
                    "3.0.9"
                ]
            ]
        },
        {
            "name": "pyarrow",
            "specs": [
                [
                    "==",
                    "17.0.0"
                ]
            ]
        },
        {
            "name": "pycparser",
            "specs": [
                [
                    "==",
                    "2.22"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    "==",
                    "2.8.2"
                ]
            ]
        },
        {
            "name": "pydantic_core",
            "specs": [
                [
                    "==",
                    "2.20.1"
                ]
            ]
        },
        {
            "name": "Pygments",
            "specs": [
                [
                    "==",
                    "2.18.0"
                ]
            ]
        },
        {
            "name": "PyInputPlus",
            "specs": [
                [
                    "==",
                    "0.2.12"
                ]
            ]
        },
        {
            "name": "pyparsing",
            "specs": [
                [
                    "==",
                    "3.1.2"
                ]
            ]
        },
        {
            "name": "pyphen",
            "specs": [
                [
                    "==",
                    "0.15.0"
                ]
            ]
        },
        {
            "name": "pyproject_hooks",
            "specs": [
                [
                    "==",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "PySimpleValidate",
            "specs": [
                [
                    "==",
                    "0.2.12"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.9.0.post0"
                ]
            ]
        },
        {
            "name": "python-docx",
            "specs": [
                [
                    "==",
                    "1.1.2"
                ]
            ]
        },
        {
            "name": "python-utils",
            "specs": [
                [
                    "==",
                    "3.8.2"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2024.1"
                ]
            ]
        },
        {
            "name": "pywin32-ctypes",
            "specs": [
                [
                    "==",
                    "0.2.2"
                ]
            ]
        },
        {
            "name": "PyYAML",
            "specs": [
                [
                    "==",
                    "6.0.1"
                ]
            ]
        },
        {
            "name": "readme_renderer",
            "specs": [
                [
                    "==",
                    "44.0"
                ]
            ]
        },
        {
            "name": "regex",
            "specs": [
                [
                    "==",
                    "2024.5.15"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.32.3"
                ]
            ]
        },
        {
            "name": "requests-toolbelt",
            "specs": [
                [
                    "==",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "rfc3986",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "rich",
            "specs": [
                [
                    "==",
                    "13.7.1"
                ]
            ]
        },
        {
            "name": "safetensors",
            "specs": [
                [
                    "==",
                    "0.4.3"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "==",
                    "1.5.1"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.14.0"
                ]
            ]
        },
        {
            "name": "shellingham",
            "specs": [
                [
                    "==",
                    "1.5.4"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "smart-open",
            "specs": [
                [
                    "==",
                    "7.0.4"
                ]
            ]
        },
        {
            "name": "spacy",
            "specs": [
                [
                    "==",
                    "3.7.5"
                ]
            ]
        },
        {
            "name": "spacy-alignments",
            "specs": [
                [
                    "==",
                    "0.9.1"
                ]
            ]
        },
        {
            "name": "spacy-legacy",
            "specs": [
                [
                    "==",
                    "3.0.12"
                ]
            ]
        },
        {
            "name": "spacy-loggers",
            "specs": [
                [
                    "==",
                    "1.0.5"
                ]
            ]
        },
        {
            "name": "spacy-transformers",
            "specs": [
                [
                    "==",
                    "1.3.5"
                ]
            ]
        },
        {
            "name": "srsly",
            "specs": [
                [
                    "==",
                    "2.4.8"
                ]
            ]
        },
        {
            "name": "stdiomask",
            "specs": [
                [
                    "==",
                    "0.0.6"
                ]
            ]
        },
        {
            "name": "sympy",
            "specs": [
                [
                    "==",
                    "1.13.1"
                ]
            ]
        },
        {
            "name": "textacy",
            "specs": [
                [
                    "==",
                    "0.13.0"
                ]
            ]
        },
        {
            "name": "thinc",
            "specs": [
                [
                    "==",
                    "8.2.3"
                ]
            ]
        },
        {
            "name": "threadpoolctl",
            "specs": [
                [
                    "==",
                    "3.5.0"
                ]
            ]
        },
        {
            "name": "tokenizers",
            "specs": [
                [
                    "==",
                    "0.19.1"
                ]
            ]
        },
        {
            "name": "toolz",
            "specs": [
                [
                    "==",
                    "0.12.1"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    "==",
                    "2.3.1"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.66.4"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    "==",
                    "4.43.1"
                ]
            ]
        },
        {
            "name": "twine",
            "specs": [
                [
                    "==",
                    "5.1.1"
                ]
            ]
        },
        {
            "name": "typer",
            "specs": [
                [
                    "==",
                    "0.12.3"
                ]
            ]
        },
        {
            "name": "typing_extensions",
            "specs": [
                [
                    "==",
                    "4.12.2"
                ]
            ]
        },
        {
            "name": "tzdata",
            "specs": [
                [
                    "==",
                    "2024.1"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    "==",
                    "2.2.2"
                ]
            ]
        },
        {
            "name": "wasabi",
            "specs": [
                [
                    "==",
                    "1.1.3"
                ]
            ]
        },
        {
            "name": "weasel",
            "specs": [
                [
                    "==",
                    "0.4.1"
                ]
            ]
        },
        {
            "name": "wrapt",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "zipp",
            "specs": [
                [
                    "==",
                    "3.19.2"
                ]
            ]
        }
    ],
    "lcname": "elucidoc"
}

John Blake