docanalysis

Name	docanalysis JSON
Version	0.3.0 JSON
	download
home_page	https://github.com/petermr/docanalysis
Summary	extract structured information from ethics paragraphs
upload_time	2023-11-04 11:07:47
maintainer
docs_url	None
author	Ayush Garg, Shweata N. Hegde
requires_python
license	Apache License
keywords	research automation
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ## docanalysis 
`docanalysis` is a Command Line Tool that ingests corpora [(CProjects)](https://github.com/petermr/tigr2ess/blob/master/getpapers/TUTORIAL.md#cproject-and-ctrees) and carries out text-analysis of documents, including
- sectioning
- NLP/text-mining
- dictionary generation 

Besides the bespoke code, it uses [NLTK](https://www.nltk.org/) and other Python tools for many operations, and [spaCy](https://spacy.io/) or [scispaCy](https://allenai.github.io/scispacy/) for extraction and annotation of entities. Outputs summary data and word-dictionaries. 

### Set up `venv`
We recommend you create a virtual environment (`venv`) before installing `docanalysis` and that you activate the `venv` before each time you run `docanalysis`.

#### Windows
Creating a `venv`
```
>> mkdir docanalysis_demo
>> cd docanalysis_demo
>> python -m venv venv
```

Activating `venv`
```
>> venv\Scripts\activate.bat
```

#### MacOS
Creating a `venv`
```
>> mkdir docanalysis_demo
>> cd docanalysis_demo
>> python3 -m venv venv
```

Activating `venv`
```
>> source venv/bin/activate
```

Refer the [official documentation](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/) for more help. 

### Install `docanalysis`
You can download `docanalysis` from PYPI. 
```
  pip install docanalysis
```
If you are on a Mac
```
pip3 install docanalysis
```

Download python from: [https://www.python.org/downloads/](https://www.python.org/downloads/) and select the option `Add Python to Path while installing`. Make sure `pip` is installed along with python. Check out [https://pip.pypa.io/en/stable/installation/](https://pip.pypa.io/en/stable/installation/) if you have difficulties installing pip.

### Run `docanalysis`
`docanalysis --help` should list the flags we support and their use.

```
usage: docanalysis.py [-h] [--run_pygetpapers] [--make_section] [-q QUERY] [-k HITS] [--project_name PROJECT_NAME] [-d DICTIONARY] [-o OUTPUT]
                      [--make_ami_dict MAKE_AMI_DICT] [--search_section [SEARCH_SECTION [SEARCH_SECTION ...]]] [--entities [ENTITIES [ENTITIES ...]]]
                      [--spacy_model SPACY_MODEL] [--html HTML] [--synonyms SYNONYMS] [--make_json MAKE_JSON] [--search_html] [--extract_abb EXTRACT_ABB]
                      [-l LOGLEVEL] [-f LOGFILE]

Welcome to docanalysis version 0.1.3. -h or --help for help

optional arguments:
  -h, --help            show this help message and exit
  --run_pygetpapers     [Command] downloads papers from EuropePMC via pygetpapers
  --make_section        [Command] makes sections; requires a fulltext.xml in CTree directories
  -q QUERY, --query QUERY
                        [pygetpapers] query string
  -k HITS, --hits HITS  [pygetpapers] number of papers to download
  --project_name PROJECT_NAME
                        CProject directory name
  -d DICTIONARY, --dictionary DICTIONARY
                        [file name/url] existing ami dictionary to annotate sentences or support supervised entity extraction
  -o OUTPUT, --output OUTPUT
                        outputs csv with sentences/terms
  --make_ami_dict MAKE_AMI_DICT
                        [Command] title for ami-dict. Makes ami-dict of all extracted entities; works only with spacy
  --search_section [SEARCH_SECTION [SEARCH_SECTION ...]]
                        [NER/dictionary search] section(s) to annotate. Choose from: ALL, ACK, AFF, AUT, CON, DIS, ETH, FIG, INT, KEY, MET, RES, TAB, TIL. Defaults to
                        ALL
  --entities [ENTITIES [ENTITIES ...]]
                        [NER] entities to extract. Default (ALL). Common entities SpaCy: GPE, LANGUAGE, ORG, PERSON (for additional ones check: ); SciSpaCy: CHEMICAL,
                        DISEASE
  --spacy_model SPACY_MODEL
                        [NER] optional. Choose between spacy or scispacy models. Defaults to spacy
  --html HTML           outputs html with sentences/terms
  --synonyms SYNONYMS   annotate the corpus/sections with synonyms from ami-dict
  --make_json MAKE_JSON
                        outputs json with sentences/terms
  --search_html         searches html documents (mainly IPCC)
  --extract_abb EXTRACT_ABB
                        [Command] title for abb-ami-dict. Extracts abbreviations and expansions; makes ami-dict of all extracted entities
  -l LOGLEVEL, --loglevel LOGLEVEL
                        provide logging level. Example --log warning <<info,warning,debug,error,critical>>, default='info'
  -f LOGFILE, --logfile LOGFILE
                        saves log to specified file in output directory as well as printing to terminal
```

#### Download papers from [EPMC](https://europepmc.org/) via `pygetpapers`
COMMAND
```
docanalysis --run_pygetpapers -q "terpene" -k 10 --project_name terpene_10
```
LOGS
```
INFO: making project/searching terpene for 10 hits into C:\Users\shweata\docanalysis\terpene_10
INFO: Total Hits are 13935
1it [00:00, 936.44it/s]
INFO: Saving XML files to C:\Users\shweata\docanalysis\terpene_10\*\fulltext.xml
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:30<00:00,  3.10s/it]
```

CPROJ
```
C:\USERS\SHWEATA\DOCANALYSIS\TERPENE_10
│   eupmc_results.json
│
├───PMC8625850
│       eupmc_result.json
│       fulltext.xml
│
├───PMC8727598
│       eupmc_result.json
│       fulltext.xml
│
├───PMC8747377
│       eupmc_result.json
│       fulltext.xml
│
├───PMC8771452
│       eupmc_result.json
│       fulltext.xml
│
├───PMC8775117
│       eupmc_result.json
│       fulltext.xml
│
├───PMC8801761
│       eupmc_result.json
│       fulltext.xml
│
├───PMC8831285
│       eupmc_result.json
│       fulltext.xml
│
├───PMC8839294
│       eupmc_result.json
│       fulltext.xml
│
├───PMC8840323
│       eupmc_result.json
│       fulltext.xml
│
└───PMC8879232
        eupmc_result.json
        fulltext.xml
```

#### Section the papers
COMMAND
```
docanalysis --project_name terpene_10 --make_section
```
LOGS
```
WARNING: Making sections in /content/terpene_10/PMC9095633/fulltext.xml
INFO: dict_keys: dict_keys(['abstract', 'acknowledge', 'affiliation', 'author', 'conclusion', 'discussion', 'ethics', 'fig_caption', 'front', 'introduction', 'jrnl_title', 'keyword', 'method', 'octree', 'pdfimage', 'pub_date', 'publisher', 'reference', 'results_discuss', 'search_results', 'sections', 'svg', 'table', 'title'])
WARNING: loading templates.json
INFO: wrote XML sections for /content/terpene_10/PMC9095633/fulltext.xml /content/terpene_10/PMC9095633/sections
WARNING: Making sections in /content/terpene_10/PMC9120863/fulltext.xml
INFO: wrote XML sections for /content/terpene_10/PMC9120863/fulltext.xml /content/terpene_10/PMC9120863/sections
WARNING: Making sections in /content/terpene_10/PMC8982386/fulltext.xml
INFO: wrote XML sections for /content/terpene_10/PMC8982386/fulltext.xml /content/terpene_10/PMC8982386/sections
WARNING: Making sections in /content/terpene_10/PMC9069239/fulltext.xml
INFO: wrote XML sections for /content/terpene_10/PMC9069239/fulltext.xml /content/terpene_10/PMC9069239/sections
WARNING: Making sections in /content/terpene_10/PMC9165828/fulltext.xml
INFO: wrote XML sections for /content/terpene_10/PMC9165828/fulltext.xml /content/terpene_10/PMC9165828/sections
WARNING: Making sections in /content/terpene_10/PMC9119530/fulltext.xml
INFO: wrote XML sections for /content/terpene_10/PMC9119530/fulltext.xml /content/terpene_10/PMC9119530/sections
WARNING: Making sections in /content/terpene_10/PMC8982077/fulltext.xml
INFO: wrote XML sections for /content/terpene_10/PMC8982077/fulltext.xml /content/terpene_10/PMC8982077/sections
WARNING: Making sections in /content/terpene_10/PMC9067962/fulltext.xml
INFO: wrote XML sections for /content/terpene_10/PMC9067962/fulltext.xml /content/terpene_10/PMC9067962/sections
WARNING: Making sections in /content/terpene_10/PMC9154778/fulltext.xml
INFO: wrote XML sections for /content/terpene_10/PMC9154778/fulltext.xml /content/terpene_10/PMC9154778/sections
WARNING: Making sections in /content/terpene_10/PMC9164016/fulltext.xml
INFO: wrote XML sections for /content/terpene_10/PMC9164016/fulltext.xml /content/terpene_10/PMC9164016/sections
 47% 1056/2258 [00:01<00:01, 1003.31it/s]ERROR: cannot parse /content/terpene_10/PMC9165828/sections/1_front/1_article-meta/26_custom-meta-group/0_custom-meta/1_meta-value/0_xref.xml
 67% 1516/2258 [00:01<00:00, 1047.68it/s]ERROR: cannot parse /content/terpene_10/PMC9119530/sections/1_front/1_article-meta/24_custom-meta-group/0_custom-meta/1_meta-value/7_xref.xml
ERROR: cannot parse /content/terpene_10/PMC9119530/sections/1_front/1_article-meta/24_custom-meta-group/0_custom-meta/1_meta-value/14_email.xml
ERROR: cannot parse /content/terpene_10/PMC9119530/sections/1_front/1_article-meta/24_custom-meta-group/0_custom-meta/1_meta-value/3_xref.xml
ERROR: cannot parse /content/terpene_10/PMC9119530/sections/1_front/1_article-meta/24_custom-meta-group/0_custom-meta/1_meta-value/6_xref.xml
ERROR: cannot parse /content/terpene_10/PMC9119530/sections/1_front/1_article-meta/24_custom-meta-group/0_custom-meta/1_meta-value/9_email.xml
ERROR: cannot parse /content/terpene_10/PMC9119530/sections/1_front/1_article-meta/24_custom-meta-group/0_custom-meta/1_meta-value/10_email.xml
ERROR: cannot parse /content/terpene_10/PMC9119530/sections/1_front/1_article-meta/24_custom-meta-group/0_custom-meta/1_meta-value/4_xref.xml
...
100% 2258/2258 [00:02<00:00, 949.43it/s] 
```

CTREE
```
├───PMC8625850
│   └───sections
│       ├───0_processing-meta
│       ├───1_front
│       │   ├───0_journal-meta
│       │   └───1_article-meta
│       ├───2_body
│       │   ├───0_1._introduction
│       │   ├───1_2._materials_and_methods
│       │   │   ├───1_2.1._materials
│       │   │   ├───2_2.2._bacterial_strains
│       │   │   ├───3_2.3._preparation_and_character
│       │   │   ├───4_2.4._evaluation_of_the_effect_
│       │   │   ├───5_2.5._time-kill_studies
│       │   │   ├───6_2.6._propidium_iodide_uptake-e
│       │   │   └───7_2.7._hemolysis_test_from_human
│       │   ├───2_3._results
│       │   │   ├───1_3.1._encapsulation_of_terpene_
│       │   │   ├───2_3.2._both_terpene_alcohol-load
│       │   │   ├───3_3.3._farnesol_and_geraniol-loa
│       │   │   └───4_3.4._farnesol_and_geraniol-loa
│       │   ├───3_4._discussion
│       │   ├───4_5._conclusions
│       │   └───5_6._patents
│       ├───3_back
│       │   ├───0_ack
│       │   ├───1_fn-group
│       │   │   └───0_fn
│       │   ├───2_app-group
│       │   │   └───0_app
│       │   │       └───2_supplementary-material
│       │   │           └───0_media
│       │   └───9_ref-list
│       └───4_floats-group
│           ├───4_table-wrap
│           ├───5_table-wrap
│           ├───6_table-wrap
│           │   └───4_table-wrap-foot
│           │       └───0_fn
│           ├───7_table-wrap
│           └───8_table-wrap
...
```
##### Search sections using dictionary
COMMAND
```
docanalysis --project_name terpene_10 --output entities.csv --make_ami_dict entities.xml
```
LOGS
```
INFO: Found 7134 sentences in the section(s).
INFO: getting terms from /content/activity.xml
100% 7134/7134 [00:02<00:00, 3172.14it/s]
/usr/local/lib/python3.7/dist-packages/docanalysis/entity_extraction.py:352: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.
  "[", "").str.replace("]", "")
INFO: wrote output to /content/terpene_10/activity.csv
```

#### Extract entities
We use `spacy` to extract Named Entites. Here's the list of Entities it supports:CARDINAL, DATE, EVENT, FAC, GPE, LANGUAGE, LAW,LOC, MONEY, NORP, ORDINAL, ORG, PERCENT, PERSON, PRODUCT, QUANTITY, TIME, WORK_OF_ART 
INPUT
```
docanalysis --project_name terpene_10 --make_section --spacy_model spacy --entities ORG --output org.csv
```
LOGS
```
INFO: Found 7134 sentences in the section(s).
INFO: Loading spacy
100% 7134/7134 [01:08<00:00, 104.16it/s]
/usr/local/lib/python3.7/dist-packages/docanalysis/entity_extraction.py:352: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.
  "[", "").str.replace("]", "")
INFO: wrote output to /content/terpene_10/org.csv
```
##### Extract information from specific section(s)
You can choose to extract entities from specific sections

COMMAND
```
docanalysis --project_name terpene_10 --make_section --spacy_model spacy --search_section AUT, AFF --entities ORG --output org_aut_aff.csv
```
LOG
```
INFO: Found 28 sentences in the section(s).
INFO: Loading spacy
100% 28/28 [00:00<00:00, 106.66it/s]
/usr/local/lib/python3.7/dist-packages/docanalysis/entity_extraction.py:352: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.
  "[", "").str.replace("]", "")
INFO: wrote output to /content/terpene_10/org_aut_aff.csv
```
#### Create dictionary of extracted entities
COMMAND
```
docanalysis --project_name terpene_10 --make_section --spacy_model spacy --search_section AUT, AFF --entities ORG --output org_aut_aff.csvv --make_ami_dict org
```
LOG
```
INFO: Found 28 sentences in the section(s).
INFO: Loading spacy
100% 28/28 [00:00<00:00, 96.56it/s] 
/usr/local/lib/python3.7/dist-packages/docanalysis/entity_extraction.py:352: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.
  "[", "").str.replace("]", "")
INFO: wrote output to /content/terpene_10/org_aut_aff.csvv
INFO: Wrote all the entities extracted to ami dict
```

Snippet of the dictionary
```
<?xml version="1.0"?>
- dictionary title="/content/terpene_10/org.xml">
<entry count="2" term="Department of Biochemistry"/>
<entry count="2" term="Chinese Academy of Agricultural Sciences"/>
<entry count="2" term="Tianjin University"/>
<entry count="2" term="Desert Research Center"/>
<entry count="2" term="Chinese Academy of Sciences"/>
<entry count="2" term="University of Colorado Boulder"/>
<entry count="2" term="Department of Neurology"/>
<entry count="1" term="Max Planck Institute for Chemical Ecology"/>
<entry count="1" term="College of Forest Resources and Environmental Science"/>
<entry count="1" term="Michigan Technological University"/>
```

### Extract Abbreviations

```
docanalysis --project_name corpus\ethics_10 --output dict_search_5.csv  --make_json dict_search_5.json --make_ami_dict entities --extract_abb ethics_abb
```

`--extract_abb` extracts all abbreviations and make an ami-dictionary of abbreviations and its expansion. 

EXAMPLE DICTIONARY: 
```
<dictionary title="ethics_abb">
  <entry name="ASD" term="Atrial septal defect"/>
  <entry name="SPSS" term="Statistical Package for Social Sciences"/>
  <entry name="ACGME" term="Accreditation Council of Graduate Medical Education"/>
  <entry name="ABP" term="American Board of Paediatrics"/>
  <entry name="TBL" term="Team Based Learning"/>
  <entry name="TBL" term="Team-Based Learning"/>
  <entry name="UNTH" term="University of Nigeria Teaching Hospital"/>
  <entry name="PAH" term="pulmonary hypertension"/>
  <entry name="HREC" term="Human Sciences Research Council, Research Ethics Committee"/>
  <entry name="HREC" term="Human Sciences Research Council, Research Ethics Committee"/>
  <entry name="CDC" term="Center for Disease Control and Prevention"/>
  <entry name="ASD" term="Atrial septal defect"/>
  <entry name="PAH" term="pulmonary arterial hypertension"/>
  <entry name="CVDs" term="cardiovascular diseases"/>
  <entry name="BNs" term="Bayesian networks"/>
  <entry name="GI" term="gastrointestinal cancer"/>
  <entry name="ART" term="antiretroviral therapy"/>
  <entry name="HIV" term="human immunodeficiency virus"/>
  <entry name="GATE" term="Global Cooperation on Assistive Technology"/>
</dictionary>
```

### Search HTML
If you working with HTML files (IPCC Reports, for example) and not XMLs in CProjects, you can use `--search_html` flag.

```
docanalysis --project_name corpus\ipcc_sectioned  --extract_abb ethics_abb --search_html
```

 Make sure that your `html` sections is in `sections` folder. Here's an example structure: 

```
C:.
|   dict_search_2.csv
|   dict_search_2.json
|
\---chap4
    |   chapter_4
    |
    \---sections
            4.1.html
            4.2.1.html
            4.2.2.html
            4.2.3.html
            4.2.4.html
            4.2.5.html
            4.2.7.html
            4.2.html
            4.3.1.html
            4.3.2.html
            4.3.html
            4.4.1.html
            4.4.2.html
            4.4.html
            4.5.html
            executive_summary.html
            frequently_asked_questions.html
            table_of_contents.html
```
If you haven't sectioned your `html`, please use `py4ami` to section it.  
#### What is a dictionary
Dictionary, in `ami`'s terminology, a set of terms/phrases in XML format. 
Dictionaries related to ethics and acknowledgments are available in [Ethics Dictionary](https://github.com/petermr/docanalysis/tree/main/ethics_dictionary) folder

If you'd like to create a custom dictionary, you can find the steps, [here](https://github.com/petermr/tigr2ess/blob/master/dictionaries/TUTORIAL.md)

```
### Python tools used
- [`pygetpapers`](https://github.com/petermr/pygetpapers) - scrape open repositories to download papers of interest
- [nltk](https://www.nltk.org/) - splits sentences
- [spaCy](https://spacy.io/) and  [SciSpaCy](https://allenai.github.io/scispacy/)
 - recognize Named-Entities and label them
     - Here's the list of NER labels [SpaCy's English model](https://spacy.io/models/en) provides:  
     `CARDINAL, DATE, EVENT, FAC, GPE, LANGUAGE, LAW, LOC, MONEY, NORP, ORDINAL, ORG, PERCENT, PERSON, PRODUCT, QUANTITY, TIME, WORK_OF_ART`


### Credits: 
-  [Ayush Garg](https://github.com/ayush4921)
-  [Shweata N. Hegde](https://github.com/ShweataNHegde/)
-  [Daniel Mietchen](https://github.com/Daniel-Mietchen)
-  [Peter Murray-Rust](https://github.com/petermr)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/petermr/docanalysis",
    "name": "docanalysis",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "research automation",
    "author": "Ayush Garg, Shweata N. Hegde",
    "author_email": "ayush@science.org.in, shweata.hegde@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/c4/2f/18fc6398e837c46ef9615334c1d413f4c9a5b0ca515a394a343a7388fee5/docanalysis-0.3.0.tar.gz",
    "platform": null,
    "description": "## docanalysis \r\n`docanalysis` is a Command Line Tool that ingests corpora [(CProjects)](https://github.com/petermr/tigr2ess/blob/master/getpapers/TUTORIAL.md#cproject-and-ctrees) and carries out text-analysis of documents, including\r\n- sectioning\r\n- NLP/text-mining\r\n- dictionary generation \r\n\r\nBesides the bespoke code, it uses [NLTK](https://www.nltk.org/) and other Python tools for many operations, and [spaCy](https://spacy.io/) or [scispaCy](https://allenai.github.io/scispacy/) for extraction and annotation of entities. Outputs summary data and word-dictionaries. \r\n\r\n### Set up `venv`\r\nWe recommend you create a virtual environment (`venv`) before installing `docanalysis` and that you activate the `venv` before each time you run `docanalysis`.\r\n\r\n#### Windows\r\nCreating a `venv`\r\n```\r\n>> mkdir docanalysis_demo\r\n>> cd docanalysis_demo\r\n>> python -m venv venv\r\n```\r\n\r\nActivating `venv`\r\n```\r\n>> venv\\Scripts\\activate.bat\r\n```\r\n\r\n#### MacOS\r\nCreating a `venv`\r\n```\r\n>> mkdir docanalysis_demo\r\n>> cd docanalysis_demo\r\n>> python3 -m venv venv\r\n```\r\n\r\nActivating `venv`\r\n```\r\n>> source venv/bin/activate\r\n```\r\n\r\nRefer the [official documentation](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/) for more help. \r\n\r\n### Install `docanalysis`\r\nYou can download `docanalysis` from PYPI. \r\n```\r\n  pip install docanalysis\r\n```\r\nIf you are on a Mac\r\n```\r\npip3 install docanalysis\r\n```\r\n\r\nDownload python from: [https://www.python.org/downloads/](https://www.python.org/downloads/) and select the option `Add Python to Path while installing`. Make sure `pip` is installed along with python. Check out [https://pip.pypa.io/en/stable/installation/](https://pip.pypa.io/en/stable/installation/) if you have difficulties installing pip.\r\n\r\n### Run `docanalysis`\r\n`docanalysis --help` should list the flags we support and their use.\r\n\r\n```\r\nusage: docanalysis.py [-h] [--run_pygetpapers] [--make_section] [-q QUERY] [-k HITS] [--project_name PROJECT_NAME] [-d DICTIONARY] [-o OUTPUT]\r\n                      [--make_ami_dict MAKE_AMI_DICT] [--search_section [SEARCH_SECTION [SEARCH_SECTION ...]]] [--entities [ENTITIES [ENTITIES ...]]]\r\n                      [--spacy_model SPACY_MODEL] [--html HTML] [--synonyms SYNONYMS] [--make_json MAKE_JSON] [--search_html] [--extract_abb EXTRACT_ABB]\r\n                      [-l LOGLEVEL] [-f LOGFILE]\r\n\r\nWelcome to docanalysis version 0.1.3. -h or --help for help\r\n\r\noptional arguments:\r\n  -h, --help            show this help message and exit\r\n  --run_pygetpapers     [Command] downloads papers from EuropePMC via pygetpapers\r\n  --make_section        [Command] makes sections; requires a fulltext.xml in CTree directories\r\n  -q QUERY, --query QUERY\r\n                        [pygetpapers] query string\r\n  -k HITS, --hits HITS  [pygetpapers] number of papers to download\r\n  --project_name PROJECT_NAME\r\n                        CProject directory name\r\n  -d DICTIONARY, --dictionary DICTIONARY\r\n                        [file name/url] existing ami dictionary to annotate sentences or support supervised entity extraction\r\n  -o OUTPUT, --output OUTPUT\r\n                        outputs csv with sentences/terms\r\n  --make_ami_dict MAKE_AMI_DICT\r\n                        [Command] title for ami-dict. Makes ami-dict of all extracted entities; works only with spacy\r\n  --search_section [SEARCH_SECTION [SEARCH_SECTION ...]]\r\n                        [NER/dictionary search] section(s) to annotate. Choose from: ALL, ACK, AFF, AUT, CON, DIS, ETH, FIG, INT, KEY, MET, RES, TAB, TIL. Defaults to\r\n                        ALL\r\n  --entities [ENTITIES [ENTITIES ...]]\r\n                        [NER] entities to extract. Default (ALL). Common entities SpaCy: GPE, LANGUAGE, ORG, PERSON (for additional ones check: ); SciSpaCy: CHEMICAL,\r\n                        DISEASE\r\n  --spacy_model SPACY_MODEL\r\n                        [NER] optional. Choose between spacy or scispacy models. Defaults to spacy\r\n  --html HTML           outputs html with sentences/terms\r\n  --synonyms SYNONYMS   annotate the corpus/sections with synonyms from ami-dict\r\n  --make_json MAKE_JSON\r\n                        outputs json with sentences/terms\r\n  --search_html         searches html documents (mainly IPCC)\r\n  --extract_abb EXTRACT_ABB\r\n                        [Command] title for abb-ami-dict. Extracts abbreviations and expansions; makes ami-dict of all extracted entities\r\n  -l LOGLEVEL, --loglevel LOGLEVEL\r\n                        provide logging level. Example --log warning <<info,warning,debug,error,critical>>, default='info'\r\n  -f LOGFILE, --logfile LOGFILE\r\n                        saves log to specified file in output directory as well as printing to terminal\r\n```\r\n\r\n#### Download papers from [EPMC](https://europepmc.org/) via `pygetpapers`\r\nCOMMAND\r\n```\r\ndocanalysis --run_pygetpapers -q \"terpene\" -k 10 --project_name terpene_10\r\n```\r\nLOGS\r\n```\r\nINFO: making project/searching terpene for 10 hits into C:\\Users\\shweata\\docanalysis\\terpene_10\r\nINFO: Total Hits are 13935\r\n1it [00:00, 936.44it/s]\r\nINFO: Saving XML files to C:\\Users\\shweata\\docanalysis\\terpene_10\\*\\fulltext.xml\r\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 10/10 [00:30<00:00,  3.10s/it]\r\n```\r\n\r\nCPROJ\r\n```\r\nC:\\USERS\\SHWEATA\\DOCANALYSIS\\TERPENE_10\r\n\u2502   eupmc_results.json\r\n\u2502\r\n\u251c\u2500\u2500\u2500PMC8625850\r\n\u2502       eupmc_result.json\r\n\u2502       fulltext.xml\r\n\u2502\r\n\u251c\u2500\u2500\u2500PMC8727598\r\n\u2502       eupmc_result.json\r\n\u2502       fulltext.xml\r\n\u2502\r\n\u251c\u2500\u2500\u2500PMC8747377\r\n\u2502       eupmc_result.json\r\n\u2502       fulltext.xml\r\n\u2502\r\n\u251c\u2500\u2500\u2500PMC8771452\r\n\u2502       eupmc_result.json\r\n\u2502       fulltext.xml\r\n\u2502\r\n\u251c\u2500\u2500\u2500PMC8775117\r\n\u2502       eupmc_result.json\r\n\u2502       fulltext.xml\r\n\u2502\r\n\u251c\u2500\u2500\u2500PMC8801761\r\n\u2502       eupmc_result.json\r\n\u2502       fulltext.xml\r\n\u2502\r\n\u251c\u2500\u2500\u2500PMC8831285\r\n\u2502       eupmc_result.json\r\n\u2502       fulltext.xml\r\n\u2502\r\n\u251c\u2500\u2500\u2500PMC8839294\r\n\u2502       eupmc_result.json\r\n\u2502       fulltext.xml\r\n\u2502\r\n\u251c\u2500\u2500\u2500PMC8840323\r\n\u2502       eupmc_result.json\r\n\u2502       fulltext.xml\r\n\u2502\r\n\u2514\u2500\u2500\u2500PMC8879232\r\n        eupmc_result.json\r\n        fulltext.xml\r\n```\r\n\r\n#### Section the papers\r\nCOMMAND\r\n```\r\ndocanalysis --project_name terpene_10 --make_section\r\n```\r\nLOGS\r\n```\r\nWARNING: Making sections in /content/terpene_10/PMC9095633/fulltext.xml\r\nINFO: dict_keys: dict_keys(['abstract', 'acknowledge', 'affiliation', 'author', 'conclusion', 'discussion', 'ethics', 'fig_caption', 'front', 'introduction', 'jrnl_title', 'keyword', 'method', 'octree', 'pdfimage', 'pub_date', 'publisher', 'reference', 'results_discuss', 'search_results', 'sections', 'svg', 'table', 'title'])\r\nWARNING: loading templates.json\r\nINFO: wrote XML sections for /content/terpene_10/PMC9095633/fulltext.xml /content/terpene_10/PMC9095633/sections\r\nWARNING: Making sections in /content/terpene_10/PMC9120863/fulltext.xml\r\nINFO: wrote XML sections for /content/terpene_10/PMC9120863/fulltext.xml /content/terpene_10/PMC9120863/sections\r\nWARNING: Making sections in /content/terpene_10/PMC8982386/fulltext.xml\r\nINFO: wrote XML sections for /content/terpene_10/PMC8982386/fulltext.xml /content/terpene_10/PMC8982386/sections\r\nWARNING: Making sections in /content/terpene_10/PMC9069239/fulltext.xml\r\nINFO: wrote XML sections for /content/terpene_10/PMC9069239/fulltext.xml /content/terpene_10/PMC9069239/sections\r\nWARNING: Making sections in /content/terpene_10/PMC9165828/fulltext.xml\r\nINFO: wrote XML sections for /content/terpene_10/PMC9165828/fulltext.xml /content/terpene_10/PMC9165828/sections\r\nWARNING: Making sections in /content/terpene_10/PMC9119530/fulltext.xml\r\nINFO: wrote XML sections for /content/terpene_10/PMC9119530/fulltext.xml /content/terpene_10/PMC9119530/sections\r\nWARNING: Making sections in /content/terpene_10/PMC8982077/fulltext.xml\r\nINFO: wrote XML sections for /content/terpene_10/PMC8982077/fulltext.xml /content/terpene_10/PMC8982077/sections\r\nWARNING: Making sections in /content/terpene_10/PMC9067962/fulltext.xml\r\nINFO: wrote XML sections for /content/terpene_10/PMC9067962/fulltext.xml /content/terpene_10/PMC9067962/sections\r\nWARNING: Making sections in /content/terpene_10/PMC9154778/fulltext.xml\r\nINFO: wrote XML sections for /content/terpene_10/PMC9154778/fulltext.xml /content/terpene_10/PMC9154778/sections\r\nWARNING: Making sections in /content/terpene_10/PMC9164016/fulltext.xml\r\nINFO: wrote XML sections for /content/terpene_10/PMC9164016/fulltext.xml /content/terpene_10/PMC9164016/sections\r\n 47% 1056/2258 [00:01<00:01, 1003.31it/s]ERROR: cannot parse /content/terpene_10/PMC9165828/sections/1_front/1_article-meta/26_custom-meta-group/0_custom-meta/1_meta-value/0_xref.xml\r\n 67% 1516/2258 [00:01<00:00, 1047.68it/s]ERROR: cannot parse /content/terpene_10/PMC9119530/sections/1_front/1_article-meta/24_custom-meta-group/0_custom-meta/1_meta-value/7_xref.xml\r\nERROR: cannot parse /content/terpene_10/PMC9119530/sections/1_front/1_article-meta/24_custom-meta-group/0_custom-meta/1_meta-value/14_email.xml\r\nERROR: cannot parse /content/terpene_10/PMC9119530/sections/1_front/1_article-meta/24_custom-meta-group/0_custom-meta/1_meta-value/3_xref.xml\r\nERROR: cannot parse /content/terpene_10/PMC9119530/sections/1_front/1_article-meta/24_custom-meta-group/0_custom-meta/1_meta-value/6_xref.xml\r\nERROR: cannot parse /content/terpene_10/PMC9119530/sections/1_front/1_article-meta/24_custom-meta-group/0_custom-meta/1_meta-value/9_email.xml\r\nERROR: cannot parse /content/terpene_10/PMC9119530/sections/1_front/1_article-meta/24_custom-meta-group/0_custom-meta/1_meta-value/10_email.xml\r\nERROR: cannot parse /content/terpene_10/PMC9119530/sections/1_front/1_article-meta/24_custom-meta-group/0_custom-meta/1_meta-value/4_xref.xml\r\n...\r\n100% 2258/2258 [00:02<00:00, 949.43it/s] \r\n```\r\n\r\nCTREE\r\n```\r\n\u251c\u2500\u2500\u2500PMC8625850\r\n\u2502   \u2514\u2500\u2500\u2500sections\r\n\u2502       \u251c\u2500\u2500\u25000_processing-meta\r\n\u2502       \u251c\u2500\u2500\u25001_front\r\n\u2502       \u2502   \u251c\u2500\u2500\u25000_journal-meta\r\n\u2502       \u2502   \u2514\u2500\u2500\u25001_article-meta\r\n\u2502       \u251c\u2500\u2500\u25002_body\r\n\u2502       \u2502   \u251c\u2500\u2500\u25000_1._introduction\r\n\u2502       \u2502   \u251c\u2500\u2500\u25001_2._materials_and_methods\r\n\u2502       \u2502   \u2502   \u251c\u2500\u2500\u25001_2.1._materials\r\n\u2502       \u2502   \u2502   \u251c\u2500\u2500\u25002_2.2._bacterial_strains\r\n\u2502       \u2502   \u2502   \u251c\u2500\u2500\u25003_2.3._preparation_and_character\r\n\u2502       \u2502   \u2502   \u251c\u2500\u2500\u25004_2.4._evaluation_of_the_effect_\r\n\u2502       \u2502   \u2502   \u251c\u2500\u2500\u25005_2.5._time-kill_studies\r\n\u2502       \u2502   \u2502   \u251c\u2500\u2500\u25006_2.6._propidium_iodide_uptake-e\r\n\u2502       \u2502   \u2502   \u2514\u2500\u2500\u25007_2.7._hemolysis_test_from_human\r\n\u2502       \u2502   \u251c\u2500\u2500\u25002_3._results\r\n\u2502       \u2502   \u2502   \u251c\u2500\u2500\u25001_3.1._encapsulation_of_terpene_\r\n\u2502       \u2502   \u2502   \u251c\u2500\u2500\u25002_3.2._both_terpene_alcohol-load\r\n\u2502       \u2502   \u2502   \u251c\u2500\u2500\u25003_3.3._farnesol_and_geraniol-loa\r\n\u2502       \u2502   \u2502   \u2514\u2500\u2500\u25004_3.4._farnesol_and_geraniol-loa\r\n\u2502       \u2502   \u251c\u2500\u2500\u25003_4._discussion\r\n\u2502       \u2502   \u251c\u2500\u2500\u25004_5._conclusions\r\n\u2502       \u2502   \u2514\u2500\u2500\u25005_6._patents\r\n\u2502       \u251c\u2500\u2500\u25003_back\r\n\u2502       \u2502   \u251c\u2500\u2500\u25000_ack\r\n\u2502       \u2502   \u251c\u2500\u2500\u25001_fn-group\r\n\u2502       \u2502   \u2502   \u2514\u2500\u2500\u25000_fn\r\n\u2502       \u2502   \u251c\u2500\u2500\u25002_app-group\r\n\u2502       \u2502   \u2502   \u2514\u2500\u2500\u25000_app\r\n\u2502       \u2502   \u2502       \u2514\u2500\u2500\u25002_supplementary-material\r\n\u2502       \u2502   \u2502           \u2514\u2500\u2500\u25000_media\r\n\u2502       \u2502   \u2514\u2500\u2500\u25009_ref-list\r\n\u2502       \u2514\u2500\u2500\u25004_floats-group\r\n\u2502           \u251c\u2500\u2500\u25004_table-wrap\r\n\u2502           \u251c\u2500\u2500\u25005_table-wrap\r\n\u2502           \u251c\u2500\u2500\u25006_table-wrap\r\n\u2502           \u2502   \u2514\u2500\u2500\u25004_table-wrap-foot\r\n\u2502           \u2502       \u2514\u2500\u2500\u25000_fn\r\n\u2502           \u251c\u2500\u2500\u25007_table-wrap\r\n\u2502           \u2514\u2500\u2500\u25008_table-wrap\r\n...\r\n```\r\n##### Search sections using dictionary\r\nCOMMAND\r\n```\r\ndocanalysis --project_name terpene_10 --output entities.csv --make_ami_dict entities.xml\r\n```\r\nLOGS\r\n```\r\nINFO: Found 7134 sentences in the section(s).\r\nINFO: getting terms from /content/activity.xml\r\n100% 7134/7134 [00:02<00:00, 3172.14it/s]\r\n/usr/local/lib/python3.7/dist-packages/docanalysis/entity_extraction.py:352: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.\r\n  \"[\", \"\").str.replace(\"]\", \"\")\r\nINFO: wrote output to /content/terpene_10/activity.csv\r\n```\r\n\r\n#### Extract entities\r\nWe use `spacy` to extract Named Entites. Here's the list of Entities it supports:CARDINAL, DATE, EVENT, FAC, GPE, LANGUAGE, LAW,LOC, MONEY, NORP, ORDINAL, ORG, PERCENT, PERSON, PRODUCT, QUANTITY, TIME, WORK_OF_ART \r\nINPUT\r\n```\r\ndocanalysis --project_name terpene_10 --make_section --spacy_model spacy --entities ORG --output org.csv\r\n```\r\nLOGS\r\n```\r\nINFO: Found 7134 sentences in the section(s).\r\nINFO: Loading spacy\r\n100% 7134/7134 [01:08<00:00, 104.16it/s]\r\n/usr/local/lib/python3.7/dist-packages/docanalysis/entity_extraction.py:352: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.\r\n  \"[\", \"\").str.replace(\"]\", \"\")\r\nINFO: wrote output to /content/terpene_10/org.csv\r\n```\r\n##### Extract information from specific section(s)\r\nYou can choose to extract entities from specific sections\r\n\r\nCOMMAND\r\n```\r\ndocanalysis --project_name terpene_10 --make_section --spacy_model spacy --search_section AUT, AFF --entities ORG --output org_aut_aff.csv\r\n```\r\nLOG\r\n```\r\nINFO: Found 28 sentences in the section(s).\r\nINFO: Loading spacy\r\n100% 28/28 [00:00<00:00, 106.66it/s]\r\n/usr/local/lib/python3.7/dist-packages/docanalysis/entity_extraction.py:352: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.\r\n  \"[\", \"\").str.replace(\"]\", \"\")\r\nINFO: wrote output to /content/terpene_10/org_aut_aff.csv\r\n```\r\n#### Create dictionary of extracted entities\r\nCOMMAND\r\n```\r\ndocanalysis --project_name terpene_10 --make_section --spacy_model spacy --search_section AUT, AFF --entities ORG --output org_aut_aff.csvv --make_ami_dict org\r\n```\r\nLOG\r\n```\r\nINFO: Found 28 sentences in the section(s).\r\nINFO: Loading spacy\r\n100% 28/28 [00:00<00:00, 96.56it/s] \r\n/usr/local/lib/python3.7/dist-packages/docanalysis/entity_extraction.py:352: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.\r\n  \"[\", \"\").str.replace(\"]\", \"\")\r\nINFO: wrote output to /content/terpene_10/org_aut_aff.csvv\r\nINFO: Wrote all the entities extracted to ami dict\r\n```\r\n\r\nSnippet of the dictionary\r\n```\r\n<?xml version=\"1.0\"?>\r\n- dictionary title=\"/content/terpene_10/org.xml\">\r\n<entry count=\"2\" term=\"Department of Biochemistry\"/>\r\n<entry count=\"2\" term=\"Chinese Academy of Agricultural Sciences\"/>\r\n<entry count=\"2\" term=\"Tianjin University\"/>\r\n<entry count=\"2\" term=\"Desert Research Center\"/>\r\n<entry count=\"2\" term=\"Chinese Academy of Sciences\"/>\r\n<entry count=\"2\" term=\"University of Colorado Boulder\"/>\r\n<entry count=\"2\" term=\"Department of Neurology\"/>\r\n<entry count=\"1\" term=\"Max Planck Institute for Chemical Ecology\"/>\r\n<entry count=\"1\" term=\"College of Forest Resources and Environmental Science\"/>\r\n<entry count=\"1\" term=\"Michigan Technological University\"/>\r\n```\r\n\r\n### Extract Abbreviations\r\n\r\n```\r\ndocanalysis --project_name corpus\\ethics_10 --output dict_search_5.csv  --make_json dict_search_5.json --make_ami_dict entities --extract_abb ethics_abb\r\n```\r\n\r\n`--extract_abb` extracts all abbreviations and make an ami-dictionary of abbreviations and its expansion. \r\n\r\nEXAMPLE DICTIONARY: \r\n```\r\n<dictionary title=\"ethics_abb\">\r\n  <entry name=\"ASD\" term=\"Atrial septal defect\"/>\r\n  <entry name=\"SPSS\" term=\"Statistical Package for Social Sciences\"/>\r\n  <entry name=\"ACGME\" term=\"Accreditation Council of Graduate Medical Education\"/>\r\n  <entry name=\"ABP\" term=\"American Board of Paediatrics\"/>\r\n  <entry name=\"TBL\" term=\"Team Based Learning\"/>\r\n  <entry name=\"TBL\" term=\"Team-Based Learning\"/>\r\n  <entry name=\"UNTH\" term=\"University of Nigeria Teaching Hospital\"/>\r\n  <entry name=\"PAH\" term=\"pulmonary hypertension\"/>\r\n  <entry name=\"HREC\" term=\"Human Sciences Research Council, Research Ethics Committee\"/>\r\n  <entry name=\"HREC\" term=\"Human Sciences Research Council, Research Ethics Committee\"/>\r\n  <entry name=\"CDC\" term=\"Center for Disease Control and Prevention\"/>\r\n  <entry name=\"ASD\" term=\"Atrial septal defect\"/>\r\n  <entry name=\"PAH\" term=\"pulmonary arterial hypertension\"/>\r\n  <entry name=\"CVDs\" term=\"cardiovascular diseases\"/>\r\n  <entry name=\"BNs\" term=\"Bayesian networks\"/>\r\n  <entry name=\"GI\" term=\"gastrointestinal cancer\"/>\r\n  <entry name=\"ART\" term=\"antiretroviral therapy\"/>\r\n  <entry name=\"HIV\" term=\"human immunodeficiency virus\"/>\r\n  <entry name=\"GATE\" term=\"Global Cooperation on Assistive Technology\"/>\r\n</dictionary>\r\n```\r\n\r\n### Search HTML\r\nIf you working with HTML files (IPCC Reports, for example) and not XMLs in CProjects, you can use `--search_html` flag.\r\n\r\n```\r\ndocanalysis --project_name corpus\\ipcc_sectioned  --extract_abb ethics_abb --search_html\r\n```\r\n\r\n Make sure that your `html` sections is in `sections` folder. Here's an example structure: \r\n\r\n```\r\nC:.\r\n|   dict_search_2.csv\r\n|   dict_search_2.json\r\n|\r\n\\---chap4\r\n    |   chapter_4\r\n    |\r\n    \\---sections\r\n            4.1.html\r\n            4.2.1.html\r\n            4.2.2.html\r\n            4.2.3.html\r\n            4.2.4.html\r\n            4.2.5.html\r\n            4.2.7.html\r\n            4.2.html\r\n            4.3.1.html\r\n            4.3.2.html\r\n            4.3.html\r\n            4.4.1.html\r\n            4.4.2.html\r\n            4.4.html\r\n            4.5.html\r\n            executive_summary.html\r\n            frequently_asked_questions.html\r\n            table_of_contents.html\r\n```\r\nIf you haven't sectioned your `html`, please use `py4ami` to section it.  \r\n#### What is a dictionary\r\nDictionary, in `ami`'s terminology, a set of terms/phrases in XML format. \r\nDictionaries related to ethics and acknowledgments are available in [Ethics Dictionary](https://github.com/petermr/docanalysis/tree/main/ethics_dictionary) folder\r\n\r\nIf you'd like to create a custom dictionary, you can find the steps, [here](https://github.com/petermr/tigr2ess/blob/master/dictionaries/TUTORIAL.md)\r\n\r\n```\r\n### Python tools used\r\n- [`pygetpapers`](https://github.com/petermr/pygetpapers) - scrape open repositories to download papers of interest\r\n- [nltk](https://www.nltk.org/) - splits sentences\r\n- [spaCy](https://spacy.io/) and  [SciSpaCy](https://allenai.github.io/scispacy/)\r\n - recognize Named-Entities and label them\r\n     - Here's the list of NER labels [SpaCy's English model](https://spacy.io/models/en) provides:  \r\n     `CARDINAL, DATE, EVENT, FAC, GPE, LANGUAGE, LAW, LOC, MONEY, NORP, ORDINAL, ORG, PERCENT, PERSON, PRODUCT, QUANTITY, TIME, WORK_OF_ART`\r\n\r\n\r\n### Credits: \r\n-  [Ayush Garg](https://github.com/ayush4921)\r\n-  [Shweata N. Hegde](https://github.com/ShweataNHegde/)\r\n-  [Daniel Mietchen](https://github.com/Daniel-Mietchen)\r\n-  [Peter Murray-Rust](https://github.com/petermr)\r\n\r\n\r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": "Apache License",
    "summary": "extract structured information from ethics paragraphs",
    "version": "0.3.0",
    "project_urls": {
        "Homepage": "https://github.com/petermr/docanalysis"
    },
    "split_keywords": [
        "research",
        "automation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c42f18fc6398e837c46ef9615334c1d413f4c9a5b0ca515a394a343a7388fee5",
                "md5": "d7d2e760daf86a093b18d6ecdfc9e8ff",
                "sha256": "99d8b8a25aaa6719dd27335d4bee972bb46243585d0c49c3a0d1fef8af00b3d3"
            },
            "downloads": -1,
            "filename": "docanalysis-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "d7d2e760daf86a093b18d6ecdfc9e8ff",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 37269,
            "upload_time": "2023-11-04T11:07:47",
            "upload_time_iso_8601": "2023-11-04T11:07:47.017644Z",
            "url": "https://files.pythonhosted.org/packages/c4/2f/18fc6398e837c46ef9615334c1d413f4c9a5b0ca515a394a343a7388fee5/docanalysis-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-04 11:07:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "petermr",
    "github_project": "docanalysis",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "docanalysis"
}

Ayush Garg, Shweata N. Hegde