tuw-nlp

Name	tuw-nlp JSON
Version	0.1.0 JSON
	download
home_page	http://github.com/recski/tuw-nlp
Summary	NLP tools at TUW Informatics
upload_time	2023-04-17 09:18:09
maintainer
docs_url	None
author	Gabor Recski, Adam Kovacs
requires_python
license	MIT
keywords	nlp graph transformation explainable ai xai semantic graphs
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # TUW-NLP

NLP utilities developed at TUW informatics. 

The main goal of the library is to provide a unified interface for working with different semantic graph representations. To represent 
graphs we use the [networkx](https://networkx.org/) library.
Currently you can use the following semantic graph representations integrated in the library:
- [4lang](#4lang)
- [UD](#ud) (Universal Dependencies)
- [AMR](#amr) (Abstract Meaning Representation)
- [SDP](#sdp) (Semantic Dependency Parsing)
- [UCCA](#ucca) (Universal Conceptual Cognitive Annotation)
- [DRS](#drs) (Discourse Representation Structure)

## Setup and Usage
Install the tuw-nlp repository from pip:

```
pip install tuw-nlp
```

Or install from source:
```
pip install -e .
```

On Windows and Mac, you might also need to install [Graphviz](https://graphviz.org/download/) manually.

You will also need some additional steps to use the library:

Download nltk resources:

```python
import nltk
nltk.download('stopwords')
nltk.download('propbank')
```

Download stanza models for UD parsing:

```python
import stanza

stanza.download("en")
stanza.download("de")
```

### 4lang

The [4lang](https://github.com/kornai/4lang) semantic graph representation is implemented in the repository. We use Interpreted Regular Tree Grammars (IRTGs) to build the graphs from UD trees. The grammar can be found in the [lexicon](tuw_nlp/grammar/lexicon.py). It supports English and German.

To use the parser download the [alto](https://github.com/coli-saar/alto) parser and tuw_nlp dictionaries:

```python
import tuw_nlp

tuw_nlp.download_alto()
tuw_nlp.download_definitions()
```

__Also please make sure to have JAVA on your system to be able to use the parser!__

Then you can parse a sentence as simple as:

```python
from tuw_nlp.grammar.text_to_4lang import TextTo4lang

tfl = TextTo4lang("en", "en_nlp_cache")

fl_graphs = list(tfl("brown dog", depth=1, substitute=False))

# Then the fl_graphs are the classes that contains the networkx graph object
fl_graphs[0].G.nodes(data=True)

# Visualize the graph
fl_graphs[0].to_dot()

```

### UD
To parse [Universal Dependencies](https://universaldependencies.org/)
into networkx format, we use the [stanza](https://stanfordnlp.github.io/stanza/) library. You can use all the languages supported by stanza:
https://stanfordnlp.github.io/stanza/models.html

For parsing you can use the snippet above, just use the `TextToUD` class.

### AMR
For parsing [Abstract Meaning Representation](https://amr.isi.edu/) graphs we use the [amrlib](https://amrlib.readthedocs.io/en/latest/) library. Models are only available for English.

If you want to use AMR parsing, install the [amrlib](https://amrlib.readthedocs.io/en/latest/) package (this is also included in the __setup__ file) and download the models:

```bash
pip install amrlib
```

Go to the [amrlib](https://amrlib.readthedocs.io/en/latest/) repository and follow the instructions to download the models. 

Then also download the spacy model for AMR parsing:

```bash
python -m spacy download en_core_web_sm
```

To parse AMR, see the `TextToAMR` class.

### SDP
For parsing [Semantic Dependency Parsing](https://aclanthology.org/S15-2153/) we integrated the Semantic Dependency Parser from the [SuPar](https://github.com/yzhangcs/parser) library. Models are only available for English. 

See the `TextToSDP` class for more information.

### UCCA
For parsing [UCCA](https://github.com/UniversalConceptualCognitiveAnnotation/tutorial) graphs we integrated the __tupa__ parser. See our fork of the parser [here](https://github.com/adaamko/tupa). Because of the complexity of the parser, we included a docker image that contains the parser and all the necessary dependencies. You can use the docker image to parse UCCA graphs. To see more go to the [services](services/ucca_service/) folder and follow the instructions there.

UCCA parsing is currently supporting English, French, German and Hebrew. The docker service is a REST API that you can use to parse UCCA graphs. To convert the output to networkx graphs, see the `TextToUCCA` class.

### DRS
The task of __Discourse Representation Structure (DRS) parsing__ is to convert text into formal meaning representations in the style of Discourse Representation Theory (DRT; Kamp and Reyle 1993). 

To make it compatible with our library, we make use of the paper titled [Transparent Semantic Parsing with Universal Dependencies Using Graph Transformations](https://aclanthology.org/2022.coling-1.367/). This work first transformes DRS structures into graphs (DRG). They make use of a rule-based method developed using the [GREW](https://grew.fr) library.

Because of the complexity of the parser, we included a docker image that contains the parser and all the necessary dependencies. You can use the docker image to parse DRS graphs. To see more go to the [services](services/boxer_service/) folder and follow the instructions there. For parsing we our own fork of the [ud-boxer](https://github.com/adaamko/ud-boxer) repository. Currently it supports English, Italian, German and Dutch.

To convert the output of the REST API (from the docker service) to networkx graphs, see the `TextToDRS` class.

For more examples you can check our [experiments](notebooks/experiments.ipynb) jupyter notebook.

### Command line interface

We provide a simple script to parse into any of the supported formats.
For this you can use the `scripts/semparse.py` script. For usage:

```bash
usage: semparse.py [-h] [-f FORMAT] [-cd CACHE_DIR] [-cn NLP_CACHE] -l LANG [-d DEPTH] [-s SUBSTITUTE] [-p PREPROCESSOR] [-o OUT_DIR]

optional arguments:
  -h, --help            show this help message and exit
  -f FORMAT, --format FORMAT
  -cd CACHE_DIR, --cache-dir CACHE_DIR
  -cn NLP_CACHE, --nlp-cache NLP_CACHE
  -l LANG, --lang LANG
  -d DEPTH, --depth DEPTH
  -s SUBSTITUTE, --substitute SUBSTITUTE
  -p PREPROCESSOR, --preprocessor PREPROCESSOR
  -o OUT_DIR, --out-dir OUT_DIR
```

For example to parse a sentence into UCCA graph run:
  
  ```bash
  echo "A police statement did not name the man in the boot, but in effect indicated the traveler was State Secretary Samuli Virtanen, who is also the deputy to Foreign Minister Timo Soini." | python scripts/semparse.py -f ucca -l en -cn cache/nlp_cache_en.json
  ```

## Services

We also provide services built on our package. To get to know more visit [services](services).

### Text_to_4lang service

To run a browser-based demo (also available [online](https://ir-group.ec.tuwien.ac.at/fourlang)) for building graphs from raw texts, first start the graph building service:

```
python services/text_to_4lang/backend/service.py
```

Then run the frontend with this command:

```
streamlit run services/text_to_4lang/frontend/demo.py
```

In the demo you can parse english and german sentences and you can also try out multiple algorithms our graphs implement, such as `expand`, `substitute` and `append_zero_paths`.

## Modules

### text 

General text processing utilities, contains:
- segmentation: stanza-based processors for word and sentence level segmentation
- patterns: various patterns for text processing tasks 

### graph
Tools for working with graphs, contains:
- utils: misc utilities for working with graphs

### grammar
Tools for generating and using grammars, contains:
- alto: tools for interfacing with the [alto](https://github.com/coli-saar/alto) tool
- irtg: class for representing Interpreted Regular Tree Grammars
- lexicon: Rule lexica for building lexicalized grammars
- ud_fl: grammar-based mapping of [Universal Dependencies](https://universaldependencies.org/) to [4lang](https://github.com/kornai/4lang) semantic graphs.
- utils: misc utilities for working with grammars

## Contributing

We welcome all contributions! Please fork this repository and create a branch for your modifications. We suggest getting in touch with us first, by opening an issue or by writing an email to Gabor Recski or Adam Kovacs at firstname.lastname@tuwien.ac.at

## Citing

If you use the library, please cite our [paper](http://ceur-ws.org/Vol-2888/paper3.pdf)

```bib
@inproceedings{Recski:2021,
  title={Explainable Rule Extraction via Semantic Graphs},
  author={Recski, Gabor and Lellmann, Bj{\"o}rn and Kovacs, Adam and Hanbury, Allan},
  booktitle = {{Proceedings of the Fifth Workshop on Automated Semantic Analysis
of Information in Legal Text (ASAIL 2021)}},
  publisher = {{CEUR Workshop Proceedings}},
  address = {São Paulo, Brazil},
  pages="24--35",
  url= "http://ceur-ws.org/Vol-2888/paper3.pdf",
  year={2021}
}
```

## License 

MIT license

Raw data

            {
    "_id": null,
    "home_page": "http://github.com/recski/tuw-nlp",
    "name": "tuw-nlp",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "NLP graph transformation explainable AI XAI semantic graphs",
    "author": "Gabor Recski, Adam Kovacs",
    "author_email": "gabor.recski@tuwien.ac.at,adam.kovacs@tuwien.ac.at",
    "download_url": "https://files.pythonhosted.org/packages/db/52/892c6493ccdd28babffe2cf2faf25b4b92558fc0abbe0c179d7f52c6e1ed/tuw-nlp-0.1.0.tar.gz",
    "platform": null,
    "description": "# TUW-NLP\n\nNLP utilities developed at TUW informatics. \n\nThe main goal of the library is to provide a unified interface for working with different semantic graph representations. To represent \ngraphs we use the [networkx](https://networkx.org/) library.\nCurrently you can use the following semantic graph representations integrated in the library:\n- [4lang](#4lang)\n- [UD](#ud) (Universal Dependencies)\n- [AMR](#amr) (Abstract Meaning Representation)\n- [SDP](#sdp) (Semantic Dependency Parsing)\n- [UCCA](#ucca) (Universal Conceptual Cognitive Annotation)\n- [DRS](#drs) (Discourse Representation Structure)\n\n## Setup and Usage\nInstall the tuw-nlp repository from pip:\n\n```\npip install tuw-nlp\n```\n\nOr install from source:\n```\npip install -e .\n```\n\nOn Windows and Mac, you might also need to install [Graphviz](https://graphviz.org/download/) manually.\n\nYou will also need some additional steps to use the library:\n\nDownload nltk resources:\n\n```python\nimport nltk\nnltk.download('stopwords')\nnltk.download('propbank')\n```\n\nDownload stanza models for UD parsing:\n\n```python\nimport stanza\n\nstanza.download(\"en\")\nstanza.download(\"de\")\n```\n\n### 4lang\n\nThe [4lang](https://github.com/kornai/4lang) semantic graph representation is implemented in the repository. We use Interpreted Regular Tree Grammars (IRTGs) to build the graphs from UD trees. The grammar can be found in the [lexicon](tuw_nlp/grammar/lexicon.py). It supports English and German.\n\nTo use the parser download the [alto](https://github.com/coli-saar/alto) parser and tuw_nlp dictionaries:\n\n```python\nimport tuw_nlp\n\ntuw_nlp.download_alto()\ntuw_nlp.download_definitions()\n```\n\n__Also please make sure to have JAVA on your system to be able to use the parser!__\n\nThen you can parse a sentence as simple as:\n\n```python\nfrom tuw_nlp.grammar.text_to_4lang import TextTo4lang\n\ntfl = TextTo4lang(\"en\", \"en_nlp_cache\")\n\nfl_graphs = list(tfl(\"brown dog\", depth=1, substitute=False))\n\n# Then the fl_graphs are the classes that contains the networkx graph object\nfl_graphs[0].G.nodes(data=True)\n\n# Visualize the graph\nfl_graphs[0].to_dot()\n\n```\n\n### UD\nTo parse [Universal Dependencies](https://universaldependencies.org/)\ninto networkx format, we use the [stanza](https://stanfordnlp.github.io/stanza/) library. You can use all the languages supported by stanza:\nhttps://stanfordnlp.github.io/stanza/models.html\n\nFor parsing you can use the snippet above, just use the `TextToUD` class.\n\n### AMR\nFor parsing [Abstract Meaning Representation](https://amr.isi.edu/) graphs we use the [amrlib](https://amrlib.readthedocs.io/en/latest/) library. Models are only available for English.\n\nIf you want to use AMR parsing, install the [amrlib](https://amrlib.readthedocs.io/en/latest/) package (this is also included in the __setup__ file) and download the models:\n\n```bash\npip install amrlib\n```\n\nGo to the [amrlib](https://amrlib.readthedocs.io/en/latest/) repository and follow the instructions to download the models. \n\nThen also download the spacy model for AMR parsing:\n\n```bash\npython -m spacy download en_core_web_sm\n```\n\nTo parse AMR, see the `TextToAMR` class.\n\n### SDP\nFor parsing [Semantic Dependency Parsing](https://aclanthology.org/S15-2153/) we integrated the Semantic Dependency Parser from the [SuPar](https://github.com/yzhangcs/parser) library. Models are only available for English. \n\nSee the `TextToSDP` class for more information.\n\n### UCCA\nFor parsing [UCCA](https://github.com/UniversalConceptualCognitiveAnnotation/tutorial) graphs we integrated the __tupa__ parser. See our fork of the parser [here](https://github.com/adaamko/tupa). Because of the complexity of the parser, we included a docker image that contains the parser and all the necessary dependencies. You can use the docker image to parse UCCA graphs. To see more go to the [services](services/ucca_service/) folder and follow the instructions there.\n\nUCCA parsing is currently supporting English, French, German and Hebrew. The docker service is a REST API that you can use to parse UCCA graphs. To convert the output to networkx graphs, see the `TextToUCCA` class.\n\n### DRS\nThe task of __Discourse Representation Structure (DRS) parsing__ is to convert text into formal meaning representations in the style of Discourse Representation Theory (DRT; Kamp and Reyle 1993). \n\nTo make it compatible with our library, we make use of the paper titled [Transparent Semantic Parsing with Universal Dependencies Using Graph Transformations](https://aclanthology.org/2022.coling-1.367/). This work first transformes DRS structures into graphs (DRG). They make use of a rule-based method developed using the [GREW](https://grew.fr) library.\n\nBecause of the complexity of the parser, we included a docker image that contains the parser and all the necessary dependencies. You can use the docker image to parse DRS graphs. To see more go to the [services](services/boxer_service/) folder and follow the instructions there. For parsing we our own fork of the [ud-boxer](https://github.com/adaamko/ud-boxer) repository. Currently it supports English, Italian, German and Dutch.\n\nTo convert the output of the REST API (from the docker service) to networkx graphs, see the `TextToDRS` class.\n\nFor more examples you can check our [experiments](notebooks/experiments.ipynb) jupyter notebook.\n\n### Command line interface\n\nWe provide a simple script to parse into any of the supported formats.\nFor this you can use the `scripts/semparse.py` script. For usage:\n\n```bash\nusage: semparse.py [-h] [-f FORMAT] [-cd CACHE_DIR] [-cn NLP_CACHE] -l LANG [-d DEPTH] [-s SUBSTITUTE] [-p PREPROCESSOR] [-o OUT_DIR]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -f FORMAT, --format FORMAT\n  -cd CACHE_DIR, --cache-dir CACHE_DIR\n  -cn NLP_CACHE, --nlp-cache NLP_CACHE\n  -l LANG, --lang LANG\n  -d DEPTH, --depth DEPTH\n  -s SUBSTITUTE, --substitute SUBSTITUTE\n  -p PREPROCESSOR, --preprocessor PREPROCESSOR\n  -o OUT_DIR, --out-dir OUT_DIR\n```\n\nFor example to parse a sentence into UCCA graph run:\n  \n  ```bash\n  echo \"A police statement did not name the man in the boot, but in effect indicated the traveler was State Secretary Samuli Virtanen, who is also the deputy to Foreign Minister Timo Soini.\" | python scripts/semparse.py -f ucca -l en -cn cache/nlp_cache_en.json\n  ```\n\n## Services\n\nWe also provide services built on our package. To get to know more visit [services](services).\n\n### Text_to_4lang service\n\nTo run a browser-based demo (also available [online](https://ir-group.ec.tuwien.ac.at/fourlang)) for building graphs from raw texts, first start the graph building service:\n\n```\npython services/text_to_4lang/backend/service.py\n```\n\nThen run the frontend with this command:\n\n```\nstreamlit run services/text_to_4lang/frontend/demo.py\n```\n\nIn the demo you can parse english and german sentences and you can also try out multiple algorithms our graphs implement, such as `expand`, `substitute` and `append_zero_paths`.\n\n## Modules\n\n### text \n\nGeneral text processing utilities, contains:\n- segmentation: stanza-based processors for word and sentence level segmentation\n- patterns: various patterns for text processing tasks \n\n### graph\nTools for working with graphs, contains:\n- utils: misc utilities for working with graphs\n\n### grammar\nTools for generating and using grammars, contains:\n- alto: tools for interfacing with the [alto](https://github.com/coli-saar/alto) tool\n- irtg: class for representing Interpreted Regular Tree Grammars\n- lexicon: Rule lexica for building lexicalized grammars\n- ud_fl: grammar-based mapping of [Universal Dependencies](https://universaldependencies.org/) to [4lang](https://github.com/kornai/4lang) semantic graphs.\n- utils: misc utilities for working with grammars\n\n## Contributing\n\nWe welcome all contributions! Please fork this repository and create a branch for your modifications. We suggest getting in touch with us first, by opening an issue or by writing an email to Gabor Recski or Adam Kovacs at firstname.lastname@tuwien.ac.at\n\n## Citing\n\nIf you use the library, please cite our [paper](http://ceur-ws.org/Vol-2888/paper3.pdf)\n\n```bib\n@inproceedings{Recski:2021,\n  title={Explainable Rule Extraction via Semantic Graphs},\n  author={Recski, Gabor and Lellmann, Bj{\\\"o}rn and Kovacs, Adam and Hanbury, Allan},\n  booktitle = {{Proceedings of the Fifth Workshop on Automated Semantic Analysis\nof Information in Legal Text (ASAIL 2021)}},\n  publisher = {{CEUR Workshop Proceedings}},\n  address = {S\u00e3o Paulo, Brazil},\n  pages=\"24--35\",\n  url= \"http://ceur-ws.org/Vol-2888/paper3.pdf\",\n  year={2021}\n}\n```\n\n## License \n\nMIT license\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "NLP tools at TUW Informatics",
    "version": "0.1.0",
    "split_keywords": [
        "nlp",
        "graph",
        "transformation",
        "explainable",
        "ai",
        "xai",
        "semantic",
        "graphs"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "99bd98379d705f1647bd17a1afcb72970584e091d5eb732efbcfcc0fe5fb0622",
                "md5": "a1aea2e89f1298bc2b246453d6434c17",
                "sha256": "2650abbeb3a981191d456d609d2689439c068c81a8e1ed40c255536d0bcf2ab8"
            },
            "downloads": -1,
            "filename": "tuw_nlp-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a1aea2e89f1298bc2b246453d6434c17",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 32461985,
            "upload_time": "2023-04-17T09:17:58",
            "upload_time_iso_8601": "2023-04-17T09:17:58.346816Z",
            "url": "https://files.pythonhosted.org/packages/99/bd/98379d705f1647bd17a1afcb72970584e091d5eb732efbcfcc0fe5fb0622/tuw_nlp-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "db52892c6493ccdd28babffe2cf2faf25b4b92558fc0abbe0c179d7f52c6e1ed",
                "md5": "398af1e7fd453289e7743f35edab4a6e",
                "sha256": "9a4ad70754f8f00a0a91f38d02a3da964ce71170aa4fa813eae2c283eb5c4df4"
            },
            "downloads": -1,
            "filename": "tuw-nlp-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "398af1e7fd453289e7743f35edab4a6e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 32148639,
            "upload_time": "2023-04-17T09:18:09",
            "upload_time_iso_8601": "2023-04-17T09:18:09.781291Z",
            "url": "https://files.pythonhosted.org/packages/db/52/892c6493ccdd28babffe2cf2faf25b4b92558fc0abbe0c179d7f52c6e1ed/tuw-nlp-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-17 09:18:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "recski",
    "github_project": "tuw-nlp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "tuw-nlp"
}

Gabor Recski, Adam Kovacs