# AMR annotation and feature generation
[![PyPI][pypi-badge]][pypi-link]
[![Python 3.10][python310-badge]][python310-link]
[![Python 3.11][python311-badge]][python311-link]
Provides support for AMR graph manipulation, annotations and feature
generation.
Features:
* Annotation in AMR metadata. For example, sentence types found in the Proxy
report AMR corpus.
* AMR token alignment as [spaCy] components.
* Integrates natural language parsing and features with Zensols
[zensols.nlparse] library.
* A scoring API that includes [Smatch] and [WLK], which extends a more general
[NLP scoring module].
* AMR parsing ([amrlib]) and AMR co-reference ([amr_coref]).
* Command line and API utilities for AMR graph Penman graphs, debugging and
files.
* Tools for [training and evaluating](training) new AMR parse (text to graph)
and generation (graph to text) models.
## Documentation
* [Full documentation](https://plandes.github.io/amr/index.html).
* [API reference](https://plandes.github.io/amr/api.html)
## Obtaining
The easiest way to install the command line program is via the `pip` installer:
```bash
pip3 install zensols.amr
```
Binaries are also available on [pypi].
## Usage
```python
from penman.graph import Graph
from zensols.nlp import FeatureDocument, FeatureDocumentParser
from zensols.amr import AmrDocument, AmrSentence, Dumper, ApplicationFactory
sent: str = """
He was George Washington and first president of the United States.
He was born On February 22, 1732.
""".replace('\n', ' ').strip()
# get the AMR document parser
doc_parser: FeatureDocumentParser = ApplicationFactory.get_doc_parser()
# the parser creates a NLP centric feature document as provided in the
# zensols.nlp package
doc: FeatureDocument = doc_parser(sent)
# the AMR object graph data structure is provided in the feature document
amr_doc: AmrDocument = doc.amr
# dump a human readable output of the AMR document
amr_doc.write()
# get the first AMR sentence instance
amr_sent: AmrSentence = amr_doc.sents[0]
print('sentence:')
print(' ', amr_sent.text)
print('tuples:')
# show the Penman graph representation
pgraph: Graph = amr_sent.graph
print(f'variables: {", ".join(pgraph.variables())}')
for t in pgraph.triples:
print(' ', t)
print('edges:')
for e in pgraph.edges():
print(' ', e)
# visualize the graph as a PDF
dumper: Dumper = ApplicationFactory.get_dumper()
dumper(amr_doc)
```
Per the example, the [t5.conf](test-resources/t5.conf) and
[gsii.conf](test-resources/gsii.conf) configuration show how to include
configuration needed per AMR model. These files can also be used directly with
the `amr` command using the `--config` option.
However, the other resources in the example must be imported unless you
redefine them yourself.
### Library
When adding the `amr` spaCy pipeline component, the `doc._.amr` attribute is
set on the `Doc` instance. You can either configure spaCy yourself, or you can
use the configuration files in [test-resources](test-resources) as an example
using the [zensols.util configuration framework]. The command line application
provides an example how to do this, along with the [test
case](test/python/test_amr.py).
### Command Line
This library is written mostly to be used by other program, but the command
line utility `amr` is also available to demonstrate its usage and to generate
ARM graphs on the command line.
To parse:
```lisp
$ amr parse -c test-resources/t5.conf 'This is a test of the AMR command line utility.'
# ::snt This is a test of the AMR command line utility.
(t / test-01
:ARG1 (u / utility
:mod (c / command-line)
:name (n / name
:op1 "AMR"
:toki1 "6")
:toki1 "9")
:domain (t2 / this
:toki1 "0")
:toki1 "3")
```
To generate graphs in PDF format:
```bash
$ amr plot -c test-resources/t5.conf 'This is a test of the AMR command line utility.'
wrote: amr-graph/this-is-a-test-of-the-amr-comm.pdf
```
## Training
This package uses the [amrlib] training, but adds a command line and
downloadable corpus aggregation / API. To train:
1. Choose a model (i.e. SPRING, T5).
1. Optionally edit the [train configuration](train-config) directory of the
model you choose.
1. Optionally edit the `resources/train.yml` to select/add more corpora (see
[Adding Corpora](adding-corpora)).
1. Train the model: `./amr --config train-config/<model>.conf`
### Pretrained Models
This library was used to train all of the [amrlib] models (using the same
checkpoints as [amrlib]), except the T5 Base v1 model, with additional
examples from publicly available human annotated corpora. The differences of
these trained models include:
* None of the models were tested against a training set, only the development
SMATCH scores are available. This was intentional to provide more training
examples.
* The AMR Release 3.0 ([LDC2020T02]) test set was added to the training set.
* The [Little Prince and Bio AMR](https://amr.isi.edu/download.html) corpora
where used to train the models. The first 85% of the AMR sentences were
added to training set and the remaining 15% were added to the development
set.
* The mini-batch size changed for `generate-t5wtense-base` due to memory
constraints.
* The number of training epochs were increased to account for the additional
number of training examples.
* Models have the same naming conventions but are prefixed with `zsl`.
* Generative models were trained on graphs metadata annotated by the Sci spaCy
`en_core_sci_md` model.
The performance of these models:
| Model Name | Model Type | Checkpoint | Performance |
|----------------------|------------|------------------------|---------------|
| `zsl_spring` | parse | [facebook/bart-large] | SMATCH: 81.26 |
| `zsl_xfm_bart_base` | parse | [facebook/bart-base] | SMATCH: 80.5 |
| `zsl_xfm_bart_large` | parse | [facebook/bart-large] | SMATCH: 82.7 |
| `zsl_t5wtense_base` | generative | [t5-base] | BLEU: 42.20 |
| `zsl_t5wtense_large` | generative | [google/flan-t5-large] | BLEU: 44.01 |
These models are available upon request.
### Adding Corpora
You can retrain your own model and add additional training corpora by modifying
the list of `${amr_prep_manager:preppers}` in `resources/train.yml`. This file
defines downloaded corpora for the Little Prince and Bio AMR corpora. To use
the AMR 3.0 release, add the LDC downloaded file to (a new) `download`
directory.
## Attribution
This project, or reference model code, uses:
* Python 3.11
* [amrlib] for AMR parsing.
* [amr_coref] for AMR co-reference
* [spaCy] for natural language parsing.
* [zensols.nlparse] for natural language features.
* [Smatch] (Cai and Knight. 2013) and [WLK] (Opitz et. al. 2021) for scoring.
## Citation
If you use this project in your research please use the following BibTeX entry:
```bibtex
@inproceedings{landes-etal-2023-deepzensols,
title = "{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility",
author = "Landes, Paul and
Di Eugenio, Barbara and
Caragea, Cornelia",
editor = "Tan, Liling and
Milajevs, Dmitrijs and
Chauhan, Geeticka and
Gwinnup, Jeremy and
Rippeth, Elijah",
booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
month = dec,
year = "2023",
address = "Singapore, Singapore",
publisher = "Empirical Methods in Natural Language Processing",
url = "https://aclanthology.org/2023.nlposs-1.16",
pages = "141--146"
}
```
## Changelog
An extensive changelog is available [here](CHANGELOG.md).
## Community
Please star this repository and let me know how and where you use this API.
Contributions as pull requests, feedback and any input is welcome.
## License
[MIT License](LICENSE.md)
Copyright (c) 2021 - 2024 Paul Landes
<!-- links -->
[pypi]: https://pypi.org/project/zensols.amr/
[pypi-link]: https://pypi.python.org/pypi/zensols.amr
[pypi-badge]: https://img.shields.io/pypi/v/zensols.amr.svg
[python37-badge]: https://img.shields.io/badge/python-3.7-blue.svg
[python37-link]: https://www.python.org/downloads/release/python-370
[python38-badge]: https://img.shields.io/badge/python-3.8-blue.svg
[python38-link]: https://www.python.org/downloads/release/python-380
[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg
[python310-link]: https://www.python.org/downloads/release/python-3100
[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg
[python311-link]: https://www.python.org/downloads/release/python-3110
[spaCy]: https://spacy.io
[amrlib]: https://github.com/bjascob/amrlib
[amr_coref]: https://github.com/bjascob/amr_coref
[Smatch]: https://github.com/snowblink14/smatch
[WLK]: https://github.com/flipz357/weisfeiler-leman-amr-metrics
[zensols.nlparse]: https://github.com/plandes/nlparse
[zensols.util configuration framework]: https://plandes.github.io/util/doc/config.html
[NLP scoring module]: https://plandes.github.io/nlparse/api/zensols.nlp.html#zensols-nlp-score
[LDC2020T02]: https://catalog.ldc.upenn.edu/LDC2020T02
[facebook/bart-large]: https://huggingface.co/facebook/bart-large
[facebook/bart-base]: https://huggingface.co/facebook/bart-base
[t5-base]: https://huggingface.co/google-t5/t5-base
[google/flan-t5-large]: https://huggingface.co/google/flan-t5-large
Raw data
{
"_id": null,
"home_page": "https://github.com/plandes/amr",
"name": "zensols.amr",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "tooling",
"author": "Paul Landes",
"author_email": "landes@mailc.net",
"download_url": "https://github.com/plandes/amr/releases/download/v0.1.2/zensols.amr-0.1.2-py3-none-any.whl",
"platform": null,
"description": "# AMR annotation and feature generation\n\n[![PyPI][pypi-badge]][pypi-link]\n[![Python 3.10][python310-badge]][python310-link]\n[![Python 3.11][python311-badge]][python311-link]\n\nProvides support for AMR graph manipulation, annotations and feature\ngeneration.\n\nFeatures:\n* Annotation in AMR metadata. For example, sentence types found in the Proxy\n report AMR corpus.\n* AMR token alignment as [spaCy] components.\n* Integrates natural language parsing and features with Zensols\n [zensols.nlparse] library.\n* A scoring API that includes [Smatch] and [WLK], which extends a more general\n [NLP scoring module].\n* AMR parsing ([amrlib]) and AMR co-reference ([amr_coref]).\n* Command line and API utilities for AMR graph Penman graphs, debugging and\n files.\n* Tools for [training and evaluating](training) new AMR parse (text to graph)\n and generation (graph to text) models.\n\n\n## Documentation\n\n* [Full documentation](https://plandes.github.io/amr/index.html).\n* [API reference](https://plandes.github.io/amr/api.html)\n\n\n## Obtaining\n\nThe easiest way to install the command line program is via the `pip` installer:\n```bash\npip3 install zensols.amr\n```\n\nBinaries are also available on [pypi].\n\n\n## Usage\n\n```python\nfrom penman.graph import Graph\nfrom zensols.nlp import FeatureDocument, FeatureDocumentParser\nfrom zensols.amr import AmrDocument, AmrSentence, Dumper, ApplicationFactory\n\nsent: str = \"\"\"\n\nHe was George Washington and first president of the United States.\nHe was born On February 22, 1732.\n\n\"\"\".replace('\\n', ' ').strip()\n\n# get the AMR document parser\ndoc_parser: FeatureDocumentParser = ApplicationFactory.get_doc_parser()\n\n# the parser creates a NLP centric feature document as provided in the\n# zensols.nlp package\ndoc: FeatureDocument = doc_parser(sent)\n\n# the AMR object graph data structure is provided in the feature document\namr_doc: AmrDocument = doc.amr\n\n# dump a human readable output of the AMR document\namr_doc.write()\n\n# get the first AMR sentence instance\namr_sent: AmrSentence = amr_doc.sents[0]\nprint('sentence:')\nprint(' ', amr_sent.text)\nprint('tuples:')\n\n# show the Penman graph representation\npgraph: Graph = amr_sent.graph\nprint(f'variables: {\", \".join(pgraph.variables())}')\nfor t in pgraph.triples:\n print(' ', t)\nprint('edges:')\nfor e in pgraph.edges():\n print(' ', e)\n\n# visualize the graph as a PDF\ndumper: Dumper = ApplicationFactory.get_dumper()\ndumper(amr_doc)\n```\n\nPer the example, the [t5.conf](test-resources/t5.conf) and\n[gsii.conf](test-resources/gsii.conf) configuration show how to include\nconfiguration needed per AMR model. These files can also be used directly with\nthe `amr` command using the `--config` option.\n\nHowever, the other resources in the example must be imported unless you\nredefine them yourself.\n\n\n### Library\n\nWhen adding the `amr` spaCy pipeline component, the `doc._.amr` attribute is\nset on the `Doc` instance. You can either configure spaCy yourself, or you can\nuse the configuration files in [test-resources](test-resources) as an example\nusing the [zensols.util configuration framework]. The command line application\nprovides an example how to do this, along with the [test\ncase](test/python/test_amr.py).\n\n\n### Command Line\n\nThis library is written mostly to be used by other program, but the command\nline utility `amr` is also available to demonstrate its usage and to generate\nARM graphs on the command line.\n\nTo parse:\n```lisp\n$ amr parse -c test-resources/t5.conf 'This is a test of the AMR command line utility.'\n# ::snt This is a test of the AMR command line utility.\n(t / test-01\n :ARG1 (u / utility\n :mod (c / command-line)\n :name (n / name\n :op1 \"AMR\"\n :toki1 \"6\")\n :toki1 \"9\")\n :domain (t2 / this\n :toki1 \"0\")\n :toki1 \"3\")\n```\n\nTo generate graphs in PDF format:\n```bash\n$ amr plot -c test-resources/t5.conf 'This is a test of the AMR command line utility.'\nwrote: amr-graph/this-is-a-test-of-the-amr-comm.pdf\n```\n\n\n## Training\n\nThis package uses the [amrlib] training, but adds a command line and\ndownloadable corpus aggregation / API. To train:\n\n1. Choose a model (i.e. SPRING, T5).\n1. Optionally edit the [train configuration](train-config) directory of the\n model you choose.\n1. Optionally edit the `resources/train.yml` to select/add more corpora (see\n [Adding Corpora](adding-corpora)).\n1. Train the model: `./amr --config train-config/<model>.conf`\n\n\n### Pretrained Models\n\nThis library was used to train all of the [amrlib] models (using the same\ncheckpoints as [amrlib]), except the T5 Base v1 model, with additional\nexamples from publicly available human annotated corpora. The differences of\nthese trained models include:\n\n* None of the models were tested against a training set, only the development\n SMATCH scores are available. This was intentional to provide more training\n examples.\n* The AMR Release 3.0 ([LDC2020T02]) test set was added to the training set.\n* The [Little Prince and Bio AMR](https://amr.isi.edu/download.html) corpora\n where used to train the models. The first 85% of the AMR sentences were\n added to training set and the remaining 15% were added to the development\n set.\n* The mini-batch size changed for `generate-t5wtense-base` due to memory\n constraints.\n* The number of training epochs were increased to account for the additional\n number of training examples.\n* Models have the same naming conventions but are prefixed with `zsl`.\n* Generative models were trained on graphs metadata annotated by the Sci spaCy\n `en_core_sci_md` model.\n\nThe performance of these models:\n\n| Model Name | Model Type | Checkpoint | Performance |\n|----------------------|------------|------------------------|---------------|\n| `zsl_spring` | parse | [facebook/bart-large] | SMATCH: 81.26 |\n| `zsl_xfm_bart_base` | parse | [facebook/bart-base] | SMATCH: 80.5 |\n| `zsl_xfm_bart_large` | parse | [facebook/bart-large] | SMATCH: 82.7 |\n| `zsl_t5wtense_base` | generative | [t5-base] | BLEU: 42.20 |\n| `zsl_t5wtense_large` | generative | [google/flan-t5-large] | BLEU: 44.01 |\n\nThese models are available upon request.\n\n\n### Adding Corpora\n\nYou can retrain your own model and add additional training corpora by modifying\nthe list of `${amr_prep_manager:preppers}` in `resources/train.yml`. This file\ndefines downloaded corpora for the Little Prince and Bio AMR corpora. To use\nthe AMR 3.0 release, add the LDC downloaded file to (a new) `download`\ndirectory.\n\n\n## Attribution\n\nThis project, or reference model code, uses:\n\n* Python 3.11\n* [amrlib] for AMR parsing.\n* [amr_coref] for AMR co-reference\n* [spaCy] for natural language parsing.\n* [zensols.nlparse] for natural language features.\n* [Smatch] (Cai and Knight. 2013) and [WLK] (Opitz et. al. 2021) for scoring.\n\n\n## Citation\n\nIf you use this project in your research please use the following BibTeX entry:\n\n```bibtex\n@inproceedings{landes-etal-2023-deepzensols,\n\ttitle = \"{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility\",\n\tauthor = \"Landes, Paul and\n\t Di Eugenio, Barbara and\n\t Caragea, Cornelia\",\n\teditor = \"Tan, Liling and\n\t Milajevs, Dmitrijs and\n\t Chauhan, Geeticka and\n\t Gwinnup, Jeremy and\n\t Rippeth, Elijah\",\n\tbooktitle = \"Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)\",\n\tmonth = dec,\n\tyear = \"2023\",\n\taddress = \"Singapore, Singapore\",\n\tpublisher = \"Empirical Methods in Natural Language Processing\",\n\turl = \"https://aclanthology.org/2023.nlposs-1.16\",\n\tpages = \"141--146\"\n}\n```\n\n\n## Changelog\n\nAn extensive changelog is available [here](CHANGELOG.md).\n\n\n## Community\n\nPlease star this repository and let me know how and where you use this API.\nContributions as pull requests, feedback and any input is welcome.\n\n\n## License\n\n[MIT License](LICENSE.md)\n\nCopyright (c) 2021 - 2024 Paul Landes\n\n\n<!-- links -->\n[pypi]: https://pypi.org/project/zensols.amr/\n[pypi-link]: https://pypi.python.org/pypi/zensols.amr\n[pypi-badge]: https://img.shields.io/pypi/v/zensols.amr.svg\n[python37-badge]: https://img.shields.io/badge/python-3.7-blue.svg\n[python37-link]: https://www.python.org/downloads/release/python-370\n[python38-badge]: https://img.shields.io/badge/python-3.8-blue.svg\n[python38-link]: https://www.python.org/downloads/release/python-380\n[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg\n[python310-link]: https://www.python.org/downloads/release/python-3100\n[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg\n[python311-link]: https://www.python.org/downloads/release/python-3110\n\n[spaCy]: https://spacy.io\n[amrlib]: https://github.com/bjascob/amrlib\n[amr_coref]: https://github.com/bjascob/amr_coref\n[Smatch]: https://github.com/snowblink14/smatch\n[WLK]: https://github.com/flipz357/weisfeiler-leman-amr-metrics\n[zensols.nlparse]: https://github.com/plandes/nlparse\n[zensols.util configuration framework]: https://plandes.github.io/util/doc/config.html\n[NLP scoring module]: https://plandes.github.io/nlparse/api/zensols.nlp.html#zensols-nlp-score\n[LDC2020T02]: https://catalog.ldc.upenn.edu/LDC2020T02\n\n[facebook/bart-large]: https://huggingface.co/facebook/bart-large\n[facebook/bart-base]: https://huggingface.co/facebook/bart-base\n[t5-base]: https://huggingface.co/google-t5/t5-base\n[google/flan-t5-large]: https://huggingface.co/google/flan-t5-large\n",
"bugtrack_url": null,
"license": null,
"summary": "Adapts amrlib in the Zensols framework.",
"version": "0.1.2",
"project_urls": {
"Download": "https://github.com/plandes/amr/releases/download/v0.1.2/zensols.amr-0.1.2-py3-none-any.whl",
"Homepage": "https://github.com/plandes/amr"
},
"split_keywords": [
"tooling"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2313ea3e503fbb0ef3d0bbae8c62f66ebe384b615d725b11379f6704f50098b7",
"md5": "04431d2a447c6c393dc2a0bc596c017e",
"sha256": "a461c3110141cbe034f410ea48fbb320d01ee15a2162539f3c1aefa19d050557"
},
"downloads": -1,
"filename": "zensols.amr-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "04431d2a447c6c393dc2a0bc596c017e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 98959,
"upload_time": "2024-04-15T16:54:38",
"upload_time_iso_8601": "2024-04-15T16:54:38.794735Z",
"url": "https://files.pythonhosted.org/packages/23/13/ea3e503fbb0ef3d0bbae8c62f66ebe384b615d725b11379f6704f50098b7/zensols.amr-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-15 16:54:38",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "plandes",
"github_project": "amr",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "zensols.amr"
}