zensols.amr


Namezensols.amr JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/plandes/amr
SummaryAdapts amrlib in the Zensols framework.
upload_time2024-04-15 16:54:38
maintainerNone
docs_urlNone
authorPaul Landes
requires_pythonNone
licenseNone
keywords tooling
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # AMR annotation and feature generation

[![PyPI][pypi-badge]][pypi-link]
[![Python 3.10][python310-badge]][python310-link]
[![Python 3.11][python311-badge]][python311-link]

Provides support for AMR graph manipulation, annotations and feature
generation.

Features:
* Annotation in AMR metadata.  For example, sentence types found in the Proxy
  report AMR corpus.
* AMR token alignment as [spaCy] components.
* Integrates natural language parsing and features with Zensols
  [zensols.nlparse] library.
* A scoring API that includes [Smatch] and [WLK], which extends a more general
  [NLP scoring module].
* AMR parsing ([amrlib]) and AMR co-reference ([amr_coref]).
* Command line and API utilities for AMR graph Penman graphs, debugging and
  files.
* Tools for [training and evaluating](training) new AMR parse (text to graph)
  and generation (graph to text) models.


## Documentation

* [Full documentation](https://plandes.github.io/amr/index.html).
* [API reference](https://plandes.github.io/amr/api.html)


## Obtaining

The easiest way to install the command line program is via the `pip` installer:
```bash
pip3 install zensols.amr
```

Binaries are also available on [pypi].


## Usage

```python
from penman.graph import Graph
from zensols.nlp import FeatureDocument, FeatureDocumentParser
from zensols.amr import AmrDocument, AmrSentence, Dumper, ApplicationFactory

sent: str = """

He was George Washington and first president of the United States.
He was born On February 22, 1732.

""".replace('\n', ' ').strip()

# get the AMR document parser
doc_parser: FeatureDocumentParser = ApplicationFactory.get_doc_parser()

# the parser creates a NLP centric feature document as provided in the
# zensols.nlp package
doc: FeatureDocument = doc_parser(sent)

# the AMR object graph data structure is provided in the feature document
amr_doc: AmrDocument = doc.amr

# dump a human readable output of the AMR document
amr_doc.write()

# get the first AMR sentence instance
amr_sent: AmrSentence = amr_doc.sents[0]
print('sentence:')
print(' ', amr_sent.text)
print('tuples:')

# show the Penman graph representation
pgraph: Graph = amr_sent.graph
print(f'variables: {", ".join(pgraph.variables())}')
for t in pgraph.triples:
    print(' ', t)
print('edges:')
for e in pgraph.edges():
    print(' ', e)

# visualize the graph as a PDF
dumper: Dumper = ApplicationFactory.get_dumper()
dumper(amr_doc)
```

Per the example, the [t5.conf](test-resources/t5.conf) and
[gsii.conf](test-resources/gsii.conf) configuration show how to include
configuration needed per AMR model.  These files can also be used directly with
the `amr` command using the `--config` option.

However, the other resources in the example must be imported unless you
redefine them yourself.


### Library

When adding the `amr` spaCy pipeline component, the `doc._.amr` attribute is
set on the `Doc` instance.  You can either configure spaCy yourself, or you can
use the configuration files in [test-resources](test-resources) as an example
using the [zensols.util configuration framework].  The command line application
provides an example how to do this, along with the [test
case](test/python/test_amr.py).


### Command Line

This library is written mostly to be used by other program, but the command
line utility `amr` is also available to demonstrate its usage and to generate
ARM graphs on the command line.

To parse:
```lisp
$ amr parse -c test-resources/t5.conf 'This is a test of the AMR command line utility.'
# ::snt This is a test of the AMR command line utility.
(t / test-01
   :ARG1 (u / utility
            :mod (c / command-line)
            :name (n / name
                     :op1 "AMR"
                     :toki1 "6")
            :toki1 "9")
   :domain (t2 / this
               :toki1 "0")
   :toki1 "3")
```

To generate graphs in PDF format:
```bash
$ amr plot -c test-resources/t5.conf 'This is a test of the AMR command line utility.'
wrote: amr-graph/this-is-a-test-of-the-amr-comm.pdf
```


## Training

This package uses the [amrlib] training, but adds a command line and
downloadable corpus aggregation / API.  To train:

1. Choose a model (i.e. SPRING, T5).
1. Optionally edit the [train configuration](train-config) directory of the
   model you choose.
1. Optionally edit the `resources/train.yml` to select/add more corpora (see
   [Adding Corpora](adding-corpora)).
1. Train the model: `./amr --config train-config/<model>.conf`


### Pretrained Models

This library was used to train all of the [amrlib] models (using the same
checkpoints as [amrlib]), except the T5 Base v1 model, with additional
examples from publicly available human annotated corpora.  The differences of
these trained models include:

* None of the models were tested against a training set, only the development
  SMATCH scores are available.  This was intentional to provide more training
  examples.
* The AMR Release 3.0 ([LDC2020T02]) test set was added to the training set.
* The [Little Prince and Bio AMR](https://amr.isi.edu/download.html) corpora
  where used to train the models.  The first 85% of the AMR sentences were
  added to training set and the remaining 15% were added to the development
  set.
* The mini-batch size changed for `generate-t5wtense-base` due to memory
  constraints.
* The number of training epochs were increased to account for the additional
  number of training examples.
* Models have the same naming conventions but are prefixed with `zsl`.
* Generative models were trained on graphs metadata annotated by the Sci spaCy
  `en_core_sci_md` model.

The performance of these models:

| Model Name           | Model Type | Checkpoint             | Performance   |
|----------------------|------------|------------------------|---------------|
| `zsl_spring`         | parse      | [facebook/bart-large]  | SMATCH: 81.26 |
| `zsl_xfm_bart_base`  | parse      | [facebook/bart-base]   | SMATCH: 80.5  |
| `zsl_xfm_bart_large` | parse      | [facebook/bart-large]  | SMATCH: 82.7  |
| `zsl_t5wtense_base`  | generative | [t5-base]              | BLEU: 42.20   |
| `zsl_t5wtense_large` | generative | [google/flan-t5-large] | BLEU: 44.01   |

These models are available upon request.


### Adding Corpora

You can retrain your own model and add additional training corpora by modifying
the list of `${amr_prep_manager:preppers}` in `resources/train.yml`.  This file
defines downloaded corpora for the Little Prince and Bio AMR corpora.  To use
the AMR 3.0 release, add the LDC downloaded file to (a new) `download`
directory.


## Attribution

This project, or reference model code, uses:

* Python 3.11
* [amrlib] for AMR parsing.
* [amr_coref] for AMR co-reference
* [spaCy] for natural language parsing.
* [zensols.nlparse] for natural language features.
* [Smatch] (Cai and Knight. 2013) and [WLK] (Opitz et. al. 2021) for scoring.


## Citation

If you use this project in your research please use the following BibTeX entry:

```bibtex
@inproceedings{landes-etal-2023-deepzensols,
	title = "{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility",
	author = "Landes, Paul  and
	  Di Eugenio, Barbara  and
	  Caragea, Cornelia",
	editor = "Tan, Liling  and
	  Milajevs, Dmitrijs  and
	  Chauhan, Geeticka  and
	  Gwinnup, Jeremy  and
	  Rippeth, Elijah",
	booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
	month = dec,
	year = "2023",
	address = "Singapore, Singapore",
	publisher = "Empirical Methods in Natural Language Processing",
	url = "https://aclanthology.org/2023.nlposs-1.16",
	pages = "141--146"
}
```


## Changelog

An extensive changelog is available [here](CHANGELOG.md).


## Community

Please star this repository and let me know how and where you use this API.
Contributions as pull requests, feedback and any input is welcome.


## License

[MIT License](LICENSE.md)

Copyright (c) 2021 - 2024 Paul Landes


<!-- links -->
[pypi]: https://pypi.org/project/zensols.amr/
[pypi-link]: https://pypi.python.org/pypi/zensols.amr
[pypi-badge]: https://img.shields.io/pypi/v/zensols.amr.svg
[python37-badge]: https://img.shields.io/badge/python-3.7-blue.svg
[python37-link]: https://www.python.org/downloads/release/python-370
[python38-badge]: https://img.shields.io/badge/python-3.8-blue.svg
[python38-link]: https://www.python.org/downloads/release/python-380
[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg
[python310-link]: https://www.python.org/downloads/release/python-3100
[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg
[python311-link]: https://www.python.org/downloads/release/python-3110

[spaCy]: https://spacy.io
[amrlib]: https://github.com/bjascob/amrlib
[amr_coref]: https://github.com/bjascob/amr_coref
[Smatch]: https://github.com/snowblink14/smatch
[WLK]: https://github.com/flipz357/weisfeiler-leman-amr-metrics
[zensols.nlparse]: https://github.com/plandes/nlparse
[zensols.util configuration framework]: https://plandes.github.io/util/doc/config.html
[NLP scoring module]: https://plandes.github.io/nlparse/api/zensols.nlp.html#zensols-nlp-score
[LDC2020T02]: https://catalog.ldc.upenn.edu/LDC2020T02

[facebook/bart-large]: https://huggingface.co/facebook/bart-large
[facebook/bart-base]: https://huggingface.co/facebook/bart-base
[t5-base]: https://huggingface.co/google-t5/t5-base
[google/flan-t5-large]: https://huggingface.co/google/flan-t5-large

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/plandes/amr",
    "name": "zensols.amr",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "tooling",
    "author": "Paul Landes",
    "author_email": "landes@mailc.net",
    "download_url": "https://github.com/plandes/amr/releases/download/v0.1.2/zensols.amr-0.1.2-py3-none-any.whl",
    "platform": null,
    "description": "# AMR annotation and feature generation\n\n[![PyPI][pypi-badge]][pypi-link]\n[![Python 3.10][python310-badge]][python310-link]\n[![Python 3.11][python311-badge]][python311-link]\n\nProvides support for AMR graph manipulation, annotations and feature\ngeneration.\n\nFeatures:\n* Annotation in AMR metadata.  For example, sentence types found in the Proxy\n  report AMR corpus.\n* AMR token alignment as [spaCy] components.\n* Integrates natural language parsing and features with Zensols\n  [zensols.nlparse] library.\n* A scoring API that includes [Smatch] and [WLK], which extends a more general\n  [NLP scoring module].\n* AMR parsing ([amrlib]) and AMR co-reference ([amr_coref]).\n* Command line and API utilities for AMR graph Penman graphs, debugging and\n  files.\n* Tools for [training and evaluating](training) new AMR parse (text to graph)\n  and generation (graph to text) models.\n\n\n## Documentation\n\n* [Full documentation](https://plandes.github.io/amr/index.html).\n* [API reference](https://plandes.github.io/amr/api.html)\n\n\n## Obtaining\n\nThe easiest way to install the command line program is via the `pip` installer:\n```bash\npip3 install zensols.amr\n```\n\nBinaries are also available on [pypi].\n\n\n## Usage\n\n```python\nfrom penman.graph import Graph\nfrom zensols.nlp import FeatureDocument, FeatureDocumentParser\nfrom zensols.amr import AmrDocument, AmrSentence, Dumper, ApplicationFactory\n\nsent: str = \"\"\"\n\nHe was George Washington and first president of the United States.\nHe was born On February 22, 1732.\n\n\"\"\".replace('\\n', ' ').strip()\n\n# get the AMR document parser\ndoc_parser: FeatureDocumentParser = ApplicationFactory.get_doc_parser()\n\n# the parser creates a NLP centric feature document as provided in the\n# zensols.nlp package\ndoc: FeatureDocument = doc_parser(sent)\n\n# the AMR object graph data structure is provided in the feature document\namr_doc: AmrDocument = doc.amr\n\n# dump a human readable output of the AMR document\namr_doc.write()\n\n# get the first AMR sentence instance\namr_sent: AmrSentence = amr_doc.sents[0]\nprint('sentence:')\nprint(' ', amr_sent.text)\nprint('tuples:')\n\n# show the Penman graph representation\npgraph: Graph = amr_sent.graph\nprint(f'variables: {\", \".join(pgraph.variables())}')\nfor t in pgraph.triples:\n    print(' ', t)\nprint('edges:')\nfor e in pgraph.edges():\n    print(' ', e)\n\n# visualize the graph as a PDF\ndumper: Dumper = ApplicationFactory.get_dumper()\ndumper(amr_doc)\n```\n\nPer the example, the [t5.conf](test-resources/t5.conf) and\n[gsii.conf](test-resources/gsii.conf) configuration show how to include\nconfiguration needed per AMR model.  These files can also be used directly with\nthe `amr` command using the `--config` option.\n\nHowever, the other resources in the example must be imported unless you\nredefine them yourself.\n\n\n### Library\n\nWhen adding the `amr` spaCy pipeline component, the `doc._.amr` attribute is\nset on the `Doc` instance.  You can either configure spaCy yourself, or you can\nuse the configuration files in [test-resources](test-resources) as an example\nusing the [zensols.util configuration framework].  The command line application\nprovides an example how to do this, along with the [test\ncase](test/python/test_amr.py).\n\n\n### Command Line\n\nThis library is written mostly to be used by other program, but the command\nline utility `amr` is also available to demonstrate its usage and to generate\nARM graphs on the command line.\n\nTo parse:\n```lisp\n$ amr parse -c test-resources/t5.conf 'This is a test of the AMR command line utility.'\n# ::snt This is a test of the AMR command line utility.\n(t / test-01\n   :ARG1 (u / utility\n            :mod (c / command-line)\n            :name (n / name\n                     :op1 \"AMR\"\n                     :toki1 \"6\")\n            :toki1 \"9\")\n   :domain (t2 / this\n               :toki1 \"0\")\n   :toki1 \"3\")\n```\n\nTo generate graphs in PDF format:\n```bash\n$ amr plot -c test-resources/t5.conf 'This is a test of the AMR command line utility.'\nwrote: amr-graph/this-is-a-test-of-the-amr-comm.pdf\n```\n\n\n## Training\n\nThis package uses the [amrlib] training, but adds a command line and\ndownloadable corpus aggregation / API.  To train:\n\n1. Choose a model (i.e. SPRING, T5).\n1. Optionally edit the [train configuration](train-config) directory of the\n   model you choose.\n1. Optionally edit the `resources/train.yml` to select/add more corpora (see\n   [Adding Corpora](adding-corpora)).\n1. Train the model: `./amr --config train-config/<model>.conf`\n\n\n### Pretrained Models\n\nThis library was used to train all of the [amrlib] models (using the same\ncheckpoints as [amrlib]), except the T5 Base v1 model, with additional\nexamples from publicly available human annotated corpora.  The differences of\nthese trained models include:\n\n* None of the models were tested against a training set, only the development\n  SMATCH scores are available.  This was intentional to provide more training\n  examples.\n* The AMR Release 3.0 ([LDC2020T02]) test set was added to the training set.\n* The [Little Prince and Bio AMR](https://amr.isi.edu/download.html) corpora\n  where used to train the models.  The first 85% of the AMR sentences were\n  added to training set and the remaining 15% were added to the development\n  set.\n* The mini-batch size changed for `generate-t5wtense-base` due to memory\n  constraints.\n* The number of training epochs were increased to account for the additional\n  number of training examples.\n* Models have the same naming conventions but are prefixed with `zsl`.\n* Generative models were trained on graphs metadata annotated by the Sci spaCy\n  `en_core_sci_md` model.\n\nThe performance of these models:\n\n| Model Name           | Model Type | Checkpoint             | Performance   |\n|----------------------|------------|------------------------|---------------|\n| `zsl_spring`         | parse      | [facebook/bart-large]  | SMATCH: 81.26 |\n| `zsl_xfm_bart_base`  | parse      | [facebook/bart-base]   | SMATCH: 80.5  |\n| `zsl_xfm_bart_large` | parse      | [facebook/bart-large]  | SMATCH: 82.7  |\n| `zsl_t5wtense_base`  | generative | [t5-base]              | BLEU: 42.20   |\n| `zsl_t5wtense_large` | generative | [google/flan-t5-large] | BLEU: 44.01   |\n\nThese models are available upon request.\n\n\n### Adding Corpora\n\nYou can retrain your own model and add additional training corpora by modifying\nthe list of `${amr_prep_manager:preppers}` in `resources/train.yml`.  This file\ndefines downloaded corpora for the Little Prince and Bio AMR corpora.  To use\nthe AMR 3.0 release, add the LDC downloaded file to (a new) `download`\ndirectory.\n\n\n## Attribution\n\nThis project, or reference model code, uses:\n\n* Python 3.11\n* [amrlib] for AMR parsing.\n* [amr_coref] for AMR co-reference\n* [spaCy] for natural language parsing.\n* [zensols.nlparse] for natural language features.\n* [Smatch] (Cai and Knight. 2013) and [WLK] (Opitz et. al. 2021) for scoring.\n\n\n## Citation\n\nIf you use this project in your research please use the following BibTeX entry:\n\n```bibtex\n@inproceedings{landes-etal-2023-deepzensols,\n\ttitle = \"{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility\",\n\tauthor = \"Landes, Paul  and\n\t  Di Eugenio, Barbara  and\n\t  Caragea, Cornelia\",\n\teditor = \"Tan, Liling  and\n\t  Milajevs, Dmitrijs  and\n\t  Chauhan, Geeticka  and\n\t  Gwinnup, Jeremy  and\n\t  Rippeth, Elijah\",\n\tbooktitle = \"Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)\",\n\tmonth = dec,\n\tyear = \"2023\",\n\taddress = \"Singapore, Singapore\",\n\tpublisher = \"Empirical Methods in Natural Language Processing\",\n\turl = \"https://aclanthology.org/2023.nlposs-1.16\",\n\tpages = \"141--146\"\n}\n```\n\n\n## Changelog\n\nAn extensive changelog is available [here](CHANGELOG.md).\n\n\n## Community\n\nPlease star this repository and let me know how and where you use this API.\nContributions as pull requests, feedback and any input is welcome.\n\n\n## License\n\n[MIT License](LICENSE.md)\n\nCopyright (c) 2021 - 2024 Paul Landes\n\n\n<!-- links -->\n[pypi]: https://pypi.org/project/zensols.amr/\n[pypi-link]: https://pypi.python.org/pypi/zensols.amr\n[pypi-badge]: https://img.shields.io/pypi/v/zensols.amr.svg\n[python37-badge]: https://img.shields.io/badge/python-3.7-blue.svg\n[python37-link]: https://www.python.org/downloads/release/python-370\n[python38-badge]: https://img.shields.io/badge/python-3.8-blue.svg\n[python38-link]: https://www.python.org/downloads/release/python-380\n[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg\n[python310-link]: https://www.python.org/downloads/release/python-3100\n[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg\n[python311-link]: https://www.python.org/downloads/release/python-3110\n\n[spaCy]: https://spacy.io\n[amrlib]: https://github.com/bjascob/amrlib\n[amr_coref]: https://github.com/bjascob/amr_coref\n[Smatch]: https://github.com/snowblink14/smatch\n[WLK]: https://github.com/flipz357/weisfeiler-leman-amr-metrics\n[zensols.nlparse]: https://github.com/plandes/nlparse\n[zensols.util configuration framework]: https://plandes.github.io/util/doc/config.html\n[NLP scoring module]: https://plandes.github.io/nlparse/api/zensols.nlp.html#zensols-nlp-score\n[LDC2020T02]: https://catalog.ldc.upenn.edu/LDC2020T02\n\n[facebook/bart-large]: https://huggingface.co/facebook/bart-large\n[facebook/bart-base]: https://huggingface.co/facebook/bart-base\n[t5-base]: https://huggingface.co/google-t5/t5-base\n[google/flan-t5-large]: https://huggingface.co/google/flan-t5-large\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Adapts amrlib in the Zensols framework.",
    "version": "0.1.2",
    "project_urls": {
        "Download": "https://github.com/plandes/amr/releases/download/v0.1.2/zensols.amr-0.1.2-py3-none-any.whl",
        "Homepage": "https://github.com/plandes/amr"
    },
    "split_keywords": [
        "tooling"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2313ea3e503fbb0ef3d0bbae8c62f66ebe384b615d725b11379f6704f50098b7",
                "md5": "04431d2a447c6c393dc2a0bc596c017e",
                "sha256": "a461c3110141cbe034f410ea48fbb320d01ee15a2162539f3c1aefa19d050557"
            },
            "downloads": -1,
            "filename": "zensols.amr-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "04431d2a447c6c393dc2a0bc596c017e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 98959,
            "upload_time": "2024-04-15T16:54:38",
            "upload_time_iso_8601": "2024-04-15T16:54:38.794735Z",
            "url": "https://files.pythonhosted.org/packages/23/13/ea3e503fbb0ef3d0bbae8c62f66ebe384b615d725b11379f6704f50098b7/zensols.amr-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-15 16:54:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "plandes",
    "github_project": "amr",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "zensols.amr"
}
        
Elapsed time: 0.24744s