zensols.propbankdb


Namezensols.propbankdb JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://github.com/plandes/propbankdb
SummaryAn API to access the PropBank database and generate embeddings from them.
upload_time2024-04-14 20:45:46
maintainerNone
docs_urlNone
authorPaul Landes
requires_pythonNone
licenseNone
keywords tooling
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PropBank Database and Embeddings

[![PyPI][pypi-badge]][pypi-link]
[![Python 3.10][python310-badge]][python310-link]
[![Python 3.11][python311-badge]][python311-link]
[![Build Status][build-badge]][build-link]

An API to access [PropBank] data and generate embeddings from the paper
[CALAMR: Component ALignment for Abstract Meaning Representation] used by the
[zensols.calamr] repository.  This creates a database and generates embeddings
from [PropBank frameset files] and makes it available as n API that attempts to
reduce the data complexity of the [PropBank] using an object oriented Pythonic
approach.  It will automatically download a [distribution file] that contains:

* An SQLite relational normalized database,
* [Sentence-BERT] [embeddings] for role sets, roles and functions,
* A CSV file with the corresponding extracted sentences used for the
  embeddings,
* A metadata file containing version information and bindings for the
  embeddings used by the [Zensols framework].

The API binds the relational data from the SQLite database with simple, but
performant object mappings in Python while allowing a direct row/cursor based
access to the data using the [Zensols Dbutil] API.

If you use this library or the [zensols.calamr] API, please [cite](#citation)
our paper.


## Documentation

See the [full documentation](https://plandes.github.io/propbankdb/index.html).
The [API reference](https://plandes.github.io/propbankdb/api.html) is also
available.


## Obtaining

The library can be installed with pip from the [pypi] repository:
```bash
pip3 install zensols.propbankdb
```


## Embeddings and Database

A [PropBank] database with SentenceBERT embeddings for the paper CALAMR:
Component ALignment for Abstract Meaning Representation.  This is used by the
zensols.propbankdb Python API but can be used on its own as well.  The database
contains roles, rolesets and other PropBank data along with their examples,
descriptions, functions etc. embeddings.  See the API repository for more
information.

[Sentence-BERT] embeddings are available for the following [PropBank frameset
files] XML fields:

* Role set names (`name` attribute)
* Role descriptions (`descr` attribute)
* Function description (defined in the `.dtd` file from [PropBank frameset
  files] repository)

The models and the SQLite [PropBank] database are automatically downloaded on
the first use of the command-line tool or API.  However, they can also be
[downloaded](https://zenodo.org/records/10806450) directly.


## Usage

The installed software can be used to look up data from the command line, but
was designed to be used as an API for data access and embeddings.


### Command Line

The command line details are available with the command line help using:

```bash
$ propbankdb --help
```

For example, to get the `see.01` role set in JSON format use:
```bash
$ propbankdb roleset -f json see.01
```

### API

Access a role set and its embedding from the database:
```python
from zensols.propbankdb import Roleset, Database, ApplicationFactory
db: Database = ApplicationFactory.get_database()
rs: Roleset = db.roleset_stash['see.01']
# print out the rule set, the number of roles it has, and embedding shape
print(rs, len(rs.roles), rs.embedding.shape)
>>> see.01: view 3 torch.Size([768])
# print the roleset information
rs.write()
>>> id:
>>>     label: see.01
>>>     lemma: see
>>>     index: 1
>>> name: view
>>> aliases:
>>>     part_of_speech: PartOfSpeech.verb
>>>     word: see
>>>     part_of_speech: PartOfSpeech.noun
>>>     word: seeing
>>>     part_of_speech: PartOfSpeech.verb
>>>     word: sight
>>>     part_of_speech: PartOfSpeech.noun
>>>     word: sight
>>> roles:
>>>     description: viewer
>>>     function:
>>>         label: PAG
>>>         description: prototypical agent
>>>         group: default
...
```

The [roleshow.py](example/roleshow.py) example shows how to use your own
application context as a minimum example providing only data access.  The
[role-with-embedding.py](example/role-with-embedding.py) example adds more
resource libraries necessary to fetch embeddings.


### Training

Use the `dist.py` script to train new embeddings and recreate the database:
1. Edit the `transformer_sent_fixed_resource` section `model_id` in the
   [configuration file](deploy-resources/obj.yml) to use different embeddings
1. Start with a clean environment: `./dist.py clean`
1. Create the distribution: `./dist.py package`


## Citation

If you use this project in your research please use the following BibTeX entry:

```bibtex
@inproceedings{landesCALAMRComponentALignment2024,
  title = {{{CALAMR}}: {{Component ALignment}} for {{Abstract Meaning Representation}}},
  booktitle = {The 2024 {{Joint International Conference}} on {{Computational Linguistics}}, {{Language Resources}} and {{Evaluation}}},
  author = {Landes, Paul and Di Eugenio, Barbara},
  date = {2024-05-20},
  publisher = {International Committee on Computational Linguistics},
  location = {Turin, Italy},
  eventtitle = {{{LREC-COLING}} 2024}
}
```


## Changelog

An extensive changelog is available [here](CHANGELOG.md).


## Community

Please star this repository and let me know how and where you use this API.
Contributions as pull requests, feedback and any input is welcome.


## License

[MIT License](LICENSE.md)

Copyright (c) 2023 - 2024 Paul Landes


<!-- links -->
[pypi]: https://pypi.org/project/zensols.propbankdb/
[pypi-link]: https://pypi.python.org/pypi/zensols.propbankdb
[pypi-badge]: https://img.shields.io/pypi/v/zensols.propbankdb.svg
[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg
[python310-link]: https://www.python.org/downloads/release/python-3100
[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg
[python311-link]: https://www.python.org/downloads/release/python-3110
[build-badge]: https://github.com/plandes/propbankdb/workflows/CI/badge.svg
[build-link]: https://github.com/plandes/propbankdb/actions

[PropBank]: https://propbank.github.io
[propbank frameset files]: https://github.com/propbank/propbank-frames
[Zensols framework]: https://github.com/plandes/deepnlp
[Zensols Dbutil]: https://github.com/plandes/dbutil
[configuration]: https://plandes.github.io/util/doc/config.html
[embeddings]: #embeddings
[Sentence-BERT]: https://arxiv.org/abs/1908.10084
[CALAMR: Component ALignment for Abstract Meaning Representation]: https://example.com
[zensols.calamr]: https://github.com/plandes/calamr
[Zenodo]: https://zenodo.org/records/10806450

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/plandes/propbankdb",
    "name": "zensols.propbankdb",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "tooling",
    "author": "Paul Landes",
    "author_email": "landes@mailc.net",
    "download_url": "https://github.com/plandes/propbankdb/releases/download/v0.0.2/zensols.propbankdb-0.0.2-py3-none-any.whl",
    "platform": null,
    "description": "# PropBank Database and Embeddings\n\n[![PyPI][pypi-badge]][pypi-link]\n[![Python 3.10][python310-badge]][python310-link]\n[![Python 3.11][python311-badge]][python311-link]\n[![Build Status][build-badge]][build-link]\n\nAn API to access [PropBank] data and generate embeddings from the paper\n[CALAMR: Component ALignment for Abstract Meaning Representation] used by the\n[zensols.calamr] repository.  This creates a database and generates embeddings\nfrom [PropBank frameset files] and makes it available as n API that attempts to\nreduce the data complexity of the [PropBank] using an object oriented Pythonic\napproach.  It will automatically download a [distribution file] that contains:\n\n* An SQLite relational normalized database,\n* [Sentence-BERT] [embeddings] for role sets, roles and functions,\n* A CSV file with the corresponding extracted sentences used for the\n  embeddings,\n* A metadata file containing version information and bindings for the\n  embeddings used by the [Zensols framework].\n\nThe API binds the relational data from the SQLite database with simple, but\nperformant object mappings in Python while allowing a direct row/cursor based\naccess to the data using the [Zensols Dbutil] API.\n\nIf you use this library or the [zensols.calamr] API, please [cite](#citation)\nour paper.\n\n\n## Documentation\n\nSee the [full documentation](https://plandes.github.io/propbankdb/index.html).\nThe [API reference](https://plandes.github.io/propbankdb/api.html) is also\navailable.\n\n\n## Obtaining\n\nThe library can be installed with pip from the [pypi] repository:\n```bash\npip3 install zensols.propbankdb\n```\n\n\n## Embeddings and Database\n\nA [PropBank] database with SentenceBERT embeddings for the paper CALAMR:\nComponent ALignment for Abstract Meaning Representation.  This is used by the\nzensols.propbankdb Python API but can be used on its own as well.  The database\ncontains roles, rolesets and other PropBank data along with their examples,\ndescriptions, functions etc. embeddings.  See the API repository for more\ninformation.\n\n[Sentence-BERT] embeddings are available for the following [PropBank frameset\nfiles] XML fields:\n\n* Role set names (`name` attribute)\n* Role descriptions (`descr` attribute)\n* Function description (defined in the `.dtd` file from [PropBank frameset\n  files] repository)\n\nThe models and the SQLite [PropBank] database are automatically downloaded on\nthe first use of the command-line tool or API.  However, they can also be\n[downloaded](https://zenodo.org/records/10806450) directly.\n\n\n## Usage\n\nThe installed software can be used to look up data from the command line, but\nwas designed to be used as an API for data access and embeddings.\n\n\n### Command Line\n\nThe command line details are available with the command line help using:\n\n```bash\n$ propbankdb --help\n```\n\nFor example, to get the `see.01` role set in JSON format use:\n```bash\n$ propbankdb roleset -f json see.01\n```\n\n### API\n\nAccess a role set and its embedding from the database:\n```python\nfrom zensols.propbankdb import Roleset, Database, ApplicationFactory\ndb: Database = ApplicationFactory.get_database()\nrs: Roleset = db.roleset_stash['see.01']\n# print out the rule set, the number of roles it has, and embedding shape\nprint(rs, len(rs.roles), rs.embedding.shape)\n>>> see.01: view 3 torch.Size([768])\n# print the roleset information\nrs.write()\n>>> id:\n>>>     label: see.01\n>>>     lemma: see\n>>>     index: 1\n>>> name: view\n>>> aliases:\n>>>     part_of_speech: PartOfSpeech.verb\n>>>     word: see\n>>>     part_of_speech: PartOfSpeech.noun\n>>>     word: seeing\n>>>     part_of_speech: PartOfSpeech.verb\n>>>     word: sight\n>>>     part_of_speech: PartOfSpeech.noun\n>>>     word: sight\n>>> roles:\n>>>     description: viewer\n>>>     function:\n>>>         label: PAG\n>>>         description: prototypical agent\n>>>         group: default\n...\n```\n\nThe [roleshow.py](example/roleshow.py) example shows how to use your own\napplication context as a minimum example providing only data access.  The\n[role-with-embedding.py](example/role-with-embedding.py) example adds more\nresource libraries necessary to fetch embeddings.\n\n\n### Training\n\nUse the `dist.py` script to train new embeddings and recreate the database:\n1. Edit the `transformer_sent_fixed_resource` section `model_id` in the\n   [configuration file](deploy-resources/obj.yml) to use different embeddings\n1. Start with a clean environment: `./dist.py clean`\n1. Create the distribution: `./dist.py package`\n\n\n## Citation\n\nIf you use this project in your research please use the following BibTeX entry:\n\n```bibtex\n@inproceedings{landesCALAMRComponentALignment2024,\n  title = {{{CALAMR}}: {{Component ALignment}} for {{Abstract Meaning Representation}}},\n  booktitle = {The 2024 {{Joint International Conference}} on {{Computational Linguistics}}, {{Language Resources}} and {{Evaluation}}},\n  author = {Landes, Paul and Di Eugenio, Barbara},\n  date = {2024-05-20},\n  publisher = {International Committee on Computational Linguistics},\n  location = {Turin, Italy},\n  eventtitle = {{{LREC-COLING}} 2024}\n}\n```\n\n\n## Changelog\n\nAn extensive changelog is available [here](CHANGELOG.md).\n\n\n## Community\n\nPlease star this repository and let me know how and where you use this API.\nContributions as pull requests, feedback and any input is welcome.\n\n\n## License\n\n[MIT License](LICENSE.md)\n\nCopyright (c) 2023 - 2024 Paul Landes\n\n\n<!-- links -->\n[pypi]: https://pypi.org/project/zensols.propbankdb/\n[pypi-link]: https://pypi.python.org/pypi/zensols.propbankdb\n[pypi-badge]: https://img.shields.io/pypi/v/zensols.propbankdb.svg\n[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg\n[python310-link]: https://www.python.org/downloads/release/python-3100\n[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg\n[python311-link]: https://www.python.org/downloads/release/python-3110\n[build-badge]: https://github.com/plandes/propbankdb/workflows/CI/badge.svg\n[build-link]: https://github.com/plandes/propbankdb/actions\n\n[PropBank]: https://propbank.github.io\n[propbank frameset files]: https://github.com/propbank/propbank-frames\n[Zensols framework]: https://github.com/plandes/deepnlp\n[Zensols Dbutil]: https://github.com/plandes/dbutil\n[configuration]: https://plandes.github.io/util/doc/config.html\n[embeddings]: #embeddings\n[Sentence-BERT]: https://arxiv.org/abs/1908.10084\n[CALAMR: Component ALignment for Abstract Meaning Representation]: https://example.com\n[zensols.calamr]: https://github.com/plandes/calamr\n[Zenodo]: https://zenodo.org/records/10806450\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "An API to access the PropBank database and generate embeddings from them.",
    "version": "0.0.2",
    "project_urls": {
        "Download": "https://github.com/plandes/propbankdb/releases/download/v0.0.2/zensols.propbankdb-0.0.2-py3-none-any.whl",
        "Homepage": "https://github.com/plandes/propbankdb"
    },
    "split_keywords": [
        "tooling"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ef01ecd6a1b857ad121539278d9b6699aba2570d3d90bfc8354e729e7c7c87b4",
                "md5": "e2d9d5c8a29cf117e46d6e384e910da9",
                "sha256": "4e817db48c4b379471d20eb377c98cb7b0acaffc71075ec0bca754596f6d1dd6"
            },
            "downloads": -1,
            "filename": "zensols.propbankdb-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e2d9d5c8a29cf117e46d6e384e910da9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 28649,
            "upload_time": "2024-04-14T20:45:46",
            "upload_time_iso_8601": "2024-04-14T20:45:46.360961Z",
            "url": "https://files.pythonhosted.org/packages/ef/01/ecd6a1b857ad121539278d9b6699aba2570d3d90bfc8354e729e7c7c87b4/zensols.propbankdb-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-14 20:45:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "plandes",
    "github_project": "propbankdb",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "zensols.propbankdb"
}
        
Elapsed time: 0.22250s