## Installation
`pip install feng-hirst-rst-parser`
## README.md and README_original.md
This project is a fork of an update of the original Feng-Hirst RST parser repo. The original README.md is included as
README_original.md. It is recommended to install this project as a Python package and use it that way. This should
simplifiy usage significantly and make it more accessible.
This README.md will focus on the current version and does not include instructions for how to run the original version.
## General Information
* This RST-style discourse parser produces discourse tree structure on full-text level, given a raw text. No prior
sentence splitting or any sort of preprocessing is expected. The program runs on Linux systems.
* The overall software work flow is similar to the one described in the paper by Feng and Hirst (ACL 2014). They removed
the post-editing component from the workflow, as well as the set of entity-based transaction features from our feature
set. Moreover, both structure and relation classification models are implemented using CRFSuite.
## Usage
### Example
See [`example.py`](feng_hirst_parser/example.py) for a very simple example. Note that this requires both `matplotlib` and `pydot` to run. The plotting functionality is not required to use the parser, hence these packages are not listed as requirements, and you will have to install them yourself.
### More detailed usage
First instantiate the parser:
```python
parser = DiscourseParser(
verbose,
skip_parsing,
global_features,
save_preprocessed,
output_dir=output_dir
)
```
Then parse your file to get a `ParseTree`:
```python
pt = parser.parse(os.path.join(current_file_dir, 'example.txt'))
```
You can convert this `ParseTree` to a `networkx` graph:
```python
G = pt.to_networkx()
```
Which should make it much easier to work with or to analyze the tree structure.
Additionally metrics can be extracted:
```python
metrics = extract_metrics(G, relation_ngrams=[(1, 2), (3, 4)])
```
At the moment this gives you the depth of the tree, counts how often each relation occurs, and `relation_ngrams` can tell you how often each 'ngram' is found in the paths from the root node to all the leaf nodes.
### Command-line usage and more
Refer to README_original.md for more information.
## Bugs and comments
If you encounter and bugs using the program, please create an Issue on
the [GitHub repo](https://github.com/ThHuberSG/feng-hirst-rst-parser).
## Developers
* Original author: [Vanessa Wei Feng](mailto:weifeng@cs.toronto.edu), Department of Computer Science, University of
Toronto, Canada
* [Arne Neumann](mailto:github+spam.or.ham@arne.cl) updated it to use nltk 3.4
on [this github repo](https://github.com/arne-cl/feng-hirst-rst-parser), and created a Dockerfile.
* [Zining Zhu](mailto:zining@cs.toronto.edu) updated the scripts to use Python 3.
* Thomas Huber, Chair of Data Science and Natural Language Processing, University of St. Gallen, updated the scripts
further and added the `networkx` functionality.
## References
* Vanessa Wei Feng and Graeme Hirst, 2014. Two-pass Discourse Segmentation with Pairing and Global Features. arXiv:
1407.8215v1. http://arxiv.org/abs/1407.8215
* Vanessa Wei Feng and Graeme Hirst, 2014. A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing.
In Proceedings of the 52th Annual Meeting of the Association for Computational Linguistics: Human Language
Technologies (ACL-2014), Baltimore, USA. http://aclweb.org/anthology/P14-1048
Raw data
{
"_id": null,
"home_page": "https://github.com/ThHuberSG/feng-hirst-rst-parser",
"name": "feng-hirst-rst-parser",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "RST discourse parsing, computational linguistics, NLP, Feng-Hirst",
"author": "Thomas Huber",
"author_email": "thomas.huber@unisg.ch",
"download_url": "https://files.pythonhosted.org/packages/0d/ea/5bcad6cefe95344576113ebeae942ebd988bd86244a726669031a424912c/feng_hirst_rst_parser-0.1.4.tar.gz",
"platform": null,
"description": "## Installation\n\n`pip install feng-hirst-rst-parser`\n\n## README.md and README_original.md\n\nThis project is a fork of an update of the original Feng-Hirst RST parser repo. The original README.md is included as\nREADME_original.md. It is recommended to install this project as a Python package and use it that way. This should\nsimplifiy usage significantly and make it more accessible.\nThis README.md will focus on the current version and does not include instructions for how to run the original version.\n\n## General Information\n\n* This RST-style discourse parser produces discourse tree structure on full-text level, given a raw text. No prior\n sentence splitting or any sort of preprocessing is expected. The program runs on Linux systems.\n* The overall software work flow is similar to the one described in the paper by Feng and Hirst (ACL 2014). They removed\n the post-editing component from the workflow, as well as the set of entity-based transaction features from our feature\n set. Moreover, both structure and relation classification models are implemented using CRFSuite.\n\n## Usage\n\n### Example\nSee [`example.py`](feng_hirst_parser/example.py) for a very simple example. Note that this requires both `matplotlib` and `pydot` to run. The plotting functionality is not required to use the parser, hence these packages are not listed as requirements, and you will have to install them yourself.\n\n\n### More detailed usage\n\nFirst instantiate the parser:\n\n```python\nparser = DiscourseParser(\n verbose,\n skip_parsing,\n global_features,\n save_preprocessed,\n output_dir=output_dir\n )\n```\n\nThen parse your file to get a `ParseTree`:\n \n```python\npt = parser.parse(os.path.join(current_file_dir, 'example.txt'))\n```\n\nYou can convert this `ParseTree` to a `networkx` graph:\n\n```python\nG = pt.to_networkx()\n```\n\nWhich should make it much easier to work with or to analyze the tree structure.\n\nAdditionally metrics can be extracted:\n\n```python\nmetrics = extract_metrics(G, relation_ngrams=[(1, 2), (3, 4)])\n```\nAt the moment this gives you the depth of the tree, counts how often each relation occurs, and `relation_ngrams` can tell you how often each 'ngram' is found in the paths from the root node to all the leaf nodes.\n\n### Command-line usage and more\n\nRefer to README_original.md for more information.\n\n## Bugs and comments\n\nIf you encounter and bugs using the program, please create an Issue on\nthe [GitHub repo](https://github.com/ThHuberSG/feng-hirst-rst-parser).\n\n## Developers\n\n* Original author: [Vanessa Wei Feng](mailto:weifeng@cs.toronto.edu), Department of Computer Science, University of\n Toronto, Canada\n* [Arne Neumann](mailto:github+spam.or.ham@arne.cl) updated it to use nltk 3.4\n on [this github repo](https://github.com/arne-cl/feng-hirst-rst-parser), and created a Dockerfile.\n* [Zining Zhu](mailto:zining@cs.toronto.edu) updated the scripts to use Python 3.\n* Thomas Huber, Chair of Data Science and Natural Language Processing, University of St. Gallen, updated the scripts\n further and added the `networkx` functionality.\n\n## References\n\n* Vanessa Wei Feng and Graeme Hirst, 2014. Two-pass Discourse Segmentation with Pairing and Global Features. arXiv:\n 1407.8215v1. http://arxiv.org/abs/1407.8215\n* Vanessa Wei Feng and Graeme Hirst, 2014. A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing.\n In Proceedings of the 52th Annual Meeting of the Association for Computational Linguistics: Human Language\n Technologies (ACL-2014), Baltimore, USA. http://aclweb.org/anthology/P14-1048\n",
"bugtrack_url": null,
"license": "BSD-3-Clause",
"summary": "Implementation of the Feng-Hirst RST Discourse Parser",
"version": "0.1.4",
"project_urls": {
"Bug Tracker": "https://github.com/ThHuberSG/feng-hirst-rst-parser/issues",
"Documentation": "https://github.com/ThHuberSG/feng-hirst-rst-parser/wiki",
"Homepage": "https://github.com/ThHuberSG/feng-hirst-rst-parser"
},
"split_keywords": [
"rst discourse parsing",
" computational linguistics",
" nlp",
" feng-hirst"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c35c0d2f6dd6f39f0574c1a620090acf03a08a6443306107b67cfb7d0e362200",
"md5": "588eadeecd8812e503f26a77f70e361d",
"sha256": "510b94244dc72310c35557331fc30e8e70a0472663ea32e3657ead8adc892af7"
},
"downloads": -1,
"filename": "feng_hirst_rst_parser-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "588eadeecd8812e503f26a77f70e361d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 53751544,
"upload_time": "2025-01-15T14:27:46",
"upload_time_iso_8601": "2025-01-15T14:27:46.811900Z",
"url": "https://files.pythonhosted.org/packages/c3/5c/0d2f6dd6f39f0574c1a620090acf03a08a6443306107b67cfb7d0e362200/feng_hirst_rst_parser-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0dea5bcad6cefe95344576113ebeae942ebd988bd86244a726669031a424912c",
"md5": "8489c6d1dce6e1a8334f17a2d5d6bf8f",
"sha256": "94f82ec9df3410db63f6b0db326e00b4399bfaecf6ef6a8486a68ad7f08980e1"
},
"downloads": -1,
"filename": "feng_hirst_rst_parser-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "8489c6d1dce6e1a8334f17a2d5d6bf8f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 53313089,
"upload_time": "2025-01-15T14:27:59",
"upload_time_iso_8601": "2025-01-15T14:27:59.389493Z",
"url": "https://files.pythonhosted.org/packages/0d/ea/5bcad6cefe95344576113ebeae942ebd988bd86244a726669031a424912c/feng_hirst_rst_parser-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-15 14:27:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ThHuberSG",
"github_project": "feng-hirst-rst-parser",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "feng-hirst-rst-parser"
}