# Easy Natural Language Processing
Overparameterized neural networks are lazy (Chizat et al., 2019), so we design structures and objectives that can be easily optimized.
`eznlp` is a `PyTorch`-based package for neural natural language processing, currently supporting the following tasks:
* Text Classification ([Experimental Results](docs/text-classification.md))
* Named Entity Recognition ([Experimental Results](docs/entity-recognition.md))
* Relation Extraction ([Experimental Results](docs/relation-extraction.md))
* Attribute Extraction
* Machine Translation
* Image Captioning
This repository also maintains the code of our papers:
* Check this [link](docs/deep-span.md) for "Deep Span Representations for Named Entity Recognition" accepted to Findings of ACL 2023.
* Check this [link](docs/boundary-smoothing.md) for "Boundary Smoothing for Named Entity Recognition" in ACL 2022.
* Check this [link](publications/framework/scheme.pdf) for the annotation scheme described in "A Unified Framework of Medical Information Annotation and Extraction for Chinese Clinical Text".
## Installation
### Create an environment
```bash
$ conda create --name eznlp python=3.8
$ conda activate eznlp
```
### Install dependencies
```bash
$ conda install numpy=1.18.5 pandas=1.0.5 xlrd=1.2.0 matplotlib=3.2.2
$ conda install pytorch=1.7.1 torchvision=0.8.2 torchtext=0.8.1 {cpuonly|cudatoolkit=10.2|cudatoolkit=11.0} -c pytorch
$ pip install -r requirements.txt
```
### Install `eznlp`
* From source (recommended)
```bash
$ python setup.py sdist
$ pip install dist/eznlp-<version>.tar.gz --no-deps
```
* With `pip`
```bash
$ pip install eznlp --no-deps
```
## Running the Code
### Text classification
```bash
$ python scripts/text_classification.py --dataset <dataset> [options]
```
### Entity recognition
```bash
$ python scripts/entity_recognition.py --dataset <dataset> [options]
```
### Relation extraction
```bash
$ python scripts/relation_extraction.py --dataset <dataset> [options]
```
### Attribute extraction
```bash
$ python scripts/attribute_extraction.py --dataset <dataset> [options]
```
## Citation
If you find our code useful, please cite the following papers:
```
@article{zhu2022deep-span,
title={Deep Span Representations for Named Entity Recognition},
author={Zhu, Enwei and Liu, Yiyang and Li, Jinpeng},
journal={arXiv preprint arXiv:2210.04182},
year={2022}
}
```
```
@inproceedings{zhu2022boundary,
title={Boundary Smoothing for Named Entity Recognition},
author={Zhu, Enwei and Li, Jinpeng},
booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month={may},
year={2022},
address={Dublin, Ireland},
publisher={Association for Computational Linguistics},
url={https://aclanthology.org/2022.acl-long.490},
pages={7096--7108}
}
```
```
@article{zhu2021framework,
title={A Unified Framework of Medical Information Annotation and Extraction for {C}hinese Clinical Text},
author={Zhu, Enwei and Sheng, Qilin and Yang, Huanwan and Li, Jinpeng},
journal={arXiv preprint arXiv:2203.03823},
year={2021}
}
```
## References
* Chizat, L., Oyallon, E., and Bach, F. On lazy training in differentiable programming. In *NeurIPS 2019*.
Raw data
{
"_id": null,
"home_page": "https://github.com/syuoni/eznlp",
"name": "eznlp",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4",
"maintainer_email": "",
"keywords": "torch",
"author": "Enwei Zhu",
"author_email": "enwei.zhu@outlook.com",
"download_url": "https://files.pythonhosted.org/packages/44/a9/91cfb130fb24709d8818989a80522e533b8f0e6a9dd03edd51315121f2c5/eznlp-0.2.4.tar.gz",
"platform": null,
"description": "# Easy Natural Language Processing\n\nOverparameterized neural networks are lazy (Chizat et al., 2019), so we design structures and objectives that can be easily optimized. \n\n`eznlp` is a `PyTorch`-based package for neural natural language processing, currently supporting the following tasks:\n\n* Text Classification ([Experimental Results](docs/text-classification.md))\n* Named Entity Recognition ([Experimental Results](docs/entity-recognition.md))\n* Relation Extraction ([Experimental Results](docs/relation-extraction.md))\n* Attribute Extraction\n* Machine Translation\n* Image Captioning\n\nThis repository also maintains the code of our papers: \n* Check this [link](docs/deep-span.md) for \"Deep Span Representations for Named Entity Recognition\" accepted to Findings of ACL 2023. \n* Check this [link](docs/boundary-smoothing.md) for \"Boundary Smoothing for Named Entity Recognition\" in ACL 2022. \n* Check this [link](publications/framework/scheme.pdf) for the annotation scheme described in \"A Unified Framework of Medical Information Annotation and Extraction for Chinese Clinical Text\". \n\n\n## Installation\n### Create an environment\n```bash\n$ conda create --name eznlp python=3.8\n$ conda activate eznlp\n```\n\n### Install dependencies\n```bash\n$ conda install numpy=1.18.5 pandas=1.0.5 xlrd=1.2.0 matplotlib=3.2.2 \n$ conda install pytorch=1.7.1 torchvision=0.8.2 torchtext=0.8.1 {cpuonly|cudatoolkit=10.2|cudatoolkit=11.0} -c pytorch \n$ pip install -r requirements.txt \n```\n\n### Install `eznlp`\n* From source (recommended)\n```bash\n$ python setup.py sdist\n$ pip install dist/eznlp-<version>.tar.gz --no-deps\n```\n\n* With `pip`\n```bash\n$ pip install eznlp --no-deps\n```\n\n\n## Running the Code\n### Text classification\n```bash\n$ python scripts/text_classification.py --dataset <dataset> [options]\n```\n\n### Entity recognition\n```bash\n$ python scripts/entity_recognition.py --dataset <dataset> [options]\n```\n\n### Relation extraction\n```bash\n$ python scripts/relation_extraction.py --dataset <dataset> [options]\n```\n\n### Attribute extraction\n```bash\n$ python scripts/attribute_extraction.py --dataset <dataset> [options]\n```\n\n\n## Citation\nIf you find our code useful, please cite the following papers: \n\n```\n@article{zhu2022deep-span,\n title={Deep Span Representations for Named Entity Recognition},\n author={Zhu, Enwei and Liu, Yiyang and Li, Jinpeng},\n journal={arXiv preprint arXiv:2210.04182},\n year={2022}\n}\n```\n\n```\n@inproceedings{zhu2022boundary,\n title={Boundary Smoothing for Named Entity Recognition},\n author={Zhu, Enwei and Li, Jinpeng},\n booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},\n month={may},\n year={2022},\n address={Dublin, Ireland},\n publisher={Association for Computational Linguistics},\n url={https://aclanthology.org/2022.acl-long.490},\n pages={7096--7108}\n}\n```\n\n```\n@article{zhu2021framework,\n title={A Unified Framework of Medical Information Annotation and Extraction for {C}hinese Clinical Text},\n author={Zhu, Enwei and Sheng, Qilin and Yang, Huanwan and Li, Jinpeng},\n journal={arXiv preprint arXiv:2203.03823},\n year={2021}\n}\n```\n\n\n## References\n* Chizat, L., Oyallon, E., and Bach, F. On lazy training in differentiable programming. In *NeurIPS 2019*.",
"bugtrack_url": null,
"license": "Apache",
"summary": "Easy Natural Language Processing",
"version": "0.2.4",
"project_urls": {
"Homepage": "https://github.com/syuoni/eznlp"
},
"split_keywords": [
"torch"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "44a991cfb130fb24709d8818989a80522e533b8f0e6a9dd03edd51315121f2c5",
"md5": "fb5780fce5eb622c77eb474974d38a49",
"sha256": "e8e5fbeae70115b4ed951b6b42946ec18dcaa6f9372bb5b50e922fb9f095f3b3"
},
"downloads": -1,
"filename": "eznlp-0.2.4.tar.gz",
"has_sig": false,
"md5_digest": "fb5780fce5eb622c77eb474974d38a49",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4",
"size": 121865,
"upload_time": "2023-05-11T01:53:49",
"upload_time_iso_8601": "2023-05-11T01:53:49.301669Z",
"url": "https://files.pythonhosted.org/packages/44/a9/91cfb130fb24709d8818989a80522e533b8f0e6a9dd03edd51315121f2c5/eznlp-0.2.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-11 01:53:49",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "syuoni",
"github_project": "eznlp",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "flair",
"specs": [
[
"==",
"0.8"
]
]
},
{
"name": "allennlp",
"specs": [
[
"==",
"2.1.0"
]
]
},
{
"name": "transformers",
"specs": [
[
"==",
"4.3.2"
]
]
},
{
"name": "tokenizers",
"specs": [
[
"==",
"0.10.1"
]
]
},
{
"name": "nltk",
"specs": [
[
"==",
"3.5"
]
]
},
{
"name": "truecase",
"specs": [
[
"==",
"0.0.12"
]
]
},
{
"name": "hanziconv",
"specs": [
[
"==",
"0.3.2"
]
]
},
{
"name": "spacy",
"specs": [
[
"==",
"2.3.2"
]
]
},
{
"name": "en_core_web_sm",
"specs": []
},
{
"name": "de_core_news_sm",
"specs": []
},
{
"name": "jieba",
"specs": [
[
"==",
"0.42.1"
]
]
},
{
"name": "pytorch-crf",
"specs": [
[
"==",
"0.7.2"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"20.4"
]
]
}
],
"lcname": "eznlp"
}