Name | edu-segmentation JSON |
Version |
0.0.115
JSON |
| download |
home_page | |
Summary | To improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset. |
upload_time | 2023-08-13 09:31:17 |
maintainer | |
docs_url | None |
author | Your Name |
requires_python | >=3.9,<4.0 |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
Final Year Project on EDU Segmentation:
To improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.
Segbot: <br>
http://138.197.118.157:8000/segbot/ <br>
https://www.ijcai.org/proceedings/2018/0579.pdf
----
### Installation
To use the EDUSegmentation module, follow these steps:
1. Import the `download` module to download all models:<br>
```
from edu_segmentation.download import download_models
download_models()
```
2. Import the `edu_segmentation` module and its related classes<br>
```
from edu_segmentation.main import EDUSegmentation, ModelFactory, BERTUncasedModel, BERTCasedModel, BARTModel
```
### Usage
The edu_segmentation module provides an easy-to-use interface to perform EDU segmentation using different strategies and models. Follow these steps to use it:
1. Create a segmentation strategy:<br><br>
You can choose between the default segmentation strategy or a conjunction-based segmentation strategy. <br><br>
<strong>Conjunction-based segmentation strategy:</strong> After the text has been EDU-segmented, if there are conjunctions at the start or end of each segment, the conjunctions will be isolated as its own segment.<br><br>
<strong>Default segmentation strategy: </strong> No post-processing occurs after the text has been EDU-segmented <br><br>
```
from edu_segmentation.main import DefaultSegmentation, ConjunctionSegmentation
```
2. Create a model using the `ModelFactory`. <br><br>
Choose from BERT Uncased, BERT Cased, or BART models.
```
model_type = "bert_uncased" # or "bert_cased", "bart"
model = ModelFactory.create_model(model_type)
```
3. create an instance of `EDUSegmentation` using the chosen model: <br>
```
edu_segmenter = EDUSegmentation(model)
```
4. Segment the text using the chosen strategy: <br>
```
text = "Your input text here."
granularity = "conjunction_words" # or "default"
conjunctions = ["and", "but", "however"] # Customize conjunctions if needed
device = 'cpu' # Choose your device, e.g., 'cuda:0'
segmented_output = edu_segmenter.run(text, granularity, conjunctions, device)
```
### Example
Here's a simple example demonstrating how to use the edu_segmentation module:
```
from edu_segmentation.download import download_models
from edu_segmentation.main import ModelFactory, EDUSegmentation
download_models()
# Create a BERT Uncased model
model = ModelFactory.create_model("bart") # or bert_cased or bert_uncased
# Create an instance of EDUSegmentation using the model
edu_segmenter = EDUSegmentation(model)
# Segment the text using the conjunction-based segmentation strategy
text = "The food is good, but the service is bad."
granularity = "conjunction_words" # or default
conjunctions = ["and", "but", "however"] # customise as needed
device = 'cpu' # or cuda
segmented_output = edu_segmenter.run(text, granularity, conjunctions, device)
print(segmented_output)
```
Raw data
{
"_id": null,
"home_page": "",
"name": "edu-segmentation",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<4.0",
"maintainer_email": "",
"keywords": "",
"author": "Your Name",
"author_email": "you@example.com",
"download_url": "https://files.pythonhosted.org/packages/15/bd/38d3ce7563f5282abbd66e9824975c0a84fa32f9b38bc57127b7e4cfe67c/edu_segmentation-0.0.115.tar.gz",
"platform": null,
"description": "Final Year Project on EDU Segmentation:\n\nTo improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.\n\nSegbot: <br>\nhttp://138.197.118.157:8000/segbot/ <br>\nhttps://www.ijcai.org/proceedings/2018/0579.pdf\n\n----\n### Installation\n\nTo use the EDUSegmentation module, follow these steps:\n\n1. Import the `download` module to download all models:<br>\n```\nfrom edu_segmentation.download import download_models\ndownload_models()\n```\n\n2. Import the `edu_segmentation` module and its related classes<br>\n```\nfrom edu_segmentation.main import EDUSegmentation, ModelFactory, BERTUncasedModel, BERTCasedModel, BARTModel\n```\n\n### Usage\nThe edu_segmentation module provides an easy-to-use interface to perform EDU segmentation using different strategies and models. Follow these steps to use it:\n\n1. Create a segmentation strategy:<br><br>\nYou can choose between the default segmentation strategy or a conjunction-based segmentation strategy. <br><br>\n<strong>Conjunction-based segmentation strategy:</strong> After the text has been EDU-segmented, if there are conjunctions at the start or end of each segment, the conjunctions will be isolated as its own segment.<br><br>\n<strong>Default segmentation strategy: </strong> No post-processing occurs after the text has been EDU-segmented <br><br>\n```\nfrom edu_segmentation.main import DefaultSegmentation, ConjunctionSegmentation\n```\n\n2. Create a model using the `ModelFactory`. <br><br>\nChoose from BERT Uncased, BERT Cased, or BART models.\n\n```\nmodel_type = \"bert_uncased\" # or \"bert_cased\", \"bart\"\nmodel = ModelFactory.create_model(model_type)\n```\n\n3. create an instance of `EDUSegmentation` using the chosen model: <br>\n```\nedu_segmenter = EDUSegmentation(model)\n```\n\n4. Segment the text using the chosen strategy: <br>\n```\ntext = \"Your input text here.\"\ngranularity = \"conjunction_words\" # or \"default\"\nconjunctions = [\"and\", \"but\", \"however\"] # Customize conjunctions if needed\ndevice = 'cpu' # Choose your device, e.g., 'cuda:0'\n\nsegmented_output = edu_segmenter.run(text, granularity, conjunctions, device)\n```\n\n\n### Example\n\nHere's a simple example demonstrating how to use the edu_segmentation module:\n\n```\nfrom edu_segmentation.download import download_models\nfrom edu_segmentation.main import ModelFactory, EDUSegmentation\n\ndownload_models()\n\n# Create a BERT Uncased model\nmodel = ModelFactory.create_model(\"bart\") # or bert_cased or bert_uncased\n\n# Create an instance of EDUSegmentation using the model\nedu_segmenter = EDUSegmentation(model)\n\n# Segment the text using the conjunction-based segmentation strategy\ntext = \"The food is good, but the service is bad.\"\ngranularity = \"conjunction_words\" # or default\nconjunctions = [\"and\", \"but\", \"however\"] # customise as needed\ndevice = 'cpu' # or cuda\n\nsegmented_output = edu_segmenter.run(text, granularity, conjunctions, device)\nprint(segmented_output)\n```\n",
"bugtrack_url": null,
"license": "",
"summary": "To improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.",
"version": "0.0.115",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b3d2ec9a838336c10f40da19183e85dcf5c8a45df8a7218cb6765c6a422bbfd1",
"md5": "5e33cf11400aa2388296611fd2cce805",
"sha256": "4d36694d8f38b62cbd80ae97067d281bfe7d1897fb702cf1dbd639e9dc2fd3a7"
},
"downloads": -1,
"filename": "edu_segmentation-0.0.115-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5e33cf11400aa2388296611fd2cce805",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<4.0",
"size": 327174,
"upload_time": "2023-08-13T09:31:14",
"upload_time_iso_8601": "2023-08-13T09:31:14.889427Z",
"url": "https://files.pythonhosted.org/packages/b3/d2/ec9a838336c10f40da19183e85dcf5c8a45df8a7218cb6765c6a422bbfd1/edu_segmentation-0.0.115-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "15bd38d3ce7563f5282abbd66e9824975c0a84fa32f9b38bc57127b7e4cfe67c",
"md5": "eea702e2157258dcae8a731d51ab2c4d",
"sha256": "7ed7151461a2ffb21f3dfafae0f2262b9d3e7fce13b93c75582e1bd8f81d827a"
},
"downloads": -1,
"filename": "edu_segmentation-0.0.115.tar.gz",
"has_sig": false,
"md5_digest": "eea702e2157258dcae8a731d51ab2c4d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<4.0",
"size": 317046,
"upload_time": "2023-08-13T09:31:17",
"upload_time_iso_8601": "2023-08-13T09:31:17.190257Z",
"url": "https://files.pythonhosted.org/packages/15/bd/38d3ce7563f5282abbd66e9824975c0a84fa32f9b38bc57127b7e4cfe67c/edu_segmentation-0.0.115.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-13 09:31:17",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "edu-segmentation"
}