edu-segmentation


Nameedu-segmentation JSON
Version 0.0.115 PyPI version JSON
download
home_page
SummaryTo improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.
upload_time2023-08-13 09:31:17
maintainer
docs_urlNone
authorYour Name
requires_python>=3.9,<4.0
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Final Year Project on EDU Segmentation:

To improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.

Segbot: <br>
http://138.197.118.157:8000/segbot/ <br>
https://www.ijcai.org/proceedings/2018/0579.pdf

----
### Installation

To use the EDUSegmentation module, follow these steps:

1. Import the `download` module to download all models:<br>
```
from edu_segmentation.download import download_models
download_models()
```

2. Import the `edu_segmentation` module and its related classes<br>
```
from edu_segmentation.main import EDUSegmentation, ModelFactory, BERTUncasedModel, BERTCasedModel, BARTModel
```

### Usage
The edu_segmentation module provides an easy-to-use interface to perform EDU segmentation using different strategies and models. Follow these steps to use it:

1. Create a segmentation strategy:<br><br>
You can choose between the default segmentation strategy or a conjunction-based segmentation strategy. <br><br>
<strong>Conjunction-based segmentation strategy:</strong> After the text has been EDU-segmented, if there are conjunctions at the start or end of each segment, the conjunctions will be isolated as its own segment.<br><br>
<strong>Default segmentation strategy: </strong> No post-processing occurs after the text has been EDU-segmented <br><br>
```
from edu_segmentation.main import DefaultSegmentation, ConjunctionSegmentation
```

2. Create a model using the `ModelFactory`. <br><br>
Choose from BERT Uncased, BERT Cased, or BART models.

```
model_type = "bert_uncased"  # or "bert_cased", "bart"
model = ModelFactory.create_model(model_type)
```

3. create an instance of `EDUSegmentation` using the chosen model: <br>
```
edu_segmenter = EDUSegmentation(model)
```

4. Segment the text using the chosen strategy: <br>
```
text = "Your input text here."
granularity = "conjunction_words"  # or "default"
conjunctions = ["and", "but", "however"]  # Customize conjunctions if needed
device = 'cpu'  # Choose your device, e.g., 'cuda:0'

segmented_output = edu_segmenter.run(text, granularity, conjunctions, device)
```


### Example

Here's a simple example demonstrating how to use the edu_segmentation module:

```
from edu_segmentation.download import download_models
from edu_segmentation.main import ModelFactory, EDUSegmentation

download_models()

# Create a BERT Uncased model
model = ModelFactory.create_model("bart") # or bert_cased or bert_uncased

# Create an instance of EDUSegmentation using the model
edu_segmenter = EDUSegmentation(model)

# Segment the text using the conjunction-based segmentation strategy
text = "The food is good, but the service is bad."
granularity = "conjunction_words" # or default
conjunctions = ["and", "but", "however"] # customise as needed
device = 'cpu' # or cuda

segmented_output = edu_segmenter.run(text, granularity, conjunctions, device)
print(segmented_output)
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "edu-segmentation",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/15/bd/38d3ce7563f5282abbd66e9824975c0a84fa32f9b38bc57127b7e4cfe67c/edu_segmentation-0.0.115.tar.gz",
    "platform": null,
    "description": "Final Year Project on EDU Segmentation:\n\nTo improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.\n\nSegbot: <br>\nhttp://138.197.118.157:8000/segbot/ <br>\nhttps://www.ijcai.org/proceedings/2018/0579.pdf\n\n----\n### Installation\n\nTo use the EDUSegmentation module, follow these steps:\n\n1. Import the `download` module to download all models:<br>\n```\nfrom edu_segmentation.download import download_models\ndownload_models()\n```\n\n2. Import the `edu_segmentation` module and its related classes<br>\n```\nfrom edu_segmentation.main import EDUSegmentation, ModelFactory, BERTUncasedModel, BERTCasedModel, BARTModel\n```\n\n### Usage\nThe edu_segmentation module provides an easy-to-use interface to perform EDU segmentation using different strategies and models. Follow these steps to use it:\n\n1. Create a segmentation strategy:<br><br>\nYou can choose between the default segmentation strategy or a conjunction-based segmentation strategy. <br><br>\n<strong>Conjunction-based segmentation strategy:</strong> After the text has been EDU-segmented, if there are conjunctions at the start or end of each segment, the conjunctions will be isolated as its own segment.<br><br>\n<strong>Default segmentation strategy: </strong> No post-processing occurs after the text has been EDU-segmented <br><br>\n```\nfrom edu_segmentation.main import DefaultSegmentation, ConjunctionSegmentation\n```\n\n2. Create a model using the `ModelFactory`. <br><br>\nChoose from BERT Uncased, BERT Cased, or BART models.\n\n```\nmodel_type = \"bert_uncased\"  # or \"bert_cased\", \"bart\"\nmodel = ModelFactory.create_model(model_type)\n```\n\n3. create an instance of `EDUSegmentation` using the chosen model: <br>\n```\nedu_segmenter = EDUSegmentation(model)\n```\n\n4. Segment the text using the chosen strategy: <br>\n```\ntext = \"Your input text here.\"\ngranularity = \"conjunction_words\"  # or \"default\"\nconjunctions = [\"and\", \"but\", \"however\"]  # Customize conjunctions if needed\ndevice = 'cpu'  # Choose your device, e.g., 'cuda:0'\n\nsegmented_output = edu_segmenter.run(text, granularity, conjunctions, device)\n```\n\n\n### Example\n\nHere's a simple example demonstrating how to use the edu_segmentation module:\n\n```\nfrom edu_segmentation.download import download_models\nfrom edu_segmentation.main import ModelFactory, EDUSegmentation\n\ndownload_models()\n\n# Create a BERT Uncased model\nmodel = ModelFactory.create_model(\"bart\") # or bert_cased or bert_uncased\n\n# Create an instance of EDUSegmentation using the model\nedu_segmenter = EDUSegmentation(model)\n\n# Segment the text using the conjunction-based segmentation strategy\ntext = \"The food is good, but the service is bad.\"\ngranularity = \"conjunction_words\" # or default\nconjunctions = [\"and\", \"but\", \"however\"] # customise as needed\ndevice = 'cpu' # or cuda\n\nsegmented_output = edu_segmenter.run(text, granularity, conjunctions, device)\nprint(segmented_output)\n```\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "To improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.",
    "version": "0.0.115",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b3d2ec9a838336c10f40da19183e85dcf5c8a45df8a7218cb6765c6a422bbfd1",
                "md5": "5e33cf11400aa2388296611fd2cce805",
                "sha256": "4d36694d8f38b62cbd80ae97067d281bfe7d1897fb702cf1dbd639e9dc2fd3a7"
            },
            "downloads": -1,
            "filename": "edu_segmentation-0.0.115-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5e33cf11400aa2388296611fd2cce805",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 327174,
            "upload_time": "2023-08-13T09:31:14",
            "upload_time_iso_8601": "2023-08-13T09:31:14.889427Z",
            "url": "https://files.pythonhosted.org/packages/b3/d2/ec9a838336c10f40da19183e85dcf5c8a45df8a7218cb6765c6a422bbfd1/edu_segmentation-0.0.115-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "15bd38d3ce7563f5282abbd66e9824975c0a84fa32f9b38bc57127b7e4cfe67c",
                "md5": "eea702e2157258dcae8a731d51ab2c4d",
                "sha256": "7ed7151461a2ffb21f3dfafae0f2262b9d3e7fce13b93c75582e1bd8f81d827a"
            },
            "downloads": -1,
            "filename": "edu_segmentation-0.0.115.tar.gz",
            "has_sig": false,
            "md5_digest": "eea702e2157258dcae8a731d51ab2c4d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 317046,
            "upload_time": "2023-08-13T09:31:17",
            "upload_time_iso_8601": "2023-08-13T09:31:17.190257Z",
            "url": "https://files.pythonhosted.org/packages/15/bd/38d3ce7563f5282abbd66e9824975c0a84fa32f9b38bc57127b7e4cfe67c/edu_segmentation-0.0.115.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-13 09:31:17",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "edu-segmentation"
}
        
Elapsed time: 0.11123s