NLarge

Name	NLarge JSON
Version	1.0.0 JSON
	download
home_page	None
Summary	Data augmentation for NLP
upload_time	2024-11-15 06:43:53
maintainer	None
docs_url	None
author	Ng Tze Kean
requires_python	<4.0,>=3.12
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # SC4001 NLarge

## Purpose of Project

NLarge is a project focused on exploring and implementing various data
augmentation techniques for Natural Language Processing (NLP) tasks. The primary
goal is to enhance the diversity and robustness of training datasets, thereby
improving the performance and generalization capabilities of NLP models. This
project includes traditional data augmentation methods such as synonym
replacement and random substitution, as well as advanced techniques using Large
Language Models (LLMs).

## Initializing Virtual Environment

We use Poetry in this project for dependency management. To get started, you
will need to install Poetry.

```shell
pip install poetry
```

Afterwards, you can install the needed packages from Python with the help of
Poetry using the command below:

```shell
poetry install
```

## Repository Contents

- [`report.tex`](report/report.tex): The LaTeX document containing the detailed
  report of the project, including methodology, experiments, results, and
  analysis.
- [`example/`](example): Contains example scripts for data augmentation and
  model training.
  - [`demo.ipynb`](example/demo.ipynb)
  - [`Result of test`](example/test/)
- [`NLarge/`](NLarge): The main package containing the data augmentation and
  model implementation.
  - [`dataset_concat.py`](NLarge/dataset_concat.py)
  - [`llm.py`](NLarge/llm.py)
  - [`pipeline.py`](NLarge/pipeline.py)
  - [`random.py`](NLarge/random.py)
  - [`synonym.py`](NLarge/synonym.py)
  - [`utils/`](NLarge/utils)

## Usage

To run the models and experiments, you can use the python notebooks in the
`example/` directory. The notebooks contain detailed explanations and code
snippets for data augmentation and model training. For the results of the
experiments, you can refer to the `example/test/` directory.

We also refer the user to [`demo_attention.ipynb`](example/demo_attention.ipynb)
for a more detailed example of how to use the `pipeline.py` module. The notebook
contains the code for training a model with attention mechanism using the
NLarge library as a toolkit for data augmentation.

### Compute limitation

Should you face computational limitation, you can use the datasets that we 
have preprocessed and saved in the `example/llm-dataset/` directory. As the inference
time for the Large Language Models (LLMs) can be quite long, we have preprocessed
in advance such that end users can directly use the preprocessed datasets for
training and testing purposes.

## Development

While the library has been developed and tested, the library can be easily 
extended with additional data augmentation techniques or with new models to
support the testing and research of the performance of different augmentation
techniques.

The library can be easily extended with additional data augmentation techniques
through creation of new modules or files in the `NLarge` package.

## Website

You can access the PiPy page of the project from the link here:
[pypi page](https://pypi.org/project/NLarge/)

Our github repository can be found here:
[github page](https://github.com/HiIAmTzeKean/SC4001-NLarge)

## Contributing

Contributions to this project are welcome. If you have any suggestions or
improvements, please create a pull request or open an issue.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "NLarge",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.12",
    "maintainer_email": null,
    "keywords": null,
    "author": "Ng Tze Kean",
    "author_email": "ngtzekean@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/88/41/d289d01d6067aafaf166c7a0b8beea11d9cd57222af512bc7aa04e381e46/nlarge-1.0.0.tar.gz",
    "platform": null,
    "description": "# SC4001 NLarge\n\n## Purpose of Project\n\nNLarge is a project focused on exploring and implementing various data\naugmentation techniques for Natural Language Processing (NLP) tasks. The primary\ngoal is to enhance the diversity and robustness of training datasets, thereby\nimproving the performance and generalization capabilities of NLP models. This\nproject includes traditional data augmentation methods such as synonym\nreplacement and random substitution, as well as advanced techniques using Large\nLanguage Models (LLMs).\n\n## Initializing Virtual Environment\n\nWe use Poetry in this project for dependency management. To get started, you\nwill need to install Poetry.\n\n```shell\npip install poetry\n```\n\nAfterwards, you can install the needed packages from Python with the help of\nPoetry using the command below:\n\n```shell\npoetry install\n```\n\n## Repository Contents\n\n- [`report.tex`](report/report.tex): The LaTeX document containing the detailed\n  report of the project, including methodology, experiments, results, and\n  analysis.\n- [`example/`](example): Contains example scripts for data augmentation and\n  model training.\n  - [`demo.ipynb`](example/demo.ipynb)\n  - [`Result of test`](example/test/)\n- [`NLarge/`](NLarge): The main package containing the data augmentation and\n  model implementation.\n  - [`dataset_concat.py`](NLarge/dataset_concat.py)\n  - [`llm.py`](NLarge/llm.py)\n  - [`pipeline.py`](NLarge/pipeline.py)\n  - [`random.py`](NLarge/random.py)\n  - [`synonym.py`](NLarge/synonym.py)\n  - [`utils/`](NLarge/utils)\n\n## Usage\n\nTo run the models and experiments, you can use the python notebooks in the\n`example/` directory. The notebooks contain detailed explanations and code\nsnippets for data augmentation and model training. For the results of the\nexperiments, you can refer to the `example/test/` directory.\n\nWe also refer the user to [`demo_attention.ipynb`](example/demo_attention.ipynb)\nfor a more detailed example of how to use the `pipeline.py` module. The notebook\ncontains the code for training a model with attention mechanism using the\nNLarge library as a toolkit for data augmentation.\n\n### Compute limitation\n\nShould you face computational limitation, you can use the datasets that we \nhave preprocessed and saved in the `example/llm-dataset/` directory. As the inference\ntime for the Large Language Models (LLMs) can be quite long, we have preprocessed\nin advance such that end users can directly use the preprocessed datasets for\ntraining and testing purposes.\n\n## Development\n\nWhile the library has been developed and tested, the library can be easily \nextended with additional data augmentation techniques or with new models to\nsupport the testing and research of the performance of different augmentation\ntechniques.\n\nThe library can be easily extended with additional data augmentation techniques\nthrough creation of new modules or files in the `NLarge` package.\n\n## Website\n\nYou can access the PiPy page of the project from the link here:\n[pypi page](https://pypi.org/project/NLarge/)\n\nOur github repository can be found here:\n[github page](https://github.com/HiIAmTzeKean/SC4001-NLarge)\n\n## Contributing\n\nContributions to this project are welcome. If you have any suggestions or\nimprovements, please create a pull request or open an issue.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Data augmentation for NLP",
    "version": "1.0.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4a526efbed5c0a582aa9d9be14413a9d1ea746c0115ad8a0c4e81f469fe5f3c6",
                "md5": "e1eedc976333a62438ceb563d15aca2d",
                "sha256": "02e759184f0aeeeb890559f861ff1c50ea0593dc6ac77fe08c7dce93a90c0d54"
            },
            "downloads": -1,
            "filename": "nlarge-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e1eedc976333a62438ceb563d15aca2d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.12",
            "size": 19601,
            "upload_time": "2024-11-15T06:43:51",
            "upload_time_iso_8601": "2024-11-15T06:43:51.268427Z",
            "url": "https://files.pythonhosted.org/packages/4a/52/6efbed5c0a582aa9d9be14413a9d1ea746c0115ad8a0c4e81f469fe5f3c6/nlarge-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8841d289d01d6067aafaf166c7a0b8beea11d9cd57222af512bc7aa04e381e46",
                "md5": "add5308f50b1f0f567394b573dc2fcc8",
                "sha256": "6e442ada09d1ed507168396c76f202d742de4cf0c402bdadc01435df89c210a6"
            },
            "downloads": -1,
            "filename": "nlarge-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "add5308f50b1f0f567394b573dc2fcc8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.12",
            "size": 25400,
            "upload_time": "2024-11-15T06:43:53",
            "upload_time_iso_8601": "2024-11-15T06:43:53.310594Z",
            "url": "https://files.pythonhosted.org/packages/88/41/d289d01d6067aafaf166c7a0b8beea11d9cd57222af512bc7aa04e381e46/nlarge-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-15 06:43:53",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "nlarge"
}

Ng Tze Kean