creolenltk


Namecreolenltk JSON
Version 1.0.3 PyPI version JSON
download
home_pagehttps://github.com/jcblanc2/CreoleNLTK.git
SummaryA Python library for Creole text preprocessing
upload_time2024-02-01 03:42:06
maintainer
docs_urlNone
authorJohn Clayton
requires_python
license
keywords python nlp creole natural language processing text preprocessing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # CreoleNLTK: Creole Natural Language Toolkit

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Python Version](https://img.shields.io/badge/python-3.6%2B-blue)](https://www.python.org/downloads/)
[![Build Status](https://travis-ci.org/jcblanc2/CreoleNLTK.svg?branch=main)](https://travis-ci.org/jcblanc2/CreoleNLTK)

CreoleNLTK is a Python library designed for preprocessing Creole text. The library includes various functions and tools to prepare text data for natural language processing (NLP) tasks. It provides functionality for cleaning, tokenization, lowercasing, stopword removal, contraction to expansion, and spelling checking.

## Features

- **Spelling Check:** Identify and correct spelling errors.
- **Contraction to Expansion:** Expand contractions in the text.
- **Stopword Removal:** Remove common words that do not contribute much to the meaning.
- **Tokenization:** Break the text into words or tokens.
- **Text Cleaning:** Remove unwanted characters and clean the text.

## Installation

You can install CreoleNLTK using pip:

```bash
pip install creolenltk
```

## Usage

### Spelling Checker

````python
# -*- coding: utf-8 -*-

from creolenltk.spelling_checker import SpellingChecker

# Initialize the spelling checker
spell_checker = SpellingChecker()

# Correct spelling errors in a word
corrected_word = spell_checker.correction('òtgraf')

print(f"Original Word: òtgraf, Corrected Word: {corrected_word}") # òtograf
````

### Contraction to Expansion

````python
from creolenltk.contraction_expansion import ContractionToExpansion

# Initialize the contraction expander
contraction_expander = ContractionToExpansion()

# Expand contractions in a sentence
original_sentence = "L'ap manje. m'ap rete lakay mw."
expanded_sentence = contraction_expander.expand_contractions(original_sentence)

print(f"Original Sentence: {original_sentence}\nExpanded Sentence: {expanded_sentence}") # li ap manje. mwen ap rete lakay mwen.
````

### Stopword Removal

````python
# -*- coding: utf-8 -*-

from creolenltk.stopword import Stopword

# Initialize the stopword handler
stopword_handler = Stopword()

# Remove stopwords from a sentence
sentence_with_stopwords = "Sa se yon fraz tès ak kèk stopwords nan Kreyòl Ayisyen."
sentence_without_stopwords = stopword_handler.remove_stopwords(sentence_with_stopwords)

print(f"Sentence with Stopwords: {sentence_with_stopwords}\nWithout Stopwords: {sentence_without_stopwords}") # fraz tès stopwords Kreyòl Ayisyen.
````

### Tokenizer

````python
from creolenltk.tokenizer import Tokenizer

# Initialize the tokenizer
tokenizer = Tokenizer()

# Tokenize a sentence
sentence = "Sa se yon fraz senp"
tokens = tokenizer.word_tokenize(sentence, expand_contractions=True, lowercase=True)

print(f"Sentence: {sentence}\nTokens: {tokens}") # ["sa", "se", "yon", "fraz", "senp"]
````
For more detailed usage and examples, refer to the [documentation](https://pypi.org/project/creolenltk/).

## License

MIT licensed. See the bundled [LICENSE](LICENSE) file for more details.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jcblanc2/CreoleNLTK.git",
    "name": "creolenltk",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "python,nlp,creole,natural language processing,text preprocessing",
    "author": "John Clayton",
    "author_email": "jclaytonblanc@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ad/7b/6b85187674b46740a880e7e28f5b8e3894eb5664fd79cace7842f368625a/creolenltk-1.0.3.tar.gz",
    "platform": null,
    "description": "# CreoleNLTK: Creole Natural Language Toolkit\r\n\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\r\n[![Python Version](https://img.shields.io/badge/python-3.6%2B-blue)](https://www.python.org/downloads/)\r\n[![Build Status](https://travis-ci.org/jcblanc2/CreoleNLTK.svg?branch=main)](https://travis-ci.org/jcblanc2/CreoleNLTK)\r\n\r\nCreoleNLTK is a Python library designed for preprocessing Creole text. The library includes various functions and tools to prepare text data for natural language processing (NLP) tasks. It provides functionality for cleaning, tokenization, lowercasing, stopword removal, contraction to expansion, and spelling checking.\r\n\r\n## Features\r\n\r\n- **Spelling Check:** Identify and correct spelling errors.\r\n- **Contraction to Expansion:** Expand contractions in the text.\r\n- **Stopword Removal:** Remove common words that do not contribute much to the meaning.\r\n- **Tokenization:** Break the text into words or tokens.\r\n- **Text Cleaning:** Remove unwanted characters and clean the text.\r\n\r\n## Installation\r\n\r\nYou can install CreoleNLTK using pip:\r\n\r\n```bash\r\npip install creolenltk\r\n```\r\n\r\n## Usage\r\n\r\n### Spelling Checker\r\n\r\n````python\r\n# -*- coding: utf-8 -*-\r\n\r\nfrom creolenltk.spelling_checker import SpellingChecker\r\n\r\n# Initialize the spelling checker\r\nspell_checker = SpellingChecker()\r\n\r\n# Correct spelling errors in a word\r\ncorrected_word = spell_checker.correction('\u00c3\u00b2tgraf')\r\n\r\nprint(f\"Original Word: \u00c3\u00b2tgraf, Corrected Word: {corrected_word}\") # \u00c3\u00b2tograf\r\n````\r\n\r\n### Contraction to Expansion\r\n\r\n````python\r\nfrom creolenltk.contraction_expansion import ContractionToExpansion\r\n\r\n# Initialize the contraction expander\r\ncontraction_expander = ContractionToExpansion()\r\n\r\n# Expand contractions in a sentence\r\noriginal_sentence = \"L'ap manje. m'ap rete lakay mw.\"\r\nexpanded_sentence = contraction_expander.expand_contractions(original_sentence)\r\n\r\nprint(f\"Original Sentence: {original_sentence}\\nExpanded Sentence: {expanded_sentence}\") # li ap manje. mwen ap rete lakay mwen.\r\n````\r\n\r\n### Stopword Removal\r\n\r\n````python\r\n# -*- coding: utf-8 -*-\r\n\r\nfrom creolenltk.stopword import Stopword\r\n\r\n# Initialize the stopword handler\r\nstopword_handler = Stopword()\r\n\r\n# Remove stopwords from a sentence\r\nsentence_with_stopwords = \"Sa se yon fraz t\u00c3\u00a8s ak k\u00c3\u00a8k stopwords nan Krey\u00c3\u00b2l Ayisyen.\"\r\nsentence_without_stopwords = stopword_handler.remove_stopwords(sentence_with_stopwords)\r\n\r\nprint(f\"Sentence with Stopwords: {sentence_with_stopwords}\\nWithout Stopwords: {sentence_without_stopwords}\") # fraz t\u00c3\u00a8s stopwords Krey\u00c3\u00b2l Ayisyen.\r\n````\r\n\r\n### Tokenizer\r\n\r\n````python\r\nfrom creolenltk.tokenizer import Tokenizer\r\n\r\n# Initialize the tokenizer\r\ntokenizer = Tokenizer()\r\n\r\n# Tokenize a sentence\r\nsentence = \"Sa se yon fraz senp\"\r\ntokens = tokenizer.word_tokenize(sentence, expand_contractions=True, lowercase=True)\r\n\r\nprint(f\"Sentence: {sentence}\\nTokens: {tokens}\") # [\"sa\", \"se\", \"yon\", \"fraz\", \"senp\"]\r\n````\r\nFor more detailed usage and examples, refer to the [documentation](https://pypi.org/project/creolenltk/).\r\n\r\n## License\r\n\r\nMIT licensed. See the bundled [LICENSE](LICENSE) file for more details.\r\n\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A Python library for Creole text preprocessing",
    "version": "1.0.3",
    "project_urls": {
        "Homepage": "https://github.com/jcblanc2/CreoleNLTK.git"
    },
    "split_keywords": [
        "python",
        "nlp",
        "creole",
        "natural language processing",
        "text preprocessing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "34c600bae18a12f9414890f02e19b298864cf891f46a87dbc37468b9322af2c5",
                "md5": "1def5931efc52c6bc375da340d60d895",
                "sha256": "5e140a8a00a53af9c31249faa430bbda8e10103ec28453312253c7990699eaa3"
            },
            "downloads": -1,
            "filename": "creolenltk-1.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1def5931efc52c6bc375da340d60d895",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 7950,
            "upload_time": "2024-02-01T03:42:01",
            "upload_time_iso_8601": "2024-02-01T03:42:01.601407Z",
            "url": "https://files.pythonhosted.org/packages/34/c6/00bae18a12f9414890f02e19b298864cf891f46a87dbc37468b9322af2c5/creolenltk-1.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ad7b6b85187674b46740a880e7e28f5b8e3894eb5664fd79cace7842f368625a",
                "md5": "76b47fa2e3c1803f3a54e5350283f3e2",
                "sha256": "3d33a074c8dd8d7fd29d07c329d04b38ef12d80afe2cfe9c821e4bf45b73aba7"
            },
            "downloads": -1,
            "filename": "creolenltk-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "76b47fa2e3c1803f3a54e5350283f3e2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 6762,
            "upload_time": "2024-02-01T03:42:06",
            "upload_time_iso_8601": "2024-02-01T03:42:06.856818Z",
            "url": "https://files.pythonhosted.org/packages/ad/7b/6b85187674b46740a880e7e28f5b8e3894eb5664fd79cace7842f368625a/creolenltk-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-01 03:42:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jcblanc2",
    "github_project": "CreoleNLTK",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "creolenltk"
}
        
Elapsed time: 0.25130s