nahiarhdNLP


NamenahiarhdNLP JSON
Version 1.0.7 PyPI version JSON
download
home_pageNone
SummaryAdvanced Indonesian Natural Language Processing Library
upload_time2025-07-18 09:04:08
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords nlp indonesian natural-language-processing text-processing bahasa-indonesia
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # nahiarhdNLP - Advanced Indonesian Natural Language Processing Library

Advanced Indonesian Natural Language Processing Library dengan fitur preprocessing teks, normalisasi slang, konversi emoji, koreksi ejaan, dan banyak lagi.

## ๐Ÿš€ Instalasi

```bash
pip install nahiarhdNLP
```

## ๐Ÿ“ฆ Import Library

```python
# Import package utama
import nahiarhdNLP

# Import module preprocessing
from nahiarhdNLP import preprocessing

# Import module datasets
from nahiarhdNLP import datasets

# Atau import fungsi spesifik
from nahiarhdNLP.preprocessing import preprocess, remove_html, replace_slang
```

## Contoh Penggunaan

### 1. ๐ŸŽฏ Fungsi Preprocess All-in-One

```python
from nahiarhdNLP import preprocessing

# Preprocessing lengkap dengan satu fungsi
teks = "Halooo emg siapa yg nanya? ๐Ÿ˜€"
hasil = preprocessing.preprocess(teks)
print(hasil)
# Output: "halo wajah_gembira"
```

### 2. ๐Ÿงน TextCleaner - Membersihkan Teks

```python
from nahiarhdNLP.preprocessing import TextCleaner

cleaner = TextCleaner()

# Membersihkan URL
url_text = "kunjungi https://google.com sekarang!"
clean_result = cleaner.clean_urls(url_text)
print(clean_result)
# Output: "kunjungi  sekarang!"

# Membersihkan mentions
mention_text = "Halo @user123 apa kabar?"
clean_result = cleaner.clean_mentions(mention_text)
print(clean_result)
# Output: "Halo  apa kabar?"
```

### 3. โœ๏ธ SpellCorrector - Koreksi Ejaan

```python
from nahiarhdNLP.preprocessing import SpellCorrector

spell = SpellCorrector()

# Koreksi kata
word = "mencri"
corrected = spell.correct(word)
print(corrected)
# Output: "mencuri"

# Koreksi kalimat
sentence = "saya mencri informsi"
corrected = spell.correct_sentence(sentence)
print(corrected)
# Output: "saya mencuri informasi"
```

### 4. ๐Ÿšซ StopwordRemover - Menghapus Stopwords

```python
from nahiarhdNLP.preprocessing import StopwordRemover

stopword = StopwordRemover()

# Menghapus stopwords
text = "saya suka makan nasi goreng"
result = stopword.remove_stopwords(text)
print(result)
# Output: "suka makan nasi goreng"
```

### 5. ๐Ÿ”„ SlangNormalizer - Normalisasi Slang

```python
from nahiarhdNLP.preprocessing import SlangNormalizer

slang = SlangNormalizer()

# Normalisasi kata slang
text = "gw lg di rmh"
result = slang.normalize(text)
print(result)
# Output: "saya lagi di rumah"
```

### 6. ๐Ÿ˜€ EmojiConverter - Konversi Emoji

```python
from nahiarhdNLP.preprocessing import EmojiConverter

emoji = EmojiConverter()

# Emoji ke teks
emoji_text = "๐Ÿ˜€ ๐Ÿ˜‚ ๐Ÿ˜"
text_result = emoji.emoji_to_text_convert(emoji_text)
print(text_result)
# Output: "wajah_gembira wajah_tertawa wajah_bercinta"

# Teks ke emoji
text = "wajah_gembira"
emoji_result = emoji.text_to_emoji_convert(text)
print(emoji_result)
# Output: "๐Ÿ˜€"
```

### 7. ๐Ÿ”ช Tokenizer - Tokenisasi

```python
from nahiarhdNLP.preprocessing import Tokenizer

tokenizer = Tokenizer()

# Tokenisasi teks
text = "ini contoh tokenisasi"
tokens = tokenizer.tokenize(text)
print(tokens)
# Output: ['ini', 'contoh', 'tokenisasi']
```

### 8. ๐Ÿ› ๏ธ Fungsi Individual

```python
from nahiarhdNLP.preprocessing import (
    remove_html, remove_url, remove_mentions,
    replace_slang, emoji_to_words, correct_spelling
)

# Menghapus HTML
html_text = "website <a href='https://google.com'>google</a>"
clean_text = remove_html(html_text)
print(clean_text)
# Output: "website google"

# Menghapus URL
url_text = "kunjungi https://google.com sekarang!"
clean_text = remove_url(url_text)
print(clean_text)
# Output: "kunjungi  sekarang!"

# Menghapus mentions
mention_text = "Halo @user123 apa kabar?"
clean_text = remove_mentions(mention_text)
print(clean_text)
# Output: "Halo  apa kabar?"

# Normalisasi slang
slang_text = "emg siapa yg nanya?"
normal_text = replace_slang(slang_text)
print(normal_text)
# Output: "memang siapa yang bertanya?"

# Konversi emoji
emoji_text = "๐Ÿ˜€ ๐Ÿ˜‚ ๐Ÿ˜"
text_result = emoji_to_words(emoji_text)
print(text_result)
# Output: "wajah_gembira wajah_tertawa wajah_bercinta"

# Koreksi ejaan
spell_text = "saya mencri informsi"
corrected = correct_spelling(spell_text)
print(corrected)
# Output: "saya mencuri informasi"
```

### 9. ๐Ÿ“Š Dataset Loader

```python
from nahiarhdNLP.datasets import DatasetLoader

loader = DatasetLoader()

# Load stopwords (dari file CSV lokal)
stopwords = loader.load_stopwords_dataset()
print(f"Jumlah stopwords: {len(stopwords)}")

# Load slang dictionary (dari file CSV lokal)
slang_dict = loader.load_slang_dataset()
print(f"Jumlah slang: {len(slang_dict)}")

# Load emoji dictionary (dari file CSV lokal)
emoji_dict = loader.load_emoji_dataset()
print(f"Jumlah emoji: {len(emoji_dict)}")
```

> **Catatan:** Semua dataset (stopword, slang, emoji) di-load langsung dari file CSV di folder `nahiarhdNLP/datasets/`. Tidak ada proses cache atau download dari HuggingFace.

### 10. ๐Ÿ”„ Pipeline Custom

```python
from nahiarhdNLP.preprocessing import pipeline, replace_word_elongation, replace_slang

# Buat pipeline custom
custom_pipeline = pipeline([
    replace_word_elongation,
    replace_slang
])

# Jalankan pipeline
text = "Knp emg gk mw makan kenapaaa???"
result = custom_pipeline(text)
print(result)
# Output: "mengapa memang tidak mau makan mengapa???"
```

## โš™๏ธ Parameter Preprocess

Fungsi `preprocess()` memiliki parameter opsional:

```python
result = nahiarhdNLP.preprocessing.preprocess(
    text="Halooo emg siapa yg nanya? ๐Ÿ˜€",
    remove_html_tags=True,      # Hapus HTML tags
    remove_urls=True,           # Hapus URL
    remove_stopwords_flag=True, # Hapus stopwords
    replace_slang_flag=True,    # Normalisasi slang
    replace_elongation=True,    # Atasi word elongation
    convert_emoji=True,         # Konversi emoji
    correct_spelling_flag=False,# Koreksi ejaan (lambat)
    stem_text_flag=False,       # Stemming
    to_lowercase=True           # Lowercase
)
```

## ๐Ÿšจ Error Handling

```python
try:
    from nahiarhdNLP import preprocessing
    result = preprocessing.preprocess("test")
except ImportError:
    print("Package nahiarhdNLP belum terinstall")
    print("Install dengan: pip install nahiarhdNLP")
except Exception as e:
    print(f"Error: {e}")
```

## ๐Ÿ’ก Tips Penggunaan

1. **Untuk preprocessing cepat**: Gunakan `preprocess()` dengan parameter default
2. **Untuk kontrol penuh**: Gunakan kelas individual (`TextCleaner`, `SpellCorrector`, dll)
3. **Untuk kustomisasi**: Gunakan `pipeline()` dengan fungsi yang diinginkan
4. **Untuk koreksi ejaan**: Aktifkan `correct_spelling_flag=True` (tapi lebih lambat)
5. **Untuk stemming**: Aktifkan `stem_text_flag=True` (perlu install Sastrawi)
6. **Untuk performa optimal**: Dataset akan di-cache otomatis setelah download pertama
7. **Untuk development**: Gunakan fallback data jika HuggingFace down

## โšก Performance & Caching

Mulai versi terbaru, nahiarhdNLP **tidak lagi menggunakan cache atau download dataset dari HuggingFace**. Semua dataset di-load langsung dari file CSV lokal yang sudah disediakan di folder `nahiarhdNLP/datasets/`.

- Tidak ada proses cache otomatis
- Tidak ada fallback data
- Tidak ada dependensi ke HuggingFace untuk dataset

## ๐Ÿ“ฆ Dependencies

Package ini membutuhkan:

- `pandas` - untuk load dan proses dataset CSV
- `sastrawi` - untuk stemming (opsional)
- `rich` - untuk output formatting

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "nahiarhdNLP",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "nlp, indonesian, natural-language-processing, text-processing, bahasa-indonesia",
    "author": null,
    "author_email": "Raihan Hidayatullah Djunaedi <raihanhd.dev@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/64/21/a6ae8b27c7a5fbbf6a88a1595157ad0f19027eb0ba19dbd725f52388291b/nahiarhdnlp-1.0.7.tar.gz",
    "platform": null,
    "description": "# nahiarhdNLP - Advanced Indonesian Natural Language Processing Library\n\nAdvanced Indonesian Natural Language Processing Library dengan fitur preprocessing teks, normalisasi slang, konversi emoji, koreksi ejaan, dan banyak lagi.\n\n## \ud83d\ude80 Instalasi\n\n```bash\npip install nahiarhdNLP\n```\n\n## \ud83d\udce6 Import Library\n\n```python\n# Import package utama\nimport nahiarhdNLP\n\n# Import module preprocessing\nfrom nahiarhdNLP import preprocessing\n\n# Import module datasets\nfrom nahiarhdNLP import datasets\n\n# Atau import fungsi spesifik\nfrom nahiarhdNLP.preprocessing import preprocess, remove_html, replace_slang\n```\n\n## Contoh Penggunaan\n\n### 1. \ud83c\udfaf Fungsi Preprocess All-in-One\n\n```python\nfrom nahiarhdNLP import preprocessing\n\n# Preprocessing lengkap dengan satu fungsi\nteks = \"Halooo emg siapa yg nanya? \ud83d\ude00\"\nhasil = preprocessing.preprocess(teks)\nprint(hasil)\n# Output: \"halo wajah_gembira\"\n```\n\n### 2. \ud83e\uddf9 TextCleaner - Membersihkan Teks\n\n```python\nfrom nahiarhdNLP.preprocessing import TextCleaner\n\ncleaner = TextCleaner()\n\n# Membersihkan URL\nurl_text = \"kunjungi https://google.com sekarang!\"\nclean_result = cleaner.clean_urls(url_text)\nprint(clean_result)\n# Output: \"kunjungi  sekarang!\"\n\n# Membersihkan mentions\nmention_text = \"Halo @user123 apa kabar?\"\nclean_result = cleaner.clean_mentions(mention_text)\nprint(clean_result)\n# Output: \"Halo  apa kabar?\"\n```\n\n### 3. \u270f\ufe0f SpellCorrector - Koreksi Ejaan\n\n```python\nfrom nahiarhdNLP.preprocessing import SpellCorrector\n\nspell = SpellCorrector()\n\n# Koreksi kata\nword = \"mencri\"\ncorrected = spell.correct(word)\nprint(corrected)\n# Output: \"mencuri\"\n\n# Koreksi kalimat\nsentence = \"saya mencri informsi\"\ncorrected = spell.correct_sentence(sentence)\nprint(corrected)\n# Output: \"saya mencuri informasi\"\n```\n\n### 4. \ud83d\udeab StopwordRemover - Menghapus Stopwords\n\n```python\nfrom nahiarhdNLP.preprocessing import StopwordRemover\n\nstopword = StopwordRemover()\n\n# Menghapus stopwords\ntext = \"saya suka makan nasi goreng\"\nresult = stopword.remove_stopwords(text)\nprint(result)\n# Output: \"suka makan nasi goreng\"\n```\n\n### 5. \ud83d\udd04 SlangNormalizer - Normalisasi Slang\n\n```python\nfrom nahiarhdNLP.preprocessing import SlangNormalizer\n\nslang = SlangNormalizer()\n\n# Normalisasi kata slang\ntext = \"gw lg di rmh\"\nresult = slang.normalize(text)\nprint(result)\n# Output: \"saya lagi di rumah\"\n```\n\n### 6. \ud83d\ude00 EmojiConverter - Konversi Emoji\n\n```python\nfrom nahiarhdNLP.preprocessing import EmojiConverter\n\nemoji = EmojiConverter()\n\n# Emoji ke teks\nemoji_text = \"\ud83d\ude00 \ud83d\ude02 \ud83d\ude0d\"\ntext_result = emoji.emoji_to_text_convert(emoji_text)\nprint(text_result)\n# Output: \"wajah_gembira wajah_tertawa wajah_bercinta\"\n\n# Teks ke emoji\ntext = \"wajah_gembira\"\nemoji_result = emoji.text_to_emoji_convert(text)\nprint(emoji_result)\n# Output: \"\ud83d\ude00\"\n```\n\n### 7. \ud83d\udd2a Tokenizer - Tokenisasi\n\n```python\nfrom nahiarhdNLP.preprocessing import Tokenizer\n\ntokenizer = Tokenizer()\n\n# Tokenisasi teks\ntext = \"ini contoh tokenisasi\"\ntokens = tokenizer.tokenize(text)\nprint(tokens)\n# Output: ['ini', 'contoh', 'tokenisasi']\n```\n\n### 8. \ud83d\udee0\ufe0f Fungsi Individual\n\n```python\nfrom nahiarhdNLP.preprocessing import (\n    remove_html, remove_url, remove_mentions,\n    replace_slang, emoji_to_words, correct_spelling\n)\n\n# Menghapus HTML\nhtml_text = \"website <a href='https://google.com'>google</a>\"\nclean_text = remove_html(html_text)\nprint(clean_text)\n# Output: \"website google\"\n\n# Menghapus URL\nurl_text = \"kunjungi https://google.com sekarang!\"\nclean_text = remove_url(url_text)\nprint(clean_text)\n# Output: \"kunjungi  sekarang!\"\n\n# Menghapus mentions\nmention_text = \"Halo @user123 apa kabar?\"\nclean_text = remove_mentions(mention_text)\nprint(clean_text)\n# Output: \"Halo  apa kabar?\"\n\n# Normalisasi slang\nslang_text = \"emg siapa yg nanya?\"\nnormal_text = replace_slang(slang_text)\nprint(normal_text)\n# Output: \"memang siapa yang bertanya?\"\n\n# Konversi emoji\nemoji_text = \"\ud83d\ude00 \ud83d\ude02 \ud83d\ude0d\"\ntext_result = emoji_to_words(emoji_text)\nprint(text_result)\n# Output: \"wajah_gembira wajah_tertawa wajah_bercinta\"\n\n# Koreksi ejaan\nspell_text = \"saya mencri informsi\"\ncorrected = correct_spelling(spell_text)\nprint(corrected)\n# Output: \"saya mencuri informasi\"\n```\n\n### 9. \ud83d\udcca Dataset Loader\n\n```python\nfrom nahiarhdNLP.datasets import DatasetLoader\n\nloader = DatasetLoader()\n\n# Load stopwords (dari file CSV lokal)\nstopwords = loader.load_stopwords_dataset()\nprint(f\"Jumlah stopwords: {len(stopwords)}\")\n\n# Load slang dictionary (dari file CSV lokal)\nslang_dict = loader.load_slang_dataset()\nprint(f\"Jumlah slang: {len(slang_dict)}\")\n\n# Load emoji dictionary (dari file CSV lokal)\nemoji_dict = loader.load_emoji_dataset()\nprint(f\"Jumlah emoji: {len(emoji_dict)}\")\n```\n\n> **Catatan:** Semua dataset (stopword, slang, emoji) di-load langsung dari file CSV di folder `nahiarhdNLP/datasets/`. Tidak ada proses cache atau download dari HuggingFace.\n\n### 10. \ud83d\udd04 Pipeline Custom\n\n```python\nfrom nahiarhdNLP.preprocessing import pipeline, replace_word_elongation, replace_slang\n\n# Buat pipeline custom\ncustom_pipeline = pipeline([\n    replace_word_elongation,\n    replace_slang\n])\n\n# Jalankan pipeline\ntext = \"Knp emg gk mw makan kenapaaa???\"\nresult = custom_pipeline(text)\nprint(result)\n# Output: \"mengapa memang tidak mau makan mengapa???\"\n```\n\n## \u2699\ufe0f Parameter Preprocess\n\nFungsi `preprocess()` memiliki parameter opsional:\n\n```python\nresult = nahiarhdNLP.preprocessing.preprocess(\n    text=\"Halooo emg siapa yg nanya? \ud83d\ude00\",\n    remove_html_tags=True,      # Hapus HTML tags\n    remove_urls=True,           # Hapus URL\n    remove_stopwords_flag=True, # Hapus stopwords\n    replace_slang_flag=True,    # Normalisasi slang\n    replace_elongation=True,    # Atasi word elongation\n    convert_emoji=True,         # Konversi emoji\n    correct_spelling_flag=False,# Koreksi ejaan (lambat)\n    stem_text_flag=False,       # Stemming\n    to_lowercase=True           # Lowercase\n)\n```\n\n## \ud83d\udea8 Error Handling\n\n```python\ntry:\n    from nahiarhdNLP import preprocessing\n    result = preprocessing.preprocess(\"test\")\nexcept ImportError:\n    print(\"Package nahiarhdNLP belum terinstall\")\n    print(\"Install dengan: pip install nahiarhdNLP\")\nexcept Exception as e:\n    print(f\"Error: {e}\")\n```\n\n## \ud83d\udca1 Tips Penggunaan\n\n1. **Untuk preprocessing cepat**: Gunakan `preprocess()` dengan parameter default\n2. **Untuk kontrol penuh**: Gunakan kelas individual (`TextCleaner`, `SpellCorrector`, dll)\n3. **Untuk kustomisasi**: Gunakan `pipeline()` dengan fungsi yang diinginkan\n4. **Untuk koreksi ejaan**: Aktifkan `correct_spelling_flag=True` (tapi lebih lambat)\n5. **Untuk stemming**: Aktifkan `stem_text_flag=True` (perlu install Sastrawi)\n6. **Untuk performa optimal**: Dataset akan di-cache otomatis setelah download pertama\n7. **Untuk development**: Gunakan fallback data jika HuggingFace down\n\n## \u26a1 Performance & Caching\n\nMulai versi terbaru, nahiarhdNLP **tidak lagi menggunakan cache atau download dataset dari HuggingFace**. Semua dataset di-load langsung dari file CSV lokal yang sudah disediakan di folder `nahiarhdNLP/datasets/`.\n\n- Tidak ada proses cache otomatis\n- Tidak ada fallback data\n- Tidak ada dependensi ke HuggingFace untuk dataset\n\n## \ud83d\udce6 Dependencies\n\nPackage ini membutuhkan:\n\n- `pandas` - untuk load dan proses dataset CSV\n- `sastrawi` - untuk stemming (opsional)\n- `rich` - untuk output formatting\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Advanced Indonesian Natural Language Processing Library",
    "version": "1.0.7",
    "project_urls": {
        "Documentation": "https://example.com",
        "Homepage": "https://example.com",
        "Issues": "https://github.com/raihanhd12/nahiarhdNLP/issues",
        "Repository": "https://github.com/raihanhd12/nahiarhdNLP"
    },
    "split_keywords": [
        "nlp",
        " indonesian",
        " natural-language-processing",
        " text-processing",
        " bahasa-indonesia"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9324ad81462838ddbf2304bec9f746cbcf6ce3a6403f14da21b71747d01d595f",
                "md5": "4407539be9f602258625d13f79689e16",
                "sha256": "18580b5b62a464a755533018753e29a576e331d45393f37c68db69e1a2b19871"
            },
            "downloads": -1,
            "filename": "nahiarhdnlp-1.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4407539be9f602258625d13f79689e16",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 789950,
            "upload_time": "2025-07-18T09:04:06",
            "upload_time_iso_8601": "2025-07-18T09:04:06.242684Z",
            "url": "https://files.pythonhosted.org/packages/93/24/ad81462838ddbf2304bec9f746cbcf6ce3a6403f14da21b71747d01d595f/nahiarhdnlp-1.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6421a6ae8b27c7a5fbbf6a88a1595157ad0f19027eb0ba19dbd725f52388291b",
                "md5": "b1def825d62777a681f71a1cb77f65a0",
                "sha256": "fd71ce4c227b25f2e6f44ee2c9ca9c86a828df21eabfa2d6cd9b43482c80c973"
            },
            "downloads": -1,
            "filename": "nahiarhdnlp-1.0.7.tar.gz",
            "has_sig": false,
            "md5_digest": "b1def825d62777a681f71a1cb77f65a0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 783602,
            "upload_time": "2025-07-18T09:04:08",
            "upload_time_iso_8601": "2025-07-18T09:04:08.109292Z",
            "url": "https://files.pythonhosted.org/packages/64/21/a6ae8b27c7a5fbbf6a88a1595157ad0f19027eb0ba19dbd725f52388291b/nahiarhdnlp-1.0.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-18 09:04:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "raihanhd12",
    "github_project": "nahiarhdNLP",
    "github_not_found": true,
    "lcname": "nahiarhdnlp"
}
        
Elapsed time: 0.46416s