# ๐ฎ๐ฉ Javanese Stemmer
A comprehensive Javanese language stemmer with advanced morphophonological rules support.
## โจ Features
- โ
Comprehensive morphological analysis
- โ
Nasal assimilation handling (ny, ng, m, n)
- โ
Prefix, suffix, infix, and confix support
- โ
Passive and active voice transformation
- โ
Person marking (1st, 2nd person)
- โ
Phonological rule application
- โ
High accuracy stemming
- โ
Sastrawi-compatible interface
## ๐ฆ Installation
```bash
pip install javanese-stemmer
```
## ๐ Quick Start
### Basic Stemming
```python
from javanese_stemmer import stem_word
# Stem a single word
result = stem_word("mangan")
print(result) # Output: "pangan"
```
### Using the Stemmer Class
```python
from javanese_stemmer import JavaneseStemmerLibrary
# Initialize stemmer
stemmer = JavaneseStemmerLibrary()
# Stem words
print(stemmer.stem("mangan")) # "pangan"
print(stemmer.stem("dipangan")) # "pangan"
print(stemmer.stem("tetuku")) # "tuku"
print(stemmer.stem("nyapu")) # "sapu"
print(stemmer.stem("nggawa")) # "gawa"
```
### Stem Sentences
```python
from javanese_stemmer import JavaneseStemmerLibrary
stemmer = JavaneseStemmerLibrary()
sentence = "Aku mangan sega ing warung"
stemmed = stemmer.stem_kalimat(sentence)
print(stemmed)
# Output: ['aku', 'pangan', 'sega', 'ing', 'warung']
```
### Stem Text
```python
from javanese_stemmer import stem_text
text = "Dheweke lagi mangan panganan enak"
stemmed_text = stem_text(text)
print(stemmed_text)
```
### Detailed Morphological Analysis
```python
from javanese_stemmer import JavaneseStemmerLibrary
stemmer = JavaneseStemmerLibrary()
# Get detailed analysis
analysis = stemmer.stem_detailed("dipanganiake")
print(f"Original: {analysis['original']}")
print(f"Stem: {analysis['stem']}")
print(f"Morphemes: {analysis['morphemes']}")
print(f"Confidence: {analysis['confidence']}")
```
## ๐ Complete Examples
### Example 1: Basic Word Stemming
```python
from javanese_stemmer import stem_word
words = ["mangan", "dipangan", "panganan", "takpanganiake"]
for word in words:
print(f"{word:20} โ {stem_word(word)}")
# Output:
# mangan โ pangan
# dipangan โ pangan
# panganan โ pangan
# takpanganiake โ pangan
```
### Example 2: Batch Processing
```python
from javanese_stemmer import JavaneseStemmerLibrary
stemmer = JavaneseStemmerLibrary()
sentences = [
"Aku mangan sega",
"Dheweke lagi turu",
"Bocah-bocah dolanan ing alun-alun"
]
for sentence in sentences:
stemmed = stemmer.stem_kalimat(sentence)
print(f"{sentence:40} โ {' '.join(stemmed)}")
```
### Example 3: Document Processing
```python
from javanese_stemmer import JavaneseStemmerLibrary
stemmer = JavaneseStemmerLibrary()
document = """
Jawa iku basa sing sugih. Akeh wong nganggo basa Jawa
kanggo komunikasi saben dina. Basa Jawa nduweni tataran
krama lan ngoko.
"""
result = stemmer.process_document(document, detailed=True)
print(f"Total words: {result['document_stats']['total_words']}")
print(f"Unique stems: {result['document_stats']['unique_stems']}")
```
## ๐ง API Reference
### Main Functions
#### `stem_word(word: str) -> str`
Quickly stem a single word.
**Parameters:**
- `word` (str): Word to stem
**Returns:**
- str: Stemmed word
#### `stem_sentence(sentence: str) -> list`
Stem all words in a sentence.
**Parameters:**
- `sentence` (str): Sentence to stem
**Returns:**
- list: List of stemmed words
#### `stem_text(text: str) -> str`
Stem entire text while preserving structure.
**Parameters:**
- `text` (str): Text to stem
**Returns:**
- str: Text with all words stemmed
### Classes
#### `JavaneseStemmerLibrary`
Main stemmer class with Sastrawi-compatible interface.
**Methods:**
##### `stem(word: str) -> str`
Stem a single word.
##### `stem_kalimat(sentence: str) -> list`
Stem a sentence and return list of stemmed words.
##### `stem_text(text: str) -> str`
Stem text while preserving structure.
##### `stem_detailed(word: str) -> dict`
Get detailed morphological analysis.
**Returns:**
```python
{
'original': str,
'stem': str,
'morphemes': list,
'transformations': list,
'confidence': float
}
```
##### `process_document(text: str, detailed: bool = False) -> dict`
Process entire document with statistics.
**Returns:**
```python
{
'processed_sentences': list,
'document_stats': {
'total_words': int,
'unique_stems': int,
'total_sentences': int
}
}
```
#### `StemmerFactory`
Factory class for creating stemmer instances.
```python
from javanese_stemmer import StemmerFactory
stemmer = StemmerFactory.create_stemmer()
```
## ๐ฏ Supported Features
### Nasal Prefixes
- `ny-` (nyapu โ sapu, nyuci โ suci)
- `ng-` (nggawa โ gawa, nggambar โ gambar)
- `m-` (mangan โ pangan, mbanyu โ banyu)
- `n-` (nulis โ tulis, nduweni โ duweni)
- `nge-` (ngelak โ elak)
### Passive Prefixes
- `di-` (dipangan โ pangan)
- `dipun-` (dipunpangan โ pangan, formal)
- `ka-` (kapangan โ pangan, archaic)
- `ke-` (kepangan โ pangan, accidental)
### Active Prefixes (Person Marking)
- `tak-` (1st person informal: takpangan โ pangan)
- `dak-` (1st person very informal: dakpangan โ pangan)
- `kok-` (2nd person informal: kokpangan โ pangan)
### Other Prefixes
- `pa-` (pasugihan โ sugih)
- `pi-` (piandel โ andel)
- `pra-` (pramugari โ mugari)
- `sa-` (saperlu โ perlu)
- `tar-` (taruban โ uban)
### Suffixes
- `-ake` (panganiake โ pangan)
- `-e` (pangane โ pangan)
- `-i` (pangani โ pangan)
- `-an` (panganan โ pangan)
- `-en` (panganen โ pangan)
- `-na` (panganna โ pangan)
### Confixes (Prefix + Suffix combinations)
- `pa-...-an` (pasugihan โ sugih)
- `ka-...-an` (kabahagiaan โ bahagia)
- `di-...-ake` (dipanganiake โ pangan)
- `tak-...-ake` (takpanganiake โ pangan)
- `ke-...-an` (kepanasan โ panas)
- `sa-...-e` (sagedhene โ gedhe)
### Infixes
- `-in-` (tinulis โ tulis)
- `-um-` (kumaput โ kaput)
## ๐งช Technical Details
This stemmer implements:
- **Comprehensive morphophonological rules** for Javanese
- **Nasal assimilation patterns** (m, n, ny, ng)
- **Vowel harmony** considerations
- **Consonant cluster** handling
- **Etymology tracking**
- **Confidence scoring** for stem accuracy
- **Recursive affix** removal
- **Irregular word** handling
## ๐ป Requirements
- Python >= 3.7
- pandas >= 1.0.0
## ๐ License
MIT License
Copyright (c) 2025 Stevia Anlena Putri
## ๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## ๐ง Contact
For questions and feedback:
- Email: your.email@example.com
- GitHub: https://github.com/yourusername/javanese-stemmer
## ๐ Acknowledgments
Special thanks to the Javanese linguistics community and all contributors to Javanese NLP research.
## ๐ Citation
If you use this stemmer in your research, please cite:
```bibtex
@software{javanese_stemmer,
title={Javanese Stemmer: A Comprehensive Morphological Analyzer},
author={Stevia Anlena Putri},
year={2025},
url={https://github.com/yourusername/javanese-stemmer}
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/yourusername/javanese-stemmer",
"name": "javanese-stemmer",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "javanese, stemmer, nlp, natural language processing, indonesian, morphology",
"author": "Stevia Anlena Putri",
"author_email": "stevia.ap@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/dc/ab/12583ab6e04edc2b8e25eb57a33198bf0982413326b63848147b2d5c0c68/javanese_stemmer-1.0.0.tar.gz",
"platform": null,
"description": "# \ud83c\uddee\ud83c\udde9 Javanese Stemmer\r\n\r\nA comprehensive Javanese language stemmer with advanced morphophonological rules support.\r\n\r\n## \u2728 Features\r\n\r\n- \u2705 Comprehensive morphological analysis\r\n- \u2705 Nasal assimilation handling (ny, ng, m, n)\r\n- \u2705 Prefix, suffix, infix, and confix support\r\n- \u2705 Passive and active voice transformation\r\n- \u2705 Person marking (1st, 2nd person)\r\n- \u2705 Phonological rule application\r\n- \u2705 High accuracy stemming\r\n- \u2705 Sastrawi-compatible interface\r\n\r\n## \ud83d\udce6 Installation\r\n\r\n```bash\r\npip install javanese-stemmer\r\n```\r\n\r\n## \ud83d\ude80 Quick Start\r\n\r\n### Basic Stemming\r\n\r\n```python\r\nfrom javanese_stemmer import stem_word\r\n\r\n# Stem a single word\r\nresult = stem_word(\"mangan\")\r\nprint(result) # Output: \"pangan\"\r\n```\r\n\r\n### Using the Stemmer Class\r\n\r\n```python\r\nfrom javanese_stemmer import JavaneseStemmerLibrary\r\n\r\n# Initialize stemmer\r\nstemmer = JavaneseStemmerLibrary()\r\n\r\n# Stem words\r\nprint(stemmer.stem(\"mangan\")) # \"pangan\"\r\nprint(stemmer.stem(\"dipangan\")) # \"pangan\"\r\nprint(stemmer.stem(\"tetuku\")) # \"tuku\"\r\nprint(stemmer.stem(\"nyapu\")) # \"sapu\"\r\nprint(stemmer.stem(\"nggawa\")) # \"gawa\"\r\n```\r\n\r\n### Stem Sentences\r\n\r\n```python\r\nfrom javanese_stemmer import JavaneseStemmerLibrary\r\n\r\nstemmer = JavaneseStemmerLibrary()\r\n\r\nsentence = \"Aku mangan sega ing warung\"\r\nstemmed = stemmer.stem_kalimat(sentence)\r\nprint(stemmed)\r\n# Output: ['aku', 'pangan', 'sega', 'ing', 'warung']\r\n```\r\n\r\n### Stem Text\r\n\r\n```python\r\nfrom javanese_stemmer import stem_text\r\n\r\ntext = \"Dheweke lagi mangan panganan enak\"\r\nstemmed_text = stem_text(text)\r\nprint(stemmed_text)\r\n```\r\n\r\n### Detailed Morphological Analysis\r\n\r\n```python\r\nfrom javanese_stemmer import JavaneseStemmerLibrary\r\n\r\nstemmer = JavaneseStemmerLibrary()\r\n\r\n# Get detailed analysis\r\nanalysis = stemmer.stem_detailed(\"dipanganiake\")\r\nprint(f\"Original: {analysis['original']}\")\r\nprint(f\"Stem: {analysis['stem']}\")\r\nprint(f\"Morphemes: {analysis['morphemes']}\")\r\nprint(f\"Confidence: {analysis['confidence']}\")\r\n```\r\n\r\n## \ud83d\udcd6 Complete Examples\r\n\r\n### Example 1: Basic Word Stemming\r\n\r\n```python\r\nfrom javanese_stemmer import stem_word\r\n\r\nwords = [\"mangan\", \"dipangan\", \"panganan\", \"takpanganiake\"]\r\nfor word in words:\r\n print(f\"{word:20} \u2192 {stem_word(word)}\")\r\n\r\n# Output:\r\n# mangan \u2192 pangan\r\n# dipangan \u2192 pangan\r\n# panganan \u2192 pangan\r\n# takpanganiake \u2192 pangan\r\n```\r\n\r\n### Example 2: Batch Processing\r\n\r\n```python\r\nfrom javanese_stemmer import JavaneseStemmerLibrary\r\n\r\nstemmer = JavaneseStemmerLibrary()\r\n\r\nsentences = [\r\n \"Aku mangan sega\",\r\n \"Dheweke lagi turu\",\r\n \"Bocah-bocah dolanan ing alun-alun\"\r\n]\r\n\r\nfor sentence in sentences:\r\n stemmed = stemmer.stem_kalimat(sentence)\r\n print(f\"{sentence:40} \u2192 {' '.join(stemmed)}\")\r\n```\r\n\r\n### Example 3: Document Processing\r\n\r\n```python\r\nfrom javanese_stemmer import JavaneseStemmerLibrary\r\n\r\nstemmer = JavaneseStemmerLibrary()\r\n\r\ndocument = \"\"\"\r\nJawa iku basa sing sugih. Akeh wong nganggo basa Jawa\r\nkanggo komunikasi saben dina. Basa Jawa nduweni tataran\r\nkrama lan ngoko.\r\n\"\"\"\r\n\r\nresult = stemmer.process_document(document, detailed=True)\r\nprint(f\"Total words: {result['document_stats']['total_words']}\")\r\nprint(f\"Unique stems: {result['document_stats']['unique_stems']}\")\r\n```\r\n\r\n## \ud83d\udd27 API Reference\r\n\r\n### Main Functions\r\n\r\n#### `stem_word(word: str) -> str`\r\nQuickly stem a single word.\r\n\r\n**Parameters:**\r\n- `word` (str): Word to stem\r\n\r\n**Returns:**\r\n- str: Stemmed word\r\n\r\n#### `stem_sentence(sentence: str) -> list`\r\nStem all words in a sentence.\r\n\r\n**Parameters:**\r\n- `sentence` (str): Sentence to stem\r\n\r\n**Returns:**\r\n- list: List of stemmed words\r\n\r\n#### `stem_text(text: str) -> str`\r\nStem entire text while preserving structure.\r\n\r\n**Parameters:**\r\n- `text` (str): Text to stem\r\n\r\n**Returns:**\r\n- str: Text with all words stemmed\r\n\r\n### Classes\r\n\r\n#### `JavaneseStemmerLibrary`\r\nMain stemmer class with Sastrawi-compatible interface.\r\n\r\n**Methods:**\r\n\r\n##### `stem(word: str) -> str`\r\nStem a single word.\r\n\r\n##### `stem_kalimat(sentence: str) -> list`\r\nStem a sentence and return list of stemmed words.\r\n\r\n##### `stem_text(text: str) -> str`\r\nStem text while preserving structure.\r\n\r\n##### `stem_detailed(word: str) -> dict`\r\nGet detailed morphological analysis.\r\n\r\n**Returns:**\r\n```python\r\n{\r\n 'original': str,\r\n 'stem': str,\r\n 'morphemes': list,\r\n 'transformations': list,\r\n 'confidence': float\r\n}\r\n```\r\n\r\n##### `process_document(text: str, detailed: bool = False) -> dict`\r\nProcess entire document with statistics.\r\n\r\n**Returns:**\r\n```python\r\n{\r\n 'processed_sentences': list,\r\n 'document_stats': {\r\n 'total_words': int,\r\n 'unique_stems': int,\r\n 'total_sentences': int\r\n }\r\n}\r\n```\r\n\r\n#### `StemmerFactory`\r\nFactory class for creating stemmer instances.\r\n\r\n```python\r\nfrom javanese_stemmer import StemmerFactory\r\n\r\nstemmer = StemmerFactory.create_stemmer()\r\n```\r\n\r\n## \ud83c\udfaf Supported Features\r\n\r\n### Nasal Prefixes\r\n- `ny-` (nyapu \u2192 sapu, nyuci \u2192 suci)\r\n- `ng-` (nggawa \u2192 gawa, nggambar \u2192 gambar)\r\n- `m-` (mangan \u2192 pangan, mbanyu \u2192 banyu)\r\n- `n-` (nulis \u2192 tulis, nduweni \u2192 duweni)\r\n- `nge-` (ngelak \u2192 elak)\r\n\r\n### Passive Prefixes\r\n- `di-` (dipangan \u2192 pangan)\r\n- `dipun-` (dipunpangan \u2192 pangan, formal)\r\n- `ka-` (kapangan \u2192 pangan, archaic)\r\n- `ke-` (kepangan \u2192 pangan, accidental)\r\n\r\n### Active Prefixes (Person Marking)\r\n- `tak-` (1st person informal: takpangan \u2192 pangan)\r\n- `dak-` (1st person very informal: dakpangan \u2192 pangan)\r\n- `kok-` (2nd person informal: kokpangan \u2192 pangan)\r\n\r\n### Other Prefixes\r\n- `pa-` (pasugihan \u2192 sugih)\r\n- `pi-` (piandel \u2192 andel)\r\n- `pra-` (pramugari \u2192 mugari)\r\n- `sa-` (saperlu \u2192 perlu)\r\n- `tar-` (taruban \u2192 uban)\r\n\r\n### Suffixes\r\n- `-ake` (panganiake \u2192 pangan)\r\n- `-e` (pangane \u2192 pangan)\r\n- `-i` (pangani \u2192 pangan)\r\n- `-an` (panganan \u2192 pangan)\r\n- `-en` (panganen \u2192 pangan)\r\n- `-na` (panganna \u2192 pangan)\r\n\r\n### Confixes (Prefix + Suffix combinations)\r\n- `pa-...-an` (pasugihan \u2192 sugih)\r\n- `ka-...-an` (kabahagiaan \u2192 bahagia)\r\n- `di-...-ake` (dipanganiake \u2192 pangan)\r\n- `tak-...-ake` (takpanganiake \u2192 pangan)\r\n- `ke-...-an` (kepanasan \u2192 panas)\r\n- `sa-...-e` (sagedhene \u2192 gedhe)\r\n\r\n### Infixes\r\n- `-in-` (tinulis \u2192 tulis)\r\n- `-um-` (kumaput \u2192 kaput)\r\n\r\n## \ud83e\uddea Technical Details\r\n\r\nThis stemmer implements:\r\n- **Comprehensive morphophonological rules** for Javanese\r\n- **Nasal assimilation patterns** (m, n, ny, ng)\r\n- **Vowel harmony** considerations\r\n- **Consonant cluster** handling\r\n- **Etymology tracking**\r\n- **Confidence scoring** for stem accuracy\r\n- **Recursive affix** removal\r\n- **Irregular word** handling\r\n\r\n## \ud83d\udcbb Requirements\r\n\r\n- Python >= 3.7\r\n- pandas >= 1.0.0\r\n\r\n## \ud83d\udcdd License\r\n\r\nMIT License\r\n\r\nCopyright (c) 2025 Stevia Anlena Putri\r\n\r\n## \ud83e\udd1d Contributing\r\n\r\nContributions are welcome! Please feel free to submit a Pull Request.\r\n\r\n## \ud83d\udce7 Contact\r\n\r\nFor questions and feedback:\r\n- Email: your.email@example.com\r\n- GitHub: https://github.com/yourusername/javanese-stemmer\r\n\r\n## \ud83d\ude4f Acknowledgments\r\n\r\nSpecial thanks to the Javanese linguistics community and all contributors to Javanese NLP research.\r\n\r\n## \ud83d\udcda Citation\r\n\r\nIf you use this stemmer in your research, please cite:\r\n\r\n```bibtex\r\n@software{javanese_stemmer,\r\n title={Javanese Stemmer: A Comprehensive Morphological Analyzer},\r\n author={Stevia Anlena Putri},\r\n year={2025},\r\n url={https://github.com/yourusername/javanese-stemmer}\r\n}\r\n```\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A comprehensive Javanese language stemmer with morphophonological rules",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/yourusername/javanese-stemmer"
},
"split_keywords": [
"javanese",
" stemmer",
" nlp",
" natural language processing",
" indonesian",
" morphology"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "4ff7aeb7303dc687763ba736a269ac5d5b07517baed7cd5d8c235e8742352f6c",
"md5": "2db72dc589cc71f365ba6e6a9cf4f79b",
"sha256": "a14e4bbba23651f001ec7cc2f1a67c0d35d1fbcbd4ad09f8e956f2e1e19207ef"
},
"downloads": -1,
"filename": "javanese_stemmer-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2db72dc589cc71f365ba6e6a9cf4f79b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 19412,
"upload_time": "2025-10-24T22:23:54",
"upload_time_iso_8601": "2025-10-24T22:23:54.233618Z",
"url": "https://files.pythonhosted.org/packages/4f/f7/aeb7303dc687763ba736a269ac5d5b07517baed7cd5d8c235e8742352f6c/javanese_stemmer-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "dcab12583ab6e04edc2b8e25eb57a33198bf0982413326b63848147b2d5c0c68",
"md5": "582fe80c533009db0db886f4f0153602",
"sha256": "bb722b4d4f4a9d9532c25af016d4d7a2aaf2e5d99fbb6920f967fb32ab8ab6a7"
},
"downloads": -1,
"filename": "javanese_stemmer-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "582fe80c533009db0db886f4f0153602",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 22247,
"upload_time": "2025-10-24T22:23:55",
"upload_time_iso_8601": "2025-10-24T22:23:55.851213Z",
"url": "https://files.pythonhosted.org/packages/dc/ab/12583ab6e04edc2b8e25eb57a33198bf0982413326b63848147b2d5c0c68/javanese_stemmer-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-24 22:23:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yourusername",
"github_project": "javanese-stemmer",
"github_not_found": true,
"lcname": "javanese-stemmer"
}