# A language processing tool for Sinhalese (සිංහල).
`Update 2020.11.01: Fixed pypi package. Use 'pip install sinling' to install sinling directly from repository.`
`Update 2020.08.16: Add pypi package @ https://pypi.org/project/sinling/.`
`Update 2020.08.16: Integrated Part of speech tagger and stemmer tool.`
`Update 2019.07.21: This tool no longer requires java to run sinhala tokenizer.
All java code is ported to Python implementation for convenience.`
[](https://mybinder.org/v2/gh/ysenarath/sinling.git/master?filepath=notebooks%2Fexamples.ipynb)
[](https://badge.fury.io/py/sinling)
## Installation
Run the following command in your virtualenv to install this package.
`pip install sinling`
## How to use
### Sinhala Tokenizer
```python
from sinling import SinhalaTokenizer
tokenizer = SinhalaTokenizer()
sentence = '...' # your sentence
tokenizer.tokenize(sentence)
```
### Sinhala Stemmer (Experimental)
```python
from sinling import SinhalaStemmer
stemmer = SinhalaStemmer()
word = '...' # your sentence
stemmer.stem(word)
```
Please cite [sinhala-stemmer](https://github.com/rksk/sinhala-news-analysis/tree/master/sinhala-stemmer) if you are using this implementation.
### Part-of-Speech Tagger
```python
from sinling import SinhalaTokenizer, POSTagger
tokenizer = SinhalaTokenizer()
document = '...' # may contain multiple sentences
tokenized_sentences = [tokenizer.tokenize(f'{ss}.') for ss in tokenizer.split_sentences(document)]
tagger = POSTagger()
pos_tags = tagger.predict(tokenized_sentences)
```
### Word Joiner (Morphological Joiner)
```python
from sinling import preprocess, word_joiner
w1 = preprocess('මුනි')
w2 = preprocess('උතුමා')
results = word_joiner.join(w1, w2)
# Returns a list of possible results after applying join rules ['මුනිතුමා', ...]
```
### Word Splitter (Morphological Splitter) / corpus based - *experimental*
```python
from sinling import word_splitter
word = '...'
results = word_splitter.split(word)
# Returns a dict containing debug information, base word and affix
```
Visit [here](https://github.com/ysenarath/sinling/blob/master/scripts/splitter.ipynb) to see some sample splits.
## Contributions
- Contact `wayasas.13@cse.mrt.ac.lk` if you would like to contribute to this project.
## License
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
Raw data
{
"_id": null,
"home_page": "https://github.com/ysenarath/sinling",
"name": "sinling",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "",
"author": "Yasas Senarath",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/fc/42/b512f34da021decfed3b04fc21b806f96b71d6b76e5e211d492dccd38fe6/sinling-0.3.6.tar.gz",
"platform": "",
"description": "# A language processing tool for Sinhalese (\u0dc3\u0dd2\u0d82\u0dc4\u0dbd). \n\n`Update 2020.11.01: Fixed pypi package. Use 'pip install sinling' to install sinling directly from repository.`\n\n`Update 2020.08.16: Add pypi package @ https://pypi.org/project/sinling/.`\n\n`Update 2020.08.16: Integrated Part of speech tagger and stemmer tool.`\n\n`Update 2019.07.21: This tool no longer requires java to run sinhala tokenizer. \nAll java code is ported to Python implementation for convenience.`\n\n[](https://mybinder.org/v2/gh/ysenarath/sinling.git/master?filepath=notebooks%2Fexamples.ipynb)\n[](https://badge.fury.io/py/sinling)\n\n## Installation\n\nRun the following command in your virtualenv to install this package.\n\n`pip install sinling`\n\n## How to use\n### Sinhala Tokenizer\n```python\nfrom sinling import SinhalaTokenizer\n\ntokenizer = SinhalaTokenizer()\n\nsentence = '...' # your sentence\n\ntokenizer.tokenize(sentence)\n```\n\n### Sinhala Stemmer (Experimental)\n```python\nfrom sinling import SinhalaStemmer\n\nstemmer = SinhalaStemmer()\n\nword = '...' # your sentence\n\nstemmer.stem(word)\n```\n\nPlease cite [sinhala-stemmer](https://github.com/rksk/sinhala-news-analysis/tree/master/sinhala-stemmer) if you are using this implementation.\n\n### Part-of-Speech Tagger\n\n```python\nfrom sinling import SinhalaTokenizer, POSTagger\n\ntokenizer = SinhalaTokenizer()\n\ndocument = '...' # may contain multiple sentences\n\ntokenized_sentences = [tokenizer.tokenize(f'{ss}.') for ss in tokenizer.split_sentences(document)]\n\ntagger = POSTagger()\n\npos_tags = tagger.predict(tokenized_sentences)\n```\n\n### Word Joiner (Morphological Joiner)\n```python\nfrom sinling import preprocess, word_joiner\n\nw1 = preprocess('\u0db8\u0dd4\u0db1\u0dd2')\nw2 = preprocess('\u0d8b\u0dad\u0dd4\u0db8\u0dcf')\nresults = word_joiner.join(w1, w2)\n# Returns a list of possible results after applying join rules ['\u0db8\u0dd4\u0db1\u0dd2\u0dad\u0dd4\u0db8\u0dcf', ...]\n```\n\n### Word Splitter (Morphological Splitter) / corpus based - *experimental*\n```python\nfrom sinling import word_splitter\n\nword = '...'\nresults = word_splitter.split(word)\n# Returns a dict containing debug information, base word and affix\n```\n\nVisit [here](https://github.com/ysenarath/sinling/blob/master/scripts/splitter.ipynb) to see some sample splits.\n\n## Contributions\n- Contact `wayasas.13@cse.mrt.ac.lk` if you would like to contribute to this project.\n\n## License\nApache License\nVersion 2.0, January 2004\nhttp://www.apache.org/licenses/\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "A language processing tool for Sinhalese (\u0dc3\u0dd2\u0d82\u0dc4\u0dbd)",
"version": "0.3.6",
"project_urls": {
"Homepage": "https://github.com/ysenarath/sinling"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "23bf43a39e626dfd56002d74c646cc4ab88b396c17e889bad03e4a27fbc10265",
"md5": "b2ab74213d99462634f0bbba226b9379",
"sha256": "eb3a58ede6531edd9865c8ae4f39ab74cb044bdc896848a0c088c836a91bb3cc"
},
"downloads": -1,
"filename": "sinling-0.3.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b2ab74213d99462634f0bbba226b9379",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 20018406,
"upload_time": "2020-11-08T00:02:45",
"upload_time_iso_8601": "2020-11-08T00:02:45.364387Z",
"url": "https://files.pythonhosted.org/packages/23/bf/43a39e626dfd56002d74c646cc4ab88b396c17e889bad03e4a27fbc10265/sinling-0.3.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "fc42b512f34da021decfed3b04fc21b806f96b71d6b76e5e211d492dccd38fe6",
"md5": "6584b3ed1312b1e31da7d65aacebad4b",
"sha256": "a0c9cbd49823aab972b5ad059bb02bd315eff6dea480fc8025324b648919af93"
},
"downloads": -1,
"filename": "sinling-0.3.6.tar.gz",
"has_sig": false,
"md5_digest": "6584b3ed1312b1e31da7d65aacebad4b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 19630223,
"upload_time": "2020-11-08T00:02:47",
"upload_time_iso_8601": "2020-11-08T00:02:47.798836Z",
"url": "https://files.pythonhosted.org/packages/fc/42/b512f34da021decfed3b04fc21b806f96b71d6b76e5e211d492dccd38fe6/sinling-0.3.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2020-11-08 00:02:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ysenarath",
"github_project": "sinling",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "sinling"
}