Name | jaziw JSON |
Version |
0.0.2
JSON |
| download |
home_page | None |
Summary | Special opportunities for the Karakalpak language |
upload_time | 2024-08-02 14:43:40 |
maintainer | None |
docs_url | None |
author | Turdıbek Jumabaev |
requires_python | None |
license | None |
keywords |
karakalpak
language
python
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Tokens Example
```python
from jaziw import tokenize
text = "Asslawma áleykum! Aqılı pútin, ziyalı qáwim."
tokens = tokenize(text)
print(tokens)
# Result
# [Substring(start=0, stop=8, text='Asslawma'), Substring(start=9, stop=16, text='áleykum'), Substring(start=16, stop=17, text='!'), Substring(start=18, stop=23, text='Aqılı'), Substring(start=24, stop=29, text='pútin'), Substring(start=31, stop=37, text='ziyalı'), Substring(start=38, stop=43, text='qáwim'), Substring(start=43, stop=44, text='.')]
```
# Sentences Example
```python
from jaziw import sentenize
text = "Asslawma áleykum! Aqılı pútin, ziyalı qáwim."
sentences = sentenize(text)
print(sentences)
# Result
# [Substring(start=0, stop=17, text='Asslawma áleykum!'), Substring(start=18, stop=44, text='Aqılı pútin, ziyalı qáwim.')]
```
# Normalize Example
```python
from jaziw import normalize
bad_text = """– Há, júwermek qatqır-aw, tursań bolmay ma,
mollańa keshiktiń ǵoy! – degen anamnıń sesi tatlı uyqımdı bóldi.
Ol bunday sózdi tek ashıwı kelgende ǵana aytatuǵın edi.
Anamnıń ǵarǵısınan beter «molla» degen sózin esitkennen-aq, quyqam juwlap, uyqım shayday ashıldı. Qorqınısh denemdi qaplap turdı. Bunnan 7-8 kún burın pallaqqa asılǵandaǵı tut shıbıqtan tilingen eki ayaǵım sol qálpinde matalıp jatqanday bolıp sezildi. Haqıyqatında da, ayaǵımnıń tilikleri pite qoyǵan joq edi. Aqsap zorǵa júretuǵın edim.
"""
print(normalize(bad_text))
# Result
# – Há, júwermek qatqır-aw, tursań bolmay ma, mollańa keshiktiń ǵoy! – degen anamnıń sesi tatlı uyqımdı bóldi. Ol bunday sózdi tek ashıwı kelgende ǵana aytatuǵın edi. Anamnıń ǵarǵısınan beter «molla» degen sózin esitkennen-aq, quyqam juwlap, uyqım shayday ashıldı. Qorqınısh denemdi qaplap turdı. Bunnan 7-8 kún burın pallaqqa asılǵandaǵı tut shıbıqtan tilingen eki ayaǵım sol qálpinde matalıp jatqanday bolıp sezildi. Haqıyqatında da, ayaǵımnıń tilikleri pite qoyǵan joq edi. Aqsap zorǵa júretuǵın edim.
```
# Recommended
```python
from jaziw import normalize, tokenize, sentenize
filename = "a-shamuratov-lat.txt"
with open(filename, "r", encoding="utf-8") as file:
text = file.read()
normalized_text = normalize(text)
tokenized_text = tokenize(normalized_text)
sentenized_text = sentenize(normalized_text)
with open("normalized-" + filename, "w", encoding="utf-8") as new_file:
new_file.write(text)
```
Raw data
{
"_id": null,
"home_page": null,
"name": "jaziw",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "karakalpak, language, python",
"author": "Turd\u0131bek Jumabaev",
"author_email": "<turdibekjumabaev05@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/28/5c/c32f48f916872d5db0a0dec0ea821b4490f9e9b9f67f2eeaab9e78a43eb0/jaziw-0.0.2.tar.gz",
"platform": null,
"description": "\r\n# Tokens Example\r\n\r\n\r\n\r\n```python\r\n\r\nfrom jaziw import tokenize\r\n\r\n\r\n\r\ntext = \"Asslawma \u00e1leykum! Aq\u0131l\u0131 p\u00fatin, ziyal\u0131 q\u00e1wim.\"\r\n\r\ntokens = tokenize(text)\r\n\r\nprint(tokens)\r\n\r\n\r\n\r\n# Result\r\n\r\n# [Substring(start=0, stop=8, text='Asslawma'), Substring(start=9, stop=16, text='\u00e1leykum'), Substring(start=16, stop=17, text='!'), Substring(start=18, stop=23, text='Aq\u0131l\u0131'), Substring(start=24, stop=29, text='p\u00fatin'), Substring(start=31, stop=37, text='ziyal\u0131'), Substring(start=38, stop=43, text='q\u00e1wim'), Substring(start=43, stop=44, text='.')]\r\n\r\n```\r\n\r\n\r\n\r\n# Sentences Example\r\n\r\n\r\n\r\n```python\r\n\r\nfrom jaziw import sentenize\r\n\r\n\r\n\r\ntext = \"Asslawma \u00e1leykum! Aq\u0131l\u0131 p\u00fatin, ziyal\u0131 q\u00e1wim.\"\r\n\r\nsentences = sentenize(text)\r\n\r\nprint(sentences)\r\n\r\n\r\n\r\n# Result\r\n\r\n# [Substring(start=0, stop=17, text='Asslawma \u00e1leykum!'), Substring(start=18, stop=44, text='Aq\u0131l\u0131 p\u00fatin, ziyal\u0131 q\u00e1wim.')]\r\n\r\n```\r\n\r\n\r\n\r\n# Normalize Example\r\n\r\n\r\n\r\n```python\r\n\r\nfrom jaziw import normalize\r\n\r\n\r\n\r\nbad_text = \"\"\"\u2013 H\u00e1, j\u00fawermek qatq\u0131r-aw, tursa\u0144 bolmay ma, \r\n\r\nmolla\u0144a keshikti\u0144 \u01f5oy! \u2013 degen anamn\u0131\u0144 sesi tatl\u0131 uyq\u0131md\u0131 b\u00f3ldi. \r\n\r\n\r\n\r\nOl bunday s\u00f3zdi tek ash\u0131w\u0131 kelgende \u01f5ana aytatu\u01f5\u0131n edi. \r\n\r\n Anamn\u0131\u0144 \u01f5ar\u01f5\u0131s\u0131nan beter \u00abmolla\u00bb degen s\u00f3zin esitkennen-aq, quyqam juwlap, uyq\u0131m shayday ash\u0131ld\u0131. Qorq\u0131n\u0131sh denemdi qaplap turd\u0131. Bunnan 7-8 k\u00fan bur\u0131n pallaqqa as\u0131l\u01f5anda\u01f5\u0131 tut sh\u0131b\u0131qtan tilingen eki aya\u01f5\u0131m sol q\u00e1lpinde matal\u0131p jatqanday bol\u0131p sezildi. Haq\u0131yqat\u0131nda da, aya\u01f5\u0131mn\u0131\u0144 tilikleri pite qoy\u01f5an joq edi. Aqsap zor\u01f5a j\u00faretu\u01f5\u0131n edim.\r\n\r\n\"\"\"\r\n\r\n\r\n\r\nprint(normalize(bad_text))\r\n\r\n\r\n\r\n# Result\r\n\r\n# \u2013 H\u00e1, j\u00fawermek qatq\u0131r-aw, tursa\u0144 bolmay ma, molla\u0144a keshikti\u0144 \u01f5oy! \u2013 degen anamn\u0131\u0144 sesi tatl\u0131 uyq\u0131md\u0131 b\u00f3ldi. Ol bunday s\u00f3zdi tek ash\u0131w\u0131 kelgende \u01f5ana aytatu\u01f5\u0131n edi. Anamn\u0131\u0144 \u01f5ar\u01f5\u0131s\u0131nan beter \u00abmolla\u00bb degen s\u00f3zin esitkennen-aq, quyqam juwlap, uyq\u0131m shayday ash\u0131ld\u0131. Qorq\u0131n\u0131sh denemdi qaplap turd\u0131. Bunnan 7-8 k\u00fan bur\u0131n pallaqqa as\u0131l\u01f5anda\u01f5\u0131 tut sh\u0131b\u0131qtan tilingen eki aya\u01f5\u0131m sol q\u00e1lpinde matal\u0131p jatqanday bol\u0131p sezildi. Haq\u0131yqat\u0131nda da, aya\u01f5\u0131mn\u0131\u0144 tilikleri pite qoy\u01f5an joq edi. Aqsap zor\u01f5a j\u00faretu\u01f5\u0131n edim.\r\n\r\n```\r\n\r\n\r\n\r\n# Recommended\r\n\r\n```python\r\n\r\nfrom jaziw import normalize, tokenize, sentenize\r\n\r\n\r\n\r\n\r\n\r\nfilename = \"a-shamuratov-lat.txt\"\r\n\r\nwith open(filename, \"r\", encoding=\"utf-8\") as file:\r\n\r\n text = file.read()\r\n\r\n \r\n\r\n normalized_text = normalize(text)\r\n\r\n tokenized_text = tokenize(normalized_text)\r\n\r\n sentenized_text = sentenize(normalized_text)\r\n\r\n\r\n\r\n with open(\"normalized-\" + filename, \"w\", encoding=\"utf-8\") as new_file:\r\n\r\n new_file.write(text)\r\n\r\n```\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Special opportunities for the Karakalpak language",
"version": "0.0.2",
"project_urls": null,
"split_keywords": [
"karakalpak",
" language",
" python"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "58b47133290cb1863c5805d7fc27ccc6ae3a0372e7bf9c642c690f58fecede58",
"md5": "edfea055c69d2e9f9de9b87e9525e045",
"sha256": "dc5ab092a6e49df4e1e8f60a9c4a147cc9dc9d7dc24a08d8ef371842f0770e2e"
},
"downloads": -1,
"filename": "jaziw-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "edfea055c69d2e9f9de9b87e9525e045",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 4366,
"upload_time": "2024-08-02T14:43:38",
"upload_time_iso_8601": "2024-08-02T14:43:38.860373Z",
"url": "https://files.pythonhosted.org/packages/58/b4/7133290cb1863c5805d7fc27ccc6ae3a0372e7bf9c642c690f58fecede58/jaziw-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "285cc32f48f916872d5db0a0dec0ea821b4490f9e9b9f67f2eeaab9e78a43eb0",
"md5": "8555dcf612af25642a6bfe3c71eea6ca",
"sha256": "7228f877622a5f215f0a9da3de076a65f1850ad4431d14983bace06bef487e33"
},
"downloads": -1,
"filename": "jaziw-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "8555dcf612af25642a6bfe3c71eea6ca",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 3729,
"upload_time": "2024-08-02T14:43:40",
"upload_time_iso_8601": "2024-08-02T14:43:40.114335Z",
"url": "https://files.pythonhosted.org/packages/28/5c/c32f48f916872d5db0a0dec0ea821b4490f9e9b9f67f2eeaab9e78a43eb0/jaziw-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-02 14:43:40",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "jaziw"
}