jaziw


Namejaziw JSON
Version 0.0.2 PyPI version JSON
download
home_pageNone
SummarySpecial opportunities for the Karakalpak language
upload_time2024-08-02 14:43:40
maintainerNone
docs_urlNone
authorTurdıbek Jumabaev
requires_pythonNone
licenseNone
keywords karakalpak language python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# Tokens Example



```python

from jaziw import tokenize



text = "Asslawma áleykum! Aqılı pútin, ziyalı qáwim."

tokens = tokenize(text)

print(tokens)



# Result

# [Substring(start=0, stop=8, text='Asslawma'), Substring(start=9, stop=16, text='áleykum'), Substring(start=16, stop=17, text='!'), Substring(start=18, stop=23, text='Aqılı'), Substring(start=24, stop=29, text='pútin'), Substring(start=31, stop=37, text='ziyalı'), Substring(start=38, stop=43, text='qáwim'), Substring(start=43, stop=44, text='.')]

```



# Sentences Example



```python

from jaziw import sentenize



text = "Asslawma áleykum! Aqılı pútin, ziyalı qáwim."

sentences = sentenize(text)

print(sentences)



# Result

# [Substring(start=0, stop=17, text='Asslawma áleykum!'), Substring(start=18, stop=44, text='Aqılı pútin, ziyalı qáwim.')]

```



# Normalize Example



```python

from jaziw import normalize



bad_text = """– Há, júwermek qatqır-aw, tursań bolmay ma, 

mollańa keshiktiń ǵoy! – degen anamnıń sesi tatlı uyqımdı bóldi. 



Ol bunday sózdi tek ashıwı kelgende ǵana aytatuǵın edi. 

                Anamnıń ǵarǵısınan beter «molla» degen sózin esitkennen-aq, quyqam juwlap, uyqım shayday ashıldı. Qorqınısh denemdi qaplap turdı. Bunnan 7-8 kún burın pallaqqa asılǵandaǵı tut shıbıqtan tilingen eki ayaǵım sol qálpinde matalıp jatqanday bolıp sezildi. Haqıyqatında da, ayaǵımnıń tilikleri pite qoyǵan joq edi. Aqsap zorǵa júretuǵın edim.

"""



print(normalize(bad_text))



# Result

# – Há, júwermek qatqır-aw, tursań bolmay ma, mollańa keshiktiń ǵoy! – degen anamnıń sesi tatlı uyqımdı bóldi. Ol bunday sózdi tek ashıwı kelgende ǵana aytatuǵın edi. Anamnıń ǵarǵısınan beter «molla» degen sózin esitkennen-aq, quyqam juwlap, uyqım shayday ashıldı. Qorqınısh denemdi qaplap turdı. Bunnan 7-8 kún burın pallaqqa asılǵandaǵı tut shıbıqtan tilingen eki ayaǵım sol qálpinde matalıp jatqanday bolıp sezildi. Haqıyqatında da, ayaǵımnıń tilikleri pite qoyǵan joq edi. Aqsap zorǵa júretuǵın edim.

```



# Recommended

```python

from jaziw import normalize, tokenize, sentenize





filename = "a-shamuratov-lat.txt"

with open(filename, "r", encoding="utf-8") as file:

    text = file.read()

    

    normalized_text = normalize(text)

    tokenized_text = tokenize(normalized_text)

    sentenized_text = sentenize(normalized_text)



    with open("normalized-" + filename, "w", encoding="utf-8") as new_file:

        new_file.write(text)

```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "jaziw",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "karakalpak, language, python",
    "author": "Turd\u0131bek Jumabaev",
    "author_email": "<turdibekjumabaev05@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/28/5c/c32f48f916872d5db0a0dec0ea821b4490f9e9b9f67f2eeaab9e78a43eb0/jaziw-0.0.2.tar.gz",
    "platform": null,
    "description": "\r\n# Tokens Example\r\n\r\n\r\n\r\n```python\r\n\r\nfrom jaziw import tokenize\r\n\r\n\r\n\r\ntext = \"Asslawma \u00e1leykum! Aq\u0131l\u0131 p\u00fatin, ziyal\u0131 q\u00e1wim.\"\r\n\r\ntokens = tokenize(text)\r\n\r\nprint(tokens)\r\n\r\n\r\n\r\n# Result\r\n\r\n# [Substring(start=0, stop=8, text='Asslawma'), Substring(start=9, stop=16, text='\u00e1leykum'), Substring(start=16, stop=17, text='!'), Substring(start=18, stop=23, text='Aq\u0131l\u0131'), Substring(start=24, stop=29, text='p\u00fatin'), Substring(start=31, stop=37, text='ziyal\u0131'), Substring(start=38, stop=43, text='q\u00e1wim'), Substring(start=43, stop=44, text='.')]\r\n\r\n```\r\n\r\n\r\n\r\n# Sentences Example\r\n\r\n\r\n\r\n```python\r\n\r\nfrom jaziw import sentenize\r\n\r\n\r\n\r\ntext = \"Asslawma \u00e1leykum! Aq\u0131l\u0131 p\u00fatin, ziyal\u0131 q\u00e1wim.\"\r\n\r\nsentences = sentenize(text)\r\n\r\nprint(sentences)\r\n\r\n\r\n\r\n# Result\r\n\r\n# [Substring(start=0, stop=17, text='Asslawma \u00e1leykum!'), Substring(start=18, stop=44, text='Aq\u0131l\u0131 p\u00fatin, ziyal\u0131 q\u00e1wim.')]\r\n\r\n```\r\n\r\n\r\n\r\n# Normalize Example\r\n\r\n\r\n\r\n```python\r\n\r\nfrom jaziw import normalize\r\n\r\n\r\n\r\nbad_text = \"\"\"\u2013 H\u00e1, j\u00fawermek qatq\u0131r-aw, tursa\u0144 bolmay ma, \r\n\r\nmolla\u0144a keshikti\u0144 \u01f5oy! \u2013 degen anamn\u0131\u0144 sesi tatl\u0131 uyq\u0131md\u0131 b\u00f3ldi. \r\n\r\n\r\n\r\nOl bunday s\u00f3zdi tek ash\u0131w\u0131 kelgende \u01f5ana aytatu\u01f5\u0131n edi. \r\n\r\n                Anamn\u0131\u0144 \u01f5ar\u01f5\u0131s\u0131nan beter \u00abmolla\u00bb degen s\u00f3zin esitkennen-aq, quyqam juwlap, uyq\u0131m shayday ash\u0131ld\u0131. Qorq\u0131n\u0131sh denemdi qaplap turd\u0131. Bunnan 7-8 k\u00fan bur\u0131n pallaqqa as\u0131l\u01f5anda\u01f5\u0131 tut sh\u0131b\u0131qtan tilingen eki aya\u01f5\u0131m sol q\u00e1lpinde matal\u0131p jatqanday bol\u0131p sezildi. Haq\u0131yqat\u0131nda da, aya\u01f5\u0131mn\u0131\u0144 tilikleri pite qoy\u01f5an joq edi. Aqsap zor\u01f5a j\u00faretu\u01f5\u0131n edim.\r\n\r\n\"\"\"\r\n\r\n\r\n\r\nprint(normalize(bad_text))\r\n\r\n\r\n\r\n# Result\r\n\r\n# \u2013 H\u00e1, j\u00fawermek qatq\u0131r-aw, tursa\u0144 bolmay ma, molla\u0144a keshikti\u0144 \u01f5oy! \u2013 degen anamn\u0131\u0144 sesi tatl\u0131 uyq\u0131md\u0131 b\u00f3ldi. Ol bunday s\u00f3zdi tek ash\u0131w\u0131 kelgende \u01f5ana aytatu\u01f5\u0131n edi. Anamn\u0131\u0144 \u01f5ar\u01f5\u0131s\u0131nan beter \u00abmolla\u00bb degen s\u00f3zin esitkennen-aq, quyqam juwlap, uyq\u0131m shayday ash\u0131ld\u0131. Qorq\u0131n\u0131sh denemdi qaplap turd\u0131. Bunnan 7-8 k\u00fan bur\u0131n pallaqqa as\u0131l\u01f5anda\u01f5\u0131 tut sh\u0131b\u0131qtan tilingen eki aya\u01f5\u0131m sol q\u00e1lpinde matal\u0131p jatqanday bol\u0131p sezildi. Haq\u0131yqat\u0131nda da, aya\u01f5\u0131mn\u0131\u0144 tilikleri pite qoy\u01f5an joq edi. Aqsap zor\u01f5a j\u00faretu\u01f5\u0131n edim.\r\n\r\n```\r\n\r\n\r\n\r\n# Recommended\r\n\r\n```python\r\n\r\nfrom jaziw import normalize, tokenize, sentenize\r\n\r\n\r\n\r\n\r\n\r\nfilename = \"a-shamuratov-lat.txt\"\r\n\r\nwith open(filename, \"r\", encoding=\"utf-8\") as file:\r\n\r\n    text = file.read()\r\n\r\n    \r\n\r\n    normalized_text = normalize(text)\r\n\r\n    tokenized_text = tokenize(normalized_text)\r\n\r\n    sentenized_text = sentenize(normalized_text)\r\n\r\n\r\n\r\n    with open(\"normalized-\" + filename, \"w\", encoding=\"utf-8\") as new_file:\r\n\r\n        new_file.write(text)\r\n\r\n```\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Special opportunities for the Karakalpak language",
    "version": "0.0.2",
    "project_urls": null,
    "split_keywords": [
        "karakalpak",
        " language",
        " python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "58b47133290cb1863c5805d7fc27ccc6ae3a0372e7bf9c642c690f58fecede58",
                "md5": "edfea055c69d2e9f9de9b87e9525e045",
                "sha256": "dc5ab092a6e49df4e1e8f60a9c4a147cc9dc9d7dc24a08d8ef371842f0770e2e"
            },
            "downloads": -1,
            "filename": "jaziw-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "edfea055c69d2e9f9de9b87e9525e045",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 4366,
            "upload_time": "2024-08-02T14:43:38",
            "upload_time_iso_8601": "2024-08-02T14:43:38.860373Z",
            "url": "https://files.pythonhosted.org/packages/58/b4/7133290cb1863c5805d7fc27ccc6ae3a0372e7bf9c642c690f58fecede58/jaziw-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "285cc32f48f916872d5db0a0dec0ea821b4490f9e9b9f67f2eeaab9e78a43eb0",
                "md5": "8555dcf612af25642a6bfe3c71eea6ca",
                "sha256": "7228f877622a5f215f0a9da3de076a65f1850ad4431d14983bace06bef487e33"
            },
            "downloads": -1,
            "filename": "jaziw-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "8555dcf612af25642a6bfe3c71eea6ca",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 3729,
            "upload_time": "2024-08-02T14:43:40",
            "upload_time_iso_8601": "2024-08-02T14:43:40.114335Z",
            "url": "https://files.pythonhosted.org/packages/28/5c/c32f48f916872d5db0a0dec0ea821b4490f9e9b9f67f2eeaab9e78a43eb0/jaziw-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-02 14:43:40",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "jaziw"
}
        
Elapsed time: 0.30612s