amseg


Nameamseg JSON
Version 2.3 PyPI version JSON
download
home_pagehttps://github.com/uhh-lt/amharicprocessor
SummaryThis is an Amharic document segmentation and normalization tool
upload_time2023-05-03 19:29:00
maintainer
docs_urlNone
authorSeid Muhie Yimam
requires_python
licenseMIT
keywords amharic amharic sentence splitter amharic document normalizer amharic latin to fidel transliteration
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Amharic Segmenter and tokenizer
-------------------------------

This is a simple script that split an Amharic document into different
sentences and tokenes. If you find an issue, please let us know in the
GitHub `Issues <https://github.com/uhh-lt/amharicprocessor/issues>`__

The Segmenter is part of the ``Semantic Models for Amharic`` Project
|image0|

Usage 
-------
Install the segmenter: ``pip install amseg``

Tokenization and Segmentation
-------------------------------
Use the following code for sentence segmentation and word tokenization

::

    from amseg.amharicSegmenter import AmharicSegmenter
    sent_punct = [] 
    word_punct = [] 
    segmenter = AmharicSegmenter(sent_punct,word_punct) 
    words = segmenter.amharic_tokenizer("እአበበ በሶ በላ።") 
    sentences = segmenter.tokenize_sentence("እአበበ በሶ በላ። ከበደ ጆንያ፤ ተሸከመ፡!ለምን?")

Outputs

    words = ['እአበበ', 'በሶ', 'በላ', '።']

    sentences = ['እአበበ በሶ በላ።', 'ከበደ ጆንያ፤ ተሸከመ፡!', 'ለምን?']

Romanization and Normalization
-------------------------------
The following code show cases how to normalize and romanize a given Amharic text

::

    from amseg.amharicNormalizer import AmharicNormalizer as normalizer
    from amseg.amharicRomanizer import AmharicRomanizer as romanizer
    normalized = normalizer.normalize('ሑለት ሦስት')
    romanized = romanizer.romanize('ሑለት ሦስት')

Outputs 
    > normalized = 'ሁለት ሶስት' 
    > romanized = 'ḥulat śosət'

Transliteration to Amharic Fidel
---------------------------------
The following code show cases how to transliterate a given latin script text to Amahric Fidel script text

::

    from amseg.amharicTranslitrator import AmharicTranslitrator as  transliterator
    transliterated = transliterator.transliterate('misa belah')

Outputs 
    > transliterated = 'ሚሳ በላህ' 

Publications
------------

To cite the Amharic segmenter/tokenizer tool, use the following
`paper <https://www.mdpi.com/1999-5903/13/11/275>`__

::

    @Article{fi13110275,
    AUTHOR = {Yimam, Seid Muhie and Ayele, Abinew Ali and Venkatesh, Gopalakrishnan and Gashaw, Ibrahim and Biemann, Chris},
    TITLE = {Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets},
    JOURNAL = {Future Internet},
    VOLUME = {13},
    YEAR = {2021},
    NUMBER = {11},
    ARTICLE-NUMBER = {275},
    URL = {https://www.mdpi.com/1999-5903/13/11/275},
    ISSN = {1999-5903},
    DOI = {10.3390/fi13110275}
    }

.. |image0| image:: https://github.com/uhh-lt/amharicmodels/raw/master/logo.png
   :target: https://github.com/uhh-lt/amharicmodels/



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/uhh-lt/amharicprocessor",
    "name": "amseg",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "Amharic,Amharic sentence splitter,Amharic document normalizer,Amharic Latin to Fidel Transliteration",
    "author": "Seid Muhie Yimam",
    "author_email": "seid.muhie.yimam@uni-hamburg.de",
    "download_url": "https://files.pythonhosted.org/packages/b1/66/4f9442687a9a0f7c3d7a7e0013e60b8d400ad758903690f6535f31490f75/amseg-2.3.tar.gz",
    "platform": null,
    "description": "Amharic Segmenter and tokenizer\n-------------------------------\n\nThis is a simple script that split an Amharic document into different\nsentences and tokenes. If you find an issue, please let us know in the\nGitHub `Issues <https://github.com/uhh-lt/amharicprocessor/issues>`__\n\nThe Segmenter is part of the ``Semantic Models for Amharic`` Project\n|image0|\n\nUsage \n-------\nInstall the segmenter: ``pip install amseg``\n\nTokenization and Segmentation\n-------------------------------\nUse the following code for sentence segmentation and word tokenization\n\n::\n\n    from amseg.amharicSegmenter import AmharicSegmenter\n    sent_punct = [] \n    word_punct = [] \n    segmenter = AmharicSegmenter(sent_punct,word_punct) \n    words = segmenter.amharic_tokenizer(\"\u12a5\u12a0\u1260\u1260 \u1260\u1236 \u1260\u120b\u1362\") \n    sentences = segmenter.tokenize_sentence(\"\u12a5\u12a0\u1260\u1260 \u1260\u1236 \u1260\u120b\u1362 \u12a8\u1260\u12f0 \u1306\u1295\u12eb\u1364 \u1270\u1238\u12a8\u1218\u1361!\u1208\u121d\u1295?\")\n\nOutputs\n\n    words = ['\u12a5\u12a0\u1260\u1260', '\u1260\u1236', '\u1260\u120b', '\u1362']\n\n    sentences = ['\u12a5\u12a0\u1260\u1260 \u1260\u1236 \u1260\u120b\u1362', '\u12a8\u1260\u12f0 \u1306\u1295\u12eb\u1364 \u1270\u1238\u12a8\u1218\u1361!', '\u1208\u121d\u1295?']\n\nRomanization and Normalization\n-------------------------------\nThe following code show cases how to normalize and romanize a given Amharic text\n\n::\n\n    from amseg.amharicNormalizer import AmharicNormalizer as normalizer\n    from amseg.amharicRomanizer import AmharicRomanizer as romanizer\n    normalized = normalizer.normalize('\u1211\u1208\u1275 \u1226\u1235\u1275')\n    romanized = romanizer.romanize('\u1211\u1208\u1275 \u1226\u1235\u1275')\n\nOutputs \n    > normalized = '\u1201\u1208\u1275 \u1236\u1235\u1275' \n    > romanized = '\u1e25ulat \u015bos\u0259t'\n\nTransliteration to Amharic Fidel\n---------------------------------\nThe following code show cases how to transliterate a given latin script text to Amahric Fidel script text\n\n::\n\n    from amseg.amharicTranslitrator import AmharicTranslitrator as  transliterator\n    transliterated = transliterator.transliterate('misa belah')\n\nOutputs \n    > transliterated = '\u121a\u1233 \u1260\u120b\u1205' \n\nPublications\n------------\n\nTo cite the Amharic segmenter/tokenizer tool, use the following\n`paper <https://www.mdpi.com/1999-5903/13/11/275>`__\n\n::\n\n    @Article{fi13110275,\n    AUTHOR = {Yimam, Seid Muhie and Ayele, Abinew Ali and Venkatesh, Gopalakrishnan and Gashaw, Ibrahim and Biemann, Chris},\n    TITLE = {Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets},\n    JOURNAL = {Future Internet},\n    VOLUME = {13},\n    YEAR = {2021},\n    NUMBER = {11},\n    ARTICLE-NUMBER = {275},\n    URL = {https://www.mdpi.com/1999-5903/13/11/275},\n    ISSN = {1999-5903},\n    DOI = {10.3390/fi13110275}\n    }\n\n.. |image0| image:: https://github.com/uhh-lt/amharicmodels/raw/master/logo.png\n   :target: https://github.com/uhh-lt/amharicmodels/\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "This is an Amharic document segmentation and normalization tool",
    "version": "2.3",
    "project_urls": {
        "Download": "https://github.com/uhh-lt/amharicprocessor/archive/rehistoryfs/tags/v_23.tar.gz",
        "Homepage": "https://github.com/uhh-lt/amharicprocessor"
    },
    "split_keywords": [
        "amharic",
        "amharic sentence splitter",
        "amharic document normalizer",
        "amharic latin to fidel transliteration"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b1664f9442687a9a0f7c3d7a7e0013e60b8d400ad758903690f6535f31490f75",
                "md5": "e638ffb03e91e5fe48bb634b6e0af4f8",
                "sha256": "1d51cce1ca9b00b365b33fad5089e2660547be258750efbfb2e9658c53cff599"
            },
            "downloads": -1,
            "filename": "amseg-2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "e638ffb03e91e5fe48bb634b6e0af4f8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 11542,
            "upload_time": "2023-05-03T19:29:00",
            "upload_time_iso_8601": "2023-05-03T19:29:00.849769Z",
            "url": "https://files.pythonhosted.org/packages/b1/66/4f9442687a9a0f7c3d7a7e0013e60b8d400ad758903690f6535f31490f75/amseg-2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-03 19:29:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "uhh-lt",
    "github_project": "amharicprocessor",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "amseg"
}
        
Elapsed time: 0.06156s