rasm-arch


Namerasm-arch JSON
Version 1.2.5 PyPI version JSON
download
home_pagehttps://github.com/kabikaj/rasm_arch
Summarytext utility for converting Arabic-scripted text to a completely dediacritised skeleton
upload_time2023-06-01 11:24:15
maintainer
docs_urlNone
authorAlicia González Martínez and Thomas Milo
requires_python
licenseMIT
keywords arabic persian urdu quran manuscript rasm unicode nlp digital humanities
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Rasm Arch

> Rasm_arch is a text processing utility for converting Arabic-scripted text to a completely dediacritised skeleton.

<img src="https://raw.githubusercontent.com/kabikaj/rasm_arch/main/rasm_arch.png" align="middle">

Rasm_arch is a text processing utility for converting Arabic-scripted text to a completely dediacritised skeleton. If Quranic indexes are given, it can also retrieve Quranic text.

A minimal Rust implementation can be found under src.

## Requirements

importlib-metadata>=6.6.0 \
ujson>=5.1.0

If ujson is not installed, json will be used.

## Installation

Install rasm_arch package and rasm_arch command-line utility through pip:

```sh
$ python -m pip install rasm_arch
```

Install rasm_arch package and rasm_arch command-line utility along with the man page locally using the makefile:

```sh
$ make
```

Or simply install the man page manually:
```sh
$ sudo cp man/rasm_arch.1 /usr/share/man/man1/rasm_arch.1
$ sudo gzip -f /usr/share/man/man1/rasm_arch.1
$ mandb
``` 

Use the following commands to uninstall it:

```sh
$ pip uninstall rasm_arch
```

or

```sh
$ python setup.py install --record files.txt
$ xargs rm -rf < files.txt
```

## Examples of usage

In python:

```sh
>>> import io
>>> from rasm_arch import rasm_arch as rasm
>>> for ori, rlt, rar, pal in rasm(io.StringIO('کُتِب'), paleo=True):
...   print(ori, rlt, rar, pal)
کُتِب KBB كٮٮ KᵘB²ᵢB₁

```

```sh
>>> import io
>>> from rasm_arch import rasm_arch as rasm
>>> for word, blocks in rasm(io.StringIO("فنار الإسكندرية"), blocks=True, paleo=True):
...   print(f"word = {word}")
...   for ori, rlt, rar, pal in blocks:
...     print("--", ori, rlt, rar, pal)
word = فنار
-- فنا FBA ڡٮا F¹B¹A
-- ر R ر R
word = الإسكندرية
-- ا A ا A
-- لإ LA لا LAɂ
-- سكند SKBD سكٮد SKB¹D
-- ر R ر R
-- ية BH ٮه B₂H²
```

In python with Quran indexes as input:

```sh
>>> import io
>>> from rasm_arch import rasm_arch as rasm
>>> for word, blocks in rasm(((2, 14,15, None), (2, 15, 1, 1)), paleo=True, blocks=True):
...   print(word, *blocks[0], sep='\t')
...   if len(blocks)>1:
...     for block in blocks[1:]:
...         print('-', *block, sep='\t')
نَحۡنُ      نَحۡنُ    BGN   ٮحں   B¹ᵃGᵒN¹ᵘ       (2, 14, 15, 1)
مُسۡتَهۡزِءُونَ مُسۡتَهۡزِءُ MSBHR مسٮهر MᵘSᵒB²ᵃHᵒR¹ᵢʔᵘ (2, 14, 16, 1)
-        و      W     و     W              (2, 14, 16, 2)
-        نَ      N     ں     N¹ᵃ            (2, 14, 16, 3)
ٱ        ٱ      A     ا     Aᵟ              (2, 15, 1, 1)
```

As a command-line utility:

```sh
>>> aspell -d ar dump master | rasm-arch | tail -2
يين  BBN  ٮٮں
ييئس BBBS ٮٮٮس
```

```sh
>>> rasm-arch -q 2:286:35-2:286:36 --json | jq .
[
  {
    "ori": "لَا",
    "rlt": "LA",
    "rar": "لا",
    "ind": [
      2,
      286,
      35
    ]
  },
  {
    "ori": "طَاقَةَ",
    "rlt": "TAFH",
    "rar": "طاڡه",
    "ind": [
      2,
      286,
      36
    ]
  }
]
```

## Version

1.2.5

## License for the code
 
Rasm_arch is a text processing utility for converting Arabic-scripted text to a completely dediacritised skeleton.

The code for this project is licensed under the MIT License.

Be aware that if you use the Quranic text from rasm, they have two different licences, that you can find below.

MIT License

Copyright (c) 2023 Alicia González Martínez and Thomas Milo

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

## License for the data

Rasm_arch includes two digitised Quranic texts: The Decotype Quran and the Tanzil Quran.

### Decotype Quran

The Decotype Quran is the best encoded Quran with complete orthography and the only digital Quran that has been used to render a printed Quran approved by Al-Azhar.

The Decotype Quran is private.

### Tanzil Quran

The Tanzil Quran is the most widely used digital Quran. It was made available by the Tanzil project, https://tanzil.net.

We have included two versions of the Tanzil Quran text:
- A complete Uthmanic Quran
- A simplified Quran

Tanzil Quran Text 
  Copyright (C) 2007-2021 Tanzil Project
  License: Creative Commons Attribution 3.0 

  This copy of the Quran text is carefully produced, highly 
  verified and continuously monitored by a group of specialists 
  in Tanzil Project.

  TERMS OF USE:
 
  - Permission is granted to copy and distribute verbatim copies 
    of this text, but CHANGING IT IS NOT ALLOWED.

  - This Quran text can be used in any website or application, 
    provided that its source (Tanzil Project) is clearly indicated, 
    and a link is made to tanzil.net to enable users to keep
    track of changes.

  - This copyright notice shall be included in all verbatim copies 
    of the text, and shall be reproduced appropriately in all files 
    derived from or containing substantial portion of this text.

  Please check updates at: http://tanzil.net/updates/

## Contact

Alicia González Martínez , *aliciagm85+kabikaj at gmail dot com*


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kabikaj/rasm_arch",
    "name": "rasm-arch",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "arabic,persian,urdu,quran,manuscript,rasm,unicode,NLP,digital humanities",
    "author": "Alicia Gonz\u00e1lez Mart\u00ednez and Thomas Milo",
    "author_email": "aliciagm85+kabikaj@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/97/eb/586b808026973d1c01a9030cf0abd6938cef5990e83e2bf1c46829d8be7c/rasm_arch-1.2.5.tar.gz",
    "platform": null,
    "description": "# Rasm Arch\n\n> Rasm_arch is a text processing utility for converting Arabic-scripted text to a completely dediacritised skeleton.\n\n<img src=\"https://raw.githubusercontent.com/kabikaj/rasm_arch/main/rasm_arch.png\" align=\"middle\">\n\nRasm_arch is a text processing utility for converting Arabic-scripted text to a completely dediacritised skeleton. If Quranic indexes are given, it can also retrieve Quranic text.\n\nA minimal Rust implementation can be found under src.\n\n## Requirements\n\nimportlib-metadata>=6.6.0 \\\nujson>=5.1.0\n\nIf ujson is not installed, json will be used.\n\n## Installation\n\nInstall rasm_arch package and rasm_arch command-line utility through pip:\n\n```sh\n$ python -m pip install rasm_arch\n```\n\nInstall rasm_arch package and rasm_arch command-line utility along with the man page locally using the makefile:\n\n```sh\n$ make\n```\n\nOr simply install the man page manually:\n```sh\n$ sudo cp man/rasm_arch.1 /usr/share/man/man1/rasm_arch.1\n$ sudo gzip -f /usr/share/man/man1/rasm_arch.1\n$ mandb\n``` \n\nUse the following commands to uninstall it:\n\n```sh\n$ pip uninstall rasm_arch\n```\n\nor\n\n```sh\n$ python setup.py install --record files.txt\n$ xargs rm -rf < files.txt\n```\n\n## Examples of usage\n\nIn python:\n\n```sh\n>>> import io\n>>> from rasm_arch import rasm_arch as rasm\n>>> for ori, rlt, rar, pal in rasm(io.StringIO('\u06a9\u064f\u062a\u0650\u0628'), paleo=True):\n...   print(ori, rlt, rar, pal)\n\u06a9\u064f\u062a\u0650\u0628 KBB \u0643\u066e\u066e K\u1d58B\u00b2\u1d62B\u2081\n\n```\n\n```sh\n>>> import io\n>>> from rasm_arch import rasm_arch as rasm\n>>> for word, blocks in rasm(io.StringIO(\"\u0641\u0646\u0627\u0631 \u0627\u0644\u0625\u0633\u0643\u0646\u062f\u0631\u064a\u0629\"), blocks=True, paleo=True):\n...   print(f\"word = {word}\")\n...   for ori, rlt, rar, pal in blocks:\n...     print(\"--\", ori, rlt, rar, pal)\nword = \u0641\u0646\u0627\u0631\n-- \u0641\u0646\u0627 FBA \u06a1\u066e\u0627 F\u00b9B\u00b9A\n-- \u0631 R \u0631 R\nword = \u0627\u0644\u0625\u0633\u0643\u0646\u062f\u0631\u064a\u0629\n-- \u0627 A \u0627 A\n-- \u0644\u0625 LA \u0644\u0627 LA\u0242\n-- \u0633\u0643\u0646\u062f SKBD \u0633\u0643\u066e\u062f SKB\u00b9D\n-- \u0631 R \u0631 R\n-- \u064a\u0629 BH \u066e\u0647 B\u2082H\u00b2\n```\n\nIn python with Quran indexes as input:\n\n```sh\n>>> import io\n>>> from rasm_arch import rasm_arch as rasm\n>>> for word, blocks in rasm(((2, 14,15, None), (2, 15, 1, 1)), paleo=True, blocks=True):\n...   print(word, *blocks[0], sep='\\t')\n...   if len(blocks)>1:\n...     for block in blocks[1:]:\n...         print('-', *block, sep='\\t')\n\u0646\u064e\u062d\u06e1\u0646\u064f      \u0646\u064e\u062d\u06e1\u0646\u064f    BGN   \u066e\u062d\u06ba   B\u00b9\u1d43G\u1d52N\u00b9\u1d58       (2, 14, 15, 1)\n\u0645\u064f\u0633\u06e1\u062a\u064e\u0647\u06e1\u0632\u0650\u0621\u064f\u0648\u0646\u064e \u0645\u064f\u0633\u06e1\u062a\u064e\u0647\u06e1\u0632\u0650\u0621\u064f MSBHR \u0645\u0633\u066e\u0647\u0631 M\u1d58S\u1d52B\u00b2\u1d43H\u1d52R\u00b9\u1d62\u0294\u1d58 (2, 14, 16, 1)\n-        \u0648      W     \u0648     W              (2, 14, 16, 2)\n-        \u0646\u064e      N     \u06ba     N\u00b9\u1d43            (2, 14, 16, 3)\n\u0671        \u0671      A     \u0627     A\u1d5f              (2, 15, 1, 1)\n```\n\nAs a command-line utility:\n\n```sh\n>>> aspell -d ar dump master | rasm-arch | tail -2\n\u064a\u064a\u0646  BBN  \u066e\u066e\u06ba\n\u064a\u064a\u0626\u0633 BBBS \u066e\u066e\u066e\u0633\n```\n\n```sh\n>>> rasm-arch -q 2:286:35-2:286:36 --json | jq .\n[\n  {\n    \"ori\": \"\u0644\u064e\u0627\",\n    \"rlt\": \"LA\",\n    \"rar\": \"\u0644\u0627\",\n    \"ind\": [\n      2,\n      286,\n      35\n    ]\n  },\n  {\n    \"ori\": \"\u0637\u064e\u0627\u0642\u064e\u0629\u064e\",\n    \"rlt\": \"TAFH\",\n    \"rar\": \"\u0637\u0627\u06a1\u0647\",\n    \"ind\": [\n      2,\n      286,\n      36\n    ]\n  }\n]\n```\n\n## Version\n\n1.2.5\n\n## License for the code\n \nRasm_arch is a text processing utility for converting Arabic-scripted text to a completely dediacritised skeleton.\n\nThe code for this project is licensed under the MIT License.\n\nBe aware that if you use the Quranic text from rasm, they have two different licences, that you can find below.\n\nMIT License\n\nCopyright (c) 2023 Alicia Gonz\u00e1lez Mart\u00ednez and Thomas Milo\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n\n## License for the data\n\nRasm_arch includes two digitised Quranic texts: The Decotype Quran and the Tanzil Quran.\n\n### Decotype Quran\n\nThe Decotype Quran is the best encoded Quran with complete orthography and the only digital Quran that has been used to render a printed Quran approved by Al-Azhar.\n\nThe Decotype Quran is private.\n\n### Tanzil Quran\n\nThe Tanzil Quran is the most widely used digital Quran. It was made available by the Tanzil project, https://tanzil.net.\n\nWe have included two versions of the Tanzil Quran text:\n- A complete Uthmanic Quran\n- A simplified Quran\n\nTanzil Quran Text \n  Copyright (C) 2007-2021 Tanzil Project\n  License: Creative Commons Attribution 3.0 \n\n  This copy of the Quran text is carefully produced, highly \n  verified and continuously monitored by a group of specialists \n  in Tanzil Project.\n\n  TERMS OF USE:\n \n  - Permission is granted to copy and distribute verbatim copies \n    of this text, but CHANGING IT IS NOT ALLOWED.\n\n  - This Quran text can be used in any website or application, \n    provided that its source (Tanzil Project) is clearly indicated, \n    and a link is made to tanzil.net to enable users to keep\n    track of changes.\n\n  - This copyright notice shall be included in all verbatim copies \n    of the text, and shall be reproduced appropriately in all files \n    derived from or containing substantial portion of this text.\n\n  Please check updates at: http://tanzil.net/updates/\n\n## Contact\n\nAlicia Gonz\u00e1lez Mart\u00ednez , *aliciagm85+kabikaj at gmail dot com*\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "text utility for converting Arabic-scripted text to a completely dediacritised skeleton",
    "version": "1.2.5",
    "project_urls": {
        "Download": "https://github.com/kabikaj/rasm_arch",
        "Homepage": "https://github.com/kabikaj/rasm_arch"
    },
    "split_keywords": [
        "arabic",
        "persian",
        "urdu",
        "quran",
        "manuscript",
        "rasm",
        "unicode",
        "nlp",
        "digital humanities"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "68191791f844801f05cd97a3ba2320be156f6cbec07dc209d244fb1a4bc02d7b",
                "md5": "8b8cdaccdf63f6a177fce5f2c68947c1",
                "sha256": "45a42103930e1b7f0c66b807423e565e3a55517f6a932e371201d857066c0509"
            },
            "downloads": -1,
            "filename": "rasm_arch-1.2.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8b8cdaccdf63f6a177fce5f2c68947c1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 825678,
            "upload_time": "2023-06-01T11:24:11",
            "upload_time_iso_8601": "2023-06-01T11:24:11.928318Z",
            "url": "https://files.pythonhosted.org/packages/68/19/1791f844801f05cd97a3ba2320be156f6cbec07dc209d244fb1a4bc02d7b/rasm_arch-1.2.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "97eb586b808026973d1c01a9030cf0abd6938cef5990e83e2bf1c46829d8be7c",
                "md5": "31d779bb9e8414dc1ff0028d84681a9d",
                "sha256": "f9d6af4a75129fa4cfb84804ac0172d1d17fcb20fbed75b0485085fdc6128d7e"
            },
            "downloads": -1,
            "filename": "rasm_arch-1.2.5.tar.gz",
            "has_sig": false,
            "md5_digest": "31d779bb9e8414dc1ff0028d84681a9d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 778344,
            "upload_time": "2023-06-01T11:24:15",
            "upload_time_iso_8601": "2023-06-01T11:24:15.879669Z",
            "url": "https://files.pythonhosted.org/packages/97/eb/586b808026973d1c01a9030cf0abd6938cef5990e83e2bf1c46829d8be7c/rasm_arch-1.2.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-01 11:24:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kabikaj",
    "github_project": "rasm_arch",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "rasm-arch"
}
        
Elapsed time: 0.08180s