greekroom


Namegreekroom JSON
Version 0.0.13 PyPI version JSON
download
home_pagehttps://github.com/BibleNLP/greek-room
SummaryThe Greek Room will be a suite of tools supporting Biblical natural language processing.
upload_time2025-08-27 03:21:27
maintainerNone
docs_urlNone
authorUlf Hermjakob
requires_python>=3.11
licenseNone
keywords machine translation datasets nlp natural language processing computational linguistics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # greekroom  

_greekroom_ is a suite of tools to support Biblical natural language processing (in progress)

<!--
[![image alt >](http://img.shields.io/pypi/v/greekroom.svg)](https://pypi.python.org/pypi/greekroom/)

### Installation (stubs only, in early development, not ready for regular users yet)

```bash
pip install greekroom
```
or
```bash
git clone https://github.com/BibleNLP/greek-room.git
```
-->


## gr_utilities
_gr_utilities_ is a set of Greek Room utilities.

<details>
<summary> <b>wb_file_props.py</b>
A CLI Python script to analyze file properties such as script direction, quotations.</summary>

```
usage: wb_file_props.py [-h]
           [-i INPUT_FILENAME]
           [-s INPUT_STRING]
           [-j JSON_OUT_FILENAME]
           [-o HTML_OUT_FILENAME]
           [--lang_code LANG_CODE]
           [--lang_name LANG_NAME]

options:
  -h, --help            show this help message and exit
  -i INPUT_FILENAME, --input_filename INPUT_FILENAME
  -s INPUT_STRING, --input_string INPUT_STRING
  -j JSON_OUT_FILENAME, --json_out_filename JSON_OUT_FILENAME
  -o HTML_OUT_FILENAME, --html_out_filename HTML_OUT_FILENAME
  --lang_code LANG_CODE
  --lang_name LANG_NAME
```
Notes:
* Typically, either an INPUT_FILENAME or an INPUT_STRING is provided (but not both).
* Typically, a JSON_OUT_FILENAME or a HTML_OUT_FILENAME is provided (or both).

Sample calls
```
wb_file_props.py -h
wb_file_props.py -s """She asked: “Whatʼs a ‘PyPi’?”
He replied: “I don't know.”""" -j test.json
cat test.json

```
</details>

<details>
<summary> <b>gr_utilities.wb_file_props.script_punct</b>
A Python function to analyze file properties such as script direction, quotations.</summary>

```python 
import json
try:
    from gr_utilities import wb_file_props
except ImportError:
    from greekroom.gr_utilities import wb_file_props

## Apply script to string
text = """She asked: “Whatʼs a ‘PyPi’?”
He replied: “I don't know.”"""
result_dict = wb_file_props.script_punct(None, text, "eng", "English")
print(result_dict)

## Apply script to file content
# Write text to file
filename = "test.txt"
with open(filename, "w") as f_out:
    f_out.write(text)

# Apply script
result_dict2 = wb_file_props.script_punct(filename)
# Print result as JSON string
print(json.dumps(result_dict2))
# Write result to HTML file
html_output = "test.html"
with open(html_output, "w") as f_html:
    wb_file_props.print_to_html(result_dict2, f_html)

```
</details>

## owl 
_owl_ is a battery of smaller Bible Translation checks.

<details>
<summary> <b>repeated_words.py</b>
A CLI Python script to check a file for repeated words, e.g. "the the".</summary>

```
usage: repeated_words.py [-h] 
                         [-j JSON] 
                         [-i IN_FILENAME] 
                         [-r REF_FILENAME] 
                         [-o OUT_FILENAME] 
                         [--html HTML] 
                         [--project_name PROJECT_NAME] 
                         [--lang_code LANGUAGE-CODE] 
                         [--lang_name LANG_NAME] 
                         [--message_id MESSAGE_ID]
                         [-d DATA_FILENAMES] 
                         [--verbose]

options:
  -h, --help            show this help message and exit
  -j JSON, --json JSON  input (alternative 1)
  -i IN_FILENAME, --in_filename IN_FILENAME
                        text file (alternative 2)
  -r REF_FILENAME, --ref_filename REF_FILENAME
                        ref file (alt. 2)
  -o OUT_FILENAME, --out_filename OUT_FILENAME
                        output JSON filename
  --html HTML           output HTML filename
  --project_name PROJECT_NAME
                        full name of Bible translation project
  --lang_code LANGUAGE-CODE
                        ISO 639-3, e.g. 'fas' for Persian
  --lang_name LANG_NAME
  --message_id MESSAGE_ID
  -d DATA_FILENAMES, --data_filenames DATA_FILENAMES
  --verbose
```
Notes:
* Typically, either a JSON INPUT_FILENAME or a JSON INPUT_STRING is provided (but not both).
* Typically, a JSON_OUT_FILENAME or a HTML_OUT_FILENAME is provided (or both).


Sample calls
```
repeated_words.py -h
repeated_words.py -j '{"jsonrpc": "2.0",
 "id": "eng-sample-01",
 "method": "BibleTranslationCheck",
 "params": [{"lang-code": "eng", "lang-name": "English", 
             "project-id": "eng-sample", 
             "project-name": "English Bible",
             "selectors": [{"tool": "GreekRoom", "checks": ["RepeatedWords"]}],
             "check-corpus": [{"snt-id": "GEN 1:1", "text": "In in the beginning ..."},
                              {"snt-id": "JHN 12:24", "text": "Truly truly, I say to you ..."}]}]}' -o test.json
cat test.json
```
</details>

<details>
<summary> <b>owl.repeated_words.check_mcp</b>
A Python function to check a file for repeated words, e.g. "the the".</summary>

```python 
import json
try:
   from owl import repeated_words
except ImportError:
    from greekroom.owl import repeated_words

task_s = '''{"jsonrpc": "2.0",
 "id": "eng-sample-01",
 "method": "BibleTranslationCheck",
 "params": [{"lang-code": "eng", "lang-name": "English",
             "project-id": "eng-sample",
             "project-name": "English Bible",
             "selectors": [{"tool": "GreekRoom", "checks": ["RepeatedWords"]}],
             "check-corpus": [{"snt-id": "GEN 1:1", "text": "In in the beginning ..."},
                              {"snt-id": "JHN 12:24", "text": "Truly truly, I say to you ..."}]}]}'''

# load_data_filename() loads <i>legitimate_duplicates.jsonl</i> (see below); call this function only once, even for multiple checks.
data_filename_dict = repeated_words.load_data_filename()
corpus = repeated_words.new_corpus("eng-sample-01")
mcp_d, misc_data_dict, check_corpus_list = repeated_words.check_mcp(task_s, data_filename_dict, corpus)
print(json.dumps(mcp_d))
print(misc_data_dict)
print(check_corpus_list)

# print to HTML file
feedback = repeated_words.get_feedback(mcp_d, 'GreekRoom', 'RepeatedWords')
corpus = repeated_words.update_corpus_if_empty(corpus, check_corpus_list)
repeated_words.write_to_html(feedback, misc_data_dict, corpus, "test.html", "eng", "English", "English Bible")
# result will be in test.html

```
</details>

<details>
<summary> <b>legitimate_duplicates.jsonl</b> 
Data files describing legitimate repeated words.</summary>

Samples:

```
{"lang-code": "eng", "text": "truly, truly"}
{"lang-code": "eng", "text": "her her", "snt-ids": ["HOS 2:17", "EST 2:9", "JDT 10:4"], "context-examples": ["give her her vineyards", "gave her her things for purification"]}
{"lang-code": "grc", "text": "ἀμὴν ἀμὴν", "rom": "amen amen", "gloss": {"eng": "truly truly [I say to you]"}}

{"lang-code": "hin", "text": "जब जब", "rom": "jab jab", "gloss": {"eng": "whenever"}}
{"lang-code": "hin", "text": "कुछ कुछ", "rom": "kuch kuch", "gloss": {"eng": "something, somewhat, some of, part of"}}
{"lang-code": "eng", "text": "they they", "delete": true}
```
Notes: 
* Searches for files <i>owl/data/legitimate_duplicates.jsonl</i> in directories "greekroom", "$XDG_DATA_HOME", "/usr/share", "$HOME/.local/share"
* later entries overwrite prior entries
* <i>"delete": true</i> entries delete prior entries

</details>


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/BibleNLP/greek-room",
    "name": "greekroom",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "machine translation, datasets, NLP, natural language processing, computational linguistics",
    "author": "Ulf Hermjakob",
    "author_email": "ulfhermjakob@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e2/90/37f929f9cdb864359133e265ab0e5170bf0c1b500b21580699bb68a2be79/greekroom-0.0.13.tar.gz",
    "platform": "any",
    "description": "# greekroom  \n\n_greekroom_ is a suite of tools to support Biblical natural language processing (in progress)\n\n<!--\n[![image alt >](http://img.shields.io/pypi/v/greekroom.svg)](https://pypi.python.org/pypi/greekroom/)\n\n### Installation (stubs only, in early development, not ready for regular users yet)\n\n```bash\npip install greekroom\n```\nor\n```bash\ngit clone https://github.com/BibleNLP/greek-room.git\n```\n-->\n\n\n## gr_utilities\n_gr_utilities_ is a set of Greek Room utilities.\n\n<details>\n<summary> <b>wb_file_props.py</b>\nA CLI Python script to analyze file properties such as script direction, quotations.</summary>\n\n```\nusage: wb_file_props.py [-h]\n           [-i INPUT_FILENAME]\n           [-s INPUT_STRING]\n           [-j JSON_OUT_FILENAME]\n           [-o HTML_OUT_FILENAME]\n           [--lang_code LANG_CODE]\n           [--lang_name LANG_NAME]\n\noptions:\n  -h, --help            show this help message and exit\n  -i INPUT_FILENAME, --input_filename INPUT_FILENAME\n  -s INPUT_STRING, --input_string INPUT_STRING\n  -j JSON_OUT_FILENAME, --json_out_filename JSON_OUT_FILENAME\n  -o HTML_OUT_FILENAME, --html_out_filename HTML_OUT_FILENAME\n  --lang_code LANG_CODE\n  --lang_name LANG_NAME\n```\nNotes:\n* Typically, either an INPUT_FILENAME or an INPUT_STRING is provided (but not both).\n* Typically, a JSON_OUT_FILENAME or a HTML_OUT_FILENAME is provided (or both).\n\nSample calls\n```\nwb_file_props.py -h\nwb_file_props.py -s \"\"\"She asked: \u201cWhat\u02bcs a \u2018PyPi\u2019?\u201d\nHe replied: \u201cI don't know.\u201d\"\"\" -j test.json\ncat test.json\n\n```\n</details>\n\n<details>\n<summary> <b>gr_utilities.wb_file_props.script_punct</b>\nA Python function to analyze file properties such as script direction, quotations.</summary>\n\n```python \nimport json\ntry:\n    from gr_utilities import wb_file_props\nexcept ImportError:\n    from greekroom.gr_utilities import wb_file_props\n\n## Apply script to string\ntext = \"\"\"She asked: \u201cWhat\u02bcs a \u2018PyPi\u2019?\u201d\nHe replied: \u201cI don't know.\u201d\"\"\"\nresult_dict = wb_file_props.script_punct(None, text, \"eng\", \"English\")\nprint(result_dict)\n\n## Apply script to file content\n# Write text to file\nfilename = \"test.txt\"\nwith open(filename, \"w\") as f_out:\n    f_out.write(text)\n\n# Apply script\nresult_dict2 = wb_file_props.script_punct(filename)\n# Print result as JSON string\nprint(json.dumps(result_dict2))\n# Write result to HTML file\nhtml_output = \"test.html\"\nwith open(html_output, \"w\") as f_html:\n    wb_file_props.print_to_html(result_dict2, f_html)\n\n```\n</details>\n\n## owl \n_owl_ is a battery of smaller Bible Translation checks.\n\n<details>\n<summary> <b>repeated_words.py</b>\nA CLI Python script to check a file for repeated words, e.g. \"the the\".</summary>\n\n```\nusage: repeated_words.py [-h] \n                         [-j JSON] \n                         [-i IN_FILENAME] \n                         [-r REF_FILENAME] \n                         [-o OUT_FILENAME] \n                         [--html HTML] \n                         [--project_name PROJECT_NAME] \n                         [--lang_code LANGUAGE-CODE] \n                         [--lang_name LANG_NAME] \n                         [--message_id MESSAGE_ID]\n                         [-d DATA_FILENAMES] \n                         [--verbose]\n\noptions:\n  -h, --help            show this help message and exit\n  -j JSON, --json JSON  input (alternative 1)\n  -i IN_FILENAME, --in_filename IN_FILENAME\n                        text file (alternative 2)\n  -r REF_FILENAME, --ref_filename REF_FILENAME\n                        ref file (alt. 2)\n  -o OUT_FILENAME, --out_filename OUT_FILENAME\n                        output JSON filename\n  --html HTML           output HTML filename\n  --project_name PROJECT_NAME\n                        full name of Bible translation project\n  --lang_code LANGUAGE-CODE\n                        ISO 639-3, e.g. 'fas' for Persian\n  --lang_name LANG_NAME\n  --message_id MESSAGE_ID\n  -d DATA_FILENAMES, --data_filenames DATA_FILENAMES\n  --verbose\n```\nNotes:\n* Typically, either a JSON INPUT_FILENAME or a JSON INPUT_STRING is provided (but not both).\n* Typically, a JSON_OUT_FILENAME or a HTML_OUT_FILENAME is provided (or both).\n\n\nSample calls\n```\nrepeated_words.py -h\nrepeated_words.py -j '{\"jsonrpc\": \"2.0\",\n \"id\": \"eng-sample-01\",\n \"method\": \"BibleTranslationCheck\",\n \"params\": [{\"lang-code\": \"eng\", \"lang-name\": \"English\", \n             \"project-id\": \"eng-sample\", \n             \"project-name\": \"English Bible\",\n             \"selectors\": [{\"tool\": \"GreekRoom\", \"checks\": [\"RepeatedWords\"]}],\n             \"check-corpus\": [{\"snt-id\": \"GEN 1:1\", \"text\": \"In in the beginning ...\"},\n                              {\"snt-id\": \"JHN 12:24\", \"text\": \"Truly truly, I say to you ...\"}]}]}' -o test.json\ncat test.json\n```\n</details>\n\n<details>\n<summary> <b>owl.repeated_words.check_mcp</b>\nA Python function to check a file for repeated words, e.g. \"the the\".</summary>\n\n```python \nimport json\ntry:\n   from owl import repeated_words\nexcept ImportError:\n    from greekroom.owl import repeated_words\n\ntask_s = '''{\"jsonrpc\": \"2.0\",\n \"id\": \"eng-sample-01\",\n \"method\": \"BibleTranslationCheck\",\n \"params\": [{\"lang-code\": \"eng\", \"lang-name\": \"English\",\n             \"project-id\": \"eng-sample\",\n             \"project-name\": \"English Bible\",\n             \"selectors\": [{\"tool\": \"GreekRoom\", \"checks\": [\"RepeatedWords\"]}],\n             \"check-corpus\": [{\"snt-id\": \"GEN 1:1\", \"text\": \"In in the beginning ...\"},\n                              {\"snt-id\": \"JHN 12:24\", \"text\": \"Truly truly, I say to you ...\"}]}]}'''\n\n# load_data_filename() loads <i>legitimate_duplicates.jsonl</i> (see below); call this function only once, even for multiple checks.\ndata_filename_dict = repeated_words.load_data_filename()\ncorpus = repeated_words.new_corpus(\"eng-sample-01\")\nmcp_d, misc_data_dict, check_corpus_list = repeated_words.check_mcp(task_s, data_filename_dict, corpus)\nprint(json.dumps(mcp_d))\nprint(misc_data_dict)\nprint(check_corpus_list)\n\n# print to HTML file\nfeedback = repeated_words.get_feedback(mcp_d, 'GreekRoom', 'RepeatedWords')\ncorpus = repeated_words.update_corpus_if_empty(corpus, check_corpus_list)\nrepeated_words.write_to_html(feedback, misc_data_dict, corpus, \"test.html\", \"eng\", \"English\", \"English Bible\")\n# result will be in test.html\n\n```\n</details>\n\n<details>\n<summary> <b>legitimate_duplicates.jsonl</b> \nData files describing legitimate repeated words.</summary>\n\nSamples:\n\n```\n{\"lang-code\": \"eng\", \"text\": \"truly, truly\"}\n{\"lang-code\": \"eng\", \"text\": \"her her\", \"snt-ids\": [\"HOS 2:17\", \"EST 2:9\", \"JDT 10:4\"], \"context-examples\": [\"give her her vineyards\", \"gave her her things for purification\"]}\n{\"lang-code\": \"grc\", \"text\": \"\u1f00\u03bc\u1f74\u03bd \u1f00\u03bc\u1f74\u03bd\", \"rom\": \"amen amen\", \"gloss\": {\"eng\": \"truly truly [I say to you]\"}}\n\n{\"lang-code\": \"hin\", \"text\": \"\u091c\u092c \u091c\u092c\", \"rom\": \"jab jab\", \"gloss\": {\"eng\": \"whenever\"}}\n{\"lang-code\": \"hin\", \"text\": \"\u0915\u0941\u091b \u0915\u0941\u091b\", \"rom\": \"kuch kuch\", \"gloss\": {\"eng\": \"something, somewhat, some of, part of\"}}\n{\"lang-code\": \"eng\", \"text\": \"they they\", \"delete\": true}\n```\nNotes: \n* Searches for files <i>owl/data/legitimate_duplicates.jsonl</i> in directories \"greekroom\", \"$XDG_DATA_HOME\", \"/usr/share\", \"$HOME/.local/share\"\n* later entries overwrite prior entries\n* <i>\"delete\": true</i> entries delete prior entries\n\n</details>\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "The Greek Room will be a suite of tools supporting Biblical natural language processing.",
    "version": "0.0.13",
    "project_urls": {
        "Download": "https://github.com/BibleNLP/greek-room",
        "Homepage": "https://github.com/BibleNLP/greek-room"
    },
    "split_keywords": [
        "machine translation",
        " datasets",
        " nlp",
        " natural language processing",
        " computational linguistics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3040822f944b38172dd041269f179d2b6fbf392a54a38c8e32f733b891506c40",
                "md5": "90415dd2d82f0cfe748a2305e408cc83",
                "sha256": "2fe8673bd6cfa258f6808eed2ab296a059f9c84a1b7cb980e99bc680165c8197"
            },
            "downloads": -1,
            "filename": "greekroom-0.0.13-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "90415dd2d82f0cfe748a2305e408cc83",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 156297,
            "upload_time": "2025-08-27T03:21:25",
            "upload_time_iso_8601": "2025-08-27T03:21:25.939451Z",
            "url": "https://files.pythonhosted.org/packages/30/40/822f944b38172dd041269f179d2b6fbf392a54a38c8e32f733b891506c40/greekroom-0.0.13-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e29037f929f9cdb864359133e265ab0e5170bf0c1b500b21580699bb68a2be79",
                "md5": "3bda17f7d6e2f4f02e5b1f72a541ad6a",
                "sha256": "91afe7b191b783795f404a66b6cb6d04027955eaa88d261ba70a14ee492dad79"
            },
            "downloads": -1,
            "filename": "greekroom-0.0.13.tar.gz",
            "has_sig": false,
            "md5_digest": "3bda17f7d6e2f4f02e5b1f72a541ad6a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 132596,
            "upload_time": "2025-08-27T03:21:27",
            "upload_time_iso_8601": "2025-08-27T03:21:27.100642Z",
            "url": "https://files.pythonhosted.org/packages/e2/90/37f929f9cdb864359133e265ab0e5170bf0c1b500b21580699bb68a2be79/greekroom-0.0.13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-27 03:21:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "BibleNLP",
    "github_project": "greek-room",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "greekroom"
}
        
Elapsed time: 0.51774s