# greekroom
_greekroom_ is a suite of tools to support Biblical natural language processing (in progress)
<!--
[](https://pypi.python.org/pypi/greekroom/)
### Installation (stubs only, in early development, not ready for regular users yet)
```bash
pip install greekroom
```
or
```bash
git clone https://github.com/BibleNLP/greek-room.git
```
-->
When using the GitHub version, we recommend that your PYTHONPATH includes the outer *greekroom* directory, i.e. the one that includes this README.md;
additionally you might want to include in PATH the Greek Room's executable directories such as greekroom/greekroom/gr_utilities:greekroom/greekroom/owl .
## gr_utilities
_gr_utilities_ is a set of Greek Room utilities.
<details>
<summary> <b>gr-wb-file-props</b>
A CLI Python script to analyze file properties such as script direction, quotations.</summary>
```
usage: gr-wb-file-props [-h]
[-i INPUT_FILENAME]
[-s INPUT_STRING]
[-j JSON_OUT_FILENAME]
[-o HTML_OUT_FILENAME]
[--lang_code LANG_CODE]
[--lang_name LANG_NAME]
options:
-h, --help show this help message and exit
-i INPUT_FILENAME, --input_filename INPUT_FILENAME
-s INPUT_STRING, --input_string INPUT_STRING
-j JSON_OUT_FILENAME, --json_out_filename JSON_OUT_FILENAME
-o HTML_OUT_FILENAME, --html_out_filename HTML_OUT_FILENAME
--lang_code LANG_CODE
--lang_name LANG_NAME
```
Notes:
* Typically, either an INPUT_FILENAME or an INPUT_STRING is provided (but not both).
* Typically, a JSON_OUT_FILENAME or a HTML_OUT_FILENAME is provided (or both).
Sample calls
```
gr-wb-file-props -h
gr-wb-file-props -s """She asked: “Whatʼs a ‘PyPi’?”
He replied: “I don't know.”""" -j test.json
cat test.json
```
</details>
<details>
<summary> <b>gr_utilities.wb_file_props.script_punct</b>
A Python function to analyze file properties such as script direction, quotations.</summary>
```python
import json
from greekroom.gr_utilities import wb_file_props
## Apply script to string
text = """She asked: “Whatʼs a ‘PyPi’?”
He replied: “I don't know.”"""
result_dict = wb_file_props.script_punct(None, text, "eng", "English")
print(result_dict)
## Apply script to file content
# Write text to file
filename = "test.txt"
with open(filename, "w") as f_out:
f_out.write(text)
# Apply script
result_dict2 = wb_file_props.script_punct(filename)
# Print result as JSON string
print(json.dumps(result_dict2))
# Write result to HTML file
html_output = "test.html"
with open(html_output, "w") as f_html:
wb_file_props.print_to_html(result_dict2, f_html)
```
</details>
## owl
_owl_ is a battery of smaller Bible Translation checks.
<details>
<summary> <b>gr-repeated-words</b>
A CLI Python script to check a file for repeated words, e.g. "the the".</summary>
```
usage: gr-repeated-words [-h]
[-j JSON]
[-i IN_FILENAME]
[-r REF_FILENAME]
[-o OUT_FILENAME]
[--html HTML]
[--project_name PROJECT_NAME]
[--lang_code LANGUAGE-CODE]
[--lang_name LANG_NAME]
[--message_id MESSAGE_ID]
[-d DATA_FILENAMES]
[--verbose]
options:
-h, --help show this help message and exit
-j JSON, --json JSON input (alternative 1)
-i IN_FILENAME, --in_filename IN_FILENAME
text file (alternative 2)
-r REF_FILENAME, --ref_filename REF_FILENAME
ref file (alt. 2)
-o OUT_FILENAME, --out_filename OUT_FILENAME
output JSON filename
--html HTML output HTML filename
--project_name PROJECT_NAME
full name of Bible translation project
--lang_code LANGUAGE-CODE
ISO 639-3, e.g. 'fas' for Persian
--lang_name LANG_NAME
--message_id MESSAGE_ID
-d DATA_FILENAMES, --data_filenames DATA_FILENAMES
--verbose
```
Notes:
* Typically, either a JSON INPUT_FILENAME or a JSON INPUT_STRING is provided (but not both).
* Typically, a JSON_OUT_FILENAME or a HTML_OUT_FILENAME is provided (or both).
Sample calls
```
gr-repeated-words -h
gr-repeated-words -j '{"jsonrpc": "2.0",
"id": "eng-sample-01",
"method": "BibleTranslationCheck",
"params": [{"lang-code": "eng", "lang-name": "English",
"project-id": "eng-sample",
"project-name": "English Bible",
"selectors": [{"tool": "GreekRoom", "checks": ["RepeatedWords"]}],
"check-corpus": [{"snt-id": "GEN 1:1", "text": "In in the beginning ..."},
{"snt-id": "JHN 12:24", "text": "Truly truly, I say to you ..."}]}]}' -o test.json
cat test.json
```
</details>
<details>
<summary> <b>owl.repeated_words.check_mcp</b>
A Python function to check a file for repeated words, e.g. "the the".</summary>
```python
import json
from greekroom.owl import repeated_words
task_s = '''{"jsonrpc": "2.0",
"id": "eng-sample-01",
"method": "BibleTranslationCheck",
"params": [{"lang-code": "eng", "lang-name": "English",
"project-id": "eng-sample",
"project-name": "English Bible",
"selectors": [{"tool": "GreekRoom", "checks": ["RepeatedWords"]}],
"check-corpus": [{"snt-id": "GEN 1:1", "text": "In in the beginning ..."},
{"snt-id": "JHN 12:24", "text": "Truly truly, I say to you ..."}]}]}'''
# load_data_filename() loads <i>legitimate_duplicates.jsonl</i> (see below); call this function only once, even for multiple checks.
data_filename_dict = repeated_words.load_data_filename()
corpus = repeated_words.new_corpus("eng-sample-01")
mcp_d, misc_data_dict, check_corpus_list = repeated_words.check_mcp(task_s, data_filename_dict, corpus)
print(json.dumps(mcp_d))
print(misc_data_dict)
print(check_corpus_list)
# print to HTML file
feedback = repeated_words.get_feedback(mcp_d, 'GreekRoom', 'RepeatedWords')
corpus = repeated_words.update_corpus_if_empty(corpus, check_corpus_list)
repeated_words.write_to_html(feedback, misc_data_dict, corpus, "test.html", "eng", "English", "English Bible")
# result will be in test.html
```
</details>
<details>
<summary> <b>legitimate_duplicates.jsonl</b>
Data files describing legitimate repeated words.</summary>
Samples:
```
{"lang-code": "eng", "text": "truly, truly"}
{"lang-code": "eng", "text": "her her", "snt-ids": ["HOS 2:17", "EST 2:9", "JDT 10:4"], "context-examples": ["give her her vineyards", "gave her her things for purification"]}
{"lang-code": "grc", "text": "ἀμὴν ἀμὴν", "rom": "amen amen", "gloss": {"eng": "truly truly [I say to you]"}}
{"lang-code": "hin", "text": "जब जब", "rom": "jab jab", "gloss": {"eng": "whenever"}}
{"lang-code": "hin", "text": "कुछ कुछ", "rom": "kuch kuch", "gloss": {"eng": "something, somewhat, some of, part of"}}
{"lang-code": "eng", "text": "they they", "delete": true}
```
Notes:
* Searches for files <i>owl/data/legitimate_duplicates.jsonl</i> in directories "greekroom", "$XDG_DATA_HOME", "/usr/share", "$HOME/.local/share"
* later entries overwrite prior entries
* <i>"delete": true</i> entries delete prior entries
</details>
Raw data
{
"_id": null,
"home_page": null,
"name": "greekroom",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "machine translation, datasets, NLP, natural language processing, computational linguistics",
"author": "Ulf Hermjakob",
"author_email": "Ulf Hermjakob <ulfhermjakob@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/21/51/0c36dce1765edb8875adbe837e39a747ba99a4127ebb177637d46f3b1711/greekroom-0.0.20.tar.gz",
"platform": null,
"description": "# greekroom\n\n_greekroom_ is a suite of tools to support Biblical natural language processing (in progress)\n\n<!--\n[](https://pypi.python.org/pypi/greekroom/)\n\n### Installation (stubs only, in early development, not ready for regular users yet)\n\n```bash\npip install greekroom\n```\nor\n```bash\ngit clone https://github.com/BibleNLP/greek-room.git\n```\n-->\n\nWhen using the GitHub version, we recommend that your PYTHONPATH includes the outer *greekroom* directory, i.e. the one that includes this README.md;\nadditionally you might want to include in PATH the Greek Room's executable directories such as greekroom/greekroom/gr_utilities:greekroom/greekroom/owl .\n\n\n## gr_utilities\n_gr_utilities_ is a set of Greek Room utilities.\n\n<details>\n<summary> <b>gr-wb-file-props</b>\nA CLI Python script to analyze file properties such as script direction, quotations.</summary>\n\n```\nusage: gr-wb-file-props [-h]\n [-i INPUT_FILENAME]\n [-s INPUT_STRING]\n [-j JSON_OUT_FILENAME]\n [-o HTML_OUT_FILENAME]\n [--lang_code LANG_CODE]\n [--lang_name LANG_NAME]\n\noptions:\n -h, --help show this help message and exit\n -i INPUT_FILENAME, --input_filename INPUT_FILENAME\n -s INPUT_STRING, --input_string INPUT_STRING\n -j JSON_OUT_FILENAME, --json_out_filename JSON_OUT_FILENAME\n -o HTML_OUT_FILENAME, --html_out_filename HTML_OUT_FILENAME\n --lang_code LANG_CODE\n --lang_name LANG_NAME\n```\nNotes:\n* Typically, either an INPUT_FILENAME or an INPUT_STRING is provided (but not both).\n* Typically, a JSON_OUT_FILENAME or a HTML_OUT_FILENAME is provided (or both).\n\nSample calls\n```\ngr-wb-file-props -h\ngr-wb-file-props -s \"\"\"She asked: \u201cWhat\u02bcs a \u2018PyPi\u2019?\u201d\nHe replied: \u201cI don't know.\u201d\"\"\" -j test.json\ncat test.json\n\n```\n</details>\n\n<details>\n<summary> <b>gr_utilities.wb_file_props.script_punct</b>\nA Python function to analyze file properties such as script direction, quotations.</summary>\n\n```python\nimport json\nfrom greekroom.gr_utilities import wb_file_props\n\n## Apply script to string\ntext = \"\"\"She asked: \u201cWhat\u02bcs a \u2018PyPi\u2019?\u201d\nHe replied: \u201cI don't know.\u201d\"\"\"\nresult_dict = wb_file_props.script_punct(None, text, \"eng\", \"English\")\nprint(result_dict)\n\n## Apply script to file content\n# Write text to file\nfilename = \"test.txt\"\nwith open(filename, \"w\") as f_out:\n f_out.write(text)\n\n# Apply script\nresult_dict2 = wb_file_props.script_punct(filename)\n# Print result as JSON string\nprint(json.dumps(result_dict2))\n# Write result to HTML file\nhtml_output = \"test.html\"\nwith open(html_output, \"w\") as f_html:\n wb_file_props.print_to_html(result_dict2, f_html)\n\n```\n</details>\n\n## owl\n_owl_ is a battery of smaller Bible Translation checks.\n\n<details>\n<summary> <b>gr-repeated-words</b>\nA CLI Python script to check a file for repeated words, e.g. \"the the\".</summary>\n\n```\nusage: gr-repeated-words [-h]\n [-j JSON]\n [-i IN_FILENAME]\n [-r REF_FILENAME]\n [-o OUT_FILENAME]\n [--html HTML]\n [--project_name PROJECT_NAME]\n [--lang_code LANGUAGE-CODE]\n [--lang_name LANG_NAME]\n [--message_id MESSAGE_ID]\n [-d DATA_FILENAMES]\n [--verbose]\n\noptions:\n -h, --help show this help message and exit\n -j JSON, --json JSON input (alternative 1)\n -i IN_FILENAME, --in_filename IN_FILENAME\n text file (alternative 2)\n -r REF_FILENAME, --ref_filename REF_FILENAME\n ref file (alt. 2)\n -o OUT_FILENAME, --out_filename OUT_FILENAME\n output JSON filename\n --html HTML output HTML filename\n --project_name PROJECT_NAME\n full name of Bible translation project\n --lang_code LANGUAGE-CODE\n ISO 639-3, e.g. 'fas' for Persian\n --lang_name LANG_NAME\n --message_id MESSAGE_ID\n -d DATA_FILENAMES, --data_filenames DATA_FILENAMES\n --verbose\n```\nNotes:\n* Typically, either a JSON INPUT_FILENAME or a JSON INPUT_STRING is provided (but not both).\n* Typically, a JSON_OUT_FILENAME or a HTML_OUT_FILENAME is provided (or both).\n\n\nSample calls\n```\ngr-repeated-words -h\ngr-repeated-words -j '{\"jsonrpc\": \"2.0\",\n \"id\": \"eng-sample-01\",\n \"method\": \"BibleTranslationCheck\",\n \"params\": [{\"lang-code\": \"eng\", \"lang-name\": \"English\",\n \"project-id\": \"eng-sample\",\n \"project-name\": \"English Bible\",\n \"selectors\": [{\"tool\": \"GreekRoom\", \"checks\": [\"RepeatedWords\"]}],\n \"check-corpus\": [{\"snt-id\": \"GEN 1:1\", \"text\": \"In in the beginning ...\"},\n {\"snt-id\": \"JHN 12:24\", \"text\": \"Truly truly, I say to you ...\"}]}]}' -o test.json\ncat test.json\n```\n</details>\n\n<details>\n<summary> <b>owl.repeated_words.check_mcp</b>\nA Python function to check a file for repeated words, e.g. \"the the\".</summary>\n\n```python\nimport json\nfrom greekroom.owl import repeated_words\n\ntask_s = '''{\"jsonrpc\": \"2.0\",\n \"id\": \"eng-sample-01\",\n \"method\": \"BibleTranslationCheck\",\n \"params\": [{\"lang-code\": \"eng\", \"lang-name\": \"English\",\n \"project-id\": \"eng-sample\",\n \"project-name\": \"English Bible\",\n \"selectors\": [{\"tool\": \"GreekRoom\", \"checks\": [\"RepeatedWords\"]}],\n \"check-corpus\": [{\"snt-id\": \"GEN 1:1\", \"text\": \"In in the beginning ...\"},\n {\"snt-id\": \"JHN 12:24\", \"text\": \"Truly truly, I say to you ...\"}]}]}'''\n\n# load_data_filename() loads <i>legitimate_duplicates.jsonl</i> (see below); call this function only once, even for multiple checks.\ndata_filename_dict = repeated_words.load_data_filename()\ncorpus = repeated_words.new_corpus(\"eng-sample-01\")\nmcp_d, misc_data_dict, check_corpus_list = repeated_words.check_mcp(task_s, data_filename_dict, corpus)\nprint(json.dumps(mcp_d))\nprint(misc_data_dict)\nprint(check_corpus_list)\n\n# print to HTML file\nfeedback = repeated_words.get_feedback(mcp_d, 'GreekRoom', 'RepeatedWords')\ncorpus = repeated_words.update_corpus_if_empty(corpus, check_corpus_list)\nrepeated_words.write_to_html(feedback, misc_data_dict, corpus, \"test.html\", \"eng\", \"English\", \"English Bible\")\n# result will be in test.html\n\n```\n</details>\n\n<details>\n<summary> <b>legitimate_duplicates.jsonl</b>\nData files describing legitimate repeated words.</summary>\n\nSamples:\n\n```\n{\"lang-code\": \"eng\", \"text\": \"truly, truly\"}\n{\"lang-code\": \"eng\", \"text\": \"her her\", \"snt-ids\": [\"HOS 2:17\", \"EST 2:9\", \"JDT 10:4\"], \"context-examples\": [\"give her her vineyards\", \"gave her her things for purification\"]}\n{\"lang-code\": \"grc\", \"text\": \"\u1f00\u03bc\u1f74\u03bd \u1f00\u03bc\u1f74\u03bd\", \"rom\": \"amen amen\", \"gloss\": {\"eng\": \"truly truly [I say to you]\"}}\n\n{\"lang-code\": \"hin\", \"text\": \"\u091c\u092c \u091c\u092c\", \"rom\": \"jab jab\", \"gloss\": {\"eng\": \"whenever\"}}\n{\"lang-code\": \"hin\", \"text\": \"\u0915\u0941\u091b \u0915\u0941\u091b\", \"rom\": \"kuch kuch\", \"gloss\": {\"eng\": \"something, somewhat, some of, part of\"}}\n{\"lang-code\": \"eng\", \"text\": \"they they\", \"delete\": true}\n```\nNotes:\n* Searches for files <i>owl/data/legitimate_duplicates.jsonl</i> in directories \"greekroom\", \"$XDG_DATA_HOME\", \"/usr/share\", \"$HOME/.local/share\"\n* later entries overwrite prior entries\n* <i>\"delete\": true</i> entries delete prior entries\n\n</details>\n",
"bugtrack_url": null,
"license": null,
"summary": "The Greek Room will be a suite of tools supporting Biblical natural language processing.",
"version": "0.0.20",
"project_urls": {
"Download": "https://github.com/BibleNLP/greek-room",
"Homepage": "https://greekroom.org"
},
"split_keywords": [
"machine translation",
" datasets",
" nlp",
" natural language processing",
" computational linguistics"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9b6f673381dfa7d381ebf0ca034531b11ea3fd8cdb8447db111b9e327b32c128",
"md5": "5cf43d5d4a3412277b767ba476c38588",
"sha256": "d2d59ff8824249d7ef21e6cc5f616c09af67e6d24d171a1c13cea2be67d1094b"
},
"downloads": -1,
"filename": "greekroom-0.0.20-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5cf43d5d4a3412277b767ba476c38588",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 25880,
"upload_time": "2025-09-11T04:41:19",
"upload_time_iso_8601": "2025-09-11T04:41:19.237318Z",
"url": "https://files.pythonhosted.org/packages/9b/6f/673381dfa7d381ebf0ca034531b11ea3fd8cdb8447db111b9e327b32c128/greekroom-0.0.20-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "21510c36dce1765edb8875adbe837e39a747ba99a4127ebb177637d46f3b1711",
"md5": "411d24415e382988316f65f4b4619a90",
"sha256": "7d22881f98e595f1cf72f1d1c845abbb51d3f019e956167756d4ccced88df95a"
},
"downloads": -1,
"filename": "greekroom-0.0.20.tar.gz",
"has_sig": false,
"md5_digest": "411d24415e382988316f65f4b4619a90",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 20512,
"upload_time": "2025-09-11T04:41:20",
"upload_time_iso_8601": "2025-09-11T04:41:20.555417Z",
"url": "https://files.pythonhosted.org/packages/21/51/0c36dce1765edb8875adbe837e39a747ba99a4127ebb177637d46f3b1711/greekroom-0.0.20.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-11 04:41:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "BibleNLP",
"github_project": "greek-room",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "greekroom"
}