greekroom


Namegreekroom JSON
Version 0.0.20 PyPI version JSON
download
home_pageNone
SummaryThe Greek Room will be a suite of tools supporting Biblical natural language processing.
upload_time2025-09-11 04:41:20
maintainerNone
docs_urlNone
authorUlf Hermjakob
requires_python>=3.11
licenseNone
keywords machine translation datasets nlp natural language processing computational linguistics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # greekroom

_greekroom_ is a suite of tools to support Biblical natural language processing (in progress)

<!--
[![image alt >](http://img.shields.io/pypi/v/greekroom.svg)](https://pypi.python.org/pypi/greekroom/)

### Installation (stubs only, in early development, not ready for regular users yet)

```bash
pip install greekroom
```
or
```bash
git clone https://github.com/BibleNLP/greek-room.git
```
-->

When using the GitHub version, we recommend that your PYTHONPATH includes the outer *greekroom* directory, i.e. the one that includes this README.md;
additionally you might want to include in PATH the Greek Room's executable directories such as greekroom/greekroom/gr_utilities:greekroom/greekroom/owl .


## gr_utilities
_gr_utilities_ is a set of Greek Room utilities.

<details>
<summary> <b>gr-wb-file-props</b>
A CLI Python script to analyze file properties such as script direction, quotations.</summary>

```
usage: gr-wb-file-props [-h]
           [-i INPUT_FILENAME]
           [-s INPUT_STRING]
           [-j JSON_OUT_FILENAME]
           [-o HTML_OUT_FILENAME]
           [--lang_code LANG_CODE]
           [--lang_name LANG_NAME]

options:
  -h, --help            show this help message and exit
  -i INPUT_FILENAME, --input_filename INPUT_FILENAME
  -s INPUT_STRING, --input_string INPUT_STRING
  -j JSON_OUT_FILENAME, --json_out_filename JSON_OUT_FILENAME
  -o HTML_OUT_FILENAME, --html_out_filename HTML_OUT_FILENAME
  --lang_code LANG_CODE
  --lang_name LANG_NAME
```
Notes:
* Typically, either an INPUT_FILENAME or an INPUT_STRING is provided (but not both).
* Typically, a JSON_OUT_FILENAME or a HTML_OUT_FILENAME is provided (or both).

Sample calls
```
gr-wb-file-props -h
gr-wb-file-props -s """She asked: “Whatʼs a ‘PyPi’?”
He replied: “I don't know.”""" -j test.json
cat test.json

```
</details>

<details>
<summary> <b>gr_utilities.wb_file_props.script_punct</b>
A Python function to analyze file properties such as script direction, quotations.</summary>

```python
import json
from greekroom.gr_utilities import wb_file_props

## Apply script to string
text = """She asked: “Whatʼs a ‘PyPi’?”
He replied: “I don't know.”"""
result_dict = wb_file_props.script_punct(None, text, "eng", "English")
print(result_dict)

## Apply script to file content
# Write text to file
filename = "test.txt"
with open(filename, "w") as f_out:
    f_out.write(text)

# Apply script
result_dict2 = wb_file_props.script_punct(filename)
# Print result as JSON string
print(json.dumps(result_dict2))
# Write result to HTML file
html_output = "test.html"
with open(html_output, "w") as f_html:
    wb_file_props.print_to_html(result_dict2, f_html)

```
</details>

## owl
_owl_ is a battery of smaller Bible Translation checks.

<details>
<summary> <b>gr-repeated-words</b>
A CLI Python script to check a file for repeated words, e.g. "the the".</summary>

```
usage: gr-repeated-words [-h]
                         [-j JSON]
                         [-i IN_FILENAME]
                         [-r REF_FILENAME]
                         [-o OUT_FILENAME]
                         [--html HTML]
                         [--project_name PROJECT_NAME]
                         [--lang_code LANGUAGE-CODE]
                         [--lang_name LANG_NAME]
                         [--message_id MESSAGE_ID]
                         [-d DATA_FILENAMES]
                         [--verbose]

options:
  -h, --help            show this help message and exit
  -j JSON, --json JSON  input (alternative 1)
  -i IN_FILENAME, --in_filename IN_FILENAME
                        text file (alternative 2)
  -r REF_FILENAME, --ref_filename REF_FILENAME
                        ref file (alt. 2)
  -o OUT_FILENAME, --out_filename OUT_FILENAME
                        output JSON filename
  --html HTML           output HTML filename
  --project_name PROJECT_NAME
                        full name of Bible translation project
  --lang_code LANGUAGE-CODE
                        ISO 639-3, e.g. 'fas' for Persian
  --lang_name LANG_NAME
  --message_id MESSAGE_ID
  -d DATA_FILENAMES, --data_filenames DATA_FILENAMES
  --verbose
```
Notes:
* Typically, either a JSON INPUT_FILENAME or a JSON INPUT_STRING is provided (but not both).
* Typically, a JSON_OUT_FILENAME or a HTML_OUT_FILENAME is provided (or both).


Sample calls
```
gr-repeated-words -h
gr-repeated-words -j '{"jsonrpc": "2.0",
 "id": "eng-sample-01",
 "method": "BibleTranslationCheck",
 "params": [{"lang-code": "eng", "lang-name": "English",
             "project-id": "eng-sample",
             "project-name": "English Bible",
             "selectors": [{"tool": "GreekRoom", "checks": ["RepeatedWords"]}],
             "check-corpus": [{"snt-id": "GEN 1:1", "text": "In in the beginning ..."},
                              {"snt-id": "JHN 12:24", "text": "Truly truly, I say to you ..."}]}]}' -o test.json
cat test.json
```
</details>

<details>
<summary> <b>owl.repeated_words.check_mcp</b>
A Python function to check a file for repeated words, e.g. "the the".</summary>

```python
import json
from greekroom.owl import repeated_words

task_s = '''{"jsonrpc": "2.0",
 "id": "eng-sample-01",
 "method": "BibleTranslationCheck",
 "params": [{"lang-code": "eng", "lang-name": "English",
             "project-id": "eng-sample",
             "project-name": "English Bible",
             "selectors": [{"tool": "GreekRoom", "checks": ["RepeatedWords"]}],
             "check-corpus": [{"snt-id": "GEN 1:1", "text": "In in the beginning ..."},
                              {"snt-id": "JHN 12:24", "text": "Truly truly, I say to you ..."}]}]}'''

# load_data_filename() loads <i>legitimate_duplicates.jsonl</i> (see below); call this function only once, even for multiple checks.
data_filename_dict = repeated_words.load_data_filename()
corpus = repeated_words.new_corpus("eng-sample-01")
mcp_d, misc_data_dict, check_corpus_list = repeated_words.check_mcp(task_s, data_filename_dict, corpus)
print(json.dumps(mcp_d))
print(misc_data_dict)
print(check_corpus_list)

# print to HTML file
feedback = repeated_words.get_feedback(mcp_d, 'GreekRoom', 'RepeatedWords')
corpus = repeated_words.update_corpus_if_empty(corpus, check_corpus_list)
repeated_words.write_to_html(feedback, misc_data_dict, corpus, "test.html", "eng", "English", "English Bible")
# result will be in test.html

```
</details>

<details>
<summary> <b>legitimate_duplicates.jsonl</b>
Data files describing legitimate repeated words.</summary>

Samples:

```
{"lang-code": "eng", "text": "truly, truly"}
{"lang-code": "eng", "text": "her her", "snt-ids": ["HOS 2:17", "EST 2:9", "JDT 10:4"], "context-examples": ["give her her vineyards", "gave her her things for purification"]}
{"lang-code": "grc", "text": "ἀμὴν ἀμὴν", "rom": "amen amen", "gloss": {"eng": "truly truly [I say to you]"}}

{"lang-code": "hin", "text": "जब जब", "rom": "jab jab", "gloss": {"eng": "whenever"}}
{"lang-code": "hin", "text": "कुछ कुछ", "rom": "kuch kuch", "gloss": {"eng": "something, somewhat, some of, part of"}}
{"lang-code": "eng", "text": "they they", "delete": true}
```
Notes:
* Searches for files <i>owl/data/legitimate_duplicates.jsonl</i> in directories "greekroom", "$XDG_DATA_HOME", "/usr/share", "$HOME/.local/share"
* later entries overwrite prior entries
* <i>"delete": true</i> entries delete prior entries

</details>

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "greekroom",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "machine translation, datasets, NLP, natural language processing, computational linguistics",
    "author": "Ulf Hermjakob",
    "author_email": "Ulf Hermjakob <ulfhermjakob@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/21/51/0c36dce1765edb8875adbe837e39a747ba99a4127ebb177637d46f3b1711/greekroom-0.0.20.tar.gz",
    "platform": null,
    "description": "# greekroom\n\n_greekroom_ is a suite of tools to support Biblical natural language processing (in progress)\n\n<!--\n[![image alt >](http://img.shields.io/pypi/v/greekroom.svg)](https://pypi.python.org/pypi/greekroom/)\n\n### Installation (stubs only, in early development, not ready for regular users yet)\n\n```bash\npip install greekroom\n```\nor\n```bash\ngit clone https://github.com/BibleNLP/greek-room.git\n```\n-->\n\nWhen using the GitHub version, we recommend that your PYTHONPATH includes the outer *greekroom* directory, i.e. the one that includes this README.md;\nadditionally you might want to include in PATH the Greek Room's executable directories such as greekroom/greekroom/gr_utilities:greekroom/greekroom/owl .\n\n\n## gr_utilities\n_gr_utilities_ is a set of Greek Room utilities.\n\n<details>\n<summary> <b>gr-wb-file-props</b>\nA CLI Python script to analyze file properties such as script direction, quotations.</summary>\n\n```\nusage: gr-wb-file-props [-h]\n           [-i INPUT_FILENAME]\n           [-s INPUT_STRING]\n           [-j JSON_OUT_FILENAME]\n           [-o HTML_OUT_FILENAME]\n           [--lang_code LANG_CODE]\n           [--lang_name LANG_NAME]\n\noptions:\n  -h, --help            show this help message and exit\n  -i INPUT_FILENAME, --input_filename INPUT_FILENAME\n  -s INPUT_STRING, --input_string INPUT_STRING\n  -j JSON_OUT_FILENAME, --json_out_filename JSON_OUT_FILENAME\n  -o HTML_OUT_FILENAME, --html_out_filename HTML_OUT_FILENAME\n  --lang_code LANG_CODE\n  --lang_name LANG_NAME\n```\nNotes:\n* Typically, either an INPUT_FILENAME or an INPUT_STRING is provided (but not both).\n* Typically, a JSON_OUT_FILENAME or a HTML_OUT_FILENAME is provided (or both).\n\nSample calls\n```\ngr-wb-file-props -h\ngr-wb-file-props -s \"\"\"She asked: \u201cWhat\u02bcs a \u2018PyPi\u2019?\u201d\nHe replied: \u201cI don't know.\u201d\"\"\" -j test.json\ncat test.json\n\n```\n</details>\n\n<details>\n<summary> <b>gr_utilities.wb_file_props.script_punct</b>\nA Python function to analyze file properties such as script direction, quotations.</summary>\n\n```python\nimport json\nfrom greekroom.gr_utilities import wb_file_props\n\n## Apply script to string\ntext = \"\"\"She asked: \u201cWhat\u02bcs a \u2018PyPi\u2019?\u201d\nHe replied: \u201cI don't know.\u201d\"\"\"\nresult_dict = wb_file_props.script_punct(None, text, \"eng\", \"English\")\nprint(result_dict)\n\n## Apply script to file content\n# Write text to file\nfilename = \"test.txt\"\nwith open(filename, \"w\") as f_out:\n    f_out.write(text)\n\n# Apply script\nresult_dict2 = wb_file_props.script_punct(filename)\n# Print result as JSON string\nprint(json.dumps(result_dict2))\n# Write result to HTML file\nhtml_output = \"test.html\"\nwith open(html_output, \"w\") as f_html:\n    wb_file_props.print_to_html(result_dict2, f_html)\n\n```\n</details>\n\n## owl\n_owl_ is a battery of smaller Bible Translation checks.\n\n<details>\n<summary> <b>gr-repeated-words</b>\nA CLI Python script to check a file for repeated words, e.g. \"the the\".</summary>\n\n```\nusage: gr-repeated-words [-h]\n                         [-j JSON]\n                         [-i IN_FILENAME]\n                         [-r REF_FILENAME]\n                         [-o OUT_FILENAME]\n                         [--html HTML]\n                         [--project_name PROJECT_NAME]\n                         [--lang_code LANGUAGE-CODE]\n                         [--lang_name LANG_NAME]\n                         [--message_id MESSAGE_ID]\n                         [-d DATA_FILENAMES]\n                         [--verbose]\n\noptions:\n  -h, --help            show this help message and exit\n  -j JSON, --json JSON  input (alternative 1)\n  -i IN_FILENAME, --in_filename IN_FILENAME\n                        text file (alternative 2)\n  -r REF_FILENAME, --ref_filename REF_FILENAME\n                        ref file (alt. 2)\n  -o OUT_FILENAME, --out_filename OUT_FILENAME\n                        output JSON filename\n  --html HTML           output HTML filename\n  --project_name PROJECT_NAME\n                        full name of Bible translation project\n  --lang_code LANGUAGE-CODE\n                        ISO 639-3, e.g. 'fas' for Persian\n  --lang_name LANG_NAME\n  --message_id MESSAGE_ID\n  -d DATA_FILENAMES, --data_filenames DATA_FILENAMES\n  --verbose\n```\nNotes:\n* Typically, either a JSON INPUT_FILENAME or a JSON INPUT_STRING is provided (but not both).\n* Typically, a JSON_OUT_FILENAME or a HTML_OUT_FILENAME is provided (or both).\n\n\nSample calls\n```\ngr-repeated-words -h\ngr-repeated-words -j '{\"jsonrpc\": \"2.0\",\n \"id\": \"eng-sample-01\",\n \"method\": \"BibleTranslationCheck\",\n \"params\": [{\"lang-code\": \"eng\", \"lang-name\": \"English\",\n             \"project-id\": \"eng-sample\",\n             \"project-name\": \"English Bible\",\n             \"selectors\": [{\"tool\": \"GreekRoom\", \"checks\": [\"RepeatedWords\"]}],\n             \"check-corpus\": [{\"snt-id\": \"GEN 1:1\", \"text\": \"In in the beginning ...\"},\n                              {\"snt-id\": \"JHN 12:24\", \"text\": \"Truly truly, I say to you ...\"}]}]}' -o test.json\ncat test.json\n```\n</details>\n\n<details>\n<summary> <b>owl.repeated_words.check_mcp</b>\nA Python function to check a file for repeated words, e.g. \"the the\".</summary>\n\n```python\nimport json\nfrom greekroom.owl import repeated_words\n\ntask_s = '''{\"jsonrpc\": \"2.0\",\n \"id\": \"eng-sample-01\",\n \"method\": \"BibleTranslationCheck\",\n \"params\": [{\"lang-code\": \"eng\", \"lang-name\": \"English\",\n             \"project-id\": \"eng-sample\",\n             \"project-name\": \"English Bible\",\n             \"selectors\": [{\"tool\": \"GreekRoom\", \"checks\": [\"RepeatedWords\"]}],\n             \"check-corpus\": [{\"snt-id\": \"GEN 1:1\", \"text\": \"In in the beginning ...\"},\n                              {\"snt-id\": \"JHN 12:24\", \"text\": \"Truly truly, I say to you ...\"}]}]}'''\n\n# load_data_filename() loads <i>legitimate_duplicates.jsonl</i> (see below); call this function only once, even for multiple checks.\ndata_filename_dict = repeated_words.load_data_filename()\ncorpus = repeated_words.new_corpus(\"eng-sample-01\")\nmcp_d, misc_data_dict, check_corpus_list = repeated_words.check_mcp(task_s, data_filename_dict, corpus)\nprint(json.dumps(mcp_d))\nprint(misc_data_dict)\nprint(check_corpus_list)\n\n# print to HTML file\nfeedback = repeated_words.get_feedback(mcp_d, 'GreekRoom', 'RepeatedWords')\ncorpus = repeated_words.update_corpus_if_empty(corpus, check_corpus_list)\nrepeated_words.write_to_html(feedback, misc_data_dict, corpus, \"test.html\", \"eng\", \"English\", \"English Bible\")\n# result will be in test.html\n\n```\n</details>\n\n<details>\n<summary> <b>legitimate_duplicates.jsonl</b>\nData files describing legitimate repeated words.</summary>\n\nSamples:\n\n```\n{\"lang-code\": \"eng\", \"text\": \"truly, truly\"}\n{\"lang-code\": \"eng\", \"text\": \"her her\", \"snt-ids\": [\"HOS 2:17\", \"EST 2:9\", \"JDT 10:4\"], \"context-examples\": [\"give her her vineyards\", \"gave her her things for purification\"]}\n{\"lang-code\": \"grc\", \"text\": \"\u1f00\u03bc\u1f74\u03bd \u1f00\u03bc\u1f74\u03bd\", \"rom\": \"amen amen\", \"gloss\": {\"eng\": \"truly truly [I say to you]\"}}\n\n{\"lang-code\": \"hin\", \"text\": \"\u091c\u092c \u091c\u092c\", \"rom\": \"jab jab\", \"gloss\": {\"eng\": \"whenever\"}}\n{\"lang-code\": \"hin\", \"text\": \"\u0915\u0941\u091b \u0915\u0941\u091b\", \"rom\": \"kuch kuch\", \"gloss\": {\"eng\": \"something, somewhat, some of, part of\"}}\n{\"lang-code\": \"eng\", \"text\": \"they they\", \"delete\": true}\n```\nNotes:\n* Searches for files <i>owl/data/legitimate_duplicates.jsonl</i> in directories \"greekroom\", \"$XDG_DATA_HOME\", \"/usr/share\", \"$HOME/.local/share\"\n* later entries overwrite prior entries\n* <i>\"delete\": true</i> entries delete prior entries\n\n</details>\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "The Greek Room will be a suite of tools supporting Biblical natural language processing.",
    "version": "0.0.20",
    "project_urls": {
        "Download": "https://github.com/BibleNLP/greek-room",
        "Homepage": "https://greekroom.org"
    },
    "split_keywords": [
        "machine translation",
        " datasets",
        " nlp",
        " natural language processing",
        " computational linguistics"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9b6f673381dfa7d381ebf0ca034531b11ea3fd8cdb8447db111b9e327b32c128",
                "md5": "5cf43d5d4a3412277b767ba476c38588",
                "sha256": "d2d59ff8824249d7ef21e6cc5f616c09af67e6d24d171a1c13cea2be67d1094b"
            },
            "downloads": -1,
            "filename": "greekroom-0.0.20-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5cf43d5d4a3412277b767ba476c38588",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 25880,
            "upload_time": "2025-09-11T04:41:19",
            "upload_time_iso_8601": "2025-09-11T04:41:19.237318Z",
            "url": "https://files.pythonhosted.org/packages/9b/6f/673381dfa7d381ebf0ca034531b11ea3fd8cdb8447db111b9e327b32c128/greekroom-0.0.20-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "21510c36dce1765edb8875adbe837e39a747ba99a4127ebb177637d46f3b1711",
                "md5": "411d24415e382988316f65f4b4619a90",
                "sha256": "7d22881f98e595f1cf72f1d1c845abbb51d3f019e956167756d4ccced88df95a"
            },
            "downloads": -1,
            "filename": "greekroom-0.0.20.tar.gz",
            "has_sig": false,
            "md5_digest": "411d24415e382988316f65f4b4619a90",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 20512,
            "upload_time": "2025-09-11T04:41:20",
            "upload_time_iso_8601": "2025-09-11T04:41:20.555417Z",
            "url": "https://files.pythonhosted.org/packages/21/51/0c36dce1765edb8875adbe837e39a747ba99a4127ebb177637d46f3b1711/greekroom-0.0.20.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-11 04:41:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "BibleNLP",
    "github_project": "greek-room",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "greekroom"
}
        
Elapsed time: 2.24296s