rdgai


Namerdgai JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/rbturnbull/rdgai/
SummaryRdgai facilitates the use of LLMs for classifying transitions between variant readings in a Text Encoding Initiative (TEI) XML file containing a critical apparatus.
upload_time2025-01-03 01:36:36
maintainerNone
docs_urlNone
authorRobert Turnbull
requires_python<3.12,>=3.10
licenseApache-2.0
keywords command-line interface
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            ================================================================
rdgai
================================================================

.. start-badges

.. image:: https://raw.githubusercontent.com/rbturnbull/rdgai/refs/heads/main/docs/img/rdgai-banner.svg
    :alt: rdgai

|pypi badge| |testing badge| |coverage badge| |docs badge| |black badge|

.. |pypi badge| image:: https://img.shields.io/pypi/v/rdgai
    :target: https://pypi.org/project/rdgai/

.. |testing badge| image:: https://github.com/rbturnbull/rdgai/actions/workflows/testing.yml/badge.svg
    :target: https://github.com/rbturnbull/rdgai/actions

.. |docs badge| image:: https://github.com/rbturnbull/rdgai/actions/workflows/docs.yml/badge.svg
    :target: https://rbturnbull.github.io/rdgai
    
.. |black badge| image:: https://img.shields.io/badge/code%20style-black-000000.svg
    :target: https://github.com/psf/black
    
.. |coverage badge| image:: https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/rbturnbull/1cf1aae1e72f85de97c7f79bb41f3d76/raw/coverage-badge.json
    :target: https://rbturnbull.github.io/rdgai/coverage/
    
Rdgai facilitates the use of LLMs for classifying transitions between variant readings in a Text Encoding Initiative (TEI) XML file containing a critical apparatus. 
It enables users to define classification categories, manually annotate changes, and use an LLM to automate the classification process.
The TEI XML can then be used for phylogenetic analysis of textual traditions using `teiphy <https://github.com/jjmccollum/teiphy>`_.

Background information about the use of classifying variants in this way can be found on the `Why use Rdgai? <https://rbturnbull.github.io/rdgai/docs/why>`_ documentation.

.. end-badges

Documentation is available at `https://rbturnbull.github.io/rdgai <https://rbturnbull.github.io/rdgai>`_.

.. start-quickstart

Installation
==================================

Install using pip:

.. code-block:: bash

    pip install rdgai

Or install directly from the repository:

.. code-block:: bash

    pip install git+https://github.com/rbturnbull/rdgai.git


Usage
==================================

See all the options with the command:

.. code-block:: bash

    rdgai --help


Preparation
==================================

You first need to prepare a `TEI XML <https://teibyexample.org/exist/tutorials/>`_ file with a `critical apparatus <https://tei-c.org/release/doc/tei-p5-doc/en/html/TC.html>`_.

Define categories in the TEI XML header under ``<interpGrp type="transcriptional">``. For example:

.. code-block:: xml

    <interpGrp type="transcriptional">
        <interp xml:id="Addition" corresp="#Omission">An addition of a word or words.</interp>
        <interp xml:id="Omission" corresp="#Addition">An omission of a word or words.</interp>
        <interp xml:id="Substituion">A substitution of a word or words.</interp>
    </interpGrp>

Then use the graphical user interface (GUI) to classify transitions via buttons or keyboard navigation in a browser-based GUI.

.. code-block:: bash

    rdgai gui apparatus.xml output.xml

Or export classifications to Excel for collaborative editing:

.. code-block:: bash

    rdgai export apparatus.xml reading-pairs.xlsx

Edit in Excel and re-import with:

.. code-block:: bash

    rdgai import-classifications apparatus.xml reading-pairs.xlsx output.xml

More information about preparing the TEI XML file can be found in the `Preparation <https://rbturnbull.github.io/rdgai/docs/preparation>`_ documentation.

Validation
==================================

The accuracy of Rdgai is dependent on the type of text, the categories and their definitions and the LLM used. 
The accuracy needs to be validated on each document used with Rdgai. 
For this purpose, Rdgai comes with a validation tool which assigns a proportion of the manual annotations to be allowed for use in the prompt 
and the remainder are used as ground truth annotations for evaluating the results from Rdgai. 

To run the validation tool, use the following command:

.. code-block:: bash

    rdgai validate apparatus.xml output.xml --report output.html --proportion 0.5 --llm claude-3-5-sonnet-20241022 --examples 20

The HTML report will show the accuracy, precision, recall, F1 scores, confusion matrix, and detailed classifications (correct/incorrect).
The LLM then gives suggestions for clarifying the definitions of the categories and alerts the user to any inconsistencies in the ground truth annotations. 

More information about validating the results of Rdgai for your TEI XML file can be found in the `Validation <https://rbturnbull.github.io/rdgai/docs/validation>`_ documentation.


Classification
==================================

After validating, you can classify the unclassified reading changes using the following command:

.. code-block:: bash

    rdgai classify apparatus.xml output.xml --llm claude-3-5-sonnet-20241022 --examples 20

View the output TEI XML in the Rdgai GUI with:

.. code-block:: bash

    rdgai gui output.xml --inplace

More information about making automated classifications using Rdgai can be found in the `Classification <https://rbturnbull.github.io/rdgai/docs/classification>`_ documentation.

.. end-quickstart


Credits
==================================

.. start-credits

Robert Turnbull
For more information contact: <robert.turnbull@unimelb.edu.au>

The article about Rdgai will be published in the near future. For now, please cite the repository and some of the following articles:

- Robert Turnbull, "Transmission History" Pages 156–204 in *Codex Sinaiticus Arabicus and Its Family: A Bayesian Approach*. Vol. 66. New Testament Tools, Studies and Documents. Brill, 2025. `https://doi.org/10.1163/9789004704619_007 <https://doi.org/10.1163/9789004704619_007>`_
- Joey McCollum and Robert Turnbull. "teiphy: A Python Package for Converting TEI XML Collations to NEXUS and Other Formats." *Journal of Open Source Software* 7, no. 80 (2022): 4879. `https://doi.org/10.21105/joss.04879 <https://doi.org/10.21105/joss.04879>`_
- Joey McCollum and Robert Turnbull. "Using Bayesian Phylogenetics to Infer Manuscript Transmission History." *Digital Scholarship in the Humanities* 39, no. 1 (2024): 258–79. `https://doi.org/10.1093/llc/fqad089 <https://doi.org/10.1093/llc/fqad089>`_

.. end-credits


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/rbturnbull/rdgai/",
    "name": "rdgai",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.10",
    "maintainer_email": null,
    "keywords": "command-line interface",
    "author": "Robert Turnbull",
    "author_email": "robert.turnbull@unimelb.edu.au",
    "download_url": "https://files.pythonhosted.org/packages/9e/be/c7b4bfe3ca91206f827764c428d4f1245d0541f97ce6d8ff5336c842032d/rdgai-0.1.1.tar.gz",
    "platform": null,
    "description": "================================================================\nrdgai\n================================================================\n\n.. start-badges\n\n.. image:: https://raw.githubusercontent.com/rbturnbull/rdgai/refs/heads/main/docs/img/rdgai-banner.svg\n    :alt: rdgai\n\n|pypi badge| |testing badge| |coverage badge| |docs badge| |black badge|\n\n.. |pypi badge| image:: https://img.shields.io/pypi/v/rdgai\n    :target: https://pypi.org/project/rdgai/\n\n.. |testing badge| image:: https://github.com/rbturnbull/rdgai/actions/workflows/testing.yml/badge.svg\n    :target: https://github.com/rbturnbull/rdgai/actions\n\n.. |docs badge| image:: https://github.com/rbturnbull/rdgai/actions/workflows/docs.yml/badge.svg\n    :target: https://rbturnbull.github.io/rdgai\n    \n.. |black badge| image:: https://img.shields.io/badge/code%20style-black-000000.svg\n    :target: https://github.com/psf/black\n    \n.. |coverage badge| image:: https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/rbturnbull/1cf1aae1e72f85de97c7f79bb41f3d76/raw/coverage-badge.json\n    :target: https://rbturnbull.github.io/rdgai/coverage/\n    \nRdgai facilitates the use of LLMs for classifying transitions between variant readings in a Text Encoding Initiative (TEI) XML file containing a critical apparatus. \nIt enables users to define classification categories, manually annotate changes, and use an LLM to automate the classification process.\nThe TEI XML can then be used for phylogenetic analysis of textual traditions using `teiphy <https://github.com/jjmccollum/teiphy>`_.\n\nBackground information about the use of classifying variants in this way can be found on the `Why use Rdgai? <https://rbturnbull.github.io/rdgai/docs/why>`_ documentation.\n\n.. end-badges\n\nDocumentation is available at `https://rbturnbull.github.io/rdgai <https://rbturnbull.github.io/rdgai>`_.\n\n.. start-quickstart\n\nInstallation\n==================================\n\nInstall using pip:\n\n.. code-block:: bash\n\n    pip install rdgai\n\nOr install directly from the repository:\n\n.. code-block:: bash\n\n    pip install git+https://github.com/rbturnbull/rdgai.git\n\n\nUsage\n==================================\n\nSee all the options with the command:\n\n.. code-block:: bash\n\n    rdgai --help\n\n\nPreparation\n==================================\n\nYou first need to prepare a `TEI XML <https://teibyexample.org/exist/tutorials/>`_ file with a `critical apparatus <https://tei-c.org/release/doc/tei-p5-doc/en/html/TC.html>`_.\n\nDefine categories in the TEI XML header under ``<interpGrp type=\"transcriptional\">``. For example:\n\n.. code-block:: xml\n\n    <interpGrp type=\"transcriptional\">\n        <interp xml:id=\"Addition\" corresp=\"#Omission\">An addition of a word or words.</interp>\n        <interp xml:id=\"Omission\" corresp=\"#Addition\">An omission of a word or words.</interp>\n        <interp xml:id=\"Substituion\">A substitution of a word or words.</interp>\n    </interpGrp>\n\nThen use the graphical user interface (GUI) to classify transitions via buttons or keyboard navigation in a browser-based GUI.\n\n.. code-block:: bash\n\n    rdgai gui apparatus.xml output.xml\n\nOr export classifications to Excel for collaborative editing:\n\n.. code-block:: bash\n\n    rdgai export apparatus.xml reading-pairs.xlsx\n\nEdit in Excel and re-import with:\n\n.. code-block:: bash\n\n    rdgai import-classifications apparatus.xml reading-pairs.xlsx output.xml\n\nMore information about preparing the TEI XML file can be found in the `Preparation <https://rbturnbull.github.io/rdgai/docs/preparation>`_ documentation.\n\nValidation\n==================================\n\nThe accuracy of Rdgai is dependent on the type of text, the categories and their definitions and the LLM used. \nThe accuracy needs to be validated on each document used with Rdgai. \nFor this purpose, Rdgai comes with a validation tool which assigns a proportion of the manual annotations to be allowed for use in the prompt \nand the remainder are used as ground truth annotations for evaluating the results from Rdgai. \n\nTo run the validation tool, use the following command:\n\n.. code-block:: bash\n\n    rdgai validate apparatus.xml output.xml --report output.html --proportion 0.5 --llm claude-3-5-sonnet-20241022 --examples 20\n\nThe HTML report will show the accuracy, precision, recall, F1 scores, confusion matrix, and detailed classifications (correct/incorrect).\nThe LLM then gives suggestions for clarifying the definitions of the categories and alerts the user to any inconsistencies in the ground truth annotations. \n\nMore information about validating the results of Rdgai for your TEI XML file can be found in the `Validation <https://rbturnbull.github.io/rdgai/docs/validation>`_ documentation.\n\n\nClassification\n==================================\n\nAfter validating, you can classify the unclassified reading changes using the following command:\n\n.. code-block:: bash\n\n    rdgai classify apparatus.xml output.xml --llm claude-3-5-sonnet-20241022 --examples 20\n\nView the output TEI XML in the Rdgai GUI with:\n\n.. code-block:: bash\n\n    rdgai gui output.xml --inplace\n\nMore information about making automated classifications using Rdgai can be found in the `Classification <https://rbturnbull.github.io/rdgai/docs/classification>`_ documentation.\n\n.. end-quickstart\n\n\nCredits\n==================================\n\n.. start-credits\n\nRobert Turnbull\nFor more information contact: <robert.turnbull@unimelb.edu.au>\n\nThe article about Rdgai will be published in the near future. For now, please cite the repository and some of the following articles:\n\n- Robert Turnbull, \"Transmission History\" Pages 156\u2013204 in *Codex Sinaiticus Arabicus and Its Family: A Bayesian Approach*. Vol. 66. New Testament Tools, Studies and Documents. Brill, 2025. `https://doi.org/10.1163/9789004704619_007 <https://doi.org/10.1163/9789004704619_007>`_\n- Joey McCollum and Robert Turnbull. \"teiphy: A Python Package for Converting TEI XML Collations to NEXUS and Other Formats.\" *Journal of Open Source Software* 7, no. 80 (2022): 4879. `https://doi.org/10.21105/joss.04879 <https://doi.org/10.21105/joss.04879>`_\n- Joey McCollum and Robert Turnbull. \"Using Bayesian Phylogenetics to Infer Manuscript Transmission History.\" *Digital Scholarship in the Humanities* 39, no. 1 (2024): 258\u201379. `https://doi.org/10.1093/llc/fqad089 <https://doi.org/10.1093/llc/fqad089>`_\n\n.. end-credits\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Rdgai facilitates the use of LLMs for classifying transitions between variant readings in a Text Encoding Initiative (TEI) XML file containing a critical apparatus.",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://rbturnbull.github.io/rdgai",
        "Homepage": "https://github.com/rbturnbull/rdgai/",
        "Repository": "https://github.com/rbturnbull/rdgai/"
    },
    "split_keywords": [
        "command-line",
        "interface"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "633d8f33686cfd0ca668965d0a10b247663b00c33e7fd1d5e29f552dc8bc4c77",
                "md5": "fecba26a9453e317ebfc7035a602057b",
                "sha256": "abcf4f3a45db734469b3f335600dd68a1f482f4de490619adb91ce86e349d27d"
            },
            "downloads": -1,
            "filename": "rdgai-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fecba26a9453e317ebfc7035a602057b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.10",
            "size": 171023,
            "upload_time": "2025-01-03T01:36:34",
            "upload_time_iso_8601": "2025-01-03T01:36:34.954161Z",
            "url": "https://files.pythonhosted.org/packages/63/3d/8f33686cfd0ca668965d0a10b247663b00c33e7fd1d5e29f552dc8bc4c77/rdgai-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9ebec7b4bfe3ca91206f827764c428d4f1245d0541f97ce6d8ff5336c842032d",
                "md5": "9157dded82c2c268590fe0681eae0b45",
                "sha256": "51e511a1694482d1c06b31e1d999566178ee166ed545404cd5f057c59193a0d5"
            },
            "downloads": -1,
            "filename": "rdgai-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9157dded82c2c268590fe0681eae0b45",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.10",
            "size": 163093,
            "upload_time": "2025-01-03T01:36:36",
            "upload_time_iso_8601": "2025-01-03T01:36:36.820843Z",
            "url": "https://files.pythonhosted.org/packages/9e/be/c7b4bfe3ca91206f827764c428d4f1245d0541f97ce6d8ff5336c842032d/rdgai-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-03 01:36:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rbturnbull",
    "github_project": "rdgai",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "rdgai"
}
        
Elapsed time: 1.60138s