bio-terrier


Namebio-terrier JSON
Version 0.3.4 PyPI version JSON
download
home_pagehttps://github.com/rbturnbull/terrier/
SummaryTransposable Element Repeat Result classifIER
upload_time2025-10-14 04:29:50
maintainerNone
docs_urlNone
authorRobert Turnbull
requires_python<3.13,>=3.10
licenseApache-2.0
keywords torchapp pytorch deep learning command-line interface
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            .. image:: https://raw.githubusercontent.com/rbturnbull/terrier/main/docs/images/terrier-banner.png

.. start-badges

|pypi badge| |colab badge| |testing badge| |docs badge| |black badge| |torchapp badge| |doi badge|

.. |pypi badge| image:: https://img.shields.io/pypi/v/bio-terrier?color=blue
   :alt: PyPI - Version
   :target: https://pypi.org/project/bio-terrier/

.. |testing badge| image:: https://github.com/rbturnbull/terrier/actions/workflows/testing.yml/badge.svg
    :target: https://github.com/rbturnbull/terrier/actions

.. |docs badge| image:: https://github.com/rbturnbull/terrier/actions/workflows/docs.yml/badge.svg
    :target: https://rbturnbull.github.io/terrier
    
.. |black badge| image:: https://img.shields.io/badge/code%20style-black-000000.svg
    :target: https://github.com/psf/black
    
.. |coverage badge| image:: https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/rbturnbull/5e0c3115955fde132a8b7c131da68b86/raw/coverage-badge.json
    :target: https://rbturnbull.github.io/terrier/coverage/

.. |torchapp badge| image:: https://img.shields.io/badge/torch-app-B1230A.svg
    :target: https://rbturnbull.github.io/torchapp/

.. |colab badge| image:: https://colab.research.google.com/assets/colab-badge.svg
   :target: https://colab.research.google.com/github/rbturnbull/terrier/blob/main/terrier_colab.ipynb

.. |doi badge| image:: https://img.shields.io/badge/DOI-10.1093%2Fbib%2Fbbaf442-blue
   :target: https://doi.org/10.1093/bib/bbaf442
    
.. end-badges

.. start-quickstart

Transposable Element Repeat Result classifIER

Terrier is a Neural Network model to classify transposable element sequences.

It is based on ‘corgi’ which was trained to do hierarchical taxonomic classification of DNA sequences.

This model was trained using the Repbase library of repetitive DNA elements and trained to do hierarchical classification according to the RepeatMasker schema.

An online version of Terrier (using CPUs only) is available at `https://portal.cpg.unimelb.edu.au/tools/terrier <https://portal.cpg.unimelb.edu.au/tools/terrier>`_.

Installation
==================================

Install using pip:

.. code-block:: bash

    pip install bio-terrier

.. warning ::

    Do not try just ``pip install terrier`` because that is a different package.

Or install the latest version from GitHub:

.. code-block:: bash

    pip install git+https://github.com/rbturnbull/terrier.git


Google Colab Version
==================================

Follow this link to launch a Google Colab notebook where you can run the model on your own data: |colab badge2|

.. |colab badge2| image:: https://colab.research.google.com/assets/colab-badge.svg
   :target: https://colab.research.google.com/github/rbturnbull/terrier/blob/main/terrier_colab.ipynb

Usage
==================================

To run inference on a FASTA file, run this command:

.. code-block:: bash

    terrier --input INPUT.fa --output-fasta OUTPUT.fa

That will add the classification to after the sequence ID in the `OUTPUT.fa` FASTA file.

If you want to save the probabilities for all classes run this:

.. code-block:: bash

    terrier --input INPUT.fa --output-csv OUTPUT.csv

The columns will be the probability of each classification and the rows correspond to each sequence in ``INPUT.fa``.

You can also use a URL as the input:

.. code-block:: bash

    terrier --input https://example.com/INPUT.fasta.gz --output-fasta OUTPUT.fa

If you want to output a visualization of the prediction probabilities:

.. code-block:: bash

    terrier --input INPUT.fa --image-dir OUTPUT-IMAGES/

The outputs for the above can be combined together. For more options run 

.. code-block:: bash

    terrier --help

To see the options to train the model, run:

.. code-block:: bash

    terrier-tools --help

Programmatic Usage
==================================

You can also use the model programmatically:

.. code-block:: python

    from terrier import Terrier

    terrier = Terrier()
    terrier(file="INPUT.fa", output_fasta="OUTPUT.fa")


Potential Use Case
==================================

A potential workflow is to use `RepeatModeler <https://github.com/Dfam-consortium/RepeatModeler>`_ first to generate a repeat library.
Then you can use Terrier to attempt to classify the remaining unknown repeats. 
If you only want highly confident classifications from Terrier, you can set the threshold to 0.9 or higher.
If you wish to have more coverage, then you can set the threshold lower (or keep it at the default value of 0.7). 
The modified repeat library can then be used with `RepeatMasker <http://www.repeatmasker.org/>`_ to mask the repeats in your genome assembly.

.. end-quickstart


Credits
==================================

.. start-credits

Terrier was developed by:

- `Robert Turnbull <https://robturnbull.com>`_
- `Neil D. Young <https://findanexpert.unimelb.edu.au/profile/249669-neil-young>`_
- `Edoardo Tescari <https://findanexpert.unimelb.edu.au/profile/428364-edoardo-tescari>`_
- `Lee F. Skerratt <https://findanexpert.unimelb.edu.au/profile/451921-lee-skerratt>`_
- `Tiffany A. Kosch <https://findanexpert.unimelb.edu.au/profile/775927-tiffany-kosch>`_

If you use this software, please cite the following preprint:

    Robert Turnbull, Neil D. Young, Edoardo Tescari, Lee F. Skerratt, and Tiffany A. Kosch. (2025). 'Terrier: A Deep Learning Repeat Classifier'. `arXiv:2503.09312 <https://arxiv.org/abs/2503.09312>`_.

`Wytamma Wirth <https://wytamma.com/>`_ set up Terrier as a tool at the `Centre for Pathogen Genomics Portal <https://portal.cpg.unimelb.edu.au/>`_ at the University of Melbourne.

This command will generate a bibliography for the Terrier project.

.. code-block:: bash

    terrier --bibliography

Here it is in BibTeX format:

.. code-block:: bibtex

    @article{terier,
        author = {Turnbull, Robert and Young, Neil D and Tescari, Edoardo and Skerratt, Lee F and Kosch, Tiffany A},
        title = {Terrier: a deep learning repeat classifier},
        journal = {Briefings in Bioinformatics},
        volume = {26},
        number = {4},
        pages = {bbaf442},
        year = {2025},
        month = {08},
        abstract = {Repetitive DNA sequences underpin genome architecture and evolutionary processes, yet they remain challenging to classify accurately. Terrier is a deep learning model designed to overcome these challenges by classifying repetitive DNA sequences using a publicly available, curated repeat sequence library trained under the RepeatMasker schema. Poor representation of taxa within repeat databases often limits the classification accuracy and reproducibility of current repeat annotation methods, limiting our understanding of repeat evolution and function. Terrier overcomes these challenges by leveraging deep learning for improved accuracy. Trained on Repbase, which includes over 100,000 repeat families—four times more than Dfam—Terrier maps 97.1\% of Repbase sequences to RepeatMasker categories, offering the most comprehensive classification system available. When benchmarked against DeepTE, TERL, and TEclass2 in model organisms (rice, fruit flies, humans, and mice), Terrier achieved superior accuracy while classifying a broader range of sequences. Further validation in non-model amphibian, flatworm, and Northern krill genomes highlights its effectiveness in improving classification in non-model species, facilitating research on repeat-driven evolution, genomic instability, and phenotypic variation.},
        issn = {1477-4054},
        doi = {10.1093/bib/bbaf442},
        url = {https://doi.org/10.1093/bib/bbaf442},
        eprint = {https://academic.oup.com/bib/article-pdf/26/4/bbaf442/64143069/bbaf442.pdf},
    }

Run the following command to get the latest BibTeX entry:

.. code-block:: bash

    terrier --bibtex


This will be updated with the final publication details when available.



Created using torchapp (https://github.com/rbturnbull/torchapp).

.. end-credits


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/rbturnbull/terrier/",
    "name": "bio-terrier",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": "torchapp, pytorch, deep learning, command-line interface",
    "author": "Robert Turnbull",
    "author_email": "robert.turnbull@unimelb.edu.au",
    "download_url": "https://files.pythonhosted.org/packages/bb/5b/397315a3a0ea205e0f20d11cfebb36ac97e8019106f5ea4a4e257a37c9b4/bio_terrier-0.3.4.tar.gz",
    "platform": null,
    "description": ".. image:: https://raw.githubusercontent.com/rbturnbull/terrier/main/docs/images/terrier-banner.png\n\n.. start-badges\n\n|pypi badge| |colab badge| |testing badge| |docs badge| |black badge| |torchapp badge| |doi badge|\n\n.. |pypi badge| image:: https://img.shields.io/pypi/v/bio-terrier?color=blue\n   :alt: PyPI - Version\n   :target: https://pypi.org/project/bio-terrier/\n\n.. |testing badge| image:: https://github.com/rbturnbull/terrier/actions/workflows/testing.yml/badge.svg\n    :target: https://github.com/rbturnbull/terrier/actions\n\n.. |docs badge| image:: https://github.com/rbturnbull/terrier/actions/workflows/docs.yml/badge.svg\n    :target: https://rbturnbull.github.io/terrier\n    \n.. |black badge| image:: https://img.shields.io/badge/code%20style-black-000000.svg\n    :target: https://github.com/psf/black\n    \n.. |coverage badge| image:: https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/rbturnbull/5e0c3115955fde132a8b7c131da68b86/raw/coverage-badge.json\n    :target: https://rbturnbull.github.io/terrier/coverage/\n\n.. |torchapp badge| image:: https://img.shields.io/badge/torch-app-B1230A.svg\n    :target: https://rbturnbull.github.io/torchapp/\n\n.. |colab badge| image:: https://colab.research.google.com/assets/colab-badge.svg\n   :target: https://colab.research.google.com/github/rbturnbull/terrier/blob/main/terrier_colab.ipynb\n\n.. |doi badge| image:: https://img.shields.io/badge/DOI-10.1093%2Fbib%2Fbbaf442-blue\n   :target: https://doi.org/10.1093/bib/bbaf442\n    \n.. end-badges\n\n.. start-quickstart\n\nTransposable Element Repeat Result classifIER\n\nTerrier is a Neural Network model to classify transposable element sequences.\n\nIt is based on \u2018corgi\u2019 which was trained to do hierarchical taxonomic classification of DNA sequences.\n\nThis model was trained using the Repbase library of repetitive DNA elements and trained to do hierarchical classification according to the RepeatMasker schema.\n\nAn online version of Terrier (using CPUs only) is available at `https://portal.cpg.unimelb.edu.au/tools/terrier <https://portal.cpg.unimelb.edu.au/tools/terrier>`_.\n\nInstallation\n==================================\n\nInstall using pip:\n\n.. code-block:: bash\n\n    pip install bio-terrier\n\n.. warning ::\n\n    Do not try just ``pip install terrier`` because that is a different package.\n\nOr install the latest version from GitHub:\n\n.. code-block:: bash\n\n    pip install git+https://github.com/rbturnbull/terrier.git\n\n\nGoogle Colab Version\n==================================\n\nFollow this link to launch a Google Colab notebook where you can run the model on your own data: |colab badge2|\n\n.. |colab badge2| image:: https://colab.research.google.com/assets/colab-badge.svg\n   :target: https://colab.research.google.com/github/rbturnbull/terrier/blob/main/terrier_colab.ipynb\n\nUsage\n==================================\n\nTo run inference on a FASTA file, run this command:\n\n.. code-block:: bash\n\n    terrier --input INPUT.fa --output-fasta OUTPUT.fa\n\nThat will add the classification to after the sequence ID in the `OUTPUT.fa` FASTA file.\n\nIf you want to save the probabilities for all classes run this:\n\n.. code-block:: bash\n\n    terrier --input INPUT.fa --output-csv OUTPUT.csv\n\nThe columns will be the probability of each classification and the rows correspond to each sequence in ``INPUT.fa``.\n\nYou can also use a URL as the input:\n\n.. code-block:: bash\n\n    terrier --input https://example.com/INPUT.fasta.gz --output-fasta OUTPUT.fa\n\nIf you want to output a visualization of the prediction probabilities:\n\n.. code-block:: bash\n\n    terrier --input INPUT.fa --image-dir OUTPUT-IMAGES/\n\nThe outputs for the above can be combined together. For more options run \n\n.. code-block:: bash\n\n    terrier --help\n\nTo see the options to train the model, run:\n\n.. code-block:: bash\n\n    terrier-tools --help\n\nProgrammatic Usage\n==================================\n\nYou can also use the model programmatically:\n\n.. code-block:: python\n\n    from terrier import Terrier\n\n    terrier = Terrier()\n    terrier(file=\"INPUT.fa\", output_fasta=\"OUTPUT.fa\")\n\n\nPotential Use Case\n==================================\n\nA potential workflow is to use `RepeatModeler <https://github.com/Dfam-consortium/RepeatModeler>`_ first to generate a repeat library.\nThen you can use Terrier to attempt to classify the remaining unknown repeats. \nIf you only want highly confident classifications from Terrier, you can set the threshold to 0.9 or higher.\nIf you wish to have more coverage, then you can set the threshold lower (or keep it at the default value of 0.7). \nThe modified repeat library can then be used with `RepeatMasker <http://www.repeatmasker.org/>`_ to mask the repeats in your genome assembly.\n\n.. end-quickstart\n\n\nCredits\n==================================\n\n.. start-credits\n\nTerrier was developed by:\n\n- `Robert Turnbull <https://robturnbull.com>`_\n- `Neil D. Young <https://findanexpert.unimelb.edu.au/profile/249669-neil-young>`_\n- `Edoardo Tescari <https://findanexpert.unimelb.edu.au/profile/428364-edoardo-tescari>`_\n- `Lee F. Skerratt <https://findanexpert.unimelb.edu.au/profile/451921-lee-skerratt>`_\n- `Tiffany A. Kosch <https://findanexpert.unimelb.edu.au/profile/775927-tiffany-kosch>`_\n\nIf you use this software, please cite the following preprint:\n\n    Robert Turnbull, Neil D. Young, Edoardo Tescari, Lee F. Skerratt, and Tiffany A. Kosch. (2025). 'Terrier: A Deep Learning Repeat Classifier'. `arXiv:2503.09312 <https://arxiv.org/abs/2503.09312>`_.\n\n`Wytamma Wirth <https://wytamma.com/>`_ set up Terrier as a tool at the `Centre for Pathogen Genomics Portal <https://portal.cpg.unimelb.edu.au/>`_ at the University of Melbourne.\n\nThis command will generate a bibliography for the Terrier project.\n\n.. code-block:: bash\n\n    terrier --bibliography\n\nHere it is in BibTeX format:\n\n.. code-block:: bibtex\n\n    @article{terier,\n        author = {Turnbull, Robert and Young, Neil D and Tescari, Edoardo and Skerratt, Lee F and Kosch, Tiffany A},\n        title = {Terrier: a deep learning repeat classifier},\n        journal = {Briefings in Bioinformatics},\n        volume = {26},\n        number = {4},\n        pages = {bbaf442},\n        year = {2025},\n        month = {08},\n        abstract = {Repetitive DNA sequences underpin genome architecture and evolutionary processes, yet they remain challenging to classify accurately. Terrier is a deep learning model designed to overcome these challenges by classifying repetitive DNA sequences using a publicly available, curated repeat sequence library trained under the RepeatMasker schema. Poor representation of taxa within repeat databases often limits the classification accuracy and reproducibility of current repeat annotation methods, limiting our understanding of repeat evolution and function. Terrier overcomes these challenges by leveraging deep learning for improved accuracy. Trained on Repbase, which includes over 100,000 repeat families\u2014four times more than Dfam\u2014Terrier maps 97.1\\% of Repbase sequences to RepeatMasker categories, offering the most comprehensive classification system available. When benchmarked against DeepTE, TERL, and TEclass2 in model organisms (rice, fruit flies, humans, and mice), Terrier achieved superior accuracy while classifying a broader range of sequences. Further validation in non-model amphibian, flatworm, and Northern krill genomes highlights its effectiveness in improving classification in non-model species, facilitating research on repeat-driven evolution, genomic instability, and phenotypic variation.},\n        issn = {1477-4054},\n        doi = {10.1093/bib/bbaf442},\n        url = {https://doi.org/10.1093/bib/bbaf442},\n        eprint = {https://academic.oup.com/bib/article-pdf/26/4/bbaf442/64143069/bbaf442.pdf},\n    }\n\nRun the following command to get the latest BibTeX entry:\n\n.. code-block:: bash\n\n    terrier --bibtex\n\n\nThis will be updated with the final publication details when available.\n\n\n\nCreated using torchapp (https://github.com/rbturnbull/torchapp).\n\n.. end-credits\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Transposable Element Repeat Result classifIER",
    "version": "0.3.4",
    "project_urls": {
        "Documentation": "https://rbturnbull.github.io/terrier",
        "Homepage": "https://github.com/rbturnbull/terrier/",
        "Repository": "https://github.com/rbturnbull/terrier/"
    },
    "split_keywords": [
        "torchapp",
        " pytorch",
        " deep learning",
        " command-line interface"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b71cbd9120b7ae65d5215d42113b41f4b5f0628375482618b7b35014020c5902",
                "md5": "6ca6fab6026d2ef27b831740d063c156",
                "sha256": "165d9aca4bfd55537073703d6706aae08543fb24d33382a39f42d9cadca931f1"
            },
            "downloads": -1,
            "filename": "bio_terrier-0.3.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6ca6fab6026d2ef27b831740d063c156",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 20274,
            "upload_time": "2025-10-14T04:29:49",
            "upload_time_iso_8601": "2025-10-14T04:29:49.346789Z",
            "url": "https://files.pythonhosted.org/packages/b7/1c/bd9120b7ae65d5215d42113b41f4b5f0628375482618b7b35014020c5902/bio_terrier-0.3.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bb5b397315a3a0ea205e0f20d11cfebb36ac97e8019106f5ea4a4e257a37c9b4",
                "md5": "6cbf1316c7730aa7d9ff4d0141b1cdc8",
                "sha256": "a536d6410696039bd4372c4681e1f85feac1a39268c7f419c9974612c44979b0"
            },
            "downloads": -1,
            "filename": "bio_terrier-0.3.4.tar.gz",
            "has_sig": false,
            "md5_digest": "6cbf1316c7730aa7d9ff4d0141b1cdc8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 19851,
            "upload_time": "2025-10-14T04:29:50",
            "upload_time_iso_8601": "2025-10-14T04:29:50.667101Z",
            "url": "https://files.pythonhosted.org/packages/bb/5b/397315a3a0ea205e0f20d11cfebb36ac97e8019106f5ea4a4e257a37c9b4/bio_terrier-0.3.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-14 04:29:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rbturnbull",
    "github_project": "terrier",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "bio-terrier"
}
        
Elapsed time: 4.58024s