ensembl_tui


Nameensembl_tui JSON
Version 0.1a3 PyPI version JSON
download
home_pageNone
SummaryEnsembl cli tools
upload_time2024-09-02 22:20:43
maintainerNone
docs_urlNone
authorNone
requires_python<3.13,>=3.10
licenseNone
keywords biology genomics evolution bioinformatics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![CI](https://github.com/cogent3/EnsemblLite/actions/workflows/testing_develop.yml/badge.svg)](https://github.com/cogent3/EnsemblLite/actions/workflows/testing_develop.yml)
[![CodeQL](https://github.com/cogent3/EnsemblLite/actions/workflows/codeql.yml/badge.svg)](https://github.com/cogent3/EnsemblLite/actions/workflows/codeql.yml)
[![Coverage Status](https://coveralls.io/repos/github/cogent3/EnsemblLite/badge.svg?branch=develop)](https://coveralls.io/github/cogent3/EnsemblLite?branch=develop)

# ensembl-tui

ensembl-tui provides the `eti` command line application for obtaining a subset of the data provided by Ensembl which can then be queried locally. You can have multiple such subsets on your machine, each corresponding to a different selection of species and data types.

> **Warning**
> ensembl-tui is in a preliminary phase of development with a limited feature set and incomplete test coverage! Please validate results against the web version. If you discover errors, please post a [bug report](https://github.com/cogent3/EnsemblLite/issues).

## Installing the software

<details>
  <summary>Developer installation instructions</summary>
  Fork the repo and clone your fork to your local machine. In the terminal, create either a python virtual environment or a new conda environment and activate it. In that virtual environment

  ```
  $ pip install flit
  ```

  Then do the flit version of a "developer install". (It is basically creating a symlink to the repos source directory.)

  ```
  $ flit install -s --python `which python`
  ```
</details>

<details>
  <summary>General user installation instructions</summary>

  We have not yet released on pypi. We will provide instructions here for a Docker based installation soon!
</details>

## Resources required to subset Ensembl data

Ensembl hosts some very large data sets. You need to have a machine with sufficient disk space to store the data you want to download. At present we do not have support for predicting how much storage would be required for a given selection of species and data types. We advise you to experiment.

Some commands can be run in parallel but have moderate memory requirements. If you have a machine with limited RAM, you may need to reduce the number of parallel processes. Again, we advise you to experiment.

## Getting setup

<details>
  <summary>Specifying what data you want to download and where to put it</summary>

  We use a plain text file to indicate the Ensembl domain, release and types of genomic data to download. Start by using the `exportrc` subcommand.

  <!-- [[[cog
  import cog
  from ensembl_tui import cli
  from click.testing import CliRunner
  runner = CliRunner()
  result = runner.invoke(cli.main, ["exportrc", "--help"])
  help = result.output.replace("Usage: main", "Usage: eti")
  cog.out(
      "```\n{}\n```".format(help)
  )
  ]]] -->
  ```
  Usage: eti exportrc [OPTIONS]

    exports sample config and species table to the nominated path

  Options:
    -o, --outpath PATH  Path to directory to export all rc contents.
    --help              Show this message and exit.

  ```
  <!-- [[[end]]] -->

  ```shell
  $ eti exportrc -o ~/Desktop/Outbox/ensembl_download
  ```
  This command creates a `ensembl_download` download directory and writes two plain text files into it:

  1. `species.tsv`: contains the Latin names, common names etc... of the species accessible at ensembl.org website.
  2. `sample.cfg`: a sample configuration file that you can edit to specify the data you want to download.

  The latter file includes comments on how to edit it in order to specify the genomic resources that you want.
</details>

<details>
  <summary>Downloading the data</summary>
  Downloads the data indicated in the config file to a local directory.

  <!-- [[[cog
  import cog
  from ensembl_tui import cli
  from click.testing import CliRunner
  runner = CliRunner()
  result = runner.invoke(cli.main, ["download", "--help"])
  help = result.output.replace("Usage: main", "Usage: eti")
  cog.out(
      "```\n{}\n```".format(help)
  )
  ]]] -->
  ```
  Usage: eti download [OPTIONS]

    download data from Ensembl's ftp site

  Options:
    -c, --configpath PATH  Path to config file specifying databases, (only species
                           or compara at present).
    -d, --debug            Maximum verbosity, and reduces number of downloads,
                           etc...
    -v, --verbose
    --help                 Show this message and exit.

  ```
  <!-- [[[end]]] -->

  For a config file named `config.cfg`, the download command would be:

  ```shell
  $ cd to/directory/with/config.cfg
  $ eti download -c config.cfg
  ```

  > **Note**
  > Downloads can be interrupted and resumed. The software deletes partially downloaded files.

The download creates a new `.cfg` file inside the download directory. This file is used by the `install` command.

</details>

<details>
  <summary>Installing the data</summary>
  
  <!-- [[[cog
  import cog
  from ensembl_tui import cli
  from click.testing import CliRunner
  runner = CliRunner()
  result = runner.invoke(cli.main, ["install", "--help"])
  help = result.output.replace("Usage: main", "Usage: eti")
  cog.out(
      "```\n{}\n```".format(help)
  )
  ]]] -->
  ```
  Usage: eti install [OPTIONS]

    create the local representations of the data

  Options:
    -d, --download PATH       Path to local download directory containing a cfg
                              file.
    -np, --num_procs INTEGER  Number of procs to use.  [default: 1]
    -f, --force_overwrite     Overwrite existing data.
    -v, --verbose
    --help                    Show this message and exit.

  ```
  <!-- [[[end]]] -->

The following command uses 2 CPUs and has been safe on systems with only 16GB of RAM for 10 primate genomes, including homology data and whole genome:

```shell
$ cd to/directory/with/downloaded_data
$ eti install -d downloaded_data -np 2
```

</details>

<details>
  <summary>Checking what has been installed</summary>
  
  <!-- [[[cog
  import cog
  from ensembl_tui import cli
  from click.testing import CliRunner
  runner = CliRunner()
  result = runner.invoke(cli.main, ["installed", "--help"])
  help = result.output.replace("Usage: main", "Usage: eti")
  cog.out(
      "```\n{}\n```".format(help)
  )
  ]]] -->
  ```
  Usage: eti installed [OPTIONS]

    show what is installed

  Options:
    -i, --installed TEXT  Path to root directory of an installation.  [required]
    --help                Show this message and exit.

  ```
  <!-- [[[end]]] -->

</details>

## Interrogating the data

We provide a conventional command line interface for querying the data with subcommands.

<details>
  <summary>The full list of subcommands</summary>

  You can get help on individual subcommands by running `eti <subcommand>` in the terminal.

  <!-- [[[cog
  import cog
  from ensembl_tui import cli
  from click.testing import CliRunner
  runner = CliRunner()
  result = runner.invoke(cli.main)
  help = result.output.replace("Usage: main", "Usage: eti")
  cog.out(
      "```\n{}\n```".format(help)
  )
  ]]] -->
  ```
  Usage: eti [OPTIONS] COMMAND [ARGS]...

    Tools for obtaining and interrogating subsets of https://ensembl.org genomic
    data.

  Options:
    --version  Show the version and exit.
    --help     Show this message and exit.

  Commands:
    alignments       export multiple alignments in fasta format for named genes
    download         download data from Ensembl's ftp site
    dump-genes       export meta-data table for genes from one species to...
    exportrc         exports sample config and species table to the nominated...
    homologs         exports CDS sequence data in fasta format for homology...
    install          create the local representations of the data
    installed        show what is installed
    species-summary  genome summary data for a species
    tui              Open Textual TUI.

  ```
  <!-- [[[end]]] -->

</details>

We also provide an experiment terminal user interface (TUI) that allows you to explore the data in a more interactive way. This is invoked with the `tui` subcommand.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ensembl_tui",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": "biology, genomics, evolution, bioinformatics",
    "author": null,
    "author_email": "Gavin Huttley <Gavin.Huttley@anu.edu.au>",
    "download_url": "https://files.pythonhosted.org/packages/29/0a/8ce79914c0d128b907e23c7b21f6c0b2073641a3852fbf725509bf668d4a/ensembl_tui-0.1a3.tar.gz",
    "platform": null,
    "description": "[![CI](https://github.com/cogent3/EnsemblLite/actions/workflows/testing_develop.yml/badge.svg)](https://github.com/cogent3/EnsemblLite/actions/workflows/testing_develop.yml)\n[![CodeQL](https://github.com/cogent3/EnsemblLite/actions/workflows/codeql.yml/badge.svg)](https://github.com/cogent3/EnsemblLite/actions/workflows/codeql.yml)\n[![Coverage Status](https://coveralls.io/repos/github/cogent3/EnsemblLite/badge.svg?branch=develop)](https://coveralls.io/github/cogent3/EnsemblLite?branch=develop)\n\n# ensembl-tui\n\nensembl-tui provides the `eti` command line application for obtaining a subset of the data provided by Ensembl which can then be queried locally. You can have multiple such subsets on your machine, each corresponding to a different selection of species and data types.\n\n> **Warning**\n> ensembl-tui is in a preliminary phase of development with a limited feature set and incomplete test coverage! Please validate results against the web version. If you discover errors, please post a [bug report](https://github.com/cogent3/EnsemblLite/issues).\n\n## Installing the software\n\n<details>\n  <summary>Developer installation instructions</summary>\n  Fork the repo and clone your fork to your local machine. In the terminal, create either a python virtual environment or a new conda environment and activate it. In that virtual environment\n\n  ```\n  $ pip install flit\n  ```\n\n  Then do the flit version of a \"developer install\". (It is basically creating a symlink to the repos source directory.)\n\n  ```\n  $ flit install -s --python `which python`\n  ```\n</details>\n\n<details>\n  <summary>General user installation instructions</summary>\n\n  We have not yet released on pypi. We will provide instructions here for a Docker based installation soon!\n</details>\n\n## Resources required to subset Ensembl data\n\nEnsembl hosts some very large data sets. You need to have a machine with sufficient disk space to store the data you want to download. At present we do not have support for predicting how much storage would be required for a given selection of species and data types. We advise you to experiment.\n\nSome commands can be run in parallel but have moderate memory requirements. If you have a machine with limited RAM, you may need to reduce the number of parallel processes. Again, we advise you to experiment.\n\n## Getting setup\n\n<details>\n  <summary>Specifying what data you want to download and where to put it</summary>\n\n  We use a plain text file to indicate the Ensembl domain, release and types of genomic data to download. Start by using the `exportrc` subcommand.\n\n  <!-- [[[cog\n  import cog\n  from ensembl_tui import cli\n  from click.testing import CliRunner\n  runner = CliRunner()\n  result = runner.invoke(cli.main, [\"exportrc\", \"--help\"])\n  help = result.output.replace(\"Usage: main\", \"Usage: eti\")\n  cog.out(\n      \"```\\n{}\\n```\".format(help)\n  )\n  ]]] -->\n  ```\n  Usage: eti exportrc [OPTIONS]\n\n    exports sample config and species table to the nominated path\n\n  Options:\n    -o, --outpath PATH  Path to directory to export all rc contents.\n    --help              Show this message and exit.\n\n  ```\n  <!-- [[[end]]] -->\n\n  ```shell\n  $ eti exportrc -o ~/Desktop/Outbox/ensembl_download\n  ```\n  This command creates a `ensembl_download` download directory and writes two plain text files into it:\n\n  1. `species.tsv`: contains the Latin names, common names etc... of the species accessible at ensembl.org website.\n  2. `sample.cfg`: a sample configuration file that you can edit to specify the data you want to download.\n\n  The latter file includes comments on how to edit it in order to specify the genomic resources that you want.\n</details>\n\n<details>\n  <summary>Downloading the data</summary>\n  Downloads the data indicated in the config file to a local directory.\n\n  <!-- [[[cog\n  import cog\n  from ensembl_tui import cli\n  from click.testing import CliRunner\n  runner = CliRunner()\n  result = runner.invoke(cli.main, [\"download\", \"--help\"])\n  help = result.output.replace(\"Usage: main\", \"Usage: eti\")\n  cog.out(\n      \"```\\n{}\\n```\".format(help)\n  )\n  ]]] -->\n  ```\n  Usage: eti download [OPTIONS]\n\n    download data from Ensembl's ftp site\n\n  Options:\n    -c, --configpath PATH  Path to config file specifying databases, (only species\n                           or compara at present).\n    -d, --debug            Maximum verbosity, and reduces number of downloads,\n                           etc...\n    -v, --verbose\n    --help                 Show this message and exit.\n\n  ```\n  <!-- [[[end]]] -->\n\n  For a config file named `config.cfg`, the download command would be:\n\n  ```shell\n  $ cd to/directory/with/config.cfg\n  $ eti download -c config.cfg\n  ```\n\n  > **Note**\n  > Downloads can be interrupted and resumed. The software deletes partially downloaded files.\n\nThe download creates a new `.cfg` file inside the download directory. This file is used by the `install` command.\n\n</details>\n\n<details>\n  <summary>Installing the data</summary>\n  \n  <!-- [[[cog\n  import cog\n  from ensembl_tui import cli\n  from click.testing import CliRunner\n  runner = CliRunner()\n  result = runner.invoke(cli.main, [\"install\", \"--help\"])\n  help = result.output.replace(\"Usage: main\", \"Usage: eti\")\n  cog.out(\n      \"```\\n{}\\n```\".format(help)\n  )\n  ]]] -->\n  ```\n  Usage: eti install [OPTIONS]\n\n    create the local representations of the data\n\n  Options:\n    -d, --download PATH       Path to local download directory containing a cfg\n                              file.\n    -np, --num_procs INTEGER  Number of procs to use.  [default: 1]\n    -f, --force_overwrite     Overwrite existing data.\n    -v, --verbose\n    --help                    Show this message and exit.\n\n  ```\n  <!-- [[[end]]] -->\n\nThe following command uses 2 CPUs and has been safe on systems with only 16GB of RAM for 10 primate genomes, including homology data and whole genome:\n\n```shell\n$ cd to/directory/with/downloaded_data\n$ eti install -d downloaded_data -np 2\n```\n\n</details>\n\n<details>\n  <summary>Checking what has been installed</summary>\n  \n  <!-- [[[cog\n  import cog\n  from ensembl_tui import cli\n  from click.testing import CliRunner\n  runner = CliRunner()\n  result = runner.invoke(cli.main, [\"installed\", \"--help\"])\n  help = result.output.replace(\"Usage: main\", \"Usage: eti\")\n  cog.out(\n      \"```\\n{}\\n```\".format(help)\n  )\n  ]]] -->\n  ```\n  Usage: eti installed [OPTIONS]\n\n    show what is installed\n\n  Options:\n    -i, --installed TEXT  Path to root directory of an installation.  [required]\n    --help                Show this message and exit.\n\n  ```\n  <!-- [[[end]]] -->\n\n</details>\n\n## Interrogating the data\n\nWe provide a conventional command line interface for querying the data with subcommands.\n\n<details>\n  <summary>The full list of subcommands</summary>\n\n  You can get help on individual subcommands by running `eti <subcommand>` in the terminal.\n\n  <!-- [[[cog\n  import cog\n  from ensembl_tui import cli\n  from click.testing import CliRunner\n  runner = CliRunner()\n  result = runner.invoke(cli.main)\n  help = result.output.replace(\"Usage: main\", \"Usage: eti\")\n  cog.out(\n      \"```\\n{}\\n```\".format(help)\n  )\n  ]]] -->\n  ```\n  Usage: eti [OPTIONS] COMMAND [ARGS]...\n\n    Tools for obtaining and interrogating subsets of https://ensembl.org genomic\n    data.\n\n  Options:\n    --version  Show the version and exit.\n    --help     Show this message and exit.\n\n  Commands:\n    alignments       export multiple alignments in fasta format for named genes\n    download         download data from Ensembl's ftp site\n    dump-genes       export meta-data table for genes from one species to...\n    exportrc         exports sample config and species table to the nominated...\n    homologs         exports CDS sequence data in fasta format for homology...\n    install          create the local representations of the data\n    installed        show what is installed\n    species-summary  genome summary data for a species\n    tui              Open Textual TUI.\n\n  ```\n  <!-- [[[end]]] -->\n\n</details>\n\nWe also provide an experiment terminal user interface (TUI) that allows you to explore the data in a more interactive way. This is invoked with the `tui` subcommand.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Ensembl cli tools",
    "version": "0.1a3",
    "project_urls": {
        "Bug Tracker": "https://github.com/cogent3/EnsemblLite/issues",
        "Documentation": "https://github.com/cogent3/EnsemblLite",
        "Source Code": "https://github.com/cogent3/EnsemblLite"
    },
    "split_keywords": [
        "biology",
        " genomics",
        " evolution",
        " bioinformatics"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3a11f07a25b154bdfed1c24c7fb8ab975c0a80bc0e529edbd730d43c5a54f9ff",
                "md5": "84f4aa709f14af5071a0598650ac73ae",
                "sha256": "c9279fea16bfae425ac43b84830bd49cfe2323fb1a59cad46c99d34acd47032b"
            },
            "downloads": -1,
            "filename": "ensembl_tui-0.1a3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "84f4aa709f14af5071a0598650ac73ae",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 65071,
            "upload_time": "2024-09-02T22:20:38",
            "upload_time_iso_8601": "2024-09-02T22:20:38.221300Z",
            "url": "https://files.pythonhosted.org/packages/3a/11/f07a25b154bdfed1c24c7fb8ab975c0a80bc0e529edbd730d43c5a54f9ff/ensembl_tui-0.1a3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "290a8ce79914c0d128b907e23c7b21f6c0b2073641a3852fbf725509bf668d4a",
                "md5": "8da52c34f155daa0768ad54c04548f84",
                "sha256": "6f8a84af1401412d8a07eda8ceb1717597b34c5f32009df96e4b2e654b8f06ae"
            },
            "downloads": -1,
            "filename": "ensembl_tui-0.1a3.tar.gz",
            "has_sig": false,
            "md5_digest": "8da52c34f155daa0768ad54c04548f84",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 434559,
            "upload_time": "2024-09-02T22:20:43",
            "upload_time_iso_8601": "2024-09-02T22:20:43.468896Z",
            "url": "https://files.pythonhosted.org/packages/29/0a/8ce79914c0d128b907e23c7b21f6c0b2073641a3852fbf725509bf668d4a/ensembl_tui-0.1a3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-02 22:20:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cogent3",
    "github_project": "EnsemblLite",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "ensembl_tui"
}
        
Elapsed time: 9.72349s