[](https://github.com/cogent3/ensembl_tui/actions/workflows/testing_develop.yml)
[](https://github.com/cogent3/ensembl_tui/actions/workflows/codeql.yml)
[](https://coveralls.io/github/cogent3/ensembl_tui?branch=develop)
[](https://github.com/astral-sh/ruff)
[](https://doi.org/10.5281/zenodo.15098645)
[](https://ensembl-tui.readthedocs.io/en/latest/)
# ensembl-tui
ensembl-tui provides the `eti` terminal application for obtaining a subset of the data provided by Ensembl which can then be queried locally. You can have multiple such subsets on your machine, each corresponding to a different selection of species and data types.
> **Warning**
> We currently **only support accessing data from the main ensembl.org** site. If you discover errors, please post a [bug report](https://github.com/cogent3/ensembl_tui/issues).
## Installing the software
<details>
<summary>General user installation instructions</summary>
```
$ pip install ensembl-tui
```
</details>
<details>
<summary>Developer installation instructions</summary>
Fork the repo and clone your fork to your local machine. In the terminal, create either a python virtual environment or a new conda environment and activate it. In that virtual environment
```
$ pip install flit
```
Then do the flit version of a "developer install". (It is basically creating a symlink to the repos source directory.)
```
$ flit install -s --python `which python`
```
</details>
## Resources required to subset Ensembl data
Ensembl hosts some very large data sets. You need to have a machine with sufficient disk space to store the data you want to download. At present we do not have support for predicting how much storage would be required for a given selection of species and data types. You will need to experiment.
Some commands can be run in parallel but have moderate memory requirements. If you have a machine with limited RAM, you may need to reduce the number of parallel processes. Again, run some experiments.
## Getting setup
<details>
<summary>Specifying what data you want to download and where to put it</summary>
We use a plain text file to indicate the Ensembl domain, release and types of genomic data to download. Start by using the `demo-config` subcommand.
<!-- [[[cog
import cog
from ensembl_tui import cli
from click.testing import CliRunner
runner = CliRunner()
result = runner.invoke(cli.main, ["demo-config", "--help"])
help = result.output.replace("Usage: main", "Usage: eti")
cog.out(
"```\n{}\n```".format(help)
)
]]] -->
```
Usage: eti demo-config [OPTIONS]
exports sample config and species table to the nominated path
Options:
-o, --outpath PATH Path to directory to export all rc contents.
-f, --force_overwrite Overwrite existing data.
--help Show this message and exit.
```
<!-- [[[end]]] -->
```shell
$ eti demo-config -o ensembl_download
```
This command creates a `ensembl_download` download directory and writes two plain text files into it:
1. `species.tsv`: contains the Latin names, common names etc... of the species accessible at ensembl.org website.
2. `sample.cfg`: a sample configuration file that you can edit to specify the data you want to download.
The latter file includes comments on how to edit it in order to specify the genomic resources that you want.
</details>
<details>
<summary>Downloading the data</summary>
Downloads the data indicated in the config file to a local directory.
<!-- [[[cog
import cog
from ensembl_tui import cli
from click.testing import CliRunner
runner = CliRunner()
result = runner.invoke(cli.main, ["download", "--help"])
help = result.output.replace("Usage: main", "Usage: eti")
cog.out(
"```\n{}\n```".format(help)
)
]]] -->
```
Usage: eti download [OPTIONS]
download data from Ensembl's ftp site
Options:
-c, --configpath PATH Path to config file specifying databases, (only species
or compara at present).
-d, --debug Maximum verbosity, and reduces number of downloads,
etc...
-v, --verbose
--help Show this message and exit.
```
<!-- [[[end]]] -->
For a config file named `config.cfg`, the download command would be:
```shell
$ cd to/directory/with/config.cfg
$ eti download -c config.cfg
```
> **Note**
> This is the only step for which the internet is required. Downloads can be interrupted and resumed. The software will delete partially downloaded files.
The download creates a new `.cfg` file inside the download directory. This file is used by the `install` command.
</details>
<details>
<summary>Installing the data</summary>
Converts the downloaded data into data formats designed to enhance querying performance.
<!-- [[[cog
import cog
from ensembl_tui import cli
from click.testing import CliRunner
runner = CliRunner()
result = runner.invoke(cli.main, ["install", "--help"])
help = result.output.replace("Usage: main", "Usage: eti")
cog.out(
"```\n{}\n```".format(help)
)
]]] -->
```
Usage: eti install [OPTIONS]
create the local representations of the data
Options:
-d, --download PATH Path to local download directory containing a cfg
file.
-np, --num_procs INTEGER Number of procs to use. [default: 1]
-f, --force_overwrite Overwrite existing data.
-v, --verbose
--help Show this message and exit.
```
<!-- [[[end]]] -->
This step can be run in parallel, but the memory requirements will scale with the number of genomes. So we suggest monitoring performance on your system by trying it out on a small number of CPUs to start with. The following command uses 2 CPUs and has been safe on systems with only 16GB of RAM for 10 primate genomes, including homology data and whole genome alignments.
```shell
$ cd to/directory/with/downloaded_data
$ eti install -d downloaded_data -np 2
```
</details>
<details>
<summary>Checking what has been installed</summary>
This will give a summary of what data has been installed at a provided path.
<!-- [[[cog
import cog
from ensembl_tui import cli
from click.testing import CliRunner
runner = CliRunner()
result = runner.invoke(cli.main, ["installed", "--help"])
help = result.output.replace("Usage: main", "Usage: eti")
cog.out(
"```\n{}\n```".format(help)
)
]]] -->
```
Usage: eti installed [OPTIONS]
show what is installed
Options:
-i, --installed TEXT Path to root directory of an installation. [required]
--help Show this message and exit.
```
<!-- [[[end]]] -->
</details>
## Interrogating the data
We provide a conventional command line interface for querying the data with subcommands.
<details>
<summary>The full list of subcommands</summary>
You can get help on individual subcommands by running `eti <subcommand>` in the terminal.
<!-- [[[cog
import cog
from ensembl_tui import cli
from click.testing import CliRunner
runner = CliRunner()
result = runner.invoke(cli.main)
help = result.output.replace("Usage: main", "Usage: eti")
cog.out(
"```\n{}\n```".format(help)
)
]]] -->
```
Usage: eti [OPTIONS] COMMAND [ARGS]...
Tools for obtaining and interrogating subsets of https://ensembl.org genomic
data.
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
tui Open Textual TUI.
demo-config exports sample config and species table to the nominated...
download download data from Ensembl's ftp site
install create the local representations of the data
installed show what is installed
species-summary genome summary data for a species
dump-genes export meta-data table for genes from one species to...
compara-summary summary data for compara
homologs exports CDS sequence data in fasta format for homology...
alignments export multiple alignments in fasta format for named genes
```
<!-- [[[end]]] -->
</details>
We also provide an experiment terminal user interface (TUI) that allows you to explore the data in a more interactive way. This is invoked with the `tui` subcommand.
### Getting a summary of a genome
A command like the following
```
eti species-summary -i primates10_113/install --species human
```
displays two tables for the indicated genome. The first is the biotypes and their counts, the second the repeat classes / types and their counts.
### Getting a summary of a homology data
A command like the following
```
eti compara-summary -i primates10_113/install
```
displays the homology types and counts. The values under `homology_type` can be used as input arguments to the `homologs` command `--homology_type` argument.
### Exporting related sequences
A command like the following
```
eti homologs -i primates10_113/install/ --outdir sampled_100 --ref human --coord_names 1 --limit 100
```
will sample 100 one-to-one orthologs (the default homology type) to human chromosome 1 linked protein coding genes (the only biotype supported at present). The canonical CDS sequences will be written in fasta format to the directory `sampled_100`.
### Exporting whole genome alignments
A command like the following
```
eti alignments -i primates10_113/install --outdir sampled_aligns_100 --align_name '*primate*' --coord_names 1 --ref human --limit 10
```
samples 10 alignments that include human chromosome 1 protein coding genes. These are from the Ensembl whole genome alignment whose name matches the glob pattern `*primate*`.
> **Warning**
>
> If this pattern matches more than one installed Ensembl alignment, the program will exit.
Raw data
{
"_id": null,
"home_page": null,
"name": "ensembl-tui",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.14,>=3.10",
"maintainer_email": "Gavin Huttley <Gavin.Huttley@anu.edu.au>, Arne Becker <arne@ebi.ac.uk>, Stefano Giorgetti <sgiorgetti@ebi.ac.uk>",
"keywords": "biology, genomics, evolution, bioinformatics",
"author": null,
"author_email": "Gavin Huttley <Gavin.Huttley@anu.edu.au>, Arne Becker <arne@ebi.ac.uk>, Stefano Giorgetti <sgiorgetti@ebi.ac.uk>",
"download_url": "https://files.pythonhosted.org/packages/65/77/a50ba6a4465c4e252d2464adce5cf63fbc8631586a7b29cd9f2c368c9553/ensembl_tui-0.4.3.tar.gz",
"platform": null,
"description": "[](https://github.com/cogent3/ensembl_tui/actions/workflows/testing_develop.yml)\n[](https://github.com/cogent3/ensembl_tui/actions/workflows/codeql.yml)\n[](https://coveralls.io/github/cogent3/ensembl_tui?branch=develop)\n[](https://github.com/astral-sh/ruff)\n[](https://doi.org/10.5281/zenodo.15098645)\n[](https://ensembl-tui.readthedocs.io/en/latest/)\n\n# ensembl-tui\n\nensembl-tui provides the `eti` terminal application for obtaining a subset of the data provided by Ensembl which can then be queried locally. You can have multiple such subsets on your machine, each corresponding to a different selection of species and data types.\n\n> **Warning**\n> We currently **only support accessing data from the main ensembl.org** site. If you discover errors, please post a [bug report](https://github.com/cogent3/ensembl_tui/issues).\n\n## Installing the software\n\n<details>\n <summary>General user installation instructions</summary>\n\n ```\n $ pip install ensembl-tui\n ```\n\n</details>\n\n<details>\n <summary>Developer installation instructions</summary>\n Fork the repo and clone your fork to your local machine. In the terminal, create either a python virtual environment or a new conda environment and activate it. In that virtual environment\n\n ```\n $ pip install flit\n ```\n\n Then do the flit version of a \"developer install\". (It is basically creating a symlink to the repos source directory.)\n\n ```\n $ flit install -s --python `which python`\n ```\n</details>\n\n## Resources required to subset Ensembl data\n\nEnsembl hosts some very large data sets. You need to have a machine with sufficient disk space to store the data you want to download. At present we do not have support for predicting how much storage would be required for a given selection of species and data types. You will need to experiment.\n\nSome commands can be run in parallel but have moderate memory requirements. If you have a machine with limited RAM, you may need to reduce the number of parallel processes. Again, run some experiments.\n\n## Getting setup\n\n<details>\n <summary>Specifying what data you want to download and where to put it</summary>\n\n We use a plain text file to indicate the Ensembl domain, release and types of genomic data to download. Start by using the `demo-config` subcommand.\n\n <!-- [[[cog\n import cog\n from ensembl_tui import cli\n from click.testing import CliRunner\n runner = CliRunner()\n result = runner.invoke(cli.main, [\"demo-config\", \"--help\"])\n help = result.output.replace(\"Usage: main\", \"Usage: eti\")\n cog.out(\n \"```\\n{}\\n```\".format(help)\n )\n ]]] -->\n ```\n Usage: eti demo-config [OPTIONS]\n\n exports sample config and species table to the nominated path\n\n Options:\n -o, --outpath PATH Path to directory to export all rc contents.\n -f, --force_overwrite Overwrite existing data.\n --help Show this message and exit.\n\n ```\n <!-- [[[end]]] -->\n\n ```shell\n $ eti demo-config -o ensembl_download\n ```\n This command creates a `ensembl_download` download directory and writes two plain text files into it:\n\n 1. `species.tsv`: contains the Latin names, common names etc... of the species accessible at ensembl.org website.\n 2. `sample.cfg`: a sample configuration file that you can edit to specify the data you want to download.\n\n The latter file includes comments on how to edit it in order to specify the genomic resources that you want.\n\n</details>\n\n<details>\n <summary>Downloading the data</summary>\n\n Downloads the data indicated in the config file to a local directory.\n\n <!-- [[[cog\n import cog\n from ensembl_tui import cli\n from click.testing import CliRunner\n runner = CliRunner()\n result = runner.invoke(cli.main, [\"download\", \"--help\"])\n help = result.output.replace(\"Usage: main\", \"Usage: eti\")\n cog.out(\n \"```\\n{}\\n```\".format(help)\n )\n ]]] -->\n ```\n Usage: eti download [OPTIONS]\n\n download data from Ensembl's ftp site\n\n Options:\n -c, --configpath PATH Path to config file specifying databases, (only species\n or compara at present).\n -d, --debug Maximum verbosity, and reduces number of downloads,\n etc...\n -v, --verbose\n --help Show this message and exit.\n\n ```\n <!-- [[[end]]] -->\n\n For a config file named `config.cfg`, the download command would be:\n\n ```shell\n $ cd to/directory/with/config.cfg\n $ eti download -c config.cfg\n ```\n\n > **Note**\n > This is the only step for which the internet is required. Downloads can be interrupted and resumed. The software will delete partially downloaded files.\n\nThe download creates a new `.cfg` file inside the download directory. This file is used by the `install` command.\n\n</details>\n\n<details>\n <summary>Installing the data</summary>\n\nConverts the downloaded data into data formats designed to enhance querying performance.\n\n <!-- [[[cog\n import cog\n from ensembl_tui import cli\n from click.testing import CliRunner\n runner = CliRunner()\n result = runner.invoke(cli.main, [\"install\", \"--help\"])\n help = result.output.replace(\"Usage: main\", \"Usage: eti\")\n cog.out(\n \"```\\n{}\\n```\".format(help)\n )\n ]]] -->\n ```\n Usage: eti install [OPTIONS]\n\n create the local representations of the data\n\n Options:\n -d, --download PATH Path to local download directory containing a cfg\n file.\n -np, --num_procs INTEGER Number of procs to use. [default: 1]\n -f, --force_overwrite Overwrite existing data.\n -v, --verbose\n --help Show this message and exit.\n\n ```\n <!-- [[[end]]] -->\n\nThis step can be run in parallel, but the memory requirements will scale with the number of genomes. So we suggest monitoring performance on your system by trying it out on a small number of CPUs to start with. The following command uses 2 CPUs and has been safe on systems with only 16GB of RAM for 10 primate genomes, including homology data and whole genome alignments.\n\n```shell\n$ cd to/directory/with/downloaded_data\n$ eti install -d downloaded_data -np 2\n```\n\n</details>\n\n<details>\n <summary>Checking what has been installed</summary>\nThis will give a summary of what data has been installed at a provided path.\n\n\n <!-- [[[cog\n import cog\n from ensembl_tui import cli\n from click.testing import CliRunner\n runner = CliRunner()\n result = runner.invoke(cli.main, [\"installed\", \"--help\"])\n help = result.output.replace(\"Usage: main\", \"Usage: eti\")\n cog.out(\n \"```\\n{}\\n```\".format(help)\n )\n ]]] -->\n ```\n Usage: eti installed [OPTIONS]\n\n show what is installed\n\n Options:\n -i, --installed TEXT Path to root directory of an installation. [required]\n --help Show this message and exit.\n\n ```\n <!-- [[[end]]] -->\n\n</details>\n\n## Interrogating the data\n\nWe provide a conventional command line interface for querying the data with subcommands.\n\n<details>\n <summary>The full list of subcommands</summary>\n\n You can get help on individual subcommands by running `eti <subcommand>` in the terminal.\n\n <!-- [[[cog\n import cog\n from ensembl_tui import cli\n from click.testing import CliRunner\n runner = CliRunner()\n result = runner.invoke(cli.main)\n help = result.output.replace(\"Usage: main\", \"Usage: eti\")\n cog.out(\n \"```\\n{}\\n```\".format(help)\n )\n ]]] -->\n ```\n Usage: eti [OPTIONS] COMMAND [ARGS]...\n\n Tools for obtaining and interrogating subsets of https://ensembl.org genomic\n data.\n\n Options:\n --version Show the version and exit.\n --help Show this message and exit.\n\n Commands:\n tui Open Textual TUI.\n demo-config exports sample config and species table to the nominated...\n download download data from Ensembl's ftp site\n install create the local representations of the data\n installed show what is installed\n species-summary genome summary data for a species\n dump-genes export meta-data table for genes from one species to...\n compara-summary summary data for compara\n homologs exports CDS sequence data in fasta format for homology...\n alignments export multiple alignments in fasta format for named genes\n\n ```\n <!-- [[[end]]] -->\n\n</details>\n\nWe also provide an experiment terminal user interface (TUI) that allows you to explore the data in a more interactive way. This is invoked with the `tui` subcommand.\n\n### Getting a summary of a genome\n\nA command like the following\n```\neti species-summary -i primates10_113/install --species human\n```\ndisplays two tables for the indicated genome. The first is the biotypes and their counts, the second the repeat classes / types and their counts.\n\n### Getting a summary of a homology data\n\nA command like the following\n```\neti compara-summary -i primates10_113/install\n```\ndisplays the homology types and counts. The values under `homology_type` can be used as input arguments to the `homologs` command `--homology_type` argument.\n\n### Exporting related sequences\n\nA command like the following\n```\neti homologs -i primates10_113/install/ --outdir sampled_100 --ref human --coord_names 1 --limit 100\n```\nwill sample 100 one-to-one orthologs (the default homology type) to human chromosome 1 linked protein coding genes (the only biotype supported at present). The canonical CDS sequences will be written in fasta format to the directory `sampled_100`.\n\n### Exporting whole genome alignments\n\nA command like the following\n```\neti alignments -i primates10_113/install --outdir sampled_aligns_100 --align_name '*primate*' --coord_names 1 --ref human --limit 10\n```\nsamples 10 alignments that include human chromosome 1 protein coding genes. These are from the Ensembl whole genome alignment whose name matches the glob pattern `*primate*`.\n\n> **Warning**\n>\n> If this pattern matches more than one installed Ensembl alignment, the program will exit.\n",
"bugtrack_url": null,
"license": null,
"summary": "Ensembl terminal user interface tools",
"version": "0.4.3",
"project_urls": {
"Bug Tracker": "https://github.com/cogent3/ensembl_tui/issues",
"Documentation": "https://ensembl-tui.readthedocs.io/",
"Source Code": "https://github.com/cogent3/ensembl_tui"
},
"split_keywords": [
"biology",
" genomics",
" evolution",
" bioinformatics"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "2ba4a40d557df1072d218b63e6295cc0617b4d5792f2f410932cae77251d5046",
"md5": "dee2c92e367af8a8dcae0d6efe42e7f0",
"sha256": "9cb08e0561565202705adc581ec2d0d42260bb17b07488374452f7c2dea86a08"
},
"downloads": -1,
"filename": "ensembl_tui-0.4.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dee2c92e367af8a8dcae0d6efe42e7f0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.14,>=3.10",
"size": 75768,
"upload_time": "2025-08-12T05:30:47",
"upload_time_iso_8601": "2025-08-12T05:30:47.130303Z",
"url": "https://files.pythonhosted.org/packages/2b/a4/a40d557df1072d218b63e6295cc0617b4d5792f2f410932cae77251d5046/ensembl_tui-0.4.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6577a50ba6a4465c4e252d2464adce5cf63fbc8631586a7b29cd9f2c368c9553",
"md5": "54f93b62378195d46e915454f2e5f92d",
"sha256": "d24225a996bda48950957f28439f1bbf7f2636948558fd07bf0eb1953700f673"
},
"downloads": -1,
"filename": "ensembl_tui-0.4.3.tar.gz",
"has_sig": false,
"md5_digest": "54f93b62378195d46e915454f2e5f92d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.14,>=3.10",
"size": 111979,
"upload_time": "2025-08-12T05:30:48",
"upload_time_iso_8601": "2025-08-12T05:30:48.888061Z",
"url": "https://files.pythonhosted.org/packages/65/77/a50ba6a4465c4e252d2464adce5cf63fbc8631586a7b29cd9f2c368c9553/ensembl_tui-0.4.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-12 05:30:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cogent3",
"github_project": "ensembl_tui",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "ensembl-tui"
}