Name | ibdpainting JSON |
Version |
0.2.0
JSON |
| download |
home_page | https://github.com/ellisztamas/ibdpainting |
Summary | Identify parents of a crossed individual by comparing identity in windows across their genomes |
upload_time | 2024-09-09 11:48:35 |
maintainer | None |
docs_url | None |
author | Tom Ellis |
requires_python | None |
license | MIT |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# ibdpainting
`ibdpainting` is a Python tool to visually validate the identity of crossed individuals
from genetic data.
## Contents
- [Premise](#premise)
- [Installation](#installation)
- [Input data files](#input-data-files)
- [Usage](#usage)
- [Author information](#author-information)
- [Contributing](#contributing)
## Premise
`ibdpainting` addresses the situation where you have multiple individuals
derived from a crosses between individuals in a reference panel, and you want to
verify that the crosses really are the genotype you think they are. Taking the
simple example of a biparental cross, you would expect an offspring of the F2
generation or later to be a mosaic of regions homozygous for either parent,
potentially interspersed with heterozygous regions, depending on the generation.
`ibdpainting` is a tool to visualise this mosaic pattern.
## Installation
Install with `pip`:
```
pip install ibdpainting
```
## Input data files
The program requires two HDF5 files created from VCF files:
* **Input panel**: An HDF5 file containing SNPs for the crossed individual(s).
This can contain multiple individuals, but the program will only work on one at
a time.
* **Reference panel**: An HDF5 file conataining SNP information for a panel of reference candidate
parents.
The reason for using HDF5 is that it allows for loading data in chunks,
which is much quicker than loading an entire VCF file into memory every time you
want to check a single sample. I recommend creating this using
[vcf_to_hdf5](https://scikit-allel.readthedocs.io/en/latest/io.html#allel.vcf_to_hdf5)
from `scikit-allel`. For example:
```
import allel
allel.vcf_to_hdf5('example.vcf', 'example.h5', fields='*', overwrite=True)
```
Tips for preparing the data:
* `ibdpainting` will only compare SNPs that intersect the input and reference files.
One one hand, this means that it does not matter if the offspring and reference
files contain SNPs that do not match exactly.
On the other, this may cause problems if you are comparing samples with *loads*
of structural variation.
* It is better to have a smaller number of reliable SNPs than a larger number of
dubious SNPs. For example, in *Arabopidopsis thaliana* that means only using
common SNPs located in genes.
* `ibdpainting` creates a subplot for every contig label in the input/reference
panel. If you work on an organism with many chromosomes or incompletely assembled
contigs, this could get messy. There is currently no way to subset which
contigs are shown, so it is probably easiest to supply input data based on only
a subset of contigs. The longest contigs are likely to be most informative
because you are more likely to be able to spot recombination break points.
## Usage
After installing, `ibdpainting` can be run as a command line tool as follows
```
ibdpainting \
--input input_file.hdf5 \
--reference reference_panel.hdf5 \
--window_size 500000 \
--sample_name "my_cross" \
--expected_match "mother" "father" \
--outdir path/to/output/directory
```
Explanation of the parameters (see also `ibdpainting --help`):
* `--input`: HDF5 file containing the crossed individuals. See [above](#input-data-files).
* `--reference`: HDF5 file containing the reference panel. See [above](#input-data-files).
* `--window_size`: Window size in base pairs.
* `--sample_name`: Name of the crossed individual to compare to the reference
panel. This must be present in the input file - you can check the original VCF file with something
like `bcftools query -l $input_vcf.vcf.gz | grep "my_cross"`.
* `--expected_match`: List of one or more expected parents of the test individual.
These names should be among the samples in the reference panel. Names should be
separated by spaces.
* `--outdir`: Path to the directory to save the output.
Additional optional parameters:
* `--keep_ibd_table`: Write an intermediate text file giving genetic distance
between the crossed individual and each candidate at each window in the genome.
Defaults to False, because these can be quite large.
* `--max_to_plot`: Integer number of candidates to plot.
`ibdpainting` makes an intial ranking of candidates based on genome-wide
similarity to the test individual, and plots only the top candidates.
## Author information
Tom Ellis
## Contributing
I will repeat the following from the [documentation](https://scikit-allel.readthedocs.io/en/stable/) for `scikit-allel`:
> This is academic software, written in the cracks of free time between other commitments, by people who are often learning as we code. We greatly appreciate bug reports, pull requests, and any other feedback or advice. If you do find a bug, we’ll do our best to fix it, but apologies in advance if we are not able to respond quickly. If you are doing any serious work with this package, please do not expect everything to work perfectly first time or be 100% correct. Treat everything with a healthy dose of suspicion, and don’t be afraid to dive into the source code if you have to. Pull requests are always welcome.
Raw data
{
"_id": null,
"home_page": "https://github.com/ellisztamas/ibdpainting",
"name": "ibdpainting",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Tom Ellis",
"author_email": "thomas.ellis@gmi.oeaw.ac.at",
"download_url": "https://files.pythonhosted.org/packages/54/19/226f0426d24d7e10594427e3a4559d17b80e0304ab4e9cdcfe66a53ba3c6/ibdpainting-0.2.0.tar.gz",
"platform": null,
"description": "# ibdpainting\n\n`ibdpainting` is a Python tool to visually validate the identity of crossed individuals\nfrom genetic data.\n\n## Contents\n\n- [Premise](#premise)\n- [Installation](#installation)\n- [Input data files](#input-data-files)\n- [Usage](#usage)\n- [Author information](#author-information)\n- [Contributing](#contributing)\n\n## Premise\n\n`ibdpainting` addresses the situation where you have multiple individuals\nderived from a crosses between individuals in a reference panel, and you want to\nverify that the crosses really are the genotype you think they are. Taking the\nsimple example of a biparental cross, you would expect an offspring of the F2 \ngeneration or later to be a mosaic of regions homozygous for either parent, \npotentially interspersed with heterozygous regions, depending on the generation.\n`ibdpainting` is a tool to visualise this mosaic pattern.\n\n## Installation\n\nInstall with `pip`:\n```\npip install ibdpainting\n```\n\n## Input data files\n\nThe program requires two HDF5 files created from VCF files:\n\n* **Input panel**: An HDF5 file containing SNPs for the crossed individual(s).\nThis can contain multiple individuals, but the program will only work on one at\na time.\n* **Reference panel**: An HDF5 file conataining SNP information for a panel of reference candidate\nparents.\n\nThe reason for using HDF5 is that it allows for loading data in chunks,\nwhich is much quicker than loading an entire VCF file into memory every time you\nwant to check a single sample. I recommend creating this using\n[vcf_to_hdf5](https://scikit-allel.readthedocs.io/en/latest/io.html#allel.vcf_to_hdf5)\nfrom `scikit-allel`. For example:\n```\nimport allel\nallel.vcf_to_hdf5('example.vcf', 'example.h5', fields='*', overwrite=True)\n```\n\nTips for preparing the data:\n\n* `ibdpainting` will only compare SNPs that intersect the input and reference files.\nOne one hand, this means that it does not matter if the offspring and reference\nfiles contain SNPs that do not match exactly.\nOn the other, this may cause problems if you are comparing samples with *loads*\nof structural variation.\n* It is better to have a smaller number of reliable SNPs than a larger number of \ndubious SNPs. For example, in *Arabopidopsis thaliana* that means only using \ncommon SNPs located in genes.\n* `ibdpainting` creates a subplot for every contig label in the input/reference\npanel. If you work on an organism with many chromosomes or incompletely assembled\ncontigs, this could get messy. There is currently no way to subset which \ncontigs are shown, so it is probably easiest to supply input data based on only \na subset of contigs. The longest contigs are likely to be most informative\nbecause you are more likely to be able to spot recombination break points.\n\n## Usage\n\nAfter installing, `ibdpainting` can be run as a command line tool as follows\n\n```\nibdpainting \\\n --input input_file.hdf5 \\\n --reference reference_panel.hdf5 \\\n --window_size 500000 \\\n --sample_name \"my_cross\" \\\n --expected_match \"mother\" \"father\" \\\n --outdir path/to/output/directory\n```\n\nExplanation of the parameters (see also `ibdpainting --help`):\n\n* `--input`: HDF5 file containing the crossed individuals. See [above](#input-data-files).\n* `--reference`: HDF5 file containing the reference panel. See [above](#input-data-files).\n* `--window_size`: Window size in base pairs.\n* `--sample_name`: Name of the crossed individual to compare to the reference \npanel. This must be present in the input file - you can check the original VCF file with something\nlike `bcftools query -l $input_vcf.vcf.gz | grep \"my_cross\"`.\n* `--expected_match`: List of one or more expected parents of the test individual.\nThese names should be among the samples in the reference panel. Names should be\nseparated by spaces.\n* `--outdir`: Path to the directory to save the output.\n\nAdditional optional parameters:\n\n* `--keep_ibd_table`: Write an intermediate text file giving genetic distance \nbetween the crossed individual and each candidate at each window in the genome.\nDefaults to False, because these can be quite large.\n* `--max_to_plot`: Integer number of candidates to plot.\n`ibdpainting` makes an intial ranking of candidates based on genome-wide \nsimilarity to the test individual, and plots only the top candidates.\n\n## Author information\n\nTom Ellis\n\n## Contributing\n\nI will repeat the following from the [documentation](https://scikit-allel.readthedocs.io/en/stable/) for `scikit-allel`:\n\n> This is academic software, written in the cracks of free time between other commitments, by people who are often learning as we code. We greatly appreciate bug reports, pull requests, and any other feedback or advice. If you do find a bug, we\u2019ll do our best to fix it, but apologies in advance if we are not able to respond quickly. If you are doing any serious work with this package, please do not expect everything to work perfectly first time or be 100% correct. Treat everything with a healthy dose of suspicion, and don\u2019t be afraid to dive into the source code if you have to. Pull requests are always welcome.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Identify parents of a crossed individual by comparing identity in windows across their genomes",
"version": "0.2.0",
"project_urls": {
"Homepage": "https://github.com/ellisztamas/ibdpainting"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5419226f0426d24d7e10594427e3a4559d17b80e0304ab4e9cdcfe66a53ba3c6",
"md5": "039121b90639ab13b66c3274ffcb67b2",
"sha256": "73160de8c858aa9a06a2fceb31f2699978d0c05484bd930616df527c4d826dda"
},
"downloads": -1,
"filename": "ibdpainting-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "039121b90639ab13b66c3274ffcb67b2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 14699,
"upload_time": "2024-09-09T11:48:35",
"upload_time_iso_8601": "2024-09-09T11:48:35.972867Z",
"url": "https://files.pythonhosted.org/packages/54/19/226f0426d24d7e10594427e3a4559d17b80e0304ab4e9cdcfe66a53ba3c6/ibdpainting-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-09 11:48:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ellisztamas",
"github_project": "ibdpainting",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "ibdpainting"
}