## UPD.py
Simple software to call UPD regions from germline exome/wgs trios. Developed by Björn Hallström. This repository was forked from https://github.com/bjhall/upd.
## Documentation
### Installation
`pip install --editable .` or `python setup.py install`
The only dependencies are click and coloredlogs.
### Input VCF requirements
* Trio (proband-mother-father) VCF is required (i.e. all three individuals in the same VCF).
* GQ must be present in genotype fields.
* Must be annotated with some sort of population frequency. This can also be in the CSQ annotation (default: MAX_AF), if so use `--vep`
### Running
```bash
upd --vcf input.vcf.gz --proband PB_ID --mother MOTHER_ID --father FATHER_ID regions --out upd_regions.bed
```
Where PB_ID/MOTHER_ID/FATHER_ID are the sample IDs from the vcf header.
#### Optional parameters
Command |Parameter | Description
------- |--------- | -----------
base | **--min-af (DEFAULT: 0.05)** | Minimum population frequency required to include the SNP in the analysis.
base | **--af-tag (DEFAULT: MAX_AF)** | Specifies which frequency field to be used when selecting SNPs.
base | **--min-gq (DEFAULT: 30)** | Specifies the minimum GQ required to include a variant in the analysis. All three individuals' must have a GQ larged than or equal to this.
base | **--vep (flag)** | If given, search the CSQ field for `af-tag`
base | **--vep (flag)** | If given, search the CSQ field for `af-tag`
regions | **--min-sites (DEFAULT: 3)** | Minimum number of consecutive UPD sites needed to call an UPD region.
regions | **--min-size (DEFAULT: 1000)** | Minimum number of base pairs between first and last UPD site in a region required to call it.
regions/sites | **--out (DEFAULT: stdout)** | If the results should be printed to a file
regions/sites | **--iso-het-pct (DEFAULT: 0.01)** | Threshold ratio for calling homodisomy
### Output
#### Called regions (udp regions)
BED file of all called regions fulfilling the --min-sites and --min-size filter criteria, including some annotation.
Example call:
```
15 22958103 101910550 ORIGIN=PATERNAL;TYPE=HETERODISOMY;LOW_SIZE=78952446;INF_SITES=182;SNPS=2896;HET_HOM=1275/1440;OPP_SITES=0;START_LOW=20170150;END_HIGH=102516492;HIGH_SIZE=82346342
```
Field | Description
----- | -----------
ORIGIN | The parental origin of the UPD region
TYPE | Indicates whether region is heterodisomic or isodisomic/deletion. See caveats
LOW_SIZE | Lower bound of the estimated size of the called region. Distance between first and last UPD site of the region
INF_SITES | Number of UPD informative sites in the called regions
SNPS | Total number of SNPs in the region
HET_HOM | Number of heterozygous and homozygous sites in the region. Used to indicate if region has LOH
OPP_SITES | Number of sites in the region that have the opposite parental origin. So no other number than 0 have been observed...
START_LOW | Earliest possible start positon of the UPD region (position of last anti-UPD site prior to the region)
END_HIGH | Last possible end position of the UPD region (position of first anti-UPD site prior to the region)
HIGH_SIZE | Upper bound of the estimated size of the called region. Distance between the surrounding anti-UPD sites.
#### Informative sites (upd sites)
Fourth field indicates what type of field:
Type | Description
---- | -----------
ANTI_UPD | Site where proband has inherited alleles from both parents (e.g. AA x BB = AB)
UPD_MATERNAL_ORIGIN | Site where both alleles are inherited from the mother, or a fathers allele was deleted. (e.g. AA x BB = AA)
UPD_PATERNAL_ORIGIN | Site where both alleles are inherited from the father, or a mothers allele was deleted. (e.g. AA x BB = BB)
PB_HETEROZYGOUS | Heterozygous site in the proband (used only to call hetero/isodisomy)
PB_HOMOZYGOUS | Homozygous site in the proband (used only to call hetero/isodisomy)
UNINFORMATIVE | Various sites excluded from the analysis. Ignore these...
### Caveats
The isodisomic/heterodisomic calling depends on estimating if there is a run of homozygousity in the called region. This is called using presence of heterozygous sites within the call, and should only be seen as a rough estimate for smaller calls. For more statistically sound detection of this, use e.g. bcftools roh to detect regions of homozygozity and combine the results.
Isodisomies cannot be differentiated from deletions using this software. Combine the calls with SV/CNV calls to determine if there is an overlapping deletion. If overlapping deletion, the "UPD" call can be used to detemine which parental allele was deleted.
Raw data
{
"_id": null,
"home_page": "https://github.com/hrydbeck/upd",
"name": "upd",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "vcf,variants,upd",
"author": "Halfdan Rydbeck",
"author_email": "halfdan.rydbeck@regionostergotland.se",
"download_url": "https://files.pythonhosted.org/packages/11/71/f46b3e56dd4a26fbbfcdf00139f1245d577157c1fe37b577ba3c79458868/upd-0.2.tar.gz",
"platform": null,
"description": "\n## UPD.py\n\nSimple software to call UPD regions from germline exome/wgs trios. Developed by Bj\u00f6rn Hallstr\u00f6m. This repository was forked from https://github.com/bjhall/upd.\n\n## Documentation\n\n### Installation\n\n`pip install --editable .` or `python setup.py install`\n\nThe only dependencies are click and coloredlogs.\n\n\n### Input VCF requirements\n\n* Trio (proband-mother-father) VCF is required (i.e. all three individuals in the same VCF).\n* GQ must be present in genotype fields.\n* Must be annotated with some sort of population frequency. This can also be in the CSQ annotation (default: MAX_AF), if so use `--vep`\n\n\n### Running\n\n```bash\nupd --vcf input.vcf.gz --proband PB_ID --mother MOTHER_ID --father FATHER_ID regions --out upd_regions.bed\n```\n\nWhere PB_ID/MOTHER_ID/FATHER_ID are the sample IDs from the vcf header.\n\n#### Optional parameters\nCommand |Parameter | Description\n------- |--------- | -----------\nbase | **--min-af (DEFAULT: 0.05)** | Minimum population frequency required to include the SNP in the analysis.\nbase | **--af-tag (DEFAULT: MAX_AF)** | Specifies which frequency field to be used when selecting SNPs.\nbase | **--min-gq (DEFAULT: 30)** | Specifies the minimum GQ required to include a variant in the analysis. All three individuals' must have a GQ larged than or equal to this.\nbase | **--vep (flag)** | If given, search the CSQ field for `af-tag`\nbase | **--vep (flag)** | If given, search the CSQ field for `af-tag`\nregions | **--min-sites (DEFAULT: 3)** | Minimum number of consecutive UPD sites needed to call an UPD region.\nregions | **--min-size (DEFAULT: 1000)** | Minimum number of base pairs between first and last UPD site in a region required to call it.\nregions/sites | **--out (DEFAULT: stdout)** | If the results should be printed to a file\nregions/sites | **--iso-het-pct (DEFAULT: 0.01)** | Threshold ratio for calling homodisomy\n\n\n### Output\n\n#### Called regions (udp regions)\nBED file of all called regions fulfilling the --min-sites and --min-size filter criteria, including some annotation.\n\nExample call:\n```\n15 22958103 101910550 ORIGIN=PATERNAL;TYPE=HETERODISOMY;LOW_SIZE=78952446;INF_SITES=182;SNPS=2896;HET_HOM=1275/1440;OPP_SITES=0;START_LOW=20170150;END_HIGH=102516492;HIGH_SIZE=82346342\n```\n\nField | Description\n----- | -----------\nORIGIN | The parental origin of the UPD region\nTYPE | Indicates whether region is heterodisomic or isodisomic/deletion. See caveats\nLOW_SIZE | Lower bound of the estimated size of the called region. Distance between first and last UPD site of the region\nINF_SITES | Number of UPD informative sites in the called regions\nSNPS | Total number of SNPs in the region\nHET_HOM | Number of heterozygous and homozygous sites in the region. Used to indicate if region has LOH\nOPP_SITES | Number of sites in the region that have the opposite parental origin. So no other number than 0 have been observed...\nSTART_LOW | Earliest possible start positon of the UPD region (position of last anti-UPD site prior to the region)\nEND_HIGH | Last possible end position of the UPD region (position of first anti-UPD site prior to the region)\nHIGH_SIZE | Upper bound of the estimated size of the called region. Distance between the surrounding anti-UPD sites.\n\n#### Informative sites (upd sites)\nFourth field indicates what type of field:\n\nType | Description\n---- | -----------\nANTI_UPD | Site where proband has inherited alleles from both parents (e.g. AA x BB = AB)\nUPD_MATERNAL_ORIGIN | Site where both alleles are inherited from the mother, or a fathers allele was deleted. (e.g. AA x BB = AA)\nUPD_PATERNAL_ORIGIN | Site where both alleles are inherited from the father, or a mothers allele was deleted. (e.g. AA x BB = BB)\nPB_HETEROZYGOUS | Heterozygous site in the proband (used only to call hetero/isodisomy)\nPB_HOMOZYGOUS | Homozygous site in the proband (used only to call hetero/isodisomy)\nUNINFORMATIVE | Various sites excluded from the analysis. Ignore these...\n\n\n### Caveats\nThe isodisomic/heterodisomic calling depends on estimating if there is a run of homozygousity in the called region. This is called using presence of heterozygous sites within the call, and should only be seen as a rough estimate for smaller calls. For more statistically sound detection of this, use e.g. bcftools roh to detect regions of homozygozity and combine the results.\n\nIsodisomies cannot be differentiated from deletions using this software. Combine the calls with SV/CNV calls to determine if there is an overlapping deletion. If overlapping deletion, the \"UPD\" call can be used to detemine which parental allele was deleted.\n\n\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Simple software to call UPD regions from germline exome/wgs trios.",
"version": "0.2",
"project_urls": {
"Homepage": "https://github.com/hrydbeck/upd"
},
"split_keywords": [
"vcf",
"variants",
"upd"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5ac1033783ecf1dd588b7fcd992cfb2c69299fb4798a2f89b7b02b4a2f46a882",
"md5": "c3b895ad9669dfa7893707e549b5171a",
"sha256": "3281d8fe79eb36c97fb325a470519569578f7fd10ed86c54a53d4f13abc8d657"
},
"downloads": -1,
"filename": "upd-0.2-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "c3b895ad9669dfa7893707e549b5171a",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.6",
"size": 12532,
"upload_time": "2023-05-16T11:48:54",
"upload_time_iso_8601": "2023-05-16T11:48:54.650631Z",
"url": "https://files.pythonhosted.org/packages/5a/c1/033783ecf1dd588b7fcd992cfb2c69299fb4798a2f89b7b02b4a2f46a882/upd-0.2-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1171f46b3e56dd4a26fbbfcdf00139f1245d577157c1fe37b577ba3c79458868",
"md5": "b7ed3dda5ced3ec0d3aa07c4436073f5",
"sha256": "d48b6eae6b095a2d2727087df082b9b7ff3e8b48e935a12e34d9bc5741765209"
},
"downloads": -1,
"filename": "upd-0.2.tar.gz",
"has_sig": false,
"md5_digest": "b7ed3dda5ced3ec0d3aa07c4436073f5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 15128,
"upload_time": "2023-05-16T11:48:57",
"upload_time_iso_8601": "2023-05-16T11:48:57.789875Z",
"url": "https://files.pythonhosted.org/packages/11/71/f46b3e56dd4a26fbbfcdf00139f1245d577157c1fe37b577ba3c79458868/upd-0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-16 11:48:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hrydbeck",
"github_project": "upd",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "upd"
}