| Name | localqtl JSON |
| Version |
0.1.2
JSON |
| download |
| home_page | None |
| Summary | GPU-accelerated ancestry-aware cis-QTL mapping library with tensorQTL-compatible APIs |
| upload_time | 2025-11-03 18:17:45 |
| maintainer | Kynon J Benjamin |
| docs_url | None |
| author | Kynon J Benjamin |
| requires_python | <3.15,>=3.11 |
| license | GPL-3.0-or-later |
| keywords |
genomics
gpu
xqtl
qtl
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# localQTL
localQTL is a **pure-Python** library for local-ancestry-aware xQTL mapping that
lets researchers run end-to-end analyses on large cohorts **without any R/rpy2
dependencies**. It preserves the familiar
[tensorQTL](https://github.com/broadinstitute/tensorqtl) data model with **GPU-first
execution paths**, flexible genotype loaders, and streaming outputs for large-scale
workflows, while adding ancestry-aware use cases.
## Features
- **GPU-accelerated workflows** powered by PyTorch, with automatic fallbacks to CPU
execution when CUDA is unavailable.
- **Modular cis-QTL mapping API** exposed through functional helpers such as
`map_nominal`, `map_permutations`, and `map_independent`, or via the convenience
wrapper `CisMapper`.
- **Multiple genotype backends** including PLINK (BED/BIM/FAM), PLINK 2 (PGEN/PSAM/PVAR),
and BED/Parquet inputs, with helpers to stream data in manageable windows.
- **Local ancestry integration** by pairing genotype windows with haplotype panels
produced by RFMix through `RFMixReader` and `InputGeneratorCisWithHaps`.
- **Parquet streaming sinks** that make it easy to materialise association statistics
without loading the entire result set in memory.
- **Pure-Python statistics (no R/rpy2 required)**: tensorQTL's `rfunc` calls have been refactored to scipy.stats for p-values and q-values are computed with the Python port of Storey's `qvalue` (`py-qvalue`).
## Installation
Install the latest release from PyPI:
```bash
pip install localqtl
```
The project uses [Poetry](https://python-poetry.org/) for dependency management.
Clone the repository and install the package into a virtual environment:
```bash
poetry install
```
If you prefer `pip`, you can install the library in editable mode after exporting
Poetry's dependency specification:
```bash
pip install -e .
```
> **Note:** GPU acceleration relies on PyTorch, CuPy, and cuDF. Make sure you use
> versions that match the CUDA toolkit available on your system. The versions in
> `pyproject.toml` target CUDA 12.
## Quickstart
Below is a minimal example that runs a nominal cis-QTL scan against PLINK-formatted
genotypes and BED-formatted phenotypes. The example mirrors the data layout
expected by tensorQTL, so **existing preprocessing pipelines can be reused**.
### Standard mapping (tensorQTL equivalent)
```python
from localqtl import PlinkReader, read_phenotype_bed
from localqtl.cis import map_nominal
# Load genotypes and variant metadata
plink = PlinkReader("data/genotypes")
genotype_df = plink.load_genotypes()
variant_df = plink.bim.set_index("snp")[["chrom", "pos"]]
# Load phenotypes (BED-style) and their genomic coordinates
phenotype_df, phenotype_pos_df = read_phenotype_bed("data/phenotypes.bed")
# Optional: load covariates as a DataFrame indexed by sample IDs
covariates_df = None
results = map_nominal(
genotype_df=genotype_df,
variant_df=variant_df,
phenotype_df=phenotype_df,
phenotype_pos_df=phenotype_pos_df,
covariates_df=covariates_df,
window=1_000_000, # ±1 Mb cis window
maf_threshold=0.01, # filter on in-sample MAF
device="auto", # picks CUDA when available, otherwise CPU
out_prefix="cis_nominal", # default prefix
return_df=True # default is False, parquet streamed sink
)
print(results.head())
```
For analyses that combine nominal scans, permutations, and independent signal
calling, the `CisMapper` class offers a thin object-oriented façade:
```python
from localqtl.cis import CisMapper
mapper = CisMapper(
genotype_df=genotype_df,
variant_df=variant_df,
phenotype_df=phenotype_df,
phenotype_pos_df=phenotype_pos_df,
covariates_df=covariates_df,
window=500_000,
)
mapper.map_nominal(nperm=1_000)
perm_df = mapper.map_permutations(nperm=1_000, beta_approx=True)
perm_df = mapper.calculate_qvalues(perm_df, fdr=0.05)
lead_df = mapper.map_independent(cis_df=perm_df, fdr=0.05)
```
### Local ancestry-aware mapping
localQTL can incorporate local ancestry (e.g., from RFMix) so that cis-xQTL tests are performed with ancestry-aware genotype inputs. To enable this, pass **`haplotypes`** (per-variant × per-sample × per-ancestry dosages) and **`loci_df`** (variant positions corresponding to the haplotype tensor) to `CisMapper`. When these are provided, `CisMapper` switches to the ancestry-aware input generator under the hood.
```python
from localqtl import PlinkReader, read_phenotype_bed
from localqtl.haplotypeio import RFMixReader
from localqtl.cis import CisMapper
plink = PlinkReader("data/genotypes")
genotype_df = plink.load_genotypes() # samples as columns, variants as rows
variant_df = plink.bim.set_index("snp")[["chrom", "pos"]]
phenotype_df, phenotype_pos_df = read_phenotype_bed("data/phenotypes.bed")
covariates_df = None # optional
# Local ancestry from RFMix
rfmix = RFMixReader(
prefix_path="data/rfmix/prefix", # directory with per-chrom outputs + fb.tsv
binary_path="data/rfmix", # where prebuilt binaries live (if used)
verbose=True
)
# Materialize ancestry haplotypes into memory (NumPy array)
# Shape: (variants, samples, ancestries) for >2 ancestries.
# For 2 ancestries, the reader exposes the ancestry channel selected internally.
H = rfmix.load_haplotypes()
loci_df = rfmix.loci_df # DataFrame with ['chrom','pos', 'ancestry', 'hap', 'index'] (indexed by 'hap')
# Align samples to the RFMix order (recommended)
# Keep only intersection and put everything in the same order as rfmix.sample_ids.
keep = [sid for sid in rfmix.sample_ids if sid in genotype_df.columns]
genotype_df = genotype_df[keep]
phenotype_df = phenotype_df[keep]
if covariates_df is not None:
covariates_df = covariates_df.loc[keep]
# (Optional) Ensure chromosome dtype matches between variant_df and loci_df
variant_df["chrom"] = variant_df["chrom"].astype(str)
loci_df = loci_df.copy()
loci_df["chrom"] = loci_df["chrom"].astype(str)
# Ancestry-aware mapping
mapper = CisMapper(
genotype_df=genotype_df,
variant_df=variant_df,
phenotype_df=phenotype_df,
phenotype_pos_df=phenotype_pos_df,
covariates_df=covariates_df,
haplotypes=H, # <-- enable local ancestry-aware mode
loci_df=loci_df, # <-- positions that align to H
window=1_000_000,
device="auto",
)
# Run nominal scans and permutations as usual; ancestry-awareness is automatic
mapper.map_nominal(nperm=0)
perm_df = mapper.map_permutations(nperm=1_000, beta_approx=True)
# FDR using the pure-Python q-value port (no R/rpy2 needed)
perm_df = mapper.calculate_qvalues(perm_df, fdr=0.05)
# Identify conditionally independent signals
lead_df = mapper.map_independent(cis_df=perm_df, fdr=0.05)
print(lead_df.head())
```
**Input expectations**
* `H` (from `RFMixReader.load_haplotypes()`) is ancestry dosages with shape `(variants, samples, ancestries)` for ≥3 ancestries; for 2 ancestries the reader exposes the configured channel.
* `loci_df` rows correspond 1:1 to `H`'s and be joinable (by `chrom`/`pos`) to `variant_df` used for genotypes.
* Sample order in `genotype_df`, `phenotype_df`, and `covariates_df` should match `rfmix.sample_ids` (reindex as shown).
* The cis-mapping helpers default to ``preload_haplotypes=True`` so ancestry blocks are staged as contiguous tensors on the requested device (GPU or CPU). Override this flag when working under strict memory constraints.
## Testing
Run the test suite with:
```bash
poetry run pytest
```
This exercises the core cis-QTL mapping routines using small synthetic datasets.
Raw data
{
"_id": null,
"home_page": null,
"name": "localqtl",
"maintainer": "Kynon J Benjamin",
"docs_url": null,
"requires_python": "<3.15,>=3.11",
"maintainer_email": "kj.benjamin90@gmail.com",
"keywords": "genomics, gpu, xQTL, qtl",
"author": "Kynon J Benjamin",
"author_email": "kj.benjamin90@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/e6/87/69c567f98fb18118e10874cf7fe2860fcd3eabd893be1049e7740e9f236b/localqtl-0.1.2.tar.gz",
"platform": null,
"description": "# localQTL\n\nlocalQTL is a **pure-Python** library for local-ancestry-aware xQTL mapping that\nlets researchers run end-to-end analyses on large cohorts **without any R/rpy2\ndependencies**. It preserves the familiar \n[tensorQTL](https://github.com/broadinstitute/tensorqtl) data model with **GPU-first\nexecution paths**, flexible genotype loaders, and streaming outputs for large-scale\nworkflows, while adding ancestry-aware use cases.\n\n## Features\n\n- **GPU-accelerated workflows** powered by PyTorch, with automatic fallbacks to CPU\n execution when CUDA is unavailable.\n- **Modular cis-QTL mapping API** exposed through functional helpers such as\n `map_nominal`, `map_permutations`, and `map_independent`, or via the convenience\n wrapper `CisMapper`.\n- **Multiple genotype backends** including PLINK (BED/BIM/FAM), PLINK 2 (PGEN/PSAM/PVAR),\n and BED/Parquet inputs, with helpers to stream data in manageable windows.\n- **Local ancestry integration** by pairing genotype windows with haplotype panels\n produced by RFMix through `RFMixReader` and `InputGeneratorCisWithHaps`.\n- **Parquet streaming sinks** that make it easy to materialise association statistics\n without loading the entire result set in memory.\n- **Pure-Python statistics (no R/rpy2 required)**: tensorQTL's `rfunc` calls have been refactored to scipy.stats for p-values and q-values are computed with the Python port of Storey's `qvalue` (`py-qvalue`).\n\n## Installation\n\nInstall the latest release from PyPI:\n\n```bash\npip install localqtl\n```\n\nThe project uses [Poetry](https://python-poetry.org/) for dependency management.\nClone the repository and install the package into a virtual environment:\n\n```bash\npoetry install\n```\n\nIf you prefer `pip`, you can install the library in editable mode after exporting\nPoetry's dependency specification:\n\n```bash\npip install -e .\n```\n\n> **Note:** GPU acceleration relies on PyTorch, CuPy, and cuDF. Make sure you use\n> versions that match the CUDA toolkit available on your system. The versions in\n> `pyproject.toml` target CUDA 12.\n\n## Quickstart\n\nBelow is a minimal example that runs a nominal cis-QTL scan against PLINK-formatted\ngenotypes and BED-formatted phenotypes. The example mirrors the data layout\nexpected by tensorQTL, so **existing preprocessing pipelines can be reused**.\n\n### Standard mapping (tensorQTL equivalent)\n```python\nfrom localqtl import PlinkReader, read_phenotype_bed\nfrom localqtl.cis import map_nominal\n\n# Load genotypes and variant metadata\nplink = PlinkReader(\"data/genotypes\")\ngenotype_df = plink.load_genotypes()\nvariant_df = plink.bim.set_index(\"snp\")[[\"chrom\", \"pos\"]]\n\n# Load phenotypes (BED-style) and their genomic coordinates\nphenotype_df, phenotype_pos_df = read_phenotype_bed(\"data/phenotypes.bed\")\n\n# Optional: load covariates as a DataFrame indexed by sample IDs\ncovariates_df = None\n\nresults = map_nominal(\n genotype_df=genotype_df,\n variant_df=variant_df,\n phenotype_df=phenotype_df,\n phenotype_pos_df=phenotype_pos_df,\n covariates_df=covariates_df,\n window=1_000_000, # \u00b11 Mb cis window\n maf_threshold=0.01, # filter on in-sample MAF\n device=\"auto\", # picks CUDA when available, otherwise CPU\n out_prefix=\"cis_nominal\", # default prefix\n return_df=True # default is False, parquet streamed sink\n)\n\nprint(results.head())\n```\n\nFor analyses that combine nominal scans, permutations, and independent signal\ncalling, the `CisMapper` class offers a thin object-oriented fa\u00e7ade:\n\n```python\nfrom localqtl.cis import CisMapper\n\nmapper = CisMapper(\n genotype_df=genotype_df,\n variant_df=variant_df,\n phenotype_df=phenotype_df,\n phenotype_pos_df=phenotype_pos_df,\n covariates_df=covariates_df,\n window=500_000,\n)\n\nmapper.map_nominal(nperm=1_000)\nperm_df = mapper.map_permutations(nperm=1_000, beta_approx=True)\nperm_df = mapper.calculate_qvalues(perm_df, fdr=0.05)\nlead_df = mapper.map_independent(cis_df=perm_df, fdr=0.05)\n```\n\n### Local ancestry-aware mapping\n\nlocalQTL can incorporate local ancestry (e.g., from RFMix) so that cis-xQTL tests are performed with ancestry-aware genotype inputs. To enable this, pass **`haplotypes`** (per-variant \u00d7 per-sample \u00d7 per-ancestry dosages) and **`loci_df`** (variant positions corresponding to the haplotype tensor) to `CisMapper`. When these are provided, `CisMapper` switches to the ancestry-aware input generator under the hood.\n\n```python\nfrom localqtl import PlinkReader, read_phenotype_bed\nfrom localqtl.haplotypeio import RFMixReader\nfrom localqtl.cis import CisMapper\n\nplink = PlinkReader(\"data/genotypes\")\ngenotype_df = plink.load_genotypes() # samples as columns, variants as rows\nvariant_df = plink.bim.set_index(\"snp\")[[\"chrom\", \"pos\"]]\n\nphenotype_df, phenotype_pos_df = read_phenotype_bed(\"data/phenotypes.bed\")\ncovariates_df = None # optional\n\n# Local ancestry from RFMix\nrfmix = RFMixReader(\n prefix_path=\"data/rfmix/prefix\", # directory with per-chrom outputs + fb.tsv\n binary_path=\"data/rfmix\", # where prebuilt binaries live (if used)\n verbose=True\n)\n\n# Materialize ancestry haplotypes into memory (NumPy array)\n# Shape: (variants, samples, ancestries) for >2 ancestries.\n# For 2 ancestries, the reader exposes the ancestry channel selected internally.\nH = rfmix.load_haplotypes()\nloci_df = rfmix.loci_df # DataFrame with ['chrom','pos', 'ancestry', 'hap', 'index'] (indexed by 'hap')\n\n# Align samples to the RFMix order (recommended)\n# Keep only intersection and put everything in the same order as rfmix.sample_ids.\nkeep = [sid for sid in rfmix.sample_ids if sid in genotype_df.columns]\ngenotype_df = genotype_df[keep]\nphenotype_df = phenotype_df[keep]\nif covariates_df is not None:\n covariates_df = covariates_df.loc[keep]\n\n# (Optional) Ensure chromosome dtype matches between variant_df and loci_df\nvariant_df[\"chrom\"] = variant_df[\"chrom\"].astype(str)\nloci_df = loci_df.copy()\nloci_df[\"chrom\"] = loci_df[\"chrom\"].astype(str)\n\n# Ancestry-aware mapping\nmapper = CisMapper(\n genotype_df=genotype_df,\n variant_df=variant_df,\n phenotype_df=phenotype_df,\n phenotype_pos_df=phenotype_pos_df,\n covariates_df=covariates_df,\n haplotypes=H, # <-- enable local ancestry-aware mode\n loci_df=loci_df, # <-- positions that align to H\n window=1_000_000,\n device=\"auto\",\n)\n\n# Run nominal scans and permutations as usual; ancestry-awareness is automatic\nmapper.map_nominal(nperm=0)\nperm_df = mapper.map_permutations(nperm=1_000, beta_approx=True)\n\n# FDR using the pure-Python q-value port (no R/rpy2 needed)\nperm_df = mapper.calculate_qvalues(perm_df, fdr=0.05)\n\n# Identify conditionally independent signals\nlead_df = mapper.map_independent(cis_df=perm_df, fdr=0.05)\n\nprint(lead_df.head())\n```\n\n**Input expectations**\n\n* `H` (from `RFMixReader.load_haplotypes()`) is ancestry dosages with shape `(variants, samples, ancestries)` for \u22653 ancestries; for 2 ancestries the reader exposes the configured channel.\n* `loci_df` rows correspond 1:1 to `H`'s and be joinable (by `chrom`/`pos`) to `variant_df` used for genotypes.\n* Sample order in `genotype_df`, `phenotype_df`, and `covariates_df` should match `rfmix.sample_ids` (reindex as shown).\n* The cis-mapping helpers default to ``preload_haplotypes=True`` so ancestry blocks are staged as contiguous tensors on the requested device (GPU or CPU). Override this flag when working under strict memory constraints.\n\n## Testing\n\nRun the test suite with:\n\n```bash\npoetry run pytest\n```\n\nThis exercises the core cis-QTL mapping routines using small synthetic datasets.\n",
"bugtrack_url": null,
"license": "GPL-3.0-or-later",
"summary": "GPU-accelerated ancestry-aware cis-QTL mapping library with tensorQTL-compatible APIs",
"version": "0.1.2",
"project_urls": {
"Bug Tracker": "https://github.com/heart-gen/localQTL/issues",
"homepage": "https://github.com/heart-gen/localQTL",
"repository": "https://github.com/heart-gen/localQTL"
},
"split_keywords": [
"genomics",
" gpu",
" xqtl",
" qtl"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4635bd0b8c31fad39e3f537a943a339d6083049deda26dd36d943cd77e24e350",
"md5": "1087b585f2286b6c8c219bc1d277dc68",
"sha256": "c9cceff00b6269b5e9e7545f19de8f166a8d4097995131bd9707211f1fee288c"
},
"downloads": -1,
"filename": "localqtl-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1087b585f2286b6c8c219bc1d277dc68",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.15,>=3.11",
"size": 57297,
"upload_time": "2025-11-03T18:17:43",
"upload_time_iso_8601": "2025-11-03T18:17:43.914123Z",
"url": "https://files.pythonhosted.org/packages/46/35/bd0b8c31fad39e3f537a943a339d6083049deda26dd36d943cd77e24e350/localqtl-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e68769c567f98fb18118e10874cf7fe2860fcd3eabd893be1049e7740e9f236b",
"md5": "3fa64e86d6207d9cd5267590ca748f53",
"sha256": "8a2e40c7d352f1c494ec233eaa9f47041266ce4529c81d3d001f9f9a0e241d53"
},
"downloads": -1,
"filename": "localqtl-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "3fa64e86d6207d9cd5267590ca748f53",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.15,>=3.11",
"size": 48979,
"upload_time": "2025-11-03T18:17:45",
"upload_time_iso_8601": "2025-11-03T18:17:45.222042Z",
"url": "https://files.pythonhosted.org/packages/e6/87/69c567f98fb18118e10874cf7fe2860fcd3eabd893be1049e7740e9f236b/localqtl-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-03 18:17:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "heart-gen",
"github_project": "localQTL",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "localqtl"
}