localqtl


Namelocalqtl JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryGPU-accelerated ancestry-aware cis-QTL mapping library with tensorQTL-compatible APIs
upload_time2025-11-03 18:17:45
maintainerKynon J Benjamin
docs_urlNone
authorKynon J Benjamin
requires_python<3.15,>=3.11
licenseGPL-3.0-or-later
keywords genomics gpu xqtl qtl
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # localQTL

localQTL is a **pure-Python** library for local-ancestry-aware xQTL mapping that
lets researchers run end-to-end analyses on large cohorts **without any R/rpy2
dependencies**. It preserves the familiar 
[tensorQTL](https://github.com/broadinstitute/tensorqtl) data model with **GPU-first
execution paths**, flexible genotype loaders, and streaming outputs for large-scale
workflows, while adding ancestry-aware use cases.

## Features

- **GPU-accelerated workflows** powered by PyTorch, with automatic fallbacks to CPU
  execution when CUDA is unavailable.
- **Modular cis-QTL mapping API** exposed through functional helpers such as
  `map_nominal`, `map_permutations`, and `map_independent`, or via the convenience
  wrapper `CisMapper`.
- **Multiple genotype backends** including PLINK (BED/BIM/FAM), PLINK 2 (PGEN/PSAM/PVAR),
  and BED/Parquet inputs, with helpers to stream data in manageable windows.
- **Local ancestry integration** by pairing genotype windows with haplotype panels
  produced by RFMix through `RFMixReader` and `InputGeneratorCisWithHaps`.
- **Parquet streaming sinks** that make it easy to materialise association statistics
  without loading the entire result set in memory.
- **Pure-Python statistics (no R/rpy2 required)**: tensorQTL's `rfunc` calls have been refactored to scipy.stats for p-values and q-values are computed with the Python port of Storey's `qvalue` (`py-qvalue`).

## Installation

Install the latest release from PyPI:

```bash
pip install localqtl
```

The project uses [Poetry](https://python-poetry.org/) for dependency management.
Clone the repository and install the package into a virtual environment:

```bash
poetry install
```

If you prefer `pip`, you can install the library in editable mode after exporting
Poetry's dependency specification:

```bash
pip install -e .
```

> **Note:** GPU acceleration relies on PyTorch, CuPy, and cuDF. Make sure you use
> versions that match the CUDA toolkit available on your system. The versions in
> `pyproject.toml` target CUDA 12.

## Quickstart

Below is a minimal example that runs a nominal cis-QTL scan against PLINK-formatted
genotypes and BED-formatted phenotypes. The example mirrors the data layout
expected by tensorQTL, so **existing preprocessing pipelines can be reused**.

### Standard mapping (tensorQTL equivalent)
```python
from localqtl import PlinkReader, read_phenotype_bed
from localqtl.cis import map_nominal

# Load genotypes and variant metadata
plink = PlinkReader("data/genotypes")
genotype_df = plink.load_genotypes()
variant_df = plink.bim.set_index("snp")[["chrom", "pos"]]

# Load phenotypes (BED-style) and their genomic coordinates
phenotype_df, phenotype_pos_df = read_phenotype_bed("data/phenotypes.bed")

# Optional: load covariates as a DataFrame indexed by sample IDs
covariates_df = None

results = map_nominal(
    genotype_df=genotype_df,
    variant_df=variant_df,
    phenotype_df=phenotype_df,
    phenotype_pos_df=phenotype_pos_df,
    covariates_df=covariates_df,
    window=1_000_000,         # ±1 Mb cis window
    maf_threshold=0.01,       # filter on in-sample MAF
    device="auto",            # picks CUDA when available, otherwise CPU
    out_prefix="cis_nominal", # default prefix
    return_df=True            # default is False, parquet streamed sink
)

print(results.head())
```

For analyses that combine nominal scans, permutations, and independent signal
calling, the `CisMapper` class offers a thin object-oriented façade:

```python
from localqtl.cis import CisMapper

mapper = CisMapper(
    genotype_df=genotype_df,
    variant_df=variant_df,
    phenotype_df=phenotype_df,
    phenotype_pos_df=phenotype_pos_df,
    covariates_df=covariates_df,
    window=500_000,
)

mapper.map_nominal(nperm=1_000)
perm_df = mapper.map_permutations(nperm=1_000, beta_approx=True)
perm_df = mapper.calculate_qvalues(perm_df, fdr=0.05)
lead_df = mapper.map_independent(cis_df=perm_df, fdr=0.05)
```

### Local ancestry-aware mapping

localQTL can incorporate local ancestry (e.g., from RFMix) so that cis-xQTL tests are performed with ancestry-aware genotype inputs. To enable this, pass **`haplotypes`** (per-variant × per-sample × per-ancestry dosages) and **`loci_df`** (variant positions corresponding to the haplotype tensor) to `CisMapper`. When these are provided, `CisMapper` switches to the ancestry-aware input generator under the hood.

```python
from localqtl import PlinkReader, read_phenotype_bed
from localqtl.haplotypeio import RFMixReader
from localqtl.cis import CisMapper

plink = PlinkReader("data/genotypes")
genotype_df = plink.load_genotypes()  # samples as columns, variants as rows
variant_df = plink.bim.set_index("snp")[["chrom", "pos"]]

phenotype_df, phenotype_pos_df = read_phenotype_bed("data/phenotypes.bed")
covariates_df = None  # optional

# Local ancestry from RFMix
rfmix = RFMixReader(
    prefix_path="data/rfmix/prefix",   # directory with per-chrom outputs + fb.tsv
    binary_path="data/rfmix",          # where prebuilt binaries live (if used)
    verbose=True
)

# Materialize ancestry haplotypes into memory (NumPy array)
# Shape: (variants, samples, ancestries) for >2 ancestries.
# For 2 ancestries, the reader exposes the ancestry channel selected internally.
H = rfmix.load_haplotypes()
loci_df = rfmix.loci_df  # DataFrame with ['chrom','pos', 'ancestry', 'hap', 'index'] (indexed by 'hap')

# Align samples to the RFMix order (recommended)
# Keep only intersection and put everything in the same order as rfmix.sample_ids.
keep = [sid for sid in rfmix.sample_ids if sid in genotype_df.columns]
genotype_df = genotype_df[keep]
phenotype_df = phenotype_df[keep]
if covariates_df is not None:
    covariates_df = covariates_df.loc[keep]

# (Optional) Ensure chromosome dtype matches between variant_df and loci_df
variant_df["chrom"] = variant_df["chrom"].astype(str)
loci_df = loci_df.copy()
loci_df["chrom"] = loci_df["chrom"].astype(str)

# Ancestry-aware mapping
mapper = CisMapper(
    genotype_df=genotype_df,
    variant_df=variant_df,
    phenotype_df=phenotype_df,
    phenotype_pos_df=phenotype_pos_df,
    covariates_df=covariates_df,
    haplotypes=H,          # <-- enable local ancestry-aware mode
    loci_df=loci_df,       # <-- positions that align to H
    window=1_000_000,
    device="auto",
)

# Run nominal scans and permutations as usual; ancestry-awareness is automatic
mapper.map_nominal(nperm=0)
perm_df = mapper.map_permutations(nperm=1_000, beta_approx=True)

# FDR using the pure-Python q-value port (no R/rpy2 needed)
perm_df = mapper.calculate_qvalues(perm_df, fdr=0.05)

# Identify conditionally independent signals
lead_df = mapper.map_independent(cis_df=perm_df, fdr=0.05)

print(lead_df.head())
```

**Input expectations**

* `H` (from `RFMixReader.load_haplotypes()`) is ancestry dosages with shape `(variants, samples, ancestries)` for ≥3 ancestries; for 2 ancestries the reader exposes the configured channel.
* `loci_df` rows correspond 1:1 to `H`'s and be joinable (by `chrom`/`pos`) to `variant_df` used for genotypes.
* Sample order in `genotype_df`, `phenotype_df`, and `covariates_df` should match `rfmix.sample_ids` (reindex as shown).
* The cis-mapping helpers default to ``preload_haplotypes=True`` so ancestry blocks are staged as contiguous tensors on the requested device (GPU or CPU). Override this flag when working under strict memory constraints.

## Testing

Run the test suite with:

```bash
poetry run pytest
```

This exercises the core cis-QTL mapping routines using small synthetic datasets.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "localqtl",
    "maintainer": "Kynon J Benjamin",
    "docs_url": null,
    "requires_python": "<3.15,>=3.11",
    "maintainer_email": "kj.benjamin90@gmail.com",
    "keywords": "genomics, gpu, xQTL, qtl",
    "author": "Kynon J Benjamin",
    "author_email": "kj.benjamin90@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e6/87/69c567f98fb18118e10874cf7fe2860fcd3eabd893be1049e7740e9f236b/localqtl-0.1.2.tar.gz",
    "platform": null,
    "description": "# localQTL\n\nlocalQTL is a **pure-Python** library for local-ancestry-aware xQTL mapping that\nlets researchers run end-to-end analyses on large cohorts **without any R/rpy2\ndependencies**. It preserves the familiar \n[tensorQTL](https://github.com/broadinstitute/tensorqtl) data model with **GPU-first\nexecution paths**, flexible genotype loaders, and streaming outputs for large-scale\nworkflows, while adding ancestry-aware use cases.\n\n## Features\n\n- **GPU-accelerated workflows** powered by PyTorch, with automatic fallbacks to CPU\n  execution when CUDA is unavailable.\n- **Modular cis-QTL mapping API** exposed through functional helpers such as\n  `map_nominal`, `map_permutations`, and `map_independent`, or via the convenience\n  wrapper `CisMapper`.\n- **Multiple genotype backends** including PLINK (BED/BIM/FAM), PLINK 2 (PGEN/PSAM/PVAR),\n  and BED/Parquet inputs, with helpers to stream data in manageable windows.\n- **Local ancestry integration** by pairing genotype windows with haplotype panels\n  produced by RFMix through `RFMixReader` and `InputGeneratorCisWithHaps`.\n- **Parquet streaming sinks** that make it easy to materialise association statistics\n  without loading the entire result set in memory.\n- **Pure-Python statistics (no R/rpy2 required)**: tensorQTL's `rfunc` calls have been refactored to scipy.stats for p-values and q-values are computed with the Python port of Storey's `qvalue` (`py-qvalue`).\n\n## Installation\n\nInstall the latest release from PyPI:\n\n```bash\npip install localqtl\n```\n\nThe project uses [Poetry](https://python-poetry.org/) for dependency management.\nClone the repository and install the package into a virtual environment:\n\n```bash\npoetry install\n```\n\nIf you prefer `pip`, you can install the library in editable mode after exporting\nPoetry's dependency specification:\n\n```bash\npip install -e .\n```\n\n> **Note:** GPU acceleration relies on PyTorch, CuPy, and cuDF. Make sure you use\n> versions that match the CUDA toolkit available on your system. The versions in\n> `pyproject.toml` target CUDA 12.\n\n## Quickstart\n\nBelow is a minimal example that runs a nominal cis-QTL scan against PLINK-formatted\ngenotypes and BED-formatted phenotypes. The example mirrors the data layout\nexpected by tensorQTL, so **existing preprocessing pipelines can be reused**.\n\n### Standard mapping (tensorQTL equivalent)\n```python\nfrom localqtl import PlinkReader, read_phenotype_bed\nfrom localqtl.cis import map_nominal\n\n# Load genotypes and variant metadata\nplink = PlinkReader(\"data/genotypes\")\ngenotype_df = plink.load_genotypes()\nvariant_df = plink.bim.set_index(\"snp\")[[\"chrom\", \"pos\"]]\n\n# Load phenotypes (BED-style) and their genomic coordinates\nphenotype_df, phenotype_pos_df = read_phenotype_bed(\"data/phenotypes.bed\")\n\n# Optional: load covariates as a DataFrame indexed by sample IDs\ncovariates_df = None\n\nresults = map_nominal(\n    genotype_df=genotype_df,\n    variant_df=variant_df,\n    phenotype_df=phenotype_df,\n    phenotype_pos_df=phenotype_pos_df,\n    covariates_df=covariates_df,\n    window=1_000_000,         # \u00b11 Mb cis window\n    maf_threshold=0.01,       # filter on in-sample MAF\n    device=\"auto\",            # picks CUDA when available, otherwise CPU\n    out_prefix=\"cis_nominal\", # default prefix\n    return_df=True            # default is False, parquet streamed sink\n)\n\nprint(results.head())\n```\n\nFor analyses that combine nominal scans, permutations, and independent signal\ncalling, the `CisMapper` class offers a thin object-oriented fa\u00e7ade:\n\n```python\nfrom localqtl.cis import CisMapper\n\nmapper = CisMapper(\n    genotype_df=genotype_df,\n    variant_df=variant_df,\n    phenotype_df=phenotype_df,\n    phenotype_pos_df=phenotype_pos_df,\n    covariates_df=covariates_df,\n    window=500_000,\n)\n\nmapper.map_nominal(nperm=1_000)\nperm_df = mapper.map_permutations(nperm=1_000, beta_approx=True)\nperm_df = mapper.calculate_qvalues(perm_df, fdr=0.05)\nlead_df = mapper.map_independent(cis_df=perm_df, fdr=0.05)\n```\n\n### Local ancestry-aware mapping\n\nlocalQTL can incorporate local ancestry (e.g., from RFMix) so that cis-xQTL tests are performed with ancestry-aware genotype inputs. To enable this, pass **`haplotypes`** (per-variant \u00d7 per-sample \u00d7 per-ancestry dosages) and **`loci_df`** (variant positions corresponding to the haplotype tensor) to `CisMapper`. When these are provided, `CisMapper` switches to the ancestry-aware input generator under the hood.\n\n```python\nfrom localqtl import PlinkReader, read_phenotype_bed\nfrom localqtl.haplotypeio import RFMixReader\nfrom localqtl.cis import CisMapper\n\nplink = PlinkReader(\"data/genotypes\")\ngenotype_df = plink.load_genotypes()  # samples as columns, variants as rows\nvariant_df = plink.bim.set_index(\"snp\")[[\"chrom\", \"pos\"]]\n\nphenotype_df, phenotype_pos_df = read_phenotype_bed(\"data/phenotypes.bed\")\ncovariates_df = None  # optional\n\n# Local ancestry from RFMix\nrfmix = RFMixReader(\n    prefix_path=\"data/rfmix/prefix\",   # directory with per-chrom outputs + fb.tsv\n    binary_path=\"data/rfmix\",          # where prebuilt binaries live (if used)\n    verbose=True\n)\n\n# Materialize ancestry haplotypes into memory (NumPy array)\n# Shape: (variants, samples, ancestries) for >2 ancestries.\n# For 2 ancestries, the reader exposes the ancestry channel selected internally.\nH = rfmix.load_haplotypes()\nloci_df = rfmix.loci_df  # DataFrame with ['chrom','pos', 'ancestry', 'hap', 'index'] (indexed by 'hap')\n\n# Align samples to the RFMix order (recommended)\n# Keep only intersection and put everything in the same order as rfmix.sample_ids.\nkeep = [sid for sid in rfmix.sample_ids if sid in genotype_df.columns]\ngenotype_df = genotype_df[keep]\nphenotype_df = phenotype_df[keep]\nif covariates_df is not None:\n    covariates_df = covariates_df.loc[keep]\n\n# (Optional) Ensure chromosome dtype matches between variant_df and loci_df\nvariant_df[\"chrom\"] = variant_df[\"chrom\"].astype(str)\nloci_df = loci_df.copy()\nloci_df[\"chrom\"] = loci_df[\"chrom\"].astype(str)\n\n# Ancestry-aware mapping\nmapper = CisMapper(\n    genotype_df=genotype_df,\n    variant_df=variant_df,\n    phenotype_df=phenotype_df,\n    phenotype_pos_df=phenotype_pos_df,\n    covariates_df=covariates_df,\n    haplotypes=H,          # <-- enable local ancestry-aware mode\n    loci_df=loci_df,       # <-- positions that align to H\n    window=1_000_000,\n    device=\"auto\",\n)\n\n# Run nominal scans and permutations as usual; ancestry-awareness is automatic\nmapper.map_nominal(nperm=0)\nperm_df = mapper.map_permutations(nperm=1_000, beta_approx=True)\n\n# FDR using the pure-Python q-value port (no R/rpy2 needed)\nperm_df = mapper.calculate_qvalues(perm_df, fdr=0.05)\n\n# Identify conditionally independent signals\nlead_df = mapper.map_independent(cis_df=perm_df, fdr=0.05)\n\nprint(lead_df.head())\n```\n\n**Input expectations**\n\n* `H` (from `RFMixReader.load_haplotypes()`) is ancestry dosages with shape `(variants, samples, ancestries)` for \u22653 ancestries; for 2 ancestries the reader exposes the configured channel.\n* `loci_df` rows correspond 1:1 to `H`'s and be joinable (by `chrom`/`pos`) to `variant_df` used for genotypes.\n* Sample order in `genotype_df`, `phenotype_df`, and `covariates_df` should match `rfmix.sample_ids` (reindex as shown).\n* The cis-mapping helpers default to ``preload_haplotypes=True`` so ancestry blocks are staged as contiguous tensors on the requested device (GPU or CPU). Override this flag when working under strict memory constraints.\n\n## Testing\n\nRun the test suite with:\n\n```bash\npoetry run pytest\n```\n\nThis exercises the core cis-QTL mapping routines using small synthetic datasets.\n",
    "bugtrack_url": null,
    "license": "GPL-3.0-or-later",
    "summary": "GPU-accelerated ancestry-aware cis-QTL mapping library with tensorQTL-compatible APIs",
    "version": "0.1.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/heart-gen/localQTL/issues",
        "homepage": "https://github.com/heart-gen/localQTL",
        "repository": "https://github.com/heart-gen/localQTL"
    },
    "split_keywords": [
        "genomics",
        " gpu",
        " xqtl",
        " qtl"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4635bd0b8c31fad39e3f537a943a339d6083049deda26dd36d943cd77e24e350",
                "md5": "1087b585f2286b6c8c219bc1d277dc68",
                "sha256": "c9cceff00b6269b5e9e7545f19de8f166a8d4097995131bd9707211f1fee288c"
            },
            "downloads": -1,
            "filename": "localqtl-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1087b585f2286b6c8c219bc1d277dc68",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.15,>=3.11",
            "size": 57297,
            "upload_time": "2025-11-03T18:17:43",
            "upload_time_iso_8601": "2025-11-03T18:17:43.914123Z",
            "url": "https://files.pythonhosted.org/packages/46/35/bd0b8c31fad39e3f537a943a339d6083049deda26dd36d943cd77e24e350/localqtl-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e68769c567f98fb18118e10874cf7fe2860fcd3eabd893be1049e7740e9f236b",
                "md5": "3fa64e86d6207d9cd5267590ca748f53",
                "sha256": "8a2e40c7d352f1c494ec233eaa9f47041266ce4529c81d3d001f9f9a0e241d53"
            },
            "downloads": -1,
            "filename": "localqtl-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "3fa64e86d6207d9cd5267590ca748f53",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.15,>=3.11",
            "size": 48979,
            "upload_time": "2025-11-03T18:17:45",
            "upload_time_iso_8601": "2025-11-03T18:17:45.222042Z",
            "url": "https://files.pythonhosted.org/packages/e6/87/69c567f98fb18118e10874cf7fe2860fcd3eabd893be1049e7740e9f236b/localqtl-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-03 18:17:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "heart-gen",
    "github_project": "localQTL",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "localqtl"
}
        
Elapsed time: 2.51969s