rfmix-reader


Namerfmix-reader JSON
Version 0.1.15 PyPI version JSON
download
home_pagehttps://rfmix-reader.readthedocs.io/en/latest/
SummaryRFMix-reader is a Python package designed to efficiently read and process output files generated by RFMix, a popular tool for estimating local ancestry in admixed populations. The package employs a lazy loading approach, which minimizes memory consumption by reading only the loci that are accessed by the user, rather than loading the entire dataset into memory at once.
upload_time2024-07-02 23:10:12
maintainerKynon JM Benjamin
docs_urlNone
authorKynon JM Benjamin
requires_python<4.0,>=3.9
licenseGPL-3.0
keywords file parser rfmix gpu acceleration local ancestry
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # RFMix-reader
`RFMix-reader` is a Python package designed to efficiently read and process output 
files generated by [`RFMix`](https://github.com/slowkoni/rfmix), 
a popular tool for estimating local ancestry in admixed 
populations. The package employs a lazy loading approach, which minimizes memory 
consumption by reading only the loci that are accessed by the user, rather than 
loading the entire dataset into memory at once. Additionally, we leverage GPU
acceleration to improve computational speed.

## Install
`rfmix-reader` can be installed using [pip](https://pypi.python.org/pypi/pip):

```bash
pip install rfmix-reader
```

**GPU Acceleration:**
`rfmix-reader` leverages GPU acceleration for improved performance. To use this
functionality, you will need to install the following libraries for your specific
CUDA version:
- `RAPIDS`: Refer to official installation guide [here](https://docs.rapids.ai/install)
- `PyTorch`: Installation instructions can be found [here](https://pytorch.org/)

**Additoinal Notes:** 
- We have not tested installation with `Docker` or `Conda` environemnts. Compatibility may vary.
- If you do not have GPU, you can still use the basic functionality of `rfmix-reader`. This is still much faster than processing the files with stardard scripting.


## Key Features

**Lazy Loading**
- Reads data on-the-fly as requested, reducing memory footprint.
- Ideal for working with large RFMix output files that may not fit entirely in memory.

**Efficient Data Access**
- Provides convenient access to specific loci or regions of interest.
- Allows for selective loading of data, enabling faster processing times.

**Seamless Integration**
- Designed to work seamlessly with existing Python data analysis workflows.
- Facilitates downstream analysis and manipulation of `RFMix` output data.

Whether you're working with large-scale genomic datasets or have limited 
computational resources, `RFMix-reader` offers an efficient and memory-conscious 
solution for reading and processing `RFMix` output files. Its lazy loading approach 
ensures optimal resource utilization, making it a valuable tool for researchers 
and bioinformaticians working with admixed population data.

## Usage
This works similarly to [`pandas-plink`]():

### Two population admixture example
This is a two part process.

#### Generate binary files
To reduce computational time and memory, we leverage binary files.
As this is not generated by RFMix, we provide a function to do
this before running.
```python
from rfmix_reader import create_binaries

# Generate binary files
file_path = "examples/two_popuations/out/"
binary_dir = "./binary_files"
create_binaries(file_path, binary_dir=binary_dir)
```

You can also do this on the fly.

```python
from rfmix_reader import read_rfmix

file_path = "examples/two_popuations/out/"
binary_dir = "./binary_files"
loci, rf_q, admix = read_rfmix(file_path, binary_dir=binary_dir,
                               generate_binary=True)
```

We do not have this turned on by default, as it is the
rate limiting step. It can take upwards of 20 to 25 minutes
to run depending on `*fb.tsv` file size.

#### Main function
Once binary files are generated, you can the main function
to process the RFMix results. With GPU this takes less than
5 minutes.

```python
from rfmix_reader import read_rfmix

file_path = "examples/two_popuations/out/"
loci, rf_q, admix = read_rfmix(file_path) 
```
**Note:** `./binary_files` is the default for `binary_dir`, 
so this is an optional parameter.

<!-- #### BED format -->
<!-- One helpful function we provide is `export_loci_admix_to_bed`.  -->
<!-- This function takes the output of the `read_rfmix` and  -->
<!-- exports a BED format with haplotypes condensed to regional -->
<!-- variation in parquet files per chromosome.  -->

<!-- ```python -->
<!-- export_loci_admix_to_bed(loci, rf_q, admix) -->
<!-- ``` -->

<!-- Unlike generating binary files, this takes a large amount of  -->
<!-- memory to write files, so it must be called separately outside -->
<!-- of the main function. -->

### Three population admixture example
`RFMix-reader` is adaptable for as many population admixtures as
needed. However, due to some conservative parameters, the use
of BED formatting will take longer than a two population analysis.

```python
from rfmix_reader import read_rfmix

file_path = "examples/three_popuations/out/"
binary_dir = "./binary_files"
loci, rf_q, admix = read_rfmix(file_path, binary_dir=binary_dir,
                               generate_binary=True)
```

## Authors
* [Kynon JM Benjamin](https://github.com/Krotosbenjamin)

## Citation

Please cite: XXXX.

            

Raw data

            {
    "_id": null,
    "home_page": "https://rfmix-reader.readthedocs.io/en/latest/",
    "name": "rfmix-reader",
    "maintainer": "Kynon JM Benjamin",
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": "kj.benjamin90@gmail.com",
    "keywords": "file parser, rfmix, gpu acceleration, local ancestry",
    "author": "Kynon JM Benjamin",
    "author_email": "kj.benjamin90@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/bb/be/81b12b8498bffe7d9485d1bfe621d4f77818f4cb009df8519a818ce3cf9d/rfmix_reader-0.1.15.tar.gz",
    "platform": null,
    "description": "# RFMix-reader\n`RFMix-reader` is a Python package designed to efficiently read and process output \nfiles generated by [`RFMix`](https://github.com/slowkoni/rfmix), \na popular tool for estimating local ancestry in admixed \npopulations. The package employs a lazy loading approach, which minimizes memory \nconsumption by reading only the loci that are accessed by the user, rather than \nloading the entire dataset into memory at once. Additionally, we leverage GPU\nacceleration to improve computational speed.\n\n## Install\n`rfmix-reader` can be installed using [pip](https://pypi.python.org/pypi/pip):\n\n```bash\npip install rfmix-reader\n```\n\n**GPU Acceleration:**\n`rfmix-reader` leverages GPU acceleration for improved performance. To use this\nfunctionality, you will need to install the following libraries for your specific\nCUDA version:\n- `RAPIDS`: Refer to official installation guide [here](https://docs.rapids.ai/install)\n- `PyTorch`: Installation instructions can be found [here](https://pytorch.org/)\n\n**Additoinal Notes:** \n- We have not tested installation with `Docker` or `Conda` environemnts. Compatibility may vary.\n- If you do not have GPU, you can still use the basic functionality of `rfmix-reader`. This is still much faster than processing the files with stardard scripting.\n\n\n## Key Features\n\n**Lazy Loading**\n- Reads data on-the-fly as requested, reducing memory footprint.\n- Ideal for working with large RFMix output files that may not fit entirely in memory.\n\n**Efficient Data Access**\n- Provides convenient access to specific loci or regions of interest.\n- Allows for selective loading of data, enabling faster processing times.\n\n**Seamless Integration**\n- Designed to work seamlessly with existing Python data analysis workflows.\n- Facilitates downstream analysis and manipulation of `RFMix` output data.\n\nWhether you're working with large-scale genomic datasets or have limited \ncomputational resources, `RFMix-reader` offers an efficient and memory-conscious \nsolution for reading and processing `RFMix` output files. Its lazy loading approach \nensures optimal resource utilization, making it a valuable tool for researchers \nand bioinformaticians working with admixed population data.\n\n## Usage\nThis works similarly to [`pandas-plink`]():\n\n### Two population admixture example\nThis is a two part process.\n\n#### Generate binary files\nTo reduce computational time and memory, we leverage binary files.\nAs this is not generated by RFMix, we provide a function to do\nthis before running.\n```python\nfrom rfmix_reader import create_binaries\n\n# Generate binary files\nfile_path = \"examples/two_popuations/out/\"\nbinary_dir = \"./binary_files\"\ncreate_binaries(file_path, binary_dir=binary_dir)\n```\n\nYou can also do this on the fly.\n\n```python\nfrom rfmix_reader import read_rfmix\n\nfile_path = \"examples/two_popuations/out/\"\nbinary_dir = \"./binary_files\"\nloci, rf_q, admix = read_rfmix(file_path, binary_dir=binary_dir,\n                               generate_binary=True)\n```\n\nWe do not have this turned on by default, as it is the\nrate limiting step. It can take upwards of 20 to 25 minutes\nto run depending on `*fb.tsv` file size.\n\n#### Main function\nOnce binary files are generated, you can the main function\nto process the RFMix results. With GPU this takes less than\n5 minutes.\n\n```python\nfrom rfmix_reader import read_rfmix\n\nfile_path = \"examples/two_popuations/out/\"\nloci, rf_q, admix = read_rfmix(file_path) \n```\n**Note:** `./binary_files` is the default for `binary_dir`, \nso this is an optional parameter.\n\n<!-- #### BED format -->\n<!-- One helpful function we provide is `export_loci_admix_to_bed`.  -->\n<!-- This function takes the output of the `read_rfmix` and  -->\n<!-- exports a BED format with haplotypes condensed to regional -->\n<!-- variation in parquet files per chromosome.  -->\n\n<!-- ```python -->\n<!-- export_loci_admix_to_bed(loci, rf_q, admix) -->\n<!-- ``` -->\n\n<!-- Unlike generating binary files, this takes a large amount of  -->\n<!-- memory to write files, so it must be called separately outside -->\n<!-- of the main function. -->\n\n### Three population admixture example\n`RFMix-reader` is adaptable for as many population admixtures as\nneeded. However, due to some conservative parameters, the use\nof BED formatting will take longer than a two population analysis.\n\n```python\nfrom rfmix_reader import read_rfmix\n\nfile_path = \"examples/three_popuations/out/\"\nbinary_dir = \"./binary_files\"\nloci, rf_q, admix = read_rfmix(file_path, binary_dir=binary_dir,\n                               generate_binary=True)\n```\n\n## Authors\n* [Kynon JM Benjamin](https://github.com/Krotosbenjamin)\n\n## Citation\n\nPlease cite: XXXX.\n",
    "bugtrack_url": null,
    "license": "GPL-3.0",
    "summary": "RFMix-reader is a Python package designed to efficiently read and process output files generated by RFMix, a popular tool for estimating local ancestry in admixed populations. The package employs a lazy loading approach, which minimizes memory consumption by reading only the loci that are accessed by the user, rather than loading the entire dataset into memory at once.",
    "version": "0.1.15",
    "project_urls": {
        "Homepage": "https://rfmix-reader.readthedocs.io/en/latest/",
        "Repository": "https://github.com/heart-gen/rfmix_reader.git"
    },
    "split_keywords": [
        "file parser",
        " rfmix",
        " gpu acceleration",
        " local ancestry"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "90b36491a0b8dbfe56f8ab3d2fb699dec7ab44c2b322f3495e6a12a464011000",
                "md5": "befdeaad4e3f95943c8a27ba025e0e94",
                "sha256": "25ea674e5f9141bc567da2e4a0545d5e21689f43ccc209570b46062ce86028e6"
            },
            "downloads": -1,
            "filename": "rfmix_reader-0.1.15-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "befdeaad4e3f95943c8a27ba025e0e94",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 30984,
            "upload_time": "2024-07-02T23:10:09",
            "upload_time_iso_8601": "2024-07-02T23:10:09.406994Z",
            "url": "https://files.pythonhosted.org/packages/90/b3/6491a0b8dbfe56f8ab3d2fb699dec7ab44c2b322f3495e6a12a464011000/rfmix_reader-0.1.15-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bbbe81b12b8498bffe7d9485d1bfe621d4f77818f4cb009df8519a818ce3cf9d",
                "md5": "2f0c40652a8a00d64efca73c807bd686",
                "sha256": "949fae3571185e85bfae485a8a1711a565a4c752995d11b9f698b440c93d65ba"
            },
            "downloads": -1,
            "filename": "rfmix_reader-0.1.15.tar.gz",
            "has_sig": false,
            "md5_digest": "2f0c40652a8a00d64efca73c807bd686",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 29847,
            "upload_time": "2024-07-02T23:10:12",
            "upload_time_iso_8601": "2024-07-02T23:10:12.121307Z",
            "url": "https://files.pythonhosted.org/packages/bb/be/81b12b8498bffe7d9485d1bfe621d4f77818f4cb009df8519a818ce3cf9d/rfmix_reader-0.1.15.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-02 23:10:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "heart-gen",
    "github_project": "rfmix_reader",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "rfmix-reader"
}
        
Elapsed time: 0.29365s