vcf2pandas


Namevcf2pandas JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/trentzz/vcf2pandas
SummaryPackage to convert a vcf into a pandas dataframe.
upload_time2024-12-06 00:57:56
maintainerNone
docs_urlNone
authorTrent Zeng
requires_python<4.0,>=3.10
licenseMIT
keywords vcf python pandas
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # vcf2pandas

![PyPI Downloads](https://static.pepy.tech/badge/vcf2pandas/month)
![PyPI Downloads](https://static.pepy.tech/badge/vcf2pandas)

`vcf2pandas` is a python package to convert vcf files to `pandas` dataframes.

## Install

```bash
pip install vcf2pandas
```

## Dependencies

- pandas (2.1.0)
- pysam (0.21.0)

## Usage

### Selecting all columns

```python
from vcf2pandas import vcf2pandas
import pandas

df_all = vcf2pandas("path_to_vcf.vcf")
```

### Selecting custom custom columns and samples

```python
info_fields = ["info_field_1", "info_field_2"]
sample_list = ["sample_name_1", "sample_name_2"]
format_fields = ["format_name_1", "format_name_2"]

df_selected = vcf2pandas(
    "path_to_vcf.vcf",
    info_fields=info_fields,
    sample_list=sample_list,
    format_fields=format_fields,
)
```

## Custom column ordering

`vcf2pandas` can select custom/specific:

- INFO fields
- samples
- FORMAT fields

And order the selected columns based on the input list.

E.g. The following list:

```python
info_fields = ["DP", "MQM", "QA"]
```

Gets the columns (in that order)

```txt
INFO:DP    INFO:MQM    INFO:QA
```

Note that this **only applies for INFO and FORMAT columns**. That is, the samples will be ordered based on the VCF and not the input list.

## Output

### INFO and FORMAT headings

```txt
INFO:INFO_FIELD                     e.g. INFO:DP
FORMAT:SAMPLE_NAME:FORMAT_FIELD     e.g. FORMAT:HG002:GT
```

### INFO fields not present for some variants

When certain INFO fields are not present for certain variants, `vcf2pandas` inserts a `.` instead in that cell. E.g. for `vcf3_all.txt` you can see `INFO:GENE` column has `.` for the first 7 variants.

## Examples

Example vcf and output files (dataframes as a .txt file) are available in `examples/`

### Example Usage

```python
df1_all = vcf2pandas("examples/vcf1.vcf")
df2_all = vcf2pandas("examples/vcf2.vcf")

df3_all = vcf2pandas("examples/vcf3.vcf")

info_fields = ["DP"]
sample_list = ["HG002"]
format_fields = ["GT", "AO"]

df3_selected = vcf2pandas(
    "examples/vcf3.vcf",
    info_fields=info_fields,
    sample_list=sample_list,
    format_fields=format_fields
)
```

To print to a text file:

```python
with open("path_to_txt_file.txt", "w", encoding='utf-8') as f:
    f.write(df.to_string())
```

To recreate the examples, run:

```bash
poetry run python tests/run_examples.py
```

## Changelog

### v0.1.0

- Initial project

### v0.1.1

- Fixed converting variant filter into string properly

### v0.1.2

- Updated pysam version to `0.22.1`

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/trentzz/vcf2pandas",
    "name": "vcf2pandas",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "vcf, python, pandas",
    "author": "Trent Zeng",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/49/db/3b1ab992771cbe7feb189e8c5297f1d3bdd5c7bdf459542c77c9894e78d1/vcf2pandas-0.1.2.tar.gz",
    "platform": null,
    "description": "# vcf2pandas\n\n![PyPI Downloads](https://static.pepy.tech/badge/vcf2pandas/month)\n![PyPI Downloads](https://static.pepy.tech/badge/vcf2pandas)\n\n`vcf2pandas` is a python package to convert vcf files to `pandas` dataframes.\n\n## Install\n\n```bash\npip install vcf2pandas\n```\n\n## Dependencies\n\n- pandas (2.1.0)\n- pysam (0.21.0)\n\n## Usage\n\n### Selecting all columns\n\n```python\nfrom vcf2pandas import vcf2pandas\nimport pandas\n\ndf_all = vcf2pandas(\"path_to_vcf.vcf\")\n```\n\n### Selecting custom custom columns and samples\n\n```python\ninfo_fields = [\"info_field_1\", \"info_field_2\"]\nsample_list = [\"sample_name_1\", \"sample_name_2\"]\nformat_fields = [\"format_name_1\", \"format_name_2\"]\n\ndf_selected = vcf2pandas(\n    \"path_to_vcf.vcf\",\n    info_fields=info_fields,\n    sample_list=sample_list,\n    format_fields=format_fields,\n)\n```\n\n## Custom column ordering\n\n`vcf2pandas` can select custom/specific:\n\n- INFO fields\n- samples\n- FORMAT fields\n\nAnd order the selected columns based on the input list.\n\nE.g. The following list:\n\n```python\ninfo_fields = [\"DP\", \"MQM\", \"QA\"]\n```\n\nGets the columns (in that order)\n\n```txt\nINFO:DP    INFO:MQM    INFO:QA\n```\n\nNote that this **only applies for INFO and FORMAT columns**. That is, the samples will be ordered based on the VCF and not the input list.\n\n## Output\n\n### INFO and FORMAT headings\n\n```txt\nINFO:INFO_FIELD                     e.g. INFO:DP\nFORMAT:SAMPLE_NAME:FORMAT_FIELD     e.g. FORMAT:HG002:GT\n```\n\n### INFO fields not present for some variants\n\nWhen certain INFO fields are not present for certain variants, `vcf2pandas` inserts a `.` instead in that cell. E.g. for `vcf3_all.txt` you can see `INFO:GENE` column has `.` for the first 7 variants.\n\n## Examples\n\nExample vcf and output files (dataframes as a .txt file) are available in `examples/`\n\n### Example Usage\n\n```python\ndf1_all = vcf2pandas(\"examples/vcf1.vcf\")\ndf2_all = vcf2pandas(\"examples/vcf2.vcf\")\n\ndf3_all = vcf2pandas(\"examples/vcf3.vcf\")\n\ninfo_fields = [\"DP\"]\nsample_list = [\"HG002\"]\nformat_fields = [\"GT\", \"AO\"]\n\ndf3_selected = vcf2pandas(\n    \"examples/vcf3.vcf\",\n    info_fields=info_fields,\n    sample_list=sample_list,\n    format_fields=format_fields\n)\n```\n\nTo print to a text file:\n\n```python\nwith open(\"path_to_txt_file.txt\", \"w\", encoding='utf-8') as f:\n    f.write(df.to_string())\n```\n\nTo recreate the examples, run:\n\n```bash\npoetry run python tests/run_examples.py\n```\n\n## Changelog\n\n### v0.1.0\n\n- Initial project\n\n### v0.1.1\n\n- Fixed converting variant filter into string properly\n\n### v0.1.2\n\n- Updated pysam version to `0.22.1`\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Package to convert a vcf into a pandas dataframe.",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/trentzz/vcf2pandas",
        "Repository": "https://github.com/trentzz/vcf2pandas"
    },
    "split_keywords": [
        "vcf",
        " python",
        " pandas"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "57f0c02177f59cf9bb110836b1799fbca5b91b814c1d5685d63a2745087d7668",
                "md5": "6c41ed12818aad37babe8744b8900289",
                "sha256": "25c0aeca3e00d43788c754a1f6b3acbcb3dabe0ff98bd5d1e842336477cea3c2"
            },
            "downloads": -1,
            "filename": "vcf2pandas-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6c41ed12818aad37babe8744b8900289",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 3789,
            "upload_time": "2024-12-06T00:57:53",
            "upload_time_iso_8601": "2024-12-06T00:57:53.518542Z",
            "url": "https://files.pythonhosted.org/packages/57/f0/c02177f59cf9bb110836b1799fbca5b91b814c1d5685d63a2745087d7668/vcf2pandas-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "49db3b1ab992771cbe7feb189e8c5297f1d3bdd5c7bdf459542c77c9894e78d1",
                "md5": "e0dc7bbba94a57c6c3625421ecb11e36",
                "sha256": "a24fd0ec289e3bf55d86de3bac278e07d41d6eaa313fcbdb30091a213c8619e6"
            },
            "downloads": -1,
            "filename": "vcf2pandas-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "e0dc7bbba94a57c6c3625421ecb11e36",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 3124,
            "upload_time": "2024-12-06T00:57:56",
            "upload_time_iso_8601": "2024-12-06T00:57:56.596157Z",
            "url": "https://files.pythonhosted.org/packages/49/db/3b1ab992771cbe7feb189e8c5297f1d3bdd5c7bdf459542c77c9894e78d1/vcf2pandas-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-06 00:57:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "trentzz",
    "github_project": "vcf2pandas",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "vcf2pandas"
}
        
Elapsed time: 0.36778s