vcf2pandas


Namevcf2pandas JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/trentzz/vcf2pandas
SummaryPackage to convert a vcf into a pandas dataframe.
upload_time2023-09-14 18:36:01
maintainer
docs_urlNone
authorTrent Zeng
requires_python>=3.10,<4.0
licenseMIT
keywords vcf python pandas
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # vcf2pandas

`vcf2pandas` is a python package to convert vcf files to `pandas` dataframes. 

## Install

```bash
pip install vcf2pandas
```

## Dependencies

- pandas (2.1.0)
- pysam (0.21.0)

## Usage

### Selecting all columns

```python
from vcf2pandas import vcf2pandas
import pandas

df_all = vcf2pandas("path_to_vcf.vcf")
```

### Selecting custom custom columns and samples

```python
info_fields = ["info_field_1", "info_field_2"]
sample_list = ["sample_name_1", "sample_name_2"]
format_fields = ["format_name_1", "format_name_2"]

df_selected = vcf2pandas(
    "path_to_vcf.vcf",
    info_fields=info_fields,
    sample_list=sample_list,
    format_fields=format_fields,
)
```
## Custom column ordering

`vcf2pandas` can select custom/specific:
- INFO fields
- samples
- FORMAT fields

And order the selected columns based on the input list. 

E.g. The following list:
```python
info_fields = ["DP", "MQM", "QA"]
```
Gets the columns (in that order)
```
INFO:DP    INFO:MQM    INFO:QA
```

Note that this **only applies for INFO and FORMAT columns**. That is, the samples will be ordered based on the VCF and not the input list.


## Output

### INFO and FORMAT headings
```
INFO:INFO_FIELD                     e.g. INFO:DP
FORMAT:SAMPLE_NAME:FORMAT_FIELD     e.g. FORMAT:HG002:GT
```

### INFO fields not present for some variants

When certain INFO fields are not present for certain variants, `vcf2pandas` inserts a `.` instead in that cell. E.g. for `vcf3_all.txt` you can see `INFO:GENE` column has `.` for the first 7 variants.


## Examples

Example vcf and output files (dataframes as a .txt file) are available in `examples/`

### Example Usage
```python
df1 = vcf2pandas("examples/vcf1.vcf")
df2 = vcf2pandas("examples/vcf2.vcf")

df3_all = vcf2pandas("examples/vcf3.vcf")

info = ["DP"]
samples = ["HG002"]
format_fields = ["GT", "AO"]
df3_selected = vcf2pandas("examples/vcf3.vcf")
```

To print to a text file:
```python
with open("path_to_txt_file.txt") as f:
    f.write(df.to_string())
```
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/trentzz/vcf2pandas",
    "name": "vcf2pandas",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10,<4.0",
    "maintainer_email": "",
    "keywords": "vcf,python,pandas",
    "author": "Trent Zeng",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/6e/71/2d5fd04e0b87e270510b6891856527f66bb4988278d8bef493855b32e072/vcf2pandas-0.1.0.tar.gz",
    "platform": null,
    "description": "# vcf2pandas\n\n`vcf2pandas` is a python package to convert vcf files to `pandas` dataframes. \n\n## Install\n\n```bash\npip install vcf2pandas\n```\n\n## Dependencies\n\n- pandas (2.1.0)\n- pysam (0.21.0)\n\n## Usage\n\n### Selecting all columns\n\n```python\nfrom vcf2pandas import vcf2pandas\nimport pandas\n\ndf_all = vcf2pandas(\"path_to_vcf.vcf\")\n```\n\n### Selecting custom custom columns and samples\n\n```python\ninfo_fields = [\"info_field_1\", \"info_field_2\"]\nsample_list = [\"sample_name_1\", \"sample_name_2\"]\nformat_fields = [\"format_name_1\", \"format_name_2\"]\n\ndf_selected = vcf2pandas(\n    \"path_to_vcf.vcf\",\n    info_fields=info_fields,\n    sample_list=sample_list,\n    format_fields=format_fields,\n)\n```\n## Custom column ordering\n\n`vcf2pandas` can select custom/specific:\n- INFO fields\n- samples\n- FORMAT fields\n\nAnd order the selected columns based on the input list. \n\nE.g. The following list:\n```python\ninfo_fields = [\"DP\", \"MQM\", \"QA\"]\n```\nGets the columns (in that order)\n```\nINFO:DP    INFO:MQM    INFO:QA\n```\n\nNote that this **only applies for INFO and FORMAT columns**. That is, the samples will be ordered based on the VCF and not the input list.\n\n\n## Output\n\n### INFO and FORMAT headings\n```\nINFO:INFO_FIELD                     e.g. INFO:DP\nFORMAT:SAMPLE_NAME:FORMAT_FIELD     e.g. FORMAT:HG002:GT\n```\n\n### INFO fields not present for some variants\n\nWhen certain INFO fields are not present for certain variants, `vcf2pandas` inserts a `.` instead in that cell. E.g. for `vcf3_all.txt` you can see `INFO:GENE` column has `.` for the first 7 variants.\n\n\n## Examples\n\nExample vcf and output files (dataframes as a .txt file) are available in `examples/`\n\n### Example Usage\n```python\ndf1 = vcf2pandas(\"examples/vcf1.vcf\")\ndf2 = vcf2pandas(\"examples/vcf2.vcf\")\n\ndf3_all = vcf2pandas(\"examples/vcf3.vcf\")\n\ninfo = [\"DP\"]\nsamples = [\"HG002\"]\nformat_fields = [\"GT\", \"AO\"]\ndf3_selected = vcf2pandas(\"examples/vcf3.vcf\")\n```\n\nTo print to a text file:\n```python\nwith open(\"path_to_txt_file.txt\") as f:\n    f.write(df.to_string())\n```",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Package to convert a vcf into a pandas dataframe.",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/trentzz/vcf2pandas",
        "Repository": "https://github.com/trentzz/vcf2pandas"
    },
    "split_keywords": [
        "vcf",
        "python",
        "pandas"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d7de7b1e2ed3579e1928263bde6a63112c9bd164601b7cb36160bde2910a96a5",
                "md5": "c7d5d4bd5046276a2ad57cf89fa77872",
                "sha256": "f18a56c01bc06785ac479b179682686889fedd3152615e1653eb36835a593415"
            },
            "downloads": -1,
            "filename": "vcf2pandas-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c7d5d4bd5046276a2ad57cf89fa77872",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10,<4.0",
            "size": 3584,
            "upload_time": "2023-09-14T18:36:00",
            "upload_time_iso_8601": "2023-09-14T18:36:00.090090Z",
            "url": "https://files.pythonhosted.org/packages/d7/de/7b1e2ed3579e1928263bde6a63112c9bd164601b7cb36160bde2910a96a5/vcf2pandas-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6e712d5fd04e0b87e270510b6891856527f66bb4988278d8bef493855b32e072",
                "md5": "f5896972a1b17258168cd4f20780aa6f",
                "sha256": "ec424e4671a61bba35cff460443ebe2a2bfd160cf06e621314637b061cdc23f5"
            },
            "downloads": -1,
            "filename": "vcf2pandas-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "f5896972a1b17258168cd4f20780aa6f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10,<4.0",
            "size": 2943,
            "upload_time": "2023-09-14T18:36:01",
            "upload_time_iso_8601": "2023-09-14T18:36:01.984276Z",
            "url": "https://files.pythonhosted.org/packages/6e/71/2d5fd04e0b87e270510b6891856527f66bb4988278d8bef493855b32e072/vcf2pandas-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-14 18:36:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "trentzz",
    "github_project": "vcf2pandas",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "vcf2pandas"
}
        
Elapsed time: 0.11949s