# vcf2pandas
`vcf2pandas` is a python package to convert vcf files to `pandas` dataframes.
## Install
```bash
pip install vcf2pandas
```
## Dependencies
- pandas (2.1.0)
- pysam (0.21.0)
## Usage
### Selecting all columns
```python
from vcf2pandas import vcf2pandas
import pandas
df_all = vcf2pandas("path_to_vcf.vcf")
```
### Selecting custom custom columns and samples
```python
info_fields = ["info_field_1", "info_field_2"]
sample_list = ["sample_name_1", "sample_name_2"]
format_fields = ["format_name_1", "format_name_2"]
df_selected = vcf2pandas(
"path_to_vcf.vcf",
info_fields=info_fields,
sample_list=sample_list,
format_fields=format_fields,
)
```
## Custom column ordering
`vcf2pandas` can select custom/specific:
- INFO fields
- samples
- FORMAT fields
And order the selected columns based on the input list.
E.g. The following list:
```python
info_fields = ["DP", "MQM", "QA"]
```
Gets the columns (in that order)
```
INFO:DP INFO:MQM INFO:QA
```
Note that this **only applies for INFO and FORMAT columns**. That is, the samples will be ordered based on the VCF and not the input list.
## Output
### INFO and FORMAT headings
```
INFO:INFO_FIELD e.g. INFO:DP
FORMAT:SAMPLE_NAME:FORMAT_FIELD e.g. FORMAT:HG002:GT
```
### INFO fields not present for some variants
When certain INFO fields are not present for certain variants, `vcf2pandas` inserts a `.` instead in that cell. E.g. for `vcf3_all.txt` you can see `INFO:GENE` column has `.` for the first 7 variants.
## Examples
Example vcf and output files (dataframes as a .txt file) are available in `examples/`
### Example Usage
```python
df1 = vcf2pandas("examples/vcf1.vcf")
df2 = vcf2pandas("examples/vcf2.vcf")
df3_all = vcf2pandas("examples/vcf3.vcf")
info = ["DP"]
samples = ["HG002"]
format_fields = ["GT", "AO"]
df3_selected = vcf2pandas("examples/vcf3.vcf")
```
To print to a text file:
```python
with open("path_to_txt_file.txt") as f:
f.write(df.to_string())
```
Raw data
{
"_id": null,
"home_page": "https://github.com/trentzz/vcf2pandas",
"name": "vcf2pandas",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10,<4.0",
"maintainer_email": "",
"keywords": "vcf,python,pandas",
"author": "Trent Zeng",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/6e/71/2d5fd04e0b87e270510b6891856527f66bb4988278d8bef493855b32e072/vcf2pandas-0.1.0.tar.gz",
"platform": null,
"description": "# vcf2pandas\n\n`vcf2pandas` is a python package to convert vcf files to `pandas` dataframes. \n\n## Install\n\n```bash\npip install vcf2pandas\n```\n\n## Dependencies\n\n- pandas (2.1.0)\n- pysam (0.21.0)\n\n## Usage\n\n### Selecting all columns\n\n```python\nfrom vcf2pandas import vcf2pandas\nimport pandas\n\ndf_all = vcf2pandas(\"path_to_vcf.vcf\")\n```\n\n### Selecting custom custom columns and samples\n\n```python\ninfo_fields = [\"info_field_1\", \"info_field_2\"]\nsample_list = [\"sample_name_1\", \"sample_name_2\"]\nformat_fields = [\"format_name_1\", \"format_name_2\"]\n\ndf_selected = vcf2pandas(\n \"path_to_vcf.vcf\",\n info_fields=info_fields,\n sample_list=sample_list,\n format_fields=format_fields,\n)\n```\n## Custom column ordering\n\n`vcf2pandas` can select custom/specific:\n- INFO fields\n- samples\n- FORMAT fields\n\nAnd order the selected columns based on the input list. \n\nE.g. The following list:\n```python\ninfo_fields = [\"DP\", \"MQM\", \"QA\"]\n```\nGets the columns (in that order)\n```\nINFO:DP INFO:MQM INFO:QA\n```\n\nNote that this **only applies for INFO and FORMAT columns**. That is, the samples will be ordered based on the VCF and not the input list.\n\n\n## Output\n\n### INFO and FORMAT headings\n```\nINFO:INFO_FIELD e.g. INFO:DP\nFORMAT:SAMPLE_NAME:FORMAT_FIELD e.g. FORMAT:HG002:GT\n```\n\n### INFO fields not present for some variants\n\nWhen certain INFO fields are not present for certain variants, `vcf2pandas` inserts a `.` instead in that cell. E.g. for `vcf3_all.txt` you can see `INFO:GENE` column has `.` for the first 7 variants.\n\n\n## Examples\n\nExample vcf and output files (dataframes as a .txt file) are available in `examples/`\n\n### Example Usage\n```python\ndf1 = vcf2pandas(\"examples/vcf1.vcf\")\ndf2 = vcf2pandas(\"examples/vcf2.vcf\")\n\ndf3_all = vcf2pandas(\"examples/vcf3.vcf\")\n\ninfo = [\"DP\"]\nsamples = [\"HG002\"]\nformat_fields = [\"GT\", \"AO\"]\ndf3_selected = vcf2pandas(\"examples/vcf3.vcf\")\n```\n\nTo print to a text file:\n```python\nwith open(\"path_to_txt_file.txt\") as f:\n f.write(df.to_string())\n```",
"bugtrack_url": null,
"license": "MIT",
"summary": "Package to convert a vcf into a pandas dataframe.",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/trentzz/vcf2pandas",
"Repository": "https://github.com/trentzz/vcf2pandas"
},
"split_keywords": [
"vcf",
"python",
"pandas"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d7de7b1e2ed3579e1928263bde6a63112c9bd164601b7cb36160bde2910a96a5",
"md5": "c7d5d4bd5046276a2ad57cf89fa77872",
"sha256": "f18a56c01bc06785ac479b179682686889fedd3152615e1653eb36835a593415"
},
"downloads": -1,
"filename": "vcf2pandas-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c7d5d4bd5046276a2ad57cf89fa77872",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10,<4.0",
"size": 3584,
"upload_time": "2023-09-14T18:36:00",
"upload_time_iso_8601": "2023-09-14T18:36:00.090090Z",
"url": "https://files.pythonhosted.org/packages/d7/de/7b1e2ed3579e1928263bde6a63112c9bd164601b7cb36160bde2910a96a5/vcf2pandas-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6e712d5fd04e0b87e270510b6891856527f66bb4988278d8bef493855b32e072",
"md5": "f5896972a1b17258168cd4f20780aa6f",
"sha256": "ec424e4671a61bba35cff460443ebe2a2bfd160cf06e621314637b061cdc23f5"
},
"downloads": -1,
"filename": "vcf2pandas-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "f5896972a1b17258168cd4f20780aa6f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10,<4.0",
"size": 2943,
"upload_time": "2023-09-14T18:36:01",
"upload_time_iso_8601": "2023-09-14T18:36:01.984276Z",
"url": "https://files.pythonhosted.org/packages/6e/71/2d5fd04e0b87e270510b6891856527f66bb4988278d8bef493855b32e072/vcf2pandas-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-14 18:36:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "trentzz",
"github_project": "vcf2pandas",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "vcf2pandas"
}