# vcf2pandas
![PyPI Downloads](https://static.pepy.tech/badge/vcf2pandas/month)
![PyPI Downloads](https://static.pepy.tech/badge/vcf2pandas)
`vcf2pandas` is a python package to convert vcf files to `pandas` dataframes.
## Install
```bash
pip install vcf2pandas
```
## Dependencies
- pandas (2.1.0)
- pysam (0.21.0)
## Usage
### Selecting all columns
```python
from vcf2pandas import vcf2pandas
import pandas
df_all = vcf2pandas("path_to_vcf.vcf")
```
### Selecting custom custom columns and samples
```python
info_fields = ["info_field_1", "info_field_2"]
sample_list = ["sample_name_1", "sample_name_2"]
format_fields = ["format_name_1", "format_name_2"]
df_selected = vcf2pandas(
"path_to_vcf.vcf",
info_fields=info_fields,
sample_list=sample_list,
format_fields=format_fields,
)
```
## Custom column ordering
`vcf2pandas` can select custom/specific:
- INFO fields
- samples
- FORMAT fields
And order the selected columns based on the input list.
E.g. The following list:
```python
info_fields = ["DP", "MQM", "QA"]
```
Gets the columns (in that order)
```txt
INFO:DP INFO:MQM INFO:QA
```
Note that this **only applies for INFO and FORMAT columns**. That is, the samples will be ordered based on the VCF and not the input list.
## Output
### INFO and FORMAT headings
```txt
INFO:INFO_FIELD e.g. INFO:DP
FORMAT:SAMPLE_NAME:FORMAT_FIELD e.g. FORMAT:HG002:GT
```
### INFO fields not present for some variants
When certain INFO fields are not present for certain variants, `vcf2pandas` inserts a `.` instead in that cell. E.g. for `vcf3_all.txt` you can see `INFO:GENE` column has `.` for the first 7 variants.
## Examples
Example vcf and output files (dataframes as a .txt file) are available in `examples/`
### Example Usage
```python
df1_all = vcf2pandas("examples/vcf1.vcf")
df2_all = vcf2pandas("examples/vcf2.vcf")
df3_all = vcf2pandas("examples/vcf3.vcf")
info_fields = ["DP"]
sample_list = ["HG002"]
format_fields = ["GT", "AO"]
df3_selected = vcf2pandas(
"examples/vcf3.vcf",
info_fields=info_fields,
sample_list=sample_list,
format_fields=format_fields
)
```
To print to a text file:
```python
with open("path_to_txt_file.txt", "w", encoding='utf-8') as f:
f.write(df.to_string())
```
To recreate the examples, run:
```bash
poetry run python tests/run_examples.py
```
## Changelog
### v0.1.0
- Initial project
### v0.1.1
- Fixed converting variant filter into string properly
### v0.1.2
- Updated pysam version to `0.22.1`
Raw data
{
"_id": null,
"home_page": "https://github.com/trentzz/vcf2pandas",
"name": "vcf2pandas",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": "vcf, python, pandas",
"author": "Trent Zeng",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/49/db/3b1ab992771cbe7feb189e8c5297f1d3bdd5c7bdf459542c77c9894e78d1/vcf2pandas-0.1.2.tar.gz",
"platform": null,
"description": "# vcf2pandas\n\n![PyPI Downloads](https://static.pepy.tech/badge/vcf2pandas/month)\n![PyPI Downloads](https://static.pepy.tech/badge/vcf2pandas)\n\n`vcf2pandas` is a python package to convert vcf files to `pandas` dataframes.\n\n## Install\n\n```bash\npip install vcf2pandas\n```\n\n## Dependencies\n\n- pandas (2.1.0)\n- pysam (0.21.0)\n\n## Usage\n\n### Selecting all columns\n\n```python\nfrom vcf2pandas import vcf2pandas\nimport pandas\n\ndf_all = vcf2pandas(\"path_to_vcf.vcf\")\n```\n\n### Selecting custom custom columns and samples\n\n```python\ninfo_fields = [\"info_field_1\", \"info_field_2\"]\nsample_list = [\"sample_name_1\", \"sample_name_2\"]\nformat_fields = [\"format_name_1\", \"format_name_2\"]\n\ndf_selected = vcf2pandas(\n \"path_to_vcf.vcf\",\n info_fields=info_fields,\n sample_list=sample_list,\n format_fields=format_fields,\n)\n```\n\n## Custom column ordering\n\n`vcf2pandas` can select custom/specific:\n\n- INFO fields\n- samples\n- FORMAT fields\n\nAnd order the selected columns based on the input list.\n\nE.g. The following list:\n\n```python\ninfo_fields = [\"DP\", \"MQM\", \"QA\"]\n```\n\nGets the columns (in that order)\n\n```txt\nINFO:DP INFO:MQM INFO:QA\n```\n\nNote that this **only applies for INFO and FORMAT columns**. That is, the samples will be ordered based on the VCF and not the input list.\n\n## Output\n\n### INFO and FORMAT headings\n\n```txt\nINFO:INFO_FIELD e.g. INFO:DP\nFORMAT:SAMPLE_NAME:FORMAT_FIELD e.g. FORMAT:HG002:GT\n```\n\n### INFO fields not present for some variants\n\nWhen certain INFO fields are not present for certain variants, `vcf2pandas` inserts a `.` instead in that cell. E.g. for `vcf3_all.txt` you can see `INFO:GENE` column has `.` for the first 7 variants.\n\n## Examples\n\nExample vcf and output files (dataframes as a .txt file) are available in `examples/`\n\n### Example Usage\n\n```python\ndf1_all = vcf2pandas(\"examples/vcf1.vcf\")\ndf2_all = vcf2pandas(\"examples/vcf2.vcf\")\n\ndf3_all = vcf2pandas(\"examples/vcf3.vcf\")\n\ninfo_fields = [\"DP\"]\nsample_list = [\"HG002\"]\nformat_fields = [\"GT\", \"AO\"]\n\ndf3_selected = vcf2pandas(\n \"examples/vcf3.vcf\",\n info_fields=info_fields,\n sample_list=sample_list,\n format_fields=format_fields\n)\n```\n\nTo print to a text file:\n\n```python\nwith open(\"path_to_txt_file.txt\", \"w\", encoding='utf-8') as f:\n f.write(df.to_string())\n```\n\nTo recreate the examples, run:\n\n```bash\npoetry run python tests/run_examples.py\n```\n\n## Changelog\n\n### v0.1.0\n\n- Initial project\n\n### v0.1.1\n\n- Fixed converting variant filter into string properly\n\n### v0.1.2\n\n- Updated pysam version to `0.22.1`\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Package to convert a vcf into a pandas dataframe.",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/trentzz/vcf2pandas",
"Repository": "https://github.com/trentzz/vcf2pandas"
},
"split_keywords": [
"vcf",
" python",
" pandas"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "57f0c02177f59cf9bb110836b1799fbca5b91b814c1d5685d63a2745087d7668",
"md5": "6c41ed12818aad37babe8744b8900289",
"sha256": "25c0aeca3e00d43788c754a1f6b3acbcb3dabe0ff98bd5d1e842336477cea3c2"
},
"downloads": -1,
"filename": "vcf2pandas-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6c41ed12818aad37babe8744b8900289",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 3789,
"upload_time": "2024-12-06T00:57:53",
"upload_time_iso_8601": "2024-12-06T00:57:53.518542Z",
"url": "https://files.pythonhosted.org/packages/57/f0/c02177f59cf9bb110836b1799fbca5b91b814c1d5685d63a2745087d7668/vcf2pandas-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "49db3b1ab992771cbe7feb189e8c5297f1d3bdd5c7bdf459542c77c9894e78d1",
"md5": "e0dc7bbba94a57c6c3625421ecb11e36",
"sha256": "a24fd0ec289e3bf55d86de3bac278e07d41d6eaa313fcbdb30091a213c8619e6"
},
"downloads": -1,
"filename": "vcf2pandas-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "e0dc7bbba94a57c6c3625421ecb11e36",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 3124,
"upload_time": "2024-12-06T00:57:56",
"upload_time_iso_8601": "2024-12-06T00:57:56.596157Z",
"url": "https://files.pythonhosted.org/packages/49/db/3b1ab992771cbe7feb189e8c5297f1d3bdd5c7bdf459542c77c9894e78d1/vcf2pandas-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-06 00:57:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "trentzz",
"github_project": "vcf2pandas",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "vcf2pandas"
}