========================
Data File Profiler Utils
========================
A Python module for simple data file profiling.
Usage
-----
Installation
------------
.. code-block:: shell
pip install data-file-profiler-utils
Integration
-----------
.. code-block:: python
from import data_file_profiler_utils import Manager as ProfileManager
pm = ProfileManager()
pm.profile_file("/tmp/patient002.vcf")
Exported Console Script
-----------------------
Contents of sample data file:
.. code-block:: shell
cat -n sample.tsv
1 #CHROM POS ID REF ALT QUAL FILTER INFO
2 1 12345 rs567 A G 50 PASS DP=30;AF=0.2;AN=1000;CSQ=missense_variant|HIGH|GeneA|ENSG00000112345|transcriptA|ENST00000234567|protein_coding|1/10|c.123C>T|p.Arg41Trp|123/1000|ensembl
3 2 56789 rs890 T C 44 PASS DP=25;AF=0.1;AN=1200;CSQ=synonymous_variant|MEDIUM|GeneB|ENSG00000123456|transcriptB|ENST00000345678|protein_coding|5/20|c.567A>G|p.Ala189Ala|567/1200|ensembl
4 3 98765 rs123 G T 60 PASS DP=40;AF=0.3;AN=800;CSQ=splice_acceptor_variant|HIGH|GeneC|ENSG00000134567|transcriptC|ENST00000456789|protein_coding|2/15|c.987+1G>T|p.?|987/800|ensembl
5 1 34567 rs456 C A 55 PASS DP=35;AF=0.15;AN=900;CSQ=frameshift_variant|HIGH|GeneX|ENSG00000145678|transcriptX|ENST00000567890|protein_coding|8/25|c.345_346insT|p.Leu116Phefs*12|345/900|ensembl
Invocation of the exported console script:
.. code-block:: shell
profile-data-file --infile /tmp/demo-data-file-profiler-utils/sample.tsv --verbose --outdir /tmp/demo-data-file-profiler-utils/
--logfile was not specified and therefore was set to '/tmp/demo-data-file-profiler-utils/profile_data_file.log'
Wrote profile metadata file '/tmp/demo-data-file-profiler-utils/sample.tsv.profile.txt'
The log file is '/tmp/demo-data-file-profiler-utils/profile_data_file.log'
Execution of '/tmp/data-file-profiler-utils/venv/lib/python3.10/site-packages/data_file_profiler_utils/profile_data_file.py' completed
Contents of the profile report:
.. code-block:: shell
cat -n /tmp/demo-data-file-profiler-utils/sample.tsv.profile.txt
1 ## method-profiled: /tmp/data-file-profiler-utils/venv/lib/python3.10/site-packages/data_file_profiler_utils/manager.py
2 ## date-profiled: 2025-02-15-142732
3 ## profiled-by: sundaram
4 file: /tmp/demo-data-file-profiler-utils/sample.tsv
5 md5sum: 786b82b2414d3acf7af34c068e358759
6 date_created: 2025-02-15 14:06:37.202165
7 file_size: 776
8 line_count: 5
=======
History
=======
0.1.0 (2024-02-10)
------------------
* First release on PyPI.
Raw data
{
"_id": null,
"home_page": "https://github.com/jai-python3/data-file-profiler-utils",
"name": "data-file-profiler-utils",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "data_file_profiler_utils",
"author": "Jaideep Sundaram",
"author_email": "jai.python3@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/78/e9/3b7d5c8202dea3e6856ce838564fdd9dbd7fd8f7d300ad05c7ab41173bc6/data_file_profiler_utils-0.1.7.tar.gz",
"platform": null,
"description": "========================\nData File Profiler Utils\n========================\n\nA Python module for simple data file profiling.\n\n\nUsage\n-----\n\nInstallation\n------------\n\n.. code-block:: shell\n\n pip install data-file-profiler-utils\n\n\nIntegration\n-----------\n\n.. code-block:: python\n\n from import data_file_profiler_utils import Manager as ProfileManager\n\n pm = ProfileManager()\n pm.profile_file(\"/tmp/patient002.vcf\")\n\n\nExported Console Script\n-----------------------\n\nContents of sample data file:\n\n.. code-block:: shell\n\n cat -n sample.tsv \n 1 #CHROM POS ID REF ALT QUAL FILTER INFO\n 2 1 12345 rs567 A G 50 PASS DP=30;AF=0.2;AN=1000;CSQ=missense_variant|HIGH|GeneA|ENSG00000112345|transcriptA|ENST00000234567|protein_coding|1/10|c.123C>T|p.Arg41Trp|123/1000|ensembl\n 3 2 56789 rs890 T C 44 PASS DP=25;AF=0.1;AN=1200;CSQ=synonymous_variant|MEDIUM|GeneB|ENSG00000123456|transcriptB|ENST00000345678|protein_coding|5/20|c.567A>G|p.Ala189Ala|567/1200|ensembl\n 4 3 98765 rs123 G T 60 PASS DP=40;AF=0.3;AN=800;CSQ=splice_acceptor_variant|HIGH|GeneC|ENSG00000134567|transcriptC|ENST00000456789|protein_coding|2/15|c.987+1G>T|p.?|987/800|ensembl\n 5 1 34567 rs456 C A 55 PASS DP=35;AF=0.15;AN=900;CSQ=frameshift_variant|HIGH|GeneX|ENSG00000145678|transcriptX|ENST00000567890|protein_coding|8/25|c.345_346insT|p.Leu116Phefs*12|345/900|ensembl\n\n\nInvocation of the exported console script:\n\n.. code-block:: shell\n \n profile-data-file --infile /tmp/demo-data-file-profiler-utils/sample.tsv --verbose --outdir /tmp/demo-data-file-profiler-utils/\n --logfile was not specified and therefore was set to '/tmp/demo-data-file-profiler-utils/profile_data_file.log'\n Wrote profile metadata file '/tmp/demo-data-file-profiler-utils/sample.tsv.profile.txt'\n The log file is '/tmp/demo-data-file-profiler-utils/profile_data_file.log'\n Execution of '/tmp/data-file-profiler-utils/venv/lib/python3.10/site-packages/data_file_profiler_utils/profile_data_file.py' completed\n\n\nContents of the profile report:\n\n.. code-block:: shell\n\n cat -n /tmp/demo-data-file-profiler-utils/sample.tsv.profile.txt\n 1 ## method-profiled: /tmp/data-file-profiler-utils/venv/lib/python3.10/site-packages/data_file_profiler_utils/manager.py\n 2 ## date-profiled: 2025-02-15-142732\n 3 ## profiled-by: sundaram\n 4 file: /tmp/demo-data-file-profiler-utils/sample.tsv\n 5 md5sum: 786b82b2414d3acf7af34c068e358759\n 6 date_created: 2025-02-15 14:06:37.202165\n 7 file_size: 776\n 8 line_count: 5\n\n\n=======\nHistory\n=======\n\n0.1.0 (2024-02-10)\n------------------\n\n* First release on PyPI.\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python module for simple data file profiling.",
"version": "0.1.7",
"project_urls": {
"Homepage": "https://github.com/jai-python3/data-file-profiler-utils"
},
"split_keywords": [
"data_file_profiler_utils"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c54d1726ae493b1fce7c6c8d0b440f6512fc50bd583a18fe006f4259a21381c0",
"md5": "49d0ca19356b4eb040a9ff9eb1f45748",
"sha256": "52d9ac6e474f74344926119d26561f1d9d317ce198ed88abb6a6b4bf15af2610"
},
"downloads": -1,
"filename": "data_file_profiler_utils-0.1.7-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "49d0ca19356b4eb040a9ff9eb1f45748",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.10",
"size": 9504,
"upload_time": "2025-02-15T19:53:10",
"upload_time_iso_8601": "2025-02-15T19:53:10.924427Z",
"url": "https://files.pythonhosted.org/packages/c5/4d/1726ae493b1fce7c6c8d0b440f6512fc50bd583a18fe006f4259a21381c0/data_file_profiler_utils-0.1.7-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "78e93b7d5c8202dea3e6856ce838564fdd9dbd7fd8f7d300ad05c7ab41173bc6",
"md5": "bf9d9907ef467e7e4da590ce1cb38022",
"sha256": "e78ada4c85bb514fa597ea50169d235fa8c156cb7591189de5cfb7fe1112b6ad"
},
"downloads": -1,
"filename": "data_file_profiler_utils-0.1.7.tar.gz",
"has_sig": false,
"md5_digest": "bf9d9907ef467e7e4da590ce1cb38022",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 14557,
"upload_time": "2025-02-15T19:53:12",
"upload_time_iso_8601": "2025-02-15T19:53:12.735560Z",
"url": "https://files.pythonhosted.org/packages/78/e9/3b7d5c8202dea3e6856ce838564fdd9dbd7fd8f7d300ad05c7ab41173bc6/data_file_profiler_utils-0.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-15 19:53:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jai-python3",
"github_project": "data-file-profiler-utils",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "click",
"specs": []
},
{
"name": "colorama",
"specs": []
}
],
"tox": true,
"lcname": "data-file-profiler-utils"
}