==========
tsv2sqlite
==========
Package for parsing a tab-delimited data file and loading the lines into SQLite3 database
Installation
.. code-block:: shell
pip install tsv2sqlite
tsv2sqlite
----------
The software will automatically assign datatype TEXT for all columns.
The target SQLite3 database file name can be specified on invocation via ``--database_file``.
The default database file name will be the basename of the input file with the filename extension replaced with .sqlite.
The target table name can be specified on invocation via ``--table_name``.
The default table name will be the basename of the input file with the filename extension removed.
The configuration file can be used to specify the datatype for each of the columns.
The configuration file can be used to specify which columns should be excluded.
Configure column datatypes
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: yaml
table_schema:
sample:
CHROM: str
POS: int
ID: str
REF: str
ALT: str
QUAL: int
FILTER: str
INFO: str
Configure column exclusion list
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: yaml
ignore_columns:
- FILTER
- INFO
add-header-row
~~~~~~~~~~~~~~
The software will write a copy of the input tab-delimited file that does not have a header row.
The inserted header row will have column names:
- col_1
- col_2
- col_3
- etc.
Edit configuration file
~~~~~~~~~~~~~~~~~~~~~~~
Edit your configuration file e.g.: ``tsv2sqlite/conf/config.yaml``.
No column mapping mode
----------------------
Run tsv2sqlite with ``--no_column_mapping`` option.
.. code-block:: shell
tsv2sqlite --infile ~/projects/tsv2sqlite/sample.tsv --config_file ~/projects/tsv2sqlite/tsv2sqlite/conf/config.yaml --no_column_mapping
--outdir was not specified and therefore was set to '/tmp/tsv2sqlite/2023-12-22-102420'
Created output directory '/tmp/tsv2sqlite/2023-12-22-102420'
--logfile was not specified and therefore was set to '/tmp/tsv2sqlite/2023-12-22-102420/main.log'
--database_file was not specified and therefore was set to '/home/sundaram/projects/tsv2sqlite/sample.tsv.sqlite3'
The log file is '/tmp/tsv2sqlite/2023-12-22-102420/main.log'
Execution of '/home/sundaram/projects/tsv2sqlite/venv/lib/python3.10/site-packages/tsv2sqlite/main.py' completed
Observe the SQLite3 database file created.
.. code-block:: shell
(venv) ➜ tsv2sqlite git:(main) ✗ ls -ltr sample.tsv.sqlite3
-rw-r--r-- 1 sundaram sundaram 12288 Dec 22 10:24 sample.tsv.sqlite3
Connect to the SQLite3 database.
.. code-block:: shell
(venv) ➜ tsv2sqlite git:(main) ✗ sqlite3 sample.tsv.sqlite3
SQLite version 3.37.2 2022-01-06 13:25:41
Enter ".help" for usage hints.
sqlite>
Observe the two tables created.
.. code-block:: sql
sqlite> .schema
CREATE TABLE provenance (
bytesize INTEGER NOT NULL,
abspath TEXT NOT NULL,
md5checksum TEXT NOT NULL,
date_created TEXT NOT NULL
);
CREATE TABLE sample (CHROM TEXT,
POS TEXT,
ID TEXT,
REF TEXT,
ALT TEXT,
QUAL TEXT,
line_number INTEGER NOT NULL);
sqlite>
Execute queries against both tables.
.. code-block:: sql
sqlite> select * from provenance;
776|/home/sundaram/projects/tsv2sqlite/sample.tsv|786b82b2414d3acf7af34c068e358759|2023-11-12 21:31:17.307717
sqlite> select * from sample;
1|12345|rs567|A|G|50|3
2|56789|rs890|T|C|44|4
3|98765|rs123|G|T|60|5
1|34567|rs456|C|A|55|6
sqlite>
Column mapping mode
-------------------
Column mapping was implemented to address a particular use case and may be deprecated in the near future.
To use the exported script for parsing a tab-delimited file to be loaded into a SQLite3 database instance:
.. code-block:: shell
tsv2sqlite --infile sample.tsv
--config_file was not specified and therefore was set to '/home/sundaram/projects/tsv2sqlite/venv/lib/python3.10/site-packages/tsv2sqlite/conf/config.yaml'
--outdir was not specified and therefore was set to '/tmp/tsv2sqlite/2023-11-13-053537'
Created output directory '/tmp/tsv2sqlite/2023-11-13-053537'
--logfile was not specified and therefore was set to '/tmp/tsv2sqlite/2023-11-13-053537/main.log'
--database_file was not specified and therefore was set to 'sample.tsv.sqlite3'
The log file is '/tmp/tsv2sqlite/2023-11-13-053537/main.log'
Execution of '/home/sundaram/projects/tsv2sqlite/venv/lib/python3.10/site-packages/tsv2sqlite/main.py' completed
Execute SQL queries
~~~~~~~~~~~~~~~~~~~
Connect to the SQLite3 database:
.. code-block:: shell
sqlite3 sample.tsv.sqlite3
Inspect the file metadata:
.. code-block:: sql
sqlite> .schema provenance
CREATE TABLE provenance (
bytesize INTEGER NOT NULL,
abspath TEXT NOT NULL,
md5checksum TEXT NOT NULL,
date_created TEXT NOT NULL
);
sqlite> select * from provenance;
776|/home/sundaram/projects/tsv2sqlite/sample.tsv|786b82b2414d3acf7af34c068e358759|2023-11-12 21:31:17.307717
Inspect the column mappings:
.. code-block:: sql
sqlite> .schema columnmaps
CREATE TABLE columnmaps (
column_num INTEGER NOT NULL PRIMARY KEY,
name TEXT NOT NULL,
norm_name TEXT NOT NULL,
UNIQUE (name, norm_name)
);
sqlite>
sqlite> select * from columnmaps;
0|#CHROM|_CHROM
1|POS|POS
2|ID|ID
3|REF|REF
4|ALT|ALT
5|QUAL|QUAL
sqlite>
Inspect the records table:
.. code-block:: sql
sqlite> .schema records
CREATE TABLE records (
line_num INTEGER NOT NULL,
column_num INTEGER NOT NULL,
value TEXT NOT NULL,
UNIQUE (line_num, column_num),
FOREIGN KEY (column_num) REFERENCES columnmaps (column_num)
);
sqlite> select * from records;
3|0|1
3|1|12345
3|2|rs567
3|3|A
3|4|G
3|5|50
4|0|2
4|1|56789
4|2|rs890
4|3|T
4|4|C
4|5|44
5|0|3
5|1|98765
5|2|rs123
5|3|G
5|4|T
5|5|60
6|0|1
6|1|34567
6|2|rs456
6|3|C
6|4|A
6|5|55
sqlite>
Get the ID values for each line:
.. code-block:: sql
sqlite> select r_id.value, r_id.line_num
from records r_id, columnmaps c_id
where c_id.name = "ID"
and c_id.column_num = r_id.column_num;
rs567|3
rs890|4
rs123|5
rs456|6
sqlite>
Get the ID, REF and line number:
.. code-block:: sql
sqlite> select r_id.value, r_ref.value, r_ref.line_num
from records r_id, records r_ref, columnmaps c_id, columnmaps c_ref
where c_ref.name = "REF"
and c_id.name = "ID"
and c_id.column_num = r_id.column_num
and c_ref.column_num = r_ref.column_num
and r_id.line_num = r_ref.line_num;
rs567|A|3
rs890|T|4
rs123|G|5
rs456|C|6
sqlite>
=======
History
=======
0.1.0 (2023-11-12)
------------------
* First release on PyPI.
Raw data
{
"_id": null,
"home_page": "https://github.com/jai-python3/tsv2sqlite",
"name": "tsv2sqlite",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "tsv2sqlite",
"author": "Jaideep Sundaram",
"author_email": "jai.python3@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/24/5f/39206a584ed984c12fc509cac980d0893d59af9d611c756f4f2f27450136/tsv2sqlite-0.5.1.tar.gz",
"platform": null,
"description": "==========\ntsv2sqlite\n==========\n\nPackage for parsing a tab-delimited data file and loading the lines into SQLite3 database\n\n\nInstallation\n\n.. code-block:: shell\n\n pip install tsv2sqlite\n\n\ntsv2sqlite\n----------\nThe software will automatically assign datatype TEXT for all columns.\nThe target SQLite3 database file name can be specified on invocation via ``--database_file``.\nThe default database file name will be the basename of the input file with the filename extension replaced with .sqlite.\n\nThe target table name can be specified on invocation via ``--table_name``.\nThe default table name will be the basename of the input file with the filename extension removed.\n\nThe configuration file can be used to specify the datatype for each of the columns.\nThe configuration file can be used to specify which columns should be excluded.\n\nConfigure column datatypes\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: yaml\n\n table_schema:\n sample:\n CHROM: str\n POS: int\n ID: str\n REF: str\n ALT: str\n QUAL: int\n FILTER: str\n INFO: str\n\nConfigure column exclusion list\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: yaml\n\n ignore_columns:\n - FILTER\n - INFO\n\nadd-header-row\n~~~~~~~~~~~~~~\n\nThe software will write a copy of the input tab-delimited file that does not have a header row.\nThe inserted header row will have column names:\n- col_1\n- col_2\n- col_3\n- etc.\n\n\nEdit configuration file\n~~~~~~~~~~~~~~~~~~~~~~~\n\nEdit your configuration file e.g.: ``tsv2sqlite/conf/config.yaml``.\n\n\nNo column mapping mode\n----------------------\n\nRun tsv2sqlite with ``--no_column_mapping`` option.\n\n.. code-block:: shell\n\n tsv2sqlite --infile ~/projects/tsv2sqlite/sample.tsv --config_file ~/projects/tsv2sqlite/tsv2sqlite/conf/config.yaml --no_column_mapping\n --outdir was not specified and therefore was set to '/tmp/tsv2sqlite/2023-12-22-102420'\n Created output directory '/tmp/tsv2sqlite/2023-12-22-102420'\n --logfile was not specified and therefore was set to '/tmp/tsv2sqlite/2023-12-22-102420/main.log'\n --database_file was not specified and therefore was set to '/home/sundaram/projects/tsv2sqlite/sample.tsv.sqlite3'\n The log file is '/tmp/tsv2sqlite/2023-12-22-102420/main.log'\n Execution of '/home/sundaram/projects/tsv2sqlite/venv/lib/python3.10/site-packages/tsv2sqlite/main.py' completed\n\nObserve the SQLite3 database file created.\n\n.. code-block:: shell\n\n (venv) \u279c tsv2sqlite git:(main) \u2717 ls -ltr sample.tsv.sqlite3\n -rw-r--r-- 1 sundaram sundaram 12288 Dec 22 10:24 sample.tsv.sqlite3\n\nConnect to the SQLite3 database.\n\n.. code-block:: shell\n\n (venv) \u279c tsv2sqlite git:(main) \u2717 sqlite3 sample.tsv.sqlite3\n SQLite version 3.37.2 2022-01-06 13:25:41\n Enter \".help\" for usage hints.\n sqlite>\n\nObserve the two tables created.\n\n.. code-block:: sql\n\n sqlite> .schema\n CREATE TABLE provenance (\n bytesize INTEGER NOT NULL,\n abspath TEXT NOT NULL,\n md5checksum TEXT NOT NULL,\n date_created TEXT NOT NULL\n );\n CREATE TABLE sample (CHROM TEXT,\n POS TEXT,\n ID TEXT,\n REF TEXT,\n ALT TEXT,\n QUAL TEXT,\n line_number INTEGER NOT NULL);\n sqlite> \n\nExecute queries against both tables.\n\n.. code-block:: sql\n\n sqlite> select * from provenance;\n 776|/home/sundaram/projects/tsv2sqlite/sample.tsv|786b82b2414d3acf7af34c068e358759|2023-11-12 21:31:17.307717\n sqlite> select * from sample;\n 1|12345|rs567|A|G|50|3\n 2|56789|rs890|T|C|44|4\n 3|98765|rs123|G|T|60|5\n 1|34567|rs456|C|A|55|6\n sqlite> \n\nColumn mapping mode\n-------------------\n\nColumn mapping was implemented to address a particular use case and may be deprecated in the near future.\n\nTo use the exported script for parsing a tab-delimited file to be loaded into a SQLite3 database instance:\n\n.. code-block:: shell\n\n tsv2sqlite --infile sample.tsv\n --config_file was not specified and therefore was set to '/home/sundaram/projects/tsv2sqlite/venv/lib/python3.10/site-packages/tsv2sqlite/conf/config.yaml'\n --outdir was not specified and therefore was set to '/tmp/tsv2sqlite/2023-11-13-053537'\n Created output directory '/tmp/tsv2sqlite/2023-11-13-053537'\n --logfile was not specified and therefore was set to '/tmp/tsv2sqlite/2023-11-13-053537/main.log'\n --database_file was not specified and therefore was set to 'sample.tsv.sqlite3'\n The log file is '/tmp/tsv2sqlite/2023-11-13-053537/main.log'\n Execution of '/home/sundaram/projects/tsv2sqlite/venv/lib/python3.10/site-packages/tsv2sqlite/main.py' completed\n\nExecute SQL queries\n~~~~~~~~~~~~~~~~~~~\n\nConnect to the SQLite3 database:\n\n.. code-block:: shell\n\n sqlite3 sample.tsv.sqlite3\n\nInspect the file metadata:\n\n.. code-block:: sql\n\n sqlite> .schema provenance\n CREATE TABLE provenance (\n bytesize INTEGER NOT NULL,\n abspath TEXT NOT NULL,\n md5checksum TEXT NOT NULL,\n date_created TEXT NOT NULL\n );\n sqlite> select * from provenance;\n 776|/home/sundaram/projects/tsv2sqlite/sample.tsv|786b82b2414d3acf7af34c068e358759|2023-11-12 21:31:17.307717\n\nInspect the column mappings:\n\n.. code-block:: sql\n\n sqlite> .schema columnmaps\n CREATE TABLE columnmaps (\n column_num INTEGER NOT NULL PRIMARY KEY,\n name TEXT NOT NULL,\n norm_name TEXT NOT NULL,\n UNIQUE (name, norm_name)\n );\n sqlite>\n\n sqlite> select * from columnmaps;\n 0|#CHROM|_CHROM\n 1|POS|POS\n 2|ID|ID\n 3|REF|REF\n 4|ALT|ALT\n 5|QUAL|QUAL\n sqlite>\n\nInspect the records table:\n\n.. code-block:: sql\n\n sqlite> .schema records\n CREATE TABLE records (\n line_num INTEGER NOT NULL,\n column_num INTEGER NOT NULL,\n value TEXT NOT NULL,\n UNIQUE (line_num, column_num),\n FOREIGN KEY (column_num) REFERENCES columnmaps (column_num)\n );\n sqlite> select * from records;\n 3|0|1\n 3|1|12345\n 3|2|rs567\n 3|3|A\n 3|4|G\n 3|5|50\n 4|0|2\n 4|1|56789\n 4|2|rs890\n 4|3|T\n 4|4|C\n 4|5|44\n 5|0|3\n 5|1|98765\n 5|2|rs123\n 5|3|G\n 5|4|T\n 5|5|60\n 6|0|1\n 6|1|34567\n 6|2|rs456\n 6|3|C\n 6|4|A\n 6|5|55\n sqlite>\n\nGet the ID values for each line:\n\n.. code-block:: sql\n\n sqlite> select r_id.value, r_id.line_num\n from records r_id, columnmaps c_id\n where c_id.name = \"ID\"\n and c_id.column_num = r_id.column_num;\n rs567|3\n rs890|4\n rs123|5\n rs456|6\n sqlite>\n\nGet the ID, REF and line number:\n\n.. code-block:: sql\n\n sqlite> select r_id.value, r_ref.value, r_ref.line_num\n from records r_id, records r_ref, columnmaps c_id, columnmaps c_ref\n where c_ref.name = \"REF\"\n and c_id.name = \"ID\"\n and c_id.column_num = r_id.column_num\n and c_ref.column_num = r_ref.column_num\n and r_id.line_num = r_ref.line_num;\n rs567|A|3\n rs890|T|4\n rs123|G|5\n rs456|C|6\n sqlite>\n\n\n=======\nHistory\n=======\n\n0.1.0 (2023-11-12)\n------------------\n\n* First release on PyPI.\n",
"bugtrack_url": null,
"license": "GNU General Public License v3",
"summary": "Package for parsing a tab-delimited data file and loading the lines into SQLite3 database",
"version": "0.5.1",
"project_urls": {
"Homepage": "https://github.com/jai-python3/tsv2sqlite"
},
"split_keywords": [
"tsv2sqlite"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "36c722187199eda39a9472d98aef8a5436c05b21bd6d18e81a2bb8d6b14bd6ea",
"md5": "1b3af20e60a3682551065e599879d34d",
"sha256": "ae059f300662edd7a940ea441f2f118a35952cebc1ef7e8dda298d067b804632"
},
"downloads": -1,
"filename": "tsv2sqlite-0.5.1-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "1b3af20e60a3682551065e599879d34d",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.10",
"size": 16591,
"upload_time": "2025-02-15T18:10:54",
"upload_time_iso_8601": "2025-02-15T18:10:54.897825Z",
"url": "https://files.pythonhosted.org/packages/36/c7/22187199eda39a9472d98aef8a5436c05b21bd6d18e81a2bb8d6b14bd6ea/tsv2sqlite-0.5.1-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "245f39206a584ed984c12fc509cac980d0893d59af9d611c756f4f2f27450136",
"md5": "e03fc1d27480a48122a1ae1b13372074",
"sha256": "b108c0fa41da05d753346466d62ebc51dbcbac2056b08e9f8dc9fa784fa8f26e"
},
"downloads": -1,
"filename": "tsv2sqlite-0.5.1.tar.gz",
"has_sig": false,
"md5_digest": "e03fc1d27480a48122a1ae1b13372074",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 19510,
"upload_time": "2025-02-15T18:10:57",
"upload_time_iso_8601": "2025-02-15T18:10:57.018001Z",
"url": "https://files.pythonhosted.org/packages/24/5f/39206a584ed984c12fc509cac980d0893d59af9d611c756f4f2f27450136/tsv2sqlite-0.5.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-15 18:10:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jai-python3",
"github_project": "tsv2sqlite",
"github_not_found": true,
"lcname": "tsv2sqlite"
}