tsv2sqlite


Nametsv2sqlite JSON
Version 0.5.1 PyPI version JSON
download
home_pagehttps://github.com/jai-python3/tsv2sqlite
SummaryPackage for parsing a tab-delimited data file and loading the lines into SQLite3 database
upload_time2025-02-15 18:10:57
maintainerNone
docs_urlNone
authorJaideep Sundaram
requires_python>=3.10
licenseGNU General Public License v3
keywords tsv2sqlite
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ==========
tsv2sqlite
==========

Package for parsing a tab-delimited data file and loading the lines into SQLite3 database


Installation

.. code-block:: shell

    pip install tsv2sqlite


tsv2sqlite
----------
The software will automatically assign datatype TEXT for all columns.
The target SQLite3 database file name can be specified on invocation via ``--database_file``.
The default database file name will be the basename of the input file with the filename extension replaced with .sqlite.

The target table name can be specified on invocation via ``--table_name``.
The default table name will be the basename of the input file with the filename extension removed.

The configuration file can be used to specify the datatype for each of the columns.
The configuration file can be used to specify which columns should be excluded.

Configure column datatypes
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

    table_schema:
      sample:
        CHROM: str
        POS: int
        ID: str
        REF: str
        ALT: str
        QUAL: int
        FILTER: str
        INFO: str

Configure column exclusion list
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

    ignore_columns:
      - FILTER
      - INFO

add-header-row
~~~~~~~~~~~~~~

The software will write a copy of the input tab-delimited file that does not have a header row.
The inserted header row will have column names:
- col_1
- col_2
- col_3
- etc.


Edit configuration file
~~~~~~~~~~~~~~~~~~~~~~~

Edit your configuration file e.g.: ``tsv2sqlite/conf/config.yaml``.


No column mapping mode
----------------------

Run tsv2sqlite with ``--no_column_mapping`` option.

.. code-block:: shell

    tsv2sqlite --infile ~/projects/tsv2sqlite/sample.tsv --config_file ~/projects/tsv2sqlite/tsv2sqlite/conf/config.yaml --no_column_mapping
    --outdir was not specified and therefore was set to '/tmp/tsv2sqlite/2023-12-22-102420'
    Created output directory '/tmp/tsv2sqlite/2023-12-22-102420'
    --logfile was not specified and therefore was set to '/tmp/tsv2sqlite/2023-12-22-102420/main.log'
    --database_file was not specified and therefore was set to '/home/sundaram/projects/tsv2sqlite/sample.tsv.sqlite3'
    The log file is '/tmp/tsv2sqlite/2023-12-22-102420/main.log'
    Execution of '/home/sundaram/projects/tsv2sqlite/venv/lib/python3.10/site-packages/tsv2sqlite/main.py' completed

Observe the SQLite3 database file created.

.. code-block:: shell

    (venv) ➜  tsv2sqlite git:(main) ✗ ls -ltr sample.tsv.sqlite3
    -rw-r--r-- 1 sundaram sundaram 12288 Dec 22 10:24 sample.tsv.sqlite3

Connect to the SQLite3 database.

.. code-block:: shell

    (venv) ➜  tsv2sqlite git:(main) ✗ sqlite3 sample.tsv.sqlite3
    SQLite version 3.37.2 2022-01-06 13:25:41
    Enter ".help" for usage hints.
    sqlite>

Observe the two tables created.

.. code-block:: sql

    sqlite> .schema
    CREATE TABLE provenance (
                bytesize INTEGER NOT NULL,
                abspath TEXT NOT NULL,
                md5checksum TEXT NOT NULL,
                date_created TEXT NOT NULL
            );
    CREATE TABLE sample (CHROM TEXT,
    POS TEXT,
    ID TEXT,
    REF TEXT,
    ALT TEXT,
    QUAL TEXT,
    line_number INTEGER NOT NULL);
    sqlite> 

Execute queries against both tables.

.. code-block:: sql

    sqlite> select * from provenance;
    776|/home/sundaram/projects/tsv2sqlite/sample.tsv|786b82b2414d3acf7af34c068e358759|2023-11-12 21:31:17.307717
    sqlite> select * from sample;
    1|12345|rs567|A|G|50|3
    2|56789|rs890|T|C|44|4
    3|98765|rs123|G|T|60|5
    1|34567|rs456|C|A|55|6
    sqlite> 

Column mapping mode
-------------------

Column mapping was implemented to address a particular use case and may be deprecated in the near future.

To use the exported script for parsing a tab-delimited file to be loaded into a SQLite3 database instance:

.. code-block:: shell

     tsv2sqlite --infile sample.tsv
    --config_file was not specified and therefore was set to '/home/sundaram/projects/tsv2sqlite/venv/lib/python3.10/site-packages/tsv2sqlite/conf/config.yaml'
    --outdir was not specified and therefore was set to '/tmp/tsv2sqlite/2023-11-13-053537'
    Created output directory '/tmp/tsv2sqlite/2023-11-13-053537'
    --logfile was not specified and therefore was set to '/tmp/tsv2sqlite/2023-11-13-053537/main.log'
    --database_file was not specified and therefore was set to 'sample.tsv.sqlite3'
    The log file is '/tmp/tsv2sqlite/2023-11-13-053537/main.log'
    Execution of '/home/sundaram/projects/tsv2sqlite/venv/lib/python3.10/site-packages/tsv2sqlite/main.py' completed

Execute SQL queries
~~~~~~~~~~~~~~~~~~~

Connect to the SQLite3 database:

.. code-block:: shell

    sqlite3 sample.tsv.sqlite3

Inspect the file metadata:

.. code-block:: sql

    sqlite> .schema provenance
    CREATE TABLE provenance (
                bytesize INTEGER NOT NULL,
                abspath TEXT NOT NULL,
                md5checksum TEXT NOT NULL,
                date_created TEXT NOT NULL
            );
    sqlite> select * from provenance;
    776|/home/sundaram/projects/tsv2sqlite/sample.tsv|786b82b2414d3acf7af34c068e358759|2023-11-12 21:31:17.307717

Inspect the column mappings:

.. code-block:: sql

    sqlite> .schema columnmaps
    CREATE TABLE columnmaps (
                column_num INTEGER NOT NULL PRIMARY KEY,
                name TEXT NOT NULL,
                norm_name TEXT NOT NULL,
                UNIQUE (name, norm_name)
            );
    sqlite>

    sqlite> select * from columnmaps;
    0|#CHROM|_CHROM
    1|POS|POS
    2|ID|ID
    3|REF|REF
    4|ALT|ALT
    5|QUAL|QUAL
    sqlite>

Inspect the records table:

.. code-block:: sql

    sqlite> .schema records
    CREATE TABLE records (
                line_num INTEGER NOT NULL,
                column_num INTEGER NOT NULL,
                value TEXT NOT NULL,
                UNIQUE (line_num, column_num),
                FOREIGN KEY (column_num) REFERENCES columnmaps (column_num)
            );
    sqlite> select * from records;
    3|0|1
    3|1|12345
    3|2|rs567
    3|3|A
    3|4|G
    3|5|50
    4|0|2
    4|1|56789
    4|2|rs890
    4|3|T
    4|4|C
    4|5|44
    5|0|3
    5|1|98765
    5|2|rs123
    5|3|G
    5|4|T
    5|5|60
    6|0|1
    6|1|34567
    6|2|rs456
    6|3|C
    6|4|A
    6|5|55
    sqlite>

Get the ID values for each line:

.. code-block:: sql

    sqlite> select r_id.value, r_id.line_num
    from records r_id, columnmaps c_id
    where c_id.name = "ID"
    and c_id.column_num = r_id.column_num;
    rs567|3
    rs890|4
    rs123|5
    rs456|6
    sqlite>

Get the ID, REF and line number:

.. code-block:: sql

    sqlite> select r_id.value, r_ref.value, r_ref.line_num
    from records r_id, records r_ref, columnmaps c_id, columnmaps c_ref
    where c_ref.name = "REF"
    and c_id.name = "ID"
    and c_id.column_num = r_id.column_num
    and c_ref.column_num = r_ref.column_num
    and r_id.line_num = r_ref.line_num;
    rs567|A|3
    rs890|T|4
    rs123|G|5
    rs456|C|6
    sqlite>


=======
History
=======

0.1.0 (2023-11-12)
------------------

* First release on PyPI.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jai-python3/tsv2sqlite",
    "name": "tsv2sqlite",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "tsv2sqlite",
    "author": "Jaideep Sundaram",
    "author_email": "jai.python3@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/24/5f/39206a584ed984c12fc509cac980d0893d59af9d611c756f4f2f27450136/tsv2sqlite-0.5.1.tar.gz",
    "platform": null,
    "description": "==========\ntsv2sqlite\n==========\n\nPackage for parsing a tab-delimited data file and loading the lines into SQLite3 database\n\n\nInstallation\n\n.. code-block:: shell\n\n    pip install tsv2sqlite\n\n\ntsv2sqlite\n----------\nThe software will automatically assign datatype TEXT for all columns.\nThe target SQLite3 database file name can be specified on invocation via ``--database_file``.\nThe default database file name will be the basename of the input file with the filename extension replaced with .sqlite.\n\nThe target table name can be specified on invocation via ``--table_name``.\nThe default table name will be the basename of the input file with the filename extension removed.\n\nThe configuration file can be used to specify the datatype for each of the columns.\nThe configuration file can be used to specify which columns should be excluded.\n\nConfigure column datatypes\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: yaml\n\n    table_schema:\n      sample:\n        CHROM: str\n        POS: int\n        ID: str\n        REF: str\n        ALT: str\n        QUAL: int\n        FILTER: str\n        INFO: str\n\nConfigure column exclusion list\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: yaml\n\n    ignore_columns:\n      - FILTER\n      - INFO\n\nadd-header-row\n~~~~~~~~~~~~~~\n\nThe software will write a copy of the input tab-delimited file that does not have a header row.\nThe inserted header row will have column names:\n- col_1\n- col_2\n- col_3\n- etc.\n\n\nEdit configuration file\n~~~~~~~~~~~~~~~~~~~~~~~\n\nEdit your configuration file e.g.: ``tsv2sqlite/conf/config.yaml``.\n\n\nNo column mapping mode\n----------------------\n\nRun tsv2sqlite with ``--no_column_mapping`` option.\n\n.. code-block:: shell\n\n    tsv2sqlite --infile ~/projects/tsv2sqlite/sample.tsv --config_file ~/projects/tsv2sqlite/tsv2sqlite/conf/config.yaml --no_column_mapping\n    --outdir was not specified and therefore was set to '/tmp/tsv2sqlite/2023-12-22-102420'\n    Created output directory '/tmp/tsv2sqlite/2023-12-22-102420'\n    --logfile was not specified and therefore was set to '/tmp/tsv2sqlite/2023-12-22-102420/main.log'\n    --database_file was not specified and therefore was set to '/home/sundaram/projects/tsv2sqlite/sample.tsv.sqlite3'\n    The log file is '/tmp/tsv2sqlite/2023-12-22-102420/main.log'\n    Execution of '/home/sundaram/projects/tsv2sqlite/venv/lib/python3.10/site-packages/tsv2sqlite/main.py' completed\n\nObserve the SQLite3 database file created.\n\n.. code-block:: shell\n\n    (venv) \u279c  tsv2sqlite git:(main) \u2717 ls -ltr sample.tsv.sqlite3\n    -rw-r--r-- 1 sundaram sundaram 12288 Dec 22 10:24 sample.tsv.sqlite3\n\nConnect to the SQLite3 database.\n\n.. code-block:: shell\n\n    (venv) \u279c  tsv2sqlite git:(main) \u2717 sqlite3 sample.tsv.sqlite3\n    SQLite version 3.37.2 2022-01-06 13:25:41\n    Enter \".help\" for usage hints.\n    sqlite>\n\nObserve the two tables created.\n\n.. code-block:: sql\n\n    sqlite> .schema\n    CREATE TABLE provenance (\n                bytesize INTEGER NOT NULL,\n                abspath TEXT NOT NULL,\n                md5checksum TEXT NOT NULL,\n                date_created TEXT NOT NULL\n            );\n    CREATE TABLE sample (CHROM TEXT,\n    POS TEXT,\n    ID TEXT,\n    REF TEXT,\n    ALT TEXT,\n    QUAL TEXT,\n    line_number INTEGER NOT NULL);\n    sqlite> \n\nExecute queries against both tables.\n\n.. code-block:: sql\n\n    sqlite> select * from provenance;\n    776|/home/sundaram/projects/tsv2sqlite/sample.tsv|786b82b2414d3acf7af34c068e358759|2023-11-12 21:31:17.307717\n    sqlite> select * from sample;\n    1|12345|rs567|A|G|50|3\n    2|56789|rs890|T|C|44|4\n    3|98765|rs123|G|T|60|5\n    1|34567|rs456|C|A|55|6\n    sqlite> \n\nColumn mapping mode\n-------------------\n\nColumn mapping was implemented to address a particular use case and may be deprecated in the near future.\n\nTo use the exported script for parsing a tab-delimited file to be loaded into a SQLite3 database instance:\n\n.. code-block:: shell\n\n     tsv2sqlite --infile sample.tsv\n    --config_file was not specified and therefore was set to '/home/sundaram/projects/tsv2sqlite/venv/lib/python3.10/site-packages/tsv2sqlite/conf/config.yaml'\n    --outdir was not specified and therefore was set to '/tmp/tsv2sqlite/2023-11-13-053537'\n    Created output directory '/tmp/tsv2sqlite/2023-11-13-053537'\n    --logfile was not specified and therefore was set to '/tmp/tsv2sqlite/2023-11-13-053537/main.log'\n    --database_file was not specified and therefore was set to 'sample.tsv.sqlite3'\n    The log file is '/tmp/tsv2sqlite/2023-11-13-053537/main.log'\n    Execution of '/home/sundaram/projects/tsv2sqlite/venv/lib/python3.10/site-packages/tsv2sqlite/main.py' completed\n\nExecute SQL queries\n~~~~~~~~~~~~~~~~~~~\n\nConnect to the SQLite3 database:\n\n.. code-block:: shell\n\n    sqlite3 sample.tsv.sqlite3\n\nInspect the file metadata:\n\n.. code-block:: sql\n\n    sqlite> .schema provenance\n    CREATE TABLE provenance (\n                bytesize INTEGER NOT NULL,\n                abspath TEXT NOT NULL,\n                md5checksum TEXT NOT NULL,\n                date_created TEXT NOT NULL\n            );\n    sqlite> select * from provenance;\n    776|/home/sundaram/projects/tsv2sqlite/sample.tsv|786b82b2414d3acf7af34c068e358759|2023-11-12 21:31:17.307717\n\nInspect the column mappings:\n\n.. code-block:: sql\n\n    sqlite> .schema columnmaps\n    CREATE TABLE columnmaps (\n                column_num INTEGER NOT NULL PRIMARY KEY,\n                name TEXT NOT NULL,\n                norm_name TEXT NOT NULL,\n                UNIQUE (name, norm_name)\n            );\n    sqlite>\n\n    sqlite> select * from columnmaps;\n    0|#CHROM|_CHROM\n    1|POS|POS\n    2|ID|ID\n    3|REF|REF\n    4|ALT|ALT\n    5|QUAL|QUAL\n    sqlite>\n\nInspect the records table:\n\n.. code-block:: sql\n\n    sqlite> .schema records\n    CREATE TABLE records (\n                line_num INTEGER NOT NULL,\n                column_num INTEGER NOT NULL,\n                value TEXT NOT NULL,\n                UNIQUE (line_num, column_num),\n                FOREIGN KEY (column_num) REFERENCES columnmaps (column_num)\n            );\n    sqlite> select * from records;\n    3|0|1\n    3|1|12345\n    3|2|rs567\n    3|3|A\n    3|4|G\n    3|5|50\n    4|0|2\n    4|1|56789\n    4|2|rs890\n    4|3|T\n    4|4|C\n    4|5|44\n    5|0|3\n    5|1|98765\n    5|2|rs123\n    5|3|G\n    5|4|T\n    5|5|60\n    6|0|1\n    6|1|34567\n    6|2|rs456\n    6|3|C\n    6|4|A\n    6|5|55\n    sqlite>\n\nGet the ID values for each line:\n\n.. code-block:: sql\n\n    sqlite> select r_id.value, r_id.line_num\n    from records r_id, columnmaps c_id\n    where c_id.name = \"ID\"\n    and c_id.column_num = r_id.column_num;\n    rs567|3\n    rs890|4\n    rs123|5\n    rs456|6\n    sqlite>\n\nGet the ID, REF and line number:\n\n.. code-block:: sql\n\n    sqlite> select r_id.value, r_ref.value, r_ref.line_num\n    from records r_id, records r_ref, columnmaps c_id, columnmaps c_ref\n    where c_ref.name = \"REF\"\n    and c_id.name = \"ID\"\n    and c_id.column_num = r_id.column_num\n    and c_ref.column_num = r_ref.column_num\n    and r_id.line_num = r_ref.line_num;\n    rs567|A|3\n    rs890|T|4\n    rs123|G|5\n    rs456|C|6\n    sqlite>\n\n\n=======\nHistory\n=======\n\n0.1.0 (2023-11-12)\n------------------\n\n* First release on PyPI.\n",
    "bugtrack_url": null,
    "license": "GNU General Public License v3",
    "summary": "Package for parsing a tab-delimited data file and loading the lines into SQLite3 database",
    "version": "0.5.1",
    "project_urls": {
        "Homepage": "https://github.com/jai-python3/tsv2sqlite"
    },
    "split_keywords": [
        "tsv2sqlite"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "36c722187199eda39a9472d98aef8a5436c05b21bd6d18e81a2bb8d6b14bd6ea",
                "md5": "1b3af20e60a3682551065e599879d34d",
                "sha256": "ae059f300662edd7a940ea441f2f118a35952cebc1ef7e8dda298d067b804632"
            },
            "downloads": -1,
            "filename": "tsv2sqlite-0.5.1-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1b3af20e60a3682551065e599879d34d",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.10",
            "size": 16591,
            "upload_time": "2025-02-15T18:10:54",
            "upload_time_iso_8601": "2025-02-15T18:10:54.897825Z",
            "url": "https://files.pythonhosted.org/packages/36/c7/22187199eda39a9472d98aef8a5436c05b21bd6d18e81a2bb8d6b14bd6ea/tsv2sqlite-0.5.1-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "245f39206a584ed984c12fc509cac980d0893d59af9d611c756f4f2f27450136",
                "md5": "e03fc1d27480a48122a1ae1b13372074",
                "sha256": "b108c0fa41da05d753346466d62ebc51dbcbac2056b08e9f8dc9fa784fa8f26e"
            },
            "downloads": -1,
            "filename": "tsv2sqlite-0.5.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e03fc1d27480a48122a1ae1b13372074",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 19510,
            "upload_time": "2025-02-15T18:10:57",
            "upload_time_iso_8601": "2025-02-15T18:10:57.018001Z",
            "url": "https://files.pythonhosted.org/packages/24/5f/39206a584ed984c12fc509cac980d0893d59af9d611c756f4f2f27450136/tsv2sqlite-0.5.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-15 18:10:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jai-python3",
    "github_project": "tsv2sqlite",
    "github_not_found": true,
    "lcname": "tsv2sqlite"
}
        
Elapsed time: 1.00466s