seqbank


Nameseqbank JSON
Version 0.1.3 PyPI version JSON
download
home_pageNone
SummaryA database to quickly read and write DNA sequence data in numerical form.
upload_time2024-10-07 02:47:12
maintainerNone
docs_urlNone
authorRobert Turnbull
requires_python<3.13,>=3.10
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ================
Seqbank
================

.. start-badges

|pypi badge| |testing badge| |coverage badge| |docs badge| |black badge| |git3moji badge|

.. |pypi badge| image:: https://img.shields.io/pypi/v/seqbank
    :target: https://pypi.org/project/seqbank/

.. |testing badge| image:: https://github.com/rbturnbull/seqbank/actions/workflows/testing.yml/badge.svg
    :target: https://github.com/rbturnbull/seqbank/actions

.. |docs badge| image:: https://github.com/rbturnbull/seqbank/actions/workflows/docs.yml/badge.svg
    :target: https://rbturnbull.github.io/seqbank
    
.. |black badge| image:: https://img.shields.io/badge/code%20style-black-000000.svg
    :target: https://github.com/psf/black
    
.. |coverage badge| image:: https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/rbturnbull/b1625e7f45428007f0982543d9d346d0/raw/coverage-badge.json
    :target: https://rbturnbull.github.io/seqbank/coverage/

.. |git3moji badge| image:: https://img.shields.io/badge/git3moji-%E2%9A%A1%EF%B8%8F%F0%9F%90%9B%F0%9F%93%BA%F0%9F%91%AE%F0%9F%94%A4-fffad8.svg
    :target: https://robinpokorny.github.io/git3moji/
        
.. end-badges

.. start-quickstart

SeqBank is a powerful and flexible command-line application designed to simplify the management and processing of large DNA sequence datasets. Whether you're working with local sequence files, 
retrieving data from remote URLs, or integrating sequences from databases like RefSeq and DFam, SeqBank provides an efficient, user-friendly solution.

SeqBank allows users to quickly add, organize, and manipulate sequences using a structured, numerical format optimized for fast retrieval and analysis. 
It's especially useful for bioinformatics professionals who regularly handle vast amounts of genomic data.


Installation
============

To install the latest version from the repository, you can use this command:

.. code-block:: bash

    pip install seqbank

Or install directly from the GitHub repository:

.. code-block:: bash

    pip install git+https://github.com/rbturnbull/seqbank.git

.. note ::

    Soon seqbank will be able to be installed using conda.

Usage
===========
    
SeqBank provides a command-line interface (CLI) for managing DNA sequence data efficiently. Below are the main tools, along with examples of how to use them in practical workflows.

Adding Sequences
----------------

SeqBank allows you to import sequence data from files or URLs into the database. The system supports multiple sequence formats, providing flexibility in handling various datasets.

**Example:**

To add sequences from one or more local files:

.. code-block:: bash

    seqbank add /path/to/seqbank /path/to/sequence1.fasta /path/to/sequence2.fasta --format fasta

To add sequences from a list of URLs:

.. code-block:: bash

    seqbank url /path/to/seqbank https://example.com/sequence1.fasta https://example.com/sequence2.fasta --format fasta --workers 4

**Use case:**  
Suppose you have a new set of genome sequences in FASTA format stored locally or accessible via URLs. You can quickly import these sequences into your SeqBank database for centralized storage and further analysis.


Managing Databases
------------------

SeqBank provides commands to manage and query the sequences in your database. You can list, count, and delete sequences, allowing efficient database management.

**Example:**

To list all sequences in the database:

.. code-block:: bash

    seqbank ls /path/to/seqbank

To count the number of sequences stored:

.. code-block:: bash

    seqbank count /path/to/seqbank

To delete a specific sequence by accession number:

.. code-block:: bash

    seqbank delete /path/to/seqbank ABC123DEF456

**Use case:**  
If you're managing a growing sequence database, the `ls` command can help you track the sequences, while `delete` can be used to remove outdated or incorrect entries.


Exporting Sequences
-------------------

You can export your stored sequences to common formats like FASTA for easy sharing and use with other bioinformatics tools. This ensures compatibility with external platforms.

**Example:**

To export sequences in FASTA format to a specific output directory:

.. code-block:: bash

    seqbank export /path/to/seqbank /output/directory --format fasta

**Use case:**  
After storing a collection of curated sequences, you may need to export them in FASTA format for downstream analysis using tools like BLAST or multiple sequence alignment software.


Integration with RefSeq and DFam
--------------------------------

SeqBank integrates with popular genomic databases like RefSeq and DFam, allowing users to download and incorporate sequences from these sources.

**Example:**

To download and add RefSeq sequences with a maximum of 1000 sequences using 4 workers:

.. code-block:: bash

    seqbank refseq /path/to/seqbank --max 1000 --workers 4

To download and add DFam sequences from the current release with curated data:

.. code-block:: bash

    seqbank dfam /path/to/seqbank --release current --curated

**Use case:**  
If you are studying repetitive elements in a genome, you can easily integrate sequences from DFam into your SeqBank database for comprehensive analysis.


Visualization of Sequence Data
------------------------------

SeqBank includes built-in functionality for generating histograms of sequence lengths, providing a visual summary of the data.

**Example:**

To generate and save a histogram of sequence lengths:

.. code-block:: bash

    seqbank histogram /path/to/seqbank --output histogram.png --nbins 50

To generate and display the histogram interactively:

.. code-block:: bash

    seqbank histogram /path/to/seqbank --show --nbins 50

**Use case:**  
When working with a dataset of varying sequence lengths, generating a histogram can help visualize the distribution and detect outliers or inconsistencies in the data.


Copying Databases
-----------------

SeqBank allows you to copy sequences from one SeqBank database to another, facilitating data migration or backup processes.

**Example:**

To copy sequences from a source SeqBank to a destination SeqBank:

.. code-block:: bash

    seqbank cp /path/to/source_seqbank /path/to/destination_seqbank

**Use case:**  
For maintaining backups of your sequence database or migrating data to a new location, the `cp` command provides a straightforward method to duplicate your SeqBank data.


Filtering Sequences and Custom Workflows
----------------------------------------

SeqBank supports filtering sequences based on criteria such as sequence length or file format before adding them to the database. Additionally, multi-threaded downloading allows you to download and process sequences more efficiently.

**Example:**

To filter sequences longer than 1000 bp before adding them:

.. code-block:: bash

    seqbank add /path/to/seqbank /path/to/sequences.fasta --format fasta --filter /path/to/filter_file

To enable multi-threaded downloading when adding sequences from URLs:

.. code-block:: bash

    seqbank url /path/to/seqbank https://example.com/sequence1.fasta https://example.com/sequence2.fasta --format fasta --workers 4 --tmp-dir /path/to/tmp

**Use case:**  
In projects where only sequences longer than a specific threshold are required, the filtering feature ensures that only relevant sequences are stored. Multi-threaded downloading can be utilized when processing large datasets to save time.

.. end-quickstart


Credits
==================================

.. start-credits

* Robert Turnbull <robert.turnbull@unimelb.edu.au>
* Rafsan Al Mamun <rafsan7238@gmail.com>

.. end-credits


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "seqbank",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Robert Turnbull",
    "author_email": "robert.turnbull@unimelb.edu.au",
    "download_url": "https://files.pythonhosted.org/packages/34/e5/5b52c6b197908a35c8666067c78dce0250ab10747c08af7d6c11bbe9d433/seqbank-0.1.3.tar.gz",
    "platform": null,
    "description": "================\nSeqbank\n================\n\n.. start-badges\n\n|pypi badge| |testing badge| |coverage badge| |docs badge| |black badge| |git3moji badge|\n\n.. |pypi badge| image:: https://img.shields.io/pypi/v/seqbank\n    :target: https://pypi.org/project/seqbank/\n\n.. |testing badge| image:: https://github.com/rbturnbull/seqbank/actions/workflows/testing.yml/badge.svg\n    :target: https://github.com/rbturnbull/seqbank/actions\n\n.. |docs badge| image:: https://github.com/rbturnbull/seqbank/actions/workflows/docs.yml/badge.svg\n    :target: https://rbturnbull.github.io/seqbank\n    \n.. |black badge| image:: https://img.shields.io/badge/code%20style-black-000000.svg\n    :target: https://github.com/psf/black\n    \n.. |coverage badge| image:: https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/rbturnbull/b1625e7f45428007f0982543d9d346d0/raw/coverage-badge.json\n    :target: https://rbturnbull.github.io/seqbank/coverage/\n\n.. |git3moji badge| image:: https://img.shields.io/badge/git3moji-%E2%9A%A1%EF%B8%8F%F0%9F%90%9B%F0%9F%93%BA%F0%9F%91%AE%F0%9F%94%A4-fffad8.svg\n    :target: https://robinpokorny.github.io/git3moji/\n        \n.. end-badges\n\n.. start-quickstart\n\nSeqBank is a powerful and flexible command-line application designed to simplify the management and processing of large DNA sequence datasets. Whether you're working with local sequence files, \nretrieving data from remote URLs, or integrating sequences from databases like RefSeq and DFam, SeqBank provides an efficient, user-friendly solution.\n\nSeqBank allows users to quickly add, organize, and manipulate sequences using a structured, numerical format optimized for fast retrieval and analysis. \nIt's especially useful for bioinformatics professionals who regularly handle vast amounts of genomic data.\n\n\nInstallation\n============\n\nTo install the latest version from the repository, you can use this command:\n\n.. code-block:: bash\n\n    pip install seqbank\n\nOr install directly from the GitHub repository:\n\n.. code-block:: bash\n\n    pip install git+https://github.com/rbturnbull/seqbank.git\n\n.. note ::\n\n    Soon seqbank will be able to be installed using conda.\n\nUsage\n===========\n    \nSeqBank provides a command-line interface (CLI) for managing DNA sequence data efficiently. Below are the main tools, along with examples of how to use them in practical workflows.\n\nAdding Sequences\n----------------\n\nSeqBank allows you to import sequence data from files or URLs into the database. The system supports multiple sequence formats, providing flexibility in handling various datasets.\n\n**Example:**\n\nTo add sequences from one or more local files:\n\n.. code-block:: bash\n\n    seqbank add /path/to/seqbank /path/to/sequence1.fasta /path/to/sequence2.fasta --format fasta\n\nTo add sequences from a list of URLs:\n\n.. code-block:: bash\n\n    seqbank url /path/to/seqbank https://example.com/sequence1.fasta https://example.com/sequence2.fasta --format fasta --workers 4\n\n**Use case:**  \nSuppose you have a new set of genome sequences in FASTA format stored locally or accessible via URLs. You can quickly import these sequences into your SeqBank database for centralized storage and further analysis.\n\n\nManaging Databases\n------------------\n\nSeqBank provides commands to manage and query the sequences in your database. You can list, count, and delete sequences, allowing efficient database management.\n\n**Example:**\n\nTo list all sequences in the database:\n\n.. code-block:: bash\n\n    seqbank ls /path/to/seqbank\n\nTo count the number of sequences stored:\n\n.. code-block:: bash\n\n    seqbank count /path/to/seqbank\n\nTo delete a specific sequence by accession number:\n\n.. code-block:: bash\n\n    seqbank delete /path/to/seqbank ABC123DEF456\n\n**Use case:**  \nIf you're managing a growing sequence database, the `ls` command can help you track the sequences, while `delete` can be used to remove outdated or incorrect entries.\n\n\nExporting Sequences\n-------------------\n\nYou can export your stored sequences to common formats like FASTA for easy sharing and use with other bioinformatics tools. This ensures compatibility with external platforms.\n\n**Example:**\n\nTo export sequences in FASTA format to a specific output directory:\n\n.. code-block:: bash\n\n    seqbank export /path/to/seqbank /output/directory --format fasta\n\n**Use case:**  \nAfter storing a collection of curated sequences, you may need to export them in FASTA format for downstream analysis using tools like BLAST or multiple sequence alignment software.\n\n\nIntegration with RefSeq and DFam\n--------------------------------\n\nSeqBank integrates with popular genomic databases like RefSeq and DFam, allowing users to download and incorporate sequences from these sources.\n\n**Example:**\n\nTo download and add RefSeq sequences with a maximum of 1000 sequences using 4 workers:\n\n.. code-block:: bash\n\n    seqbank refseq /path/to/seqbank --max 1000 --workers 4\n\nTo download and add DFam sequences from the current release with curated data:\n\n.. code-block:: bash\n\n    seqbank dfam /path/to/seqbank --release current --curated\n\n**Use case:**  \nIf you are studying repetitive elements in a genome, you can easily integrate sequences from DFam into your SeqBank database for comprehensive analysis.\n\n\nVisualization of Sequence Data\n------------------------------\n\nSeqBank includes built-in functionality for generating histograms of sequence lengths, providing a visual summary of the data.\n\n**Example:**\n\nTo generate and save a histogram of sequence lengths:\n\n.. code-block:: bash\n\n    seqbank histogram /path/to/seqbank --output histogram.png --nbins 50\n\nTo generate and display the histogram interactively:\n\n.. code-block:: bash\n\n    seqbank histogram /path/to/seqbank --show --nbins 50\n\n**Use case:**  \nWhen working with a dataset of varying sequence lengths, generating a histogram can help visualize the distribution and detect outliers or inconsistencies in the data.\n\n\nCopying Databases\n-----------------\n\nSeqBank allows you to copy sequences from one SeqBank database to another, facilitating data migration or backup processes.\n\n**Example:**\n\nTo copy sequences from a source SeqBank to a destination SeqBank:\n\n.. code-block:: bash\n\n    seqbank cp /path/to/source_seqbank /path/to/destination_seqbank\n\n**Use case:**  \nFor maintaining backups of your sequence database or migrating data to a new location, the `cp` command provides a straightforward method to duplicate your SeqBank data.\n\n\nFiltering Sequences and Custom Workflows\n----------------------------------------\n\nSeqBank supports filtering sequences based on criteria such as sequence length or file format before adding them to the database. Additionally, multi-threaded downloading allows you to download and process sequences more efficiently.\n\n**Example:**\n\nTo filter sequences longer than 1000 bp before adding them:\n\n.. code-block:: bash\n\n    seqbank add /path/to/seqbank /path/to/sequences.fasta --format fasta --filter /path/to/filter_file\n\nTo enable multi-threaded downloading when adding sequences from URLs:\n\n.. code-block:: bash\n\n    seqbank url /path/to/seqbank https://example.com/sequence1.fasta https://example.com/sequence2.fasta --format fasta --workers 4 --tmp-dir /path/to/tmp\n\n**Use case:**  \nIn projects where only sequences longer than a specific threshold are required, the filtering feature ensures that only relevant sequences are stored. Multi-threaded downloading can be utilized when processing large datasets to save time.\n\n.. end-quickstart\n\n\nCredits\n==================================\n\n.. start-credits\n\n* Robert Turnbull <robert.turnbull@unimelb.edu.au>\n* Rafsan Al Mamun <rafsan7238@gmail.com>\n\n.. end-credits\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "A database to quickly read and write DNA sequence data in numerical form.",
    "version": "0.1.3",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "70ddf36bc527dee49fb69576046d7796718144cef0ad40a16a766264b3d6129f",
                "md5": "fd7f518f2cb7f2d3d01426ccd9782769",
                "sha256": "d38b44b9eb0d26240bc7fbada3a322f0754b78154efe68ae5b72b799161a983f"
            },
            "downloads": -1,
            "filename": "seqbank-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fd7f518f2cb7f2d3d01426ccd9782769",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 16494,
            "upload_time": "2024-10-07T02:47:11",
            "upload_time_iso_8601": "2024-10-07T02:47:11.356393Z",
            "url": "https://files.pythonhosted.org/packages/70/dd/f36bc527dee49fb69576046d7796718144cef0ad40a16a766264b3d6129f/seqbank-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "34e55b52c6b197908a35c8666067c78dce0250ab10747c08af7d6c11bbe9d433",
                "md5": "763747be89db8c0b5ee4a072a7ac3765",
                "sha256": "f4044c069aa357e46157b1e456027046372c4e7cafc6ea1888672f276ee92a90"
            },
            "downloads": -1,
            "filename": "seqbank-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "763747be89db8c0b5ee4a072a7ac3765",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 15850,
            "upload_time": "2024-10-07T02:47:12",
            "upload_time_iso_8601": "2024-10-07T02:47:12.363920Z",
            "url": "https://files.pythonhosted.org/packages/34/e5/5b52c6b197908a35c8666067c78dce0250ab10747c08af7d6c11bbe9d433/seqbank-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-07 02:47:12",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "seqbank"
}
        
Elapsed time: 0.42977s