scbamtools


Namescbamtools JSON
Version 0.5 PyPI version JSON
download
home_pagehttps://github.com/marvin-jens/scbamtools
SummaryHigh performance Cython + Python tools to process BAM files with tags as they arise in single-cell sequencing
upload_time2024-03-14 20:54:22
maintainer
docs_urlNone
authorMarvin Jens
requires_python>=3.8
licenseMIT License
keywords bioinformatics single cell bam sam cram genetics cell barcode biology gene expression
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # scbamtools

High performance Cython + Python tools to process BAM files with tags as they arise in single-cell sequencing

## Status

This is alpha. Mostly, the plan is to move useful functionality developed within [spacemake](https://github.com/rajewsky-lab/spacemake) outside of spacemake, so that it can be re-used without pulling in all the dependencies for a heavy-weight spatial transcriptomics package.
Currently, the umbilical has not been cut and the code is almost certainly not functional w/o spacemake around.

### Useful things include:

  * converting FASTQ files to uBAM files with barcode information (single cell and spatial workflows)
  * trimming adapters (uses cutadapt functions under the hood)
  * making histograms and statistics about cell barcodes, UMIs and possibly other BAM tags
  * annotate aligned BAM records against a transcript annotation such as GENCODE
  * build digital gene expression counts from annotated BAM files, directly as scanpy AnnData (h5ad)

## Why is this better than ...

Depends what you need. We are building these tools to be as fast as possible while keeping as much of the functionality in python (with the occasional cython) for felxibility and maintainability. We don't care as much about (total) CPU use as we care about throughput/scalability. So, some principles:

  * avoid temp files, streaming is better
  * parallelize with [mrfifo](https://github.com/marvin-jens/mrfifo) for low-overhead parallelism
  * put some effort into efficient data structures where it pays off
  * make simple things simple, while hard things should be possible

The code in here is the same that we use to process open-st spatial transcriptomics data, which is *very deep*: typical runs having billions of reads and hundreds of millions of spatial barcodes. While we make sure that the tools here don't break and have manageable resource usage, we do not intend to be the most CPU-efficient or allow you to process open-st on your laptop. YMMV.

## Roadmap

  * port everything from spacemake [ongoing]
  * full suite of tools to replace dropseq-tools in spacemake [v1.0]
  * optimizations
  * tutorials and example uses outside of spacemake





            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/marvin-jens/scbamtools",
    "name": "scbamtools",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "bioinformatics,single cell,BAM,SAM,CRAM,genetics,cell barcode,biology,gene expression",
    "author": "Marvin Jens",
    "author_email": "marvin.jens@charite.de",
    "download_url": "https://files.pythonhosted.org/packages/60/fe/4740173cbc7a3bc2c6b041d19db568741f4e5339a3bb12951bb652467fd1/scbamtools-0.5.tar.gz",
    "platform": null,
    "description": "# scbamtools\n\nHigh performance Cython + Python tools to process BAM files with tags as they arise in single-cell sequencing\n\n## Status\n\nThis is alpha. Mostly, the plan is to move useful functionality developed within [spacemake](https://github.com/rajewsky-lab/spacemake) outside of spacemake, so that it can be re-used without pulling in all the dependencies for a heavy-weight spatial transcriptomics package.\nCurrently, the umbilical has not been cut and the code is almost certainly not functional w/o spacemake around.\n\n### Useful things include:\n\n  * converting FASTQ files to uBAM files with barcode information (single cell and spatial workflows)\n  * trimming adapters (uses cutadapt functions under the hood)\n  * making histograms and statistics about cell barcodes, UMIs and possibly other BAM tags\n  * annotate aligned BAM records against a transcript annotation such as GENCODE\n  * build digital gene expression counts from annotated BAM files, directly as scanpy AnnData (h5ad)\n\n## Why is this better than ...\n\nDepends what you need. We are building these tools to be as fast as possible while keeping as much of the functionality in python (with the occasional cython) for felxibility and maintainability. We don't care as much about (total) CPU use as we care about throughput/scalability. So, some principles:\n\n  * avoid temp files, streaming is better\n  * parallelize with [mrfifo](https://github.com/marvin-jens/mrfifo) for low-overhead parallelism\n  * put some effort into efficient data structures where it pays off\n  * make simple things simple, while hard things should be possible\n\nThe code in here is the same that we use to process open-st spatial transcriptomics data, which is *very deep*: typical runs having billions of reads and hundreds of millions of spatial barcodes. While we make sure that the tools here don't break and have manageable resource usage, we do not intend to be the most CPU-efficient or allow you to process open-st on your laptop. YMMV.\n\n## Roadmap\n\n  * port everything from spacemake [ongoing]\n  * full suite of tools to replace dropseq-tools in spacemake [v1.0]\n  * optimizations\n  * tutorials and example uses outside of spacemake\n\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "High performance Cython + Python tools to process BAM files with tags as they arise in single-cell sequencing",
    "version": "0.5",
    "project_urls": {
        "Homepage": "https://github.com/marvin-jens/scbamtools",
        "Issues": "https://github.com/marvin-jens/scbamtools/issues",
        "Repository": "https://github.com/marvin-jens/scbamtools.git"
    },
    "split_keywords": [
        "bioinformatics",
        "single cell",
        "bam",
        "sam",
        "cram",
        "genetics",
        "cell barcode",
        "biology",
        "gene expression"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "60fe4740173cbc7a3bc2c6b041d19db568741f4e5339a3bb12951bb652467fd1",
                "md5": "c5a456dc9e727d3fe0dc537b1dee9fda",
                "sha256": "173be26d8b6e998e520dbc049aed3ade2f76cc605f15b27a157881d9dc538b4e"
            },
            "downloads": -1,
            "filename": "scbamtools-0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "c5a456dc9e727d3fe0dc537b1dee9fda",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 374381,
            "upload_time": "2024-03-14T20:54:22",
            "upload_time_iso_8601": "2024-03-14T20:54:22.044968Z",
            "url": "https://files.pythonhosted.org/packages/60/fe/4740173cbc7a3bc2c6b041d19db568741f4e5339a3bb12951bb652467fd1/scbamtools-0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-14 20:54:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "marvin-jens",
    "github_project": "scbamtools",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "scbamtools"
}
        
Elapsed time: 0.21075s