vecmap


Namevecmap JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/the-jordan-lab/VecMap
SummaryUltrafast exact sequence matching in pure Python using NumPy vectorization
upload_time2025-07-16 02:42:36
maintainerNone
docs_urlNone
authorJames M. Jordan
requires_python>=3.9
licenseNone
keywords bioinformatics sequence alignment crispr vectorization exact matching
VCS
bugtrack_url
requirements numpy pandas biopython matplotlib
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # VecMap

[![PyPI version](https://badge.fury.io/py/vecmap.svg)](https://badge.fury.io/py/vecmap)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)

Ultrafast exact sequence matching using NumPy vectorization. Designed for CRISPR screens and barcode mapping where exact matching is biologically required.

**Paper**: [bioRxiv preprint](https://doi.org/10.1101/2025.XX.XX.XXXXXX)

## Installation

```bash
pip install vecmap
```

## Quick Start

```bash
# Command line
vecmap -r reference.fa -q reads.fq -o alignments.txt

# Python API
from vecmap import vecmap
alignments = vecmap(reference_seq, [(read_seq, read_id), ...])
```

## Performance

Benchmarks on human transcriptome (42,027 ± 1,856 reads/second in pure Python):

| Tool | Reads/sec (mean ± SD) | Language | Notes |
|------|----------------------|----------|-------|
| VecMap | 42,027 ± 1,856 | Python | Exact matching only |
| Minimap2 | 173,460 ± 5,203 | C | General purpose aligner |
| BWA-MEM | 60,306 ± 2,418 | C | General purpose aligner |

For CRISPR screening specifically:
- VecMap: 18,948 ± 892 reads/sec
- 1.9× faster than MAGeCK
- 3.8× faster than CRISPResso2

*Performance measured on Apple M1 Max, 32GB RAM, Python 3.11.5, NumPy 2.0.0*

## Applications

### CRISPR Guide Detection

```python
from vecmap.applications import CRISPRGuideDetector

guides = {
    "KRAS_sg1": "ACGTACGTACGTACGTACGT",
    "TP53_sg1": "GGCCGGCCGGCCGGCCGGCC"
}

detector = CRISPRGuideDetector(guides)
results = detector.detect_guides(reads)
counts = detector.summarize_detection(results)
```

### Barcode Demultiplexing

```python
from vecmap.applications import BarcodeProcessor

processor = BarcodeProcessor(
    barcode_whitelist=whitelist,
    barcode_length=16,
    umi_length=10
)

corrected = processor.correct_barcodes(processor.extract_barcodes(reads))
```

## Reproducibility

To reproduce all benchmarks and figures from the paper:

```bash
git clone https://github.com/the-jordan-lab/VecMap.git
cd VecMap
git checkout v1.0.0
pip install -e .
./reproduce.sh
```

See [DATA_AVAILABILITY.md](DATA_AVAILABILITY.md) for complete reproduction details.

## Documentation

- [Examples](examples/) - Usage examples and tutorials
- [Benchmarks](benchmarks/) - Performance benchmarks and comparisons
- [API Reference](https://vecmap.readthedocs.io) - Full API documentation

## Citation

```bibtex
@article{jordan2025vecmap,
  title={VecMap: Ultrafast Exact Sequence Matching for CRISPR Screens},
  author={Jordan, James M},
  journal={bioRxiv},
  year={2025},
  doi={10.1101/2025.XX.XX.XXXXXX}
}
```

## License

MIT License. See [LICENSE](LICENSE) for details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/the-jordan-lab/VecMap",
    "name": "vecmap",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "\"James M. Jordan\" <jjordan@bio.fsu.edu>",
    "keywords": "bioinformatics, sequence alignment, CRISPR, vectorization, exact matching",
    "author": "James M. Jordan",
    "author_email": "\"James M. Jordan\" <jjordan@bio.fsu.edu>",
    "download_url": "https://files.pythonhosted.org/packages/52/13/48d1623cb82eb5049bbf2a937c28f8dc76078b6c0df9ff4fd8f81a3e9f51/vecmap-1.0.0.tar.gz",
    "platform": null,
    "description": "# VecMap\n\n[![PyPI version](https://badge.fury.io/py/vecmap.svg)](https://badge.fury.io/py/vecmap)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n\nUltrafast exact sequence matching using NumPy vectorization. Designed for CRISPR screens and barcode mapping where exact matching is biologically required.\n\n**Paper**: [bioRxiv preprint](https://doi.org/10.1101/2025.XX.XX.XXXXXX)\n\n## Installation\n\n```bash\npip install vecmap\n```\n\n## Quick Start\n\n```bash\n# Command line\nvecmap -r reference.fa -q reads.fq -o alignments.txt\n\n# Python API\nfrom vecmap import vecmap\nalignments = vecmap(reference_seq, [(read_seq, read_id), ...])\n```\n\n## Performance\n\nBenchmarks on human transcriptome (42,027 \u00b1 1,856 reads/second in pure Python):\n\n| Tool | Reads/sec (mean \u00b1 SD) | Language | Notes |\n|------|----------------------|----------|-------|\n| VecMap | 42,027 \u00b1 1,856 | Python | Exact matching only |\n| Minimap2 | 173,460 \u00b1 5,203 | C | General purpose aligner |\n| BWA-MEM | 60,306 \u00b1 2,418 | C | General purpose aligner |\n\nFor CRISPR screening specifically:\n- VecMap: 18,948 \u00b1 892 reads/sec\n- 1.9\u00d7 faster than MAGeCK\n- 3.8\u00d7 faster than CRISPResso2\n\n*Performance measured on Apple M1 Max, 32GB RAM, Python 3.11.5, NumPy 2.0.0*\n\n## Applications\n\n### CRISPR Guide Detection\n\n```python\nfrom vecmap.applications import CRISPRGuideDetector\n\nguides = {\n    \"KRAS_sg1\": \"ACGTACGTACGTACGTACGT\",\n    \"TP53_sg1\": \"GGCCGGCCGGCCGGCCGGCC\"\n}\n\ndetector = CRISPRGuideDetector(guides)\nresults = detector.detect_guides(reads)\ncounts = detector.summarize_detection(results)\n```\n\n### Barcode Demultiplexing\n\n```python\nfrom vecmap.applications import BarcodeProcessor\n\nprocessor = BarcodeProcessor(\n    barcode_whitelist=whitelist,\n    barcode_length=16,\n    umi_length=10\n)\n\ncorrected = processor.correct_barcodes(processor.extract_barcodes(reads))\n```\n\n## Reproducibility\n\nTo reproduce all benchmarks and figures from the paper:\n\n```bash\ngit clone https://github.com/the-jordan-lab/VecMap.git\ncd VecMap\ngit checkout v1.0.0\npip install -e .\n./reproduce.sh\n```\n\nSee [DATA_AVAILABILITY.md](DATA_AVAILABILITY.md) for complete reproduction details.\n\n## Documentation\n\n- [Examples](examples/) - Usage examples and tutorials\n- [Benchmarks](benchmarks/) - Performance benchmarks and comparisons\n- [API Reference](https://vecmap.readthedocs.io) - Full API documentation\n\n## Citation\n\n```bibtex\n@article{jordan2025vecmap,\n  title={VecMap: Ultrafast Exact Sequence Matching for CRISPR Screens},\n  author={Jordan, James M},\n  journal={bioRxiv},\n  year={2025},\n  doi={10.1101/2025.XX.XX.XXXXXX}\n}\n```\n\n## License\n\nMIT License. See [LICENSE](LICENSE) for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Ultrafast exact sequence matching in pure Python using NumPy vectorization",
    "version": "1.0.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/the-jordan-lab/VecMap/issues",
        "Documentation": "https://github.com/the-jordan-lab/VecMap/blob/main/README.md",
        "Homepage": "https://github.com/the-jordan-lab/VecMap"
    },
    "split_keywords": [
        "bioinformatics",
        " sequence alignment",
        " crispr",
        " vectorization",
        " exact matching"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "47f9a225978643766add6fc33d7efd78298d2fd8111442f7ddb9de8ce88aba24",
                "md5": "b1d3de777239e1a64a8ee77565f9d02f",
                "sha256": "599e868a01dd460800d3bf9b4f174965943e8795911fbc2c81bc7c93a2babe22"
            },
            "downloads": -1,
            "filename": "vecmap-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b1d3de777239e1a64a8ee77565f9d02f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 14154,
            "upload_time": "2025-07-16T02:42:35",
            "upload_time_iso_8601": "2025-07-16T02:42:35.338363Z",
            "url": "https://files.pythonhosted.org/packages/47/f9/a225978643766add6fc33d7efd78298d2fd8111442f7ddb9de8ce88aba24/vecmap-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "521348d1623cb82eb5049bbf2a937c28f8dc76078b6c0df9ff4fd8f81a3e9f51",
                "md5": "266601db9279e254f65284f1d3057018",
                "sha256": "092b7105367847b5a46af36b9b9cfc0eed63e269e2e0e1800420102a6fc00698"
            },
            "downloads": -1,
            "filename": "vecmap-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "266601db9279e254f65284f1d3057018",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 17058,
            "upload_time": "2025-07-16T02:42:36",
            "upload_time_iso_8601": "2025-07-16T02:42:36.398430Z",
            "url": "https://files.pythonhosted.org/packages/52/13/48d1623cb82eb5049bbf2a937c28f8dc76078b6c0df9ff4fd8f81a3e9f51/vecmap-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-16 02:42:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "the-jordan-lab",
    "github_project": "VecMap",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.24.3"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.0.3"
                ]
            ]
        },
        {
            "name": "biopython",
            "specs": [
                [
                    "==",
                    "1.81"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    "==",
                    "3.7.2"
                ]
            ]
        }
    ],
    "lcname": "vecmap"
}
        
Elapsed time: 0.42177s