# VecMap
[](https://badge.fury.io/py/vecmap)
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
Ultrafast exact sequence matching using NumPy vectorization. Designed for CRISPR screens and barcode mapping where exact matching is biologically required.
**Paper**: [bioRxiv preprint](https://doi.org/10.1101/2025.XX.XX.XXXXXX)
## Installation
```bash
pip install vecmap
```
## Quick Start
```bash
# Command line
vecmap -r reference.fa -q reads.fq -o alignments.txt
# Python API
from vecmap import vecmap
alignments = vecmap(reference_seq, [(read_seq, read_id), ...])
```
## Performance
Benchmarks on human transcriptome (42,027 ± 1,856 reads/second in pure Python):
| Tool | Reads/sec (mean ± SD) | Language | Notes |
|------|----------------------|----------|-------|
| VecMap | 42,027 ± 1,856 | Python | Exact matching only |
| Minimap2 | 173,460 ± 5,203 | C | General purpose aligner |
| BWA-MEM | 60,306 ± 2,418 | C | General purpose aligner |
For CRISPR screening specifically:
- VecMap: 18,948 ± 892 reads/sec
- 1.9× faster than MAGeCK
- 3.8× faster than CRISPResso2
*Performance measured on Apple M1 Max, 32GB RAM, Python 3.11.5, NumPy 2.0.0*
## Applications
### CRISPR Guide Detection
```python
from vecmap.applications import CRISPRGuideDetector
guides = {
"KRAS_sg1": "ACGTACGTACGTACGTACGT",
"TP53_sg1": "GGCCGGCCGGCCGGCCGGCC"
}
detector = CRISPRGuideDetector(guides)
results = detector.detect_guides(reads)
counts = detector.summarize_detection(results)
```
### Barcode Demultiplexing
```python
from vecmap.applications import BarcodeProcessor
processor = BarcodeProcessor(
barcode_whitelist=whitelist,
barcode_length=16,
umi_length=10
)
corrected = processor.correct_barcodes(processor.extract_barcodes(reads))
```
## Reproducibility
To reproduce all benchmarks and figures from the paper:
```bash
git clone https://github.com/the-jordan-lab/VecMap.git
cd VecMap
git checkout v1.0.0
pip install -e .
./reproduce.sh
```
See [DATA_AVAILABILITY.md](DATA_AVAILABILITY.md) for complete reproduction details.
## Documentation
- [Examples](examples/) - Usage examples and tutorials
- [Benchmarks](benchmarks/) - Performance benchmarks and comparisons
- [API Reference](https://vecmap.readthedocs.io) - Full API documentation
## Citation
```bibtex
@article{jordan2025vecmap,
title={VecMap: Ultrafast Exact Sequence Matching for CRISPR Screens},
author={Jordan, James M},
journal={bioRxiv},
year={2025},
doi={10.1101/2025.XX.XX.XXXXXX}
}
```
## License
MIT License. See [LICENSE](LICENSE) for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/the-jordan-lab/VecMap",
"name": "vecmap",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "\"James M. Jordan\" <jjordan@bio.fsu.edu>",
"keywords": "bioinformatics, sequence alignment, CRISPR, vectorization, exact matching",
"author": "James M. Jordan",
"author_email": "\"James M. Jordan\" <jjordan@bio.fsu.edu>",
"download_url": "https://files.pythonhosted.org/packages/52/13/48d1623cb82eb5049bbf2a937c28f8dc76078b6c0df9ff4fd8f81a3e9f51/vecmap-1.0.0.tar.gz",
"platform": null,
"description": "# VecMap\n\n[](https://badge.fury.io/py/vecmap)\n[](https://opensource.org/licenses/MIT)\n[](https://www.python.org/downloads/)\n\nUltrafast exact sequence matching using NumPy vectorization. Designed for CRISPR screens and barcode mapping where exact matching is biologically required.\n\n**Paper**: [bioRxiv preprint](https://doi.org/10.1101/2025.XX.XX.XXXXXX)\n\n## Installation\n\n```bash\npip install vecmap\n```\n\n## Quick Start\n\n```bash\n# Command line\nvecmap -r reference.fa -q reads.fq -o alignments.txt\n\n# Python API\nfrom vecmap import vecmap\nalignments = vecmap(reference_seq, [(read_seq, read_id), ...])\n```\n\n## Performance\n\nBenchmarks on human transcriptome (42,027 \u00b1 1,856 reads/second in pure Python):\n\n| Tool | Reads/sec (mean \u00b1 SD) | Language | Notes |\n|------|----------------------|----------|-------|\n| VecMap | 42,027 \u00b1 1,856 | Python | Exact matching only |\n| Minimap2 | 173,460 \u00b1 5,203 | C | General purpose aligner |\n| BWA-MEM | 60,306 \u00b1 2,418 | C | General purpose aligner |\n\nFor CRISPR screening specifically:\n- VecMap: 18,948 \u00b1 892 reads/sec\n- 1.9\u00d7 faster than MAGeCK\n- 3.8\u00d7 faster than CRISPResso2\n\n*Performance measured on Apple M1 Max, 32GB RAM, Python 3.11.5, NumPy 2.0.0*\n\n## Applications\n\n### CRISPR Guide Detection\n\n```python\nfrom vecmap.applications import CRISPRGuideDetector\n\nguides = {\n \"KRAS_sg1\": \"ACGTACGTACGTACGTACGT\",\n \"TP53_sg1\": \"GGCCGGCCGGCCGGCCGGCC\"\n}\n\ndetector = CRISPRGuideDetector(guides)\nresults = detector.detect_guides(reads)\ncounts = detector.summarize_detection(results)\n```\n\n### Barcode Demultiplexing\n\n```python\nfrom vecmap.applications import BarcodeProcessor\n\nprocessor = BarcodeProcessor(\n barcode_whitelist=whitelist,\n barcode_length=16,\n umi_length=10\n)\n\ncorrected = processor.correct_barcodes(processor.extract_barcodes(reads))\n```\n\n## Reproducibility\n\nTo reproduce all benchmarks and figures from the paper:\n\n```bash\ngit clone https://github.com/the-jordan-lab/VecMap.git\ncd VecMap\ngit checkout v1.0.0\npip install -e .\n./reproduce.sh\n```\n\nSee [DATA_AVAILABILITY.md](DATA_AVAILABILITY.md) for complete reproduction details.\n\n## Documentation\n\n- [Examples](examples/) - Usage examples and tutorials\n- [Benchmarks](benchmarks/) - Performance benchmarks and comparisons\n- [API Reference](https://vecmap.readthedocs.io) - Full API documentation\n\n## Citation\n\n```bibtex\n@article{jordan2025vecmap,\n title={VecMap: Ultrafast Exact Sequence Matching for CRISPR Screens},\n author={Jordan, James M},\n journal={bioRxiv},\n year={2025},\n doi={10.1101/2025.XX.XX.XXXXXX}\n}\n```\n\n## License\n\nMIT License. See [LICENSE](LICENSE) for details.\n",
"bugtrack_url": null,
"license": null,
"summary": "Ultrafast exact sequence matching in pure Python using NumPy vectorization",
"version": "1.0.0",
"project_urls": {
"Bug Tracker": "https://github.com/the-jordan-lab/VecMap/issues",
"Documentation": "https://github.com/the-jordan-lab/VecMap/blob/main/README.md",
"Homepage": "https://github.com/the-jordan-lab/VecMap"
},
"split_keywords": [
"bioinformatics",
" sequence alignment",
" crispr",
" vectorization",
" exact matching"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "47f9a225978643766add6fc33d7efd78298d2fd8111442f7ddb9de8ce88aba24",
"md5": "b1d3de777239e1a64a8ee77565f9d02f",
"sha256": "599e868a01dd460800d3bf9b4f174965943e8795911fbc2c81bc7c93a2babe22"
},
"downloads": -1,
"filename": "vecmap-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b1d3de777239e1a64a8ee77565f9d02f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 14154,
"upload_time": "2025-07-16T02:42:35",
"upload_time_iso_8601": "2025-07-16T02:42:35.338363Z",
"url": "https://files.pythonhosted.org/packages/47/f9/a225978643766add6fc33d7efd78298d2fd8111442f7ddb9de8ce88aba24/vecmap-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "521348d1623cb82eb5049bbf2a937c28f8dc76078b6c0df9ff4fd8f81a3e9f51",
"md5": "266601db9279e254f65284f1d3057018",
"sha256": "092b7105367847b5a46af36b9b9cfc0eed63e269e2e0e1800420102a6fc00698"
},
"downloads": -1,
"filename": "vecmap-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "266601db9279e254f65284f1d3057018",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 17058,
"upload_time": "2025-07-16T02:42:36",
"upload_time_iso_8601": "2025-07-16T02:42:36.398430Z",
"url": "https://files.pythonhosted.org/packages/52/13/48d1623cb82eb5049bbf2a937c28f8dc76078b6c0df9ff4fd8f81a3e9f51/vecmap-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-16 02:42:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "the-jordan-lab",
"github_project": "VecMap",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": [
[
"==",
"1.24.3"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.0.3"
]
]
},
{
"name": "biopython",
"specs": [
[
"==",
"1.81"
]
]
},
{
"name": "matplotlib",
"specs": [
[
"==",
"3.7.2"
]
]
}
],
"lcname": "vecmap"
}