Python bindings for the [Rust FFI](https://github.com/jguhlin/minimap2-rs/) [minimap2](https://github.com/lh3/minimap2/) library. In development! Feedback appreciated!
# Why?
[PyO3](https://github.com/PyO3/pyo3) makes it very easy to create Python libraries via Rust. Further, we can use [Polars](https://github.com/pola-rs/polars) to export results as a dataframe (which can be used as-is, or converted to Pandas). Python allows for faster experimentation with novel algorithms, integration into machine learning pipelines, and provides an opportunity for those not familiar with Rust nor C/C++ to use minimap2.
# Current State
Very early alpha. Please use, and open an issue for any features you need that are missing, and for any bugs you find.
# How to use
## Requirements
Polars and PyArrow, these should be installed when you install minimappers2
## Creating an Aligner Instance
```python
aligner = map_ont()
aligner.threads(4)
```
If you want an alignment performed, rather than just matches, enable .cigar()
```python
aligner = map_hifi()
aligner.cigar()
```
Please note, at this time the following syntax is **NOT** supported:
```python
aligner = map_ont().threads(4).cigar()
```
## Creating an index
```python
aligner.index("ref.fa")
```
To save a built-index, for future processing use:
```python
aligner.index_and_save("ref.fa", "ref.mmi")
```
Then next time you use the index will be faster if you use the saved index instead.
```python
aligner.load_index("ref.mmi")
```
## Aligning a Single Sequence
```python
query = Sequence(seq_name, seq)
aligner.map1(query)
# Example
seq = "CCAGAACGTACAAGGAAATATCCTCAAATTATCCCAAGAATTGTCCGCAGGAAATGGGGATAATTTCAGAAATGAGAG"
result = aligner.map1(Sequence("MySeq", seq))
```
Where seq_name and seq are both strings. The output is a Polars DataFrame.
## Aligning Multiple Sequences
```python
seqs = [Sequence("name of seq 1", seq1),
Sequence("name of seq 2", seq1)]
result = aligner.map(seqs)
```
# Example Notebook
Please see the [example notebook](https://github.com/jguhlin/minimap2-rs/blob/main/minimappers2/example/Exampe.ipynb) for more examples.
## Mapping a file
Please [open an issue](https://github.com/jguhlin/minimap2-rs/issues/new) if you need to map files from this API.
# Results
All results are returned as [Polars](https://github.com/pola-rs/polars) dataframes. You can convert Polars dataframes to Pandas dataframes with [.to_pandas()](https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.DataFrame.to_pandas.html#polars.DataFrame.to_pandas)
* Polars is the fastest dataframe library in the Python Ecosystem.
* Polars provides a nice data bridge between Rust and Python.
For more information, please see the [Polars User Guide](https://pola-rs.github.io/polars-book/user-guide/index.html) or the [Polars Guide for Pandas users](https://pola-rs.github.io/polars-book/user-guide/coming_from_pandas.html).
## Example of Results
Here is an image of the resulting dataframe
![Resulting Dataframe Image](https://raw.githubusercontent.com/jguhlin/minimap2-rs/main/minimappers2/images/minimappers2_df.png)
**NOTE** Mapq, Cigar, and others will not show up unless .cigar() is enabled on the aligner itself.
# Errors
As this is a very-early stage library, error checking is not yet implemented. When things crash you will likely need to restart your python interpreter (jupyter kernel). Let me know what happened and [open an issue](https://github.com/jguhlin/minimap2-rs/issues/new) and I will get to it.
## Compatability
* Linux: Yes
* Mac: Unknown
* Windows: Unlikely
* x86_64: Yes
* aarch64: Unknown (open an issue)
* neon: No (Open an issue)
* Google Colab: Yes
# Performance
Effort has been made to make this as performant as possible, but if you need more performance, please use minimap2 directly and import the results.
# Citation
You should cite the minimap2 papers if you use this in your work.
> Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences.
> *Bioinformatics*, **34**:3094-3100. [doi:10.1093/bioinformatics/bty191][doi]
and/or:
> Li, H. (2021). New strategies to improve minimap2 alignment accuracy.
> *Bioinformatics*, **37**:4572-4574. [doi:10.1093/bioinformatics/btab705][doi2]
# Changelog
## 0.1.5
* Updated minimap2-rs, polars, pyo3 deps
* Add new presets
## 0.1.4
* Update pyo3, polars, minimap2-rs, and mimalloc deps
## 0.1.1
* Update pyo3 and polars deps
* Add with_seq for indexing TODO
## 0.1.0
* Initial Functions implemented
* Return results as Polars dfs
# Funding
![Genomics Aotearoa](https://github.com/jguhlin/minimap2-rs/blob/main/info/genomics-aotearoa.png)
Raw data
{
"_id": null,
"home_page": null,
"name": "minimappers2",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "minimap2, bioinformatics, alignment, mapping",
"author": null,
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/b2/2b/86702e9300f3883f778941045fcc59ae861f44f3f2ae828a7ff83eba1331/minimappers2-0.1.7.tar.gz",
"platform": null,
"description": "Python bindings for the [Rust FFI](https://github.com/jguhlin/minimap2-rs/) [minimap2](https://github.com/lh3/minimap2/) library. In development! Feedback appreciated!\n\n# Why?\n[PyO3](https://github.com/PyO3/pyo3) makes it very easy to create Python libraries via Rust. Further, we can use [Polars](https://github.com/pola-rs/polars) to export results as a dataframe (which can be used as-is, or converted to Pandas). Python allows for faster experimentation with novel algorithms, integration into machine learning pipelines, and provides an opportunity for those not familiar with Rust nor C/C++ to use minimap2.\n\n# Current State\nVery early alpha. Please use, and open an issue for any features you need that are missing, and for any bugs you find.\n\n# How to use\n## Requirements\nPolars and PyArrow, these should be installed when you install minimappers2\n\n## Creating an Aligner Instance\n```python\naligner = map_ont()\naligner.threads(4)\n```\n\nIf you want an alignment performed, rather than just matches, enable .cigar() \n```python\naligner = map_hifi()\naligner.cigar()\n```\n\nPlease note, at this time the following syntax is **NOT** supported:\n```python\naligner = map_ont().threads(4).cigar()\n```\n\n## Creating an index\n```python\naligner.index(\"ref.fa\")\n```\n\nTo save a built-index, for future processing use:\n```python\naligner.index_and_save(\"ref.fa\", \"ref.mmi\")\n```\n\nThen next time you use the index will be faster if you use the saved index instead.\n```python\naligner.load_index(\"ref.mmi\")\n```\n\n## Aligning a Single Sequence\n```python\nquery = Sequence(seq_name, seq)\naligner.map1(query)\n\n# Example\nseq = \"CCAGAACGTACAAGGAAATATCCTCAAATTATCCCAAGAATTGTCCGCAGGAAATGGGGATAATTTCAGAAATGAGAG\"\nresult = aligner.map1(Sequence(\"MySeq\", seq))\n```\n\nWhere seq_name and seq are both strings. The output is a Polars DataFrame.\n\n## Aligning Multiple Sequences\n```python\nseqs = [Sequence(\"name of seq 1\", seq1), \n Sequence(\"name of seq 2\", seq1)]\nresult = aligner.map(seqs)\n```\n\n# Example Notebook\nPlease see the [example notebook](https://github.com/jguhlin/minimap2-rs/blob/main/minimappers2/example/Exampe.ipynb) for more examples.\n\n## Mapping a file\nPlease [open an issue](https://github.com/jguhlin/minimap2-rs/issues/new) if you need to map files from this API.\n\n# Results\nAll results are returned as [Polars](https://github.com/pola-rs/polars) dataframes. You can convert Polars dataframes to Pandas dataframes with [.to_pandas()](https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.DataFrame.to_pandas.html#polars.DataFrame.to_pandas)\n\n* Polars is the fastest dataframe library in the Python Ecosystem. \n* Polars provides a nice data bridge between Rust and Python.\n\nFor more information, please see the [Polars User Guide](https://pola-rs.github.io/polars-book/user-guide/index.html) or the [Polars Guide for Pandas users](https://pola-rs.github.io/polars-book/user-guide/coming_from_pandas.html).\n\n## Example of Results\nHere is an image of the resulting dataframe\n![Resulting Dataframe Image](https://raw.githubusercontent.com/jguhlin/minimap2-rs/main/minimappers2/images/minimappers2_df.png)\n\n**NOTE** Mapq, Cigar, and others will not show up unless .cigar() is enabled on the aligner itself.\n\n# Errors\nAs this is a very-early stage library, error checking is not yet implemented. When things crash you will likely need to restart your python interpreter (jupyter kernel). Let me know what happened and [open an issue](https://github.com/jguhlin/minimap2-rs/issues/new) and I will get to it.\n\n## Compatability\n\n* Linux: Yes\n* Mac: Unknown\n* Windows: Unlikely\n\n* x86_64: Yes\n* aarch64: Unknown (open an issue)\n* neon: No (Open an issue)\n\n* Google Colab: Yes\n\n# Performance\nEffort has been made to make this as performant as possible, but if you need more performance, please use minimap2 directly and import the results.\n\n# Citation\nYou should cite the minimap2 papers if you use this in your work.\n\n> Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences.\n> *Bioinformatics*, **34**:3094-3100. [doi:10.1093/bioinformatics/bty191][doi]\n\nand/or:\n\n> Li, H. (2021). New strategies to improve minimap2 alignment accuracy.\n> *Bioinformatics*, **37**:4572-4574. [doi:10.1093/bioinformatics/btab705][doi2]\n\n# Changelog\n## 0.1.5 \n* Updated minimap2-rs, polars, pyo3 deps\n* Add new presets\n\n## 0.1.4 \n* Update pyo3, polars, minimap2-rs, and mimalloc deps\n\n## 0.1.1\n* Update pyo3 and polars deps\n* Add with_seq for indexing TODO\n\n## 0.1.0\n* Initial Functions implemented\n* Return results as Polars dfs\n\n# Funding\n![Genomics Aotearoa](https://github.com/jguhlin/minimap2-rs/blob/main/info/genomics-aotearoa.png)\n\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python wrapper for minimap2-rs",
"version": "0.1.7",
"project_urls": {
"homepage": "https://github.com/jguhlin/minimap2-rs",
"repository": "https://github.com/jguhlin/minimap2-rs"
},
"split_keywords": [
"minimap2",
" bioinformatics",
" alignment",
" mapping"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "df7e0c98585122bb0c2c844cc2277b3a7fc55b54b70156dc750cafab7290ce08",
"md5": "be9a46f98ad21fc3f48602b22f7b0b3d",
"sha256": "22af4946fc0b7991a7891daceed8756738e98d3ae9b17ce17ee6b6a09842c708"
},
"downloads": -1,
"filename": "minimappers2-0.1.7-cp37-abi3-manylinux_2_34_x86_64.whl",
"has_sig": false,
"md5_digest": "be9a46f98ad21fc3f48602b22f7b0b3d",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.7",
"size": 4208123,
"upload_time": "2025-01-08T00:02:20",
"upload_time_iso_8601": "2025-01-08T00:02:20.256407Z",
"url": "https://files.pythonhosted.org/packages/df/7e/0c98585122bb0c2c844cc2277b3a7fc55b54b70156dc750cafab7290ce08/minimappers2-0.1.7-cp37-abi3-manylinux_2_34_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b22b86702e9300f3883f778941045fcc59ae861f44f3f2ae828a7ff83eba1331",
"md5": "f68d253f3ef5ca0c3a169eb8760a3437",
"sha256": "c94b6a9b7fa807a3586719ce806a50e1d82800b86f704019f22e8220f12c4e3d"
},
"downloads": -1,
"filename": "minimappers2-0.1.7.tar.gz",
"has_sig": false,
"md5_digest": "f68d253f3ef5ca0c3a169eb8760a3437",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 2053173,
"upload_time": "2025-01-08T00:02:24",
"upload_time_iso_8601": "2025-01-08T00:02:24.547138Z",
"url": "https://files.pythonhosted.org/packages/b2/2b/86702e9300f3883f778941045fcc59ae861f44f3f2ae828a7ff83eba1331/minimappers2-0.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-08 00:02:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jguhlin",
"github_project": "minimap2-rs",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "minimappers2"
}