# Augmented Interval List
[![Build Status](https://travis-ci.org/kylessmith/ailist.svg?branch=master)](https://travis-ci.org/kylessmith/ailist) [![PyPI version](https://badge.fury.io/py/ailist.svg)](https://badge.fury.io/py/ailist)
[![Coffee](https://img.shields.io/badge/-buy_me_a%C2%A0coffee-gray?logo=buy-me-a-coffee&color=ff69b4)](https://www.buymeacoffee.com/kylessmith)
Augmented interval list (AIList) is a data structure for enumerating intersections
between a query interval and an interval set. AILists have previously been shown
to be faster than interval tree, NCList, and BEDTools.
This implementation is a Python wrapper of the one used in the original [AIList library][AIList_github].
Additonal wrapper functions have been created which allow easy user interface.
All citations should reference to [original paper][paper].
For full usage and installation [documentation][AIList_docs]
## Install
If you dont already have numpy and scipy installed, it is best to download
`Anaconda`, a python distribution that has them included.
```
https://continuum.io/downloads
```
Dependencies can be installed by:
```
pip install -r requirements.txt
```
PyPI install, presuming you have all its requirements installed:
```
pip install ailist
```
## Benchmark
Test numpy random integers:
```python
# ailist version: 0.1.7
from ailist import AIList
# ncls version: 0.0.53
from ncls import NCLS
# numpy version: 1.18.4
import numpy as np
# pandas version: 1.0.3
import pandas as pd
# quicksect version: 0.2.2
import quicksect
# Set seed
np.random.seed(100)
# First values
starts1 = np.random.randint(0, 100000, 100000)
ends1 = starts1 + np.random.randint(1, 10000, 100000)
ids1 = np.arange(len(starts1))
values1 = np.ones(len(starts1))
# Second values
starts2 = np.random.randint(0, 100000, 100000)
ends2 = starts2 + np.random.randint(1, 10000, 100000)
ids2 = np.arange(len(starts2))
values2 = np.ones(len(starts2))
```
| Library | Function | Time (µs) |
| --- | --- | --- |
| ncls | single overlap | 1170 |
| pandas | single overlap | 924 |
| quicksect | single overlap | 550 |
| ailist | single overlap | 73 |
| Library | Function | Time (s) | Max Memory (GB) |
| --- | --- | --- | --- |
| ncls | bulk overlap | 151 s | >50 |
| ailist | bulk overlap | 17.8 s | ~9 |
## Usage
```python
from ailist import AIList
import numpy as np
i = AIList()
i.add(15, 20)
i.add(10, 30)
i.add(17, 19)
i.add(5, 20)
i.add(12, 15)
i.add(30, 40)
# Print intervals
i.display()
# (15-20) (10-30) (17-19) (5-20) (12-15) (30-40)
# Find overlapping intervals
o = i.intersect(6, 15)
o.display()
# (5-20) (10-30) (12-15)
# Find index of overlaps
i.intersect_index(6, 15)
# array([3, 1, 4])
# Now i has been constructed/sorted
i.display()
# (5-20) (10-30) (12-15) (15-20) (17-19) (30-40)
# Can be done manually as well at any time
i.construct()
# Iterate over intervals
for x in i:
print(x)
# Interval(5-20, 3)
# Interval(10-30, 1)
# Interval(12-15, 4)
# Interval(15-20, 0)
# Interval(17-19, 2)
# Interval(30-40, 5)
# Interval comparisons
j = AIList()
j.add(5, 15)
j.add(50, 60)
# Subtract regions
s = i - j #also: i.subtract(j)
s.display()
# (15-20) (15-30) (15-20) (17-19) (30-40)
# Common regions
i + j #also: i.common(j)
# AIList
# range: (5-15)
# (5-15, 3)
# (10-15, 1)
# (12-15, 4)
# AIList can also add to from arrays
starts = np.arange(10,1000,100)
ends = starts + 50
ids = starts
values = np.ones(10)
i.from_array(starts, ends, ids, values)
i.display()
# (5-20) (10-30) (12-15) (15-20) (17-19) (30-40)
# (10-60) (110-160) (210-260) (310-360) (410-460)
# (510-560) (610-660) (710-760) (810-860) (910-960)
# Merge overlapping intervals
m = i.merge(gap=10)
m.display()
# (5-60) (110-160) (210-260) (310-360) (410-460)
# (510-560) (610-660) (710-760) (810-860) (910-960)
# Find array of coverage
c = i.coverage()
c.head()
# 5 1.0
# 6 1.0
# 7 1.0
# 8 1.0
# 9 1.0
# dtype: float64
# Calculate window protection score
w = i.wps(5)
w.head()
# 5 -1.0
# 6 -1.0
# 7 1.0
# 8 -1.0
# 9 -1.0
# dtype: float64
# Filter to interval lengths between 3 and 20
fi = i.filter(3,20)
fi.display()
# (5-20) (10-30) (15-20) (30-40)
# Query by array
i.intersect_from_array(starts, ends, ids)
# (array([ 10, 10, 10, 10, 10, 10, 10, 110, 210, 310, 410, 510, 610,
# 710, 810, 910]),
# array([ 5, 2, 0, 4, 10, 1, 3, 110, 210, 310, 410, 510, 610,
# 710, 810, 910]))
```
## Original paper
> Jianglin Feng, Aakrosh Ratan, Nathan C Sheffield; Augmented Interval List: a novel data structure for efficient genomic interval search, Bioinformatics, btz407, https://doi.org/10.1093/bioinformatics/btz407
[AIList_github]: https://github.com/databio/AIList
[paper]: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz407/5509521
[AIList_docs]: https://www.biosciencestack.com/static/ailist/docs/index.html
Raw data
{
"_id": null,
"home_page": "https://github.com/kylessmith/ailist",
"name": "ailist",
"maintainer": "Kyle S. Smith",
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": "kyle.smith@stjude.org",
"keywords": "cython, interval, ailist, c",
"author": "Kyle S. Smith",
"author_email": "kyle.smith@stjude.org",
"download_url": "https://files.pythonhosted.org/packages/32/6b/d51299f5fdec5ddb714b7193456a4f0365ab37e9fa07122893f1d8476169/ailist-2.1.3.tar.gz",
"platform": null,
"description": "# Augmented Interval List\n\n[![Build Status](https://travis-ci.org/kylessmith/ailist.svg?branch=master)](https://travis-ci.org/kylessmith/ailist) [![PyPI version](https://badge.fury.io/py/ailist.svg)](https://badge.fury.io/py/ailist)\n[![Coffee](https://img.shields.io/badge/-buy_me_a%C2%A0coffee-gray?logo=buy-me-a-coffee&color=ff69b4)](https://www.buymeacoffee.com/kylessmith)\n\nAugmented interval list (AIList) is a data structure for enumerating intersections \nbetween a query interval and an interval set. AILists have previously been shown \nto be faster than interval tree, NCList, and BEDTools.\n\nThis implementation is a Python wrapper of the one used in the original [AIList library][AIList_github].\n\n\nAdditonal wrapper functions have been created which allow easy user interface.\n\nAll citations should reference to [original paper][paper].\n\nFor full usage and installation [documentation][AIList_docs]\n\n## Install\n\nIf you dont already have numpy and scipy installed, it is best to download\n`Anaconda`, a python distribution that has them included. \n```\n https://continuum.io/downloads\n```\n\nDependencies can be installed by:\n\n```\n pip install -r requirements.txt\n```\n\nPyPI install, presuming you have all its requirements installed:\n```\n pip install ailist\n```\n\n## Benchmark\n\nTest numpy random integers:\n\n```python\n# ailist version: 0.1.7\nfrom ailist import AIList\n# ncls version: 0.0.53\nfrom ncls import NCLS\n# numpy version: 1.18.4\nimport numpy as np\n# pandas version: 1.0.3\nimport pandas as pd\n# quicksect version: 0.2.2\nimport quicksect\n\n# Set seed\nnp.random.seed(100)\n\n\n# First values\nstarts1 = np.random.randint(0, 100000, 100000)\nends1 = starts1 + np.random.randint(1, 10000, 100000)\nids1 = np.arange(len(starts1))\nvalues1 = np.ones(len(starts1))\n\n# Second values\nstarts2 = np.random.randint(0, 100000, 100000)\nends2 = starts2 + np.random.randint(1, 10000, 100000)\nids2 = np.arange(len(starts2))\nvalues2 = np.ones(len(starts2))\n\n```\n\n| Library | Function | Time (\u00b5s) |\n| --- | --- | --- |\n| ncls | single overlap | 1170 |\n| pandas | single overlap | 924 |\n| quicksect | single overlap | 550 |\n| ailist | single overlap | 73 |\n\n| Library | Function | Time (s) | Max Memory (GB) |\n| --- | --- | --- | --- |\n| ncls | bulk overlap | 151 s | >50 |\n| ailist | bulk overlap | 17.8 s | ~9 |\n\n## Usage\n\n```python\nfrom ailist import AIList\nimport numpy as np\n\ni = AIList()\ni.add(15, 20)\ni.add(10, 30)\ni.add(17, 19)\ni.add(5, 20)\ni.add(12, 15)\ni.add(30, 40)\n\n# Print intervals\ni.display()\n# (15-20) (10-30) (17-19) (5-20) (12-15) (30-40)\n\n# Find overlapping intervals\no = i.intersect(6, 15)\no.display()\n# (5-20) (10-30) (12-15)\n\n# Find index of overlaps\ni.intersect_index(6, 15)\n# array([3, 1, 4])\n\n# Now i has been constructed/sorted\ni.display()\n# (5-20) (10-30) (12-15) (15-20) (17-19) (30-40)\n\n# Can be done manually as well at any time\ni.construct()\n\n# Iterate over intervals\nfor x in i:\n print(x)\n# Interval(5-20, 3)\n# Interval(10-30, 1)\n# Interval(12-15, 4)\n# Interval(15-20, 0)\n# Interval(17-19, 2)\n# Interval(30-40, 5)\n\n# Interval comparisons\nj = AIList()\nj.add(5, 15)\nj.add(50, 60)\n\n# Subtract regions\ns = i - j #also: i.subtract(j)\ns.display()\n# (15-20) (15-30) (15-20) (17-19) (30-40) \n\n# Common regions\ni + j #also: i.common(j)\n# AIList\n# range: (5-15)\n# (5-15, 3)\n# (10-15, 1)\n# (12-15, 4)\n\n# AIList can also add to from arrays\nstarts = np.arange(10,1000,100)\nends = starts + 50\nids = starts\nvalues = np.ones(10)\ni.from_array(starts, ends, ids, values)\ni.display()\n# (5-20) (10-30) (12-15) (15-20) (17-19) (30-40) \n# (10-60) (110-160) (210-260) (310-360) (410-460) \n# (510-560) (610-660) (710-760) (810-860) (910-960)\n\n# Merge overlapping intervals\nm = i.merge(gap=10)\nm.display()\n# (5-60) (110-160) (210-260) (310-360) (410-460) \n# (510-560) (610-660) (710-760) (810-860) (910-960)\n\n# Find array of coverage\nc = i.coverage()\nc.head()\n# 5 1.0\n# 6 1.0\n# 7 1.0\n# 8 1.0\n# 9 1.0\n# dtype: float64\n\n# Calculate window protection score\nw = i.wps(5)\nw.head()\n# 5 -1.0\n# 6 -1.0\n# 7 1.0\n# 8 -1.0\n# 9 -1.0\n# dtype: float64\n\n# Filter to interval lengths between 3 and 20\nfi = i.filter(3,20)\nfi.display()\n# (5-20) (10-30) (15-20) (30-40)\n\n# Query by array\ni.intersect_from_array(starts, ends, ids)\n# (array([ 10, 10, 10, 10, 10, 10, 10, 110, 210, 310, 410, 510, 610,\n# 710, 810, 910]),\n# array([ 5, 2, 0, 4, 10, 1, 3, 110, 210, 310, 410, 510, 610,\n# 710, 810, 910]))\n\n```\n\n\n## Original paper\n\n> Jianglin Feng, Aakrosh Ratan, Nathan C Sheffield; Augmented Interval List: a novel data structure for efficient genomic interval search, Bioinformatics, btz407, https://doi.org/10.1093/bioinformatics/btz407\n\n\n[AIList_github]: https://github.com/databio/AIList\n[paper]: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz407/5509521\n[AIList_docs]: https://www.biosciencestack.com/static/ailist/docs/index.html",
"bugtrack_url": null,
"license": "GPL-2.0-or-later",
"summary": "Python package for Augmented Interval List",
"version": "2.1.3",
"project_urls": {
"Documentation": "https://www.biosciencestack.com/static/ailist/docs/index.html",
"Homepage": "https://github.com/kylessmith/ailist",
"Repository": "https://github.com/kylessmith/ailist"
},
"split_keywords": [
"cython",
" interval",
" ailist",
" c"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "547da1b88a19895c70081949c448e67213b850f932352605066eff6f748f909a",
"md5": "7534ecfa67b691201262e6e56d4e28bd",
"sha256": "e6f4c7d24759c3b0b962fc242beeef9cb962500dec28b78eb85d6f0a60eeeab9"
},
"downloads": -1,
"filename": "ailist-2.1.3-cp311-cp311-macosx_14_0_x86_64.whl",
"has_sig": false,
"md5_digest": "7534ecfa67b691201262e6e56d4e28bd",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": "<4.0,>=3.10",
"size": 1046042,
"upload_time": "2024-04-16T11:54:51",
"upload_time_iso_8601": "2024-04-16T11:54:51.311528Z",
"url": "https://files.pythonhosted.org/packages/54/7d/a1b88a19895c70081949c448e67213b850f932352605066eff6f748f909a/ailist-2.1.3-cp311-cp311-macosx_14_0_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "326bd51299f5fdec5ddb714b7193456a4f0365ab37e9fa07122893f1d8476169",
"md5": "8b092601f4d23ca7cabcc435b041c77d",
"sha256": "313d0d326294ebdc6d9f8ff046fabc910aa161bbfed39ebc7deb70bf96bfc66f"
},
"downloads": -1,
"filename": "ailist-2.1.3.tar.gz",
"has_sig": false,
"md5_digest": "8b092601f4d23ca7cabcc435b041c77d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 1024265,
"upload_time": "2024-04-16T11:54:53",
"upload_time_iso_8601": "2024-04-16T11:54:53.479516Z",
"url": "https://files.pythonhosted.org/packages/32/6b/d51299f5fdec5ddb714b7193456a4f0365ab37e9fa07122893f1d8476169/ailist-2.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-16 11:54:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kylessmith",
"github_project": "ailist",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "ailist"
}