ailist


Nameailist JSON
Version 2.1.3 PyPI version JSON
download
home_pagehttps://github.com/kylessmith/ailist
SummaryPython package for Augmented Interval List
upload_time2024-04-16 11:54:53
maintainerKyle S. Smith
docs_urlNone
authorKyle S. Smith
requires_python<4.0,>=3.10
licenseGPL-2.0-or-later
keywords cython interval ailist c
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Augmented Interval List

[![Build Status](https://travis-ci.org/kylessmith/ailist.svg?branch=master)](https://travis-ci.org/kylessmith/ailist) [![PyPI version](https://badge.fury.io/py/ailist.svg)](https://badge.fury.io/py/ailist)
[![Coffee](https://img.shields.io/badge/-buy_me_a%C2%A0coffee-gray?logo=buy-me-a-coffee&color=ff69b4)](https://www.buymeacoffee.com/kylessmith)

Augmented interval list (AIList) is a data structure for enumerating intersections 
between a query interval and an interval set. AILists have previously been shown 
to be faster than interval tree, NCList, and BEDTools.

This implementation is a Python wrapper of the one used in the original [AIList library][AIList_github].


Additonal wrapper functions have been created which allow easy user interface.

All citations should reference to [original paper][paper].

For full usage and installation [documentation][AIList_docs]

## Install

If you dont already have numpy and scipy installed, it is best to download
`Anaconda`, a python distribution that has them included.  
```
    https://continuum.io/downloads
```

Dependencies can be installed by:

```
    pip install -r requirements.txt
```

PyPI install, presuming you have all its requirements installed:
```
    pip install ailist
```

## Benchmark

Test numpy random integers:

```python
# ailist version: 0.1.7
from ailist import AIList
# ncls version: 0.0.53
from ncls import NCLS
# numpy version: 1.18.4
import numpy as np
# pandas version: 1.0.3
import pandas as pd
# quicksect version: 0.2.2
import quicksect

# Set seed
np.random.seed(100)


# First values
starts1 = np.random.randint(0, 100000, 100000)
ends1 = starts1 + np.random.randint(1, 10000, 100000)
ids1 = np.arange(len(starts1))
values1 = np.ones(len(starts1))

# Second values
starts2 = np.random.randint(0, 100000, 100000)
ends2 = starts2 + np.random.randint(1, 10000, 100000)
ids2 = np.arange(len(starts2))
values2 = np.ones(len(starts2))

```

| Library | Function | Time (µs) |
| --- | --- | --- |
| ncls | single overlap | 1170 |
| pandas | single overlap | 924 |
| quicksect | single overlap |  550 |
| ailist | single overlap | 73 |

| Library | Function | Time (s) | Max Memory (GB) |
| --- | --- | --- | --- |
| ncls | bulk overlap | 151 s | >50 |
| ailist | bulk overlap | 17.8 s | ~9 |

## Usage

```python
from ailist import AIList
import numpy as np

i = AIList()
i.add(15, 20)
i.add(10, 30)
i.add(17, 19)
i.add(5, 20)
i.add(12, 15)
i.add(30, 40)

# Print intervals
i.display()
# (15-20) (10-30) (17-19) (5-20) (12-15) (30-40)

# Find overlapping intervals
o = i.intersect(6, 15)
o.display()
# (5-20) (10-30) (12-15)

# Find index of overlaps
i.intersect_index(6, 15)
# array([3, 1, 4])

# Now i has been constructed/sorted
i.display()
# (5-20) (10-30) (12-15) (15-20) (17-19) (30-40)

# Can be done manually as well at any time
i.construct()

# Iterate over intervals
for x in i:
   print(x)
# Interval(5-20, 3)
# Interval(10-30, 1)
# Interval(12-15, 4)
# Interval(15-20, 0)
# Interval(17-19, 2)
# Interval(30-40, 5)

# Interval comparisons
j = AIList()
j.add(5, 15)
j.add(50, 60)

# Subtract regions
s = i - j #also: i.subtract(j)
s.display()
# (15-20) (15-30) (15-20) (17-19) (30-40) 

# Common regions
i + j #also: i.common(j)
# AIList
#  range: (5-15)
#    (5-15, 3)
#    (10-15, 1)
#    (12-15, 4)

# AIList can also add to from arrays
starts = np.arange(10,1000,100)
ends = starts + 50
ids = starts
values = np.ones(10)
i.from_array(starts, ends, ids, values)
i.display()
# (5-20) (10-30) (12-15) (15-20) (17-19) (30-40) 
# (10-60) (110-160) (210-260) (310-360) (410-460) 
# (510-560) (610-660) (710-760) (810-860) (910-960)

# Merge overlapping intervals
m = i.merge(gap=10)
m.display()
# (5-60) (110-160) (210-260) (310-360) (410-460) 
# (510-560) (610-660) (710-760) (810-860) (910-960)

# Find array of coverage
c = i.coverage()
c.head()
# 5    1.0
# 6    1.0
# 7    1.0
# 8    1.0
# 9    1.0
# dtype: float64

# Calculate window protection score
w = i.wps(5)
w.head()
# 5   -1.0
# 6   -1.0
# 7    1.0
# 8   -1.0
# 9   -1.0
# dtype: float64

# Filter to interval lengths between 3 and 20
fi = i.filter(3,20)
fi.display()
# (5-20) (10-30) (15-20) (30-40)

# Query by array
i.intersect_from_array(starts, ends, ids)
# (array([ 10,  10,  10,  10,  10,  10,  10, 110, 210, 310, 410, 510, 610,
#         710, 810, 910]),
# array([  5,   2,   0,   4,  10,   1,   3, 110, 210, 310, 410, 510, 610,
#        710, 810, 910]))

```


## Original paper

> Jianglin Feng,  Aakrosh Ratan,  Nathan C Sheffield; Augmented Interval List: a novel data structure for efficient genomic interval search, Bioinformatics, btz407, https://doi.org/10.1093/bioinformatics/btz407


[AIList_github]: https://github.com/databio/AIList
[paper]: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz407/5509521
[AIList_docs]: https://www.biosciencestack.com/static/ailist/docs/index.html
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kylessmith/ailist",
    "name": "ailist",
    "maintainer": "Kyle S. Smith",
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": "kyle.smith@stjude.org",
    "keywords": "cython, interval, ailist, c",
    "author": "Kyle S. Smith",
    "author_email": "kyle.smith@stjude.org",
    "download_url": "https://files.pythonhosted.org/packages/32/6b/d51299f5fdec5ddb714b7193456a4f0365ab37e9fa07122893f1d8476169/ailist-2.1.3.tar.gz",
    "platform": null,
    "description": "# Augmented Interval List\n\n[![Build Status](https://travis-ci.org/kylessmith/ailist.svg?branch=master)](https://travis-ci.org/kylessmith/ailist) [![PyPI version](https://badge.fury.io/py/ailist.svg)](https://badge.fury.io/py/ailist)\n[![Coffee](https://img.shields.io/badge/-buy_me_a%C2%A0coffee-gray?logo=buy-me-a-coffee&color=ff69b4)](https://www.buymeacoffee.com/kylessmith)\n\nAugmented interval list (AIList) is a data structure for enumerating intersections \nbetween a query interval and an interval set. AILists have previously been shown \nto be faster than interval tree, NCList, and BEDTools.\n\nThis implementation is a Python wrapper of the one used in the original [AIList library][AIList_github].\n\n\nAdditonal wrapper functions have been created which allow easy user interface.\n\nAll citations should reference to [original paper][paper].\n\nFor full usage and installation [documentation][AIList_docs]\n\n## Install\n\nIf you dont already have numpy and scipy installed, it is best to download\n`Anaconda`, a python distribution that has them included.  \n```\n    https://continuum.io/downloads\n```\n\nDependencies can be installed by:\n\n```\n    pip install -r requirements.txt\n```\n\nPyPI install, presuming you have all its requirements installed:\n```\n    pip install ailist\n```\n\n## Benchmark\n\nTest numpy random integers:\n\n```python\n# ailist version: 0.1.7\nfrom ailist import AIList\n# ncls version: 0.0.53\nfrom ncls import NCLS\n# numpy version: 1.18.4\nimport numpy as np\n# pandas version: 1.0.3\nimport pandas as pd\n# quicksect version: 0.2.2\nimport quicksect\n\n# Set seed\nnp.random.seed(100)\n\n\n# First values\nstarts1 = np.random.randint(0, 100000, 100000)\nends1 = starts1 + np.random.randint(1, 10000, 100000)\nids1 = np.arange(len(starts1))\nvalues1 = np.ones(len(starts1))\n\n# Second values\nstarts2 = np.random.randint(0, 100000, 100000)\nends2 = starts2 + np.random.randint(1, 10000, 100000)\nids2 = np.arange(len(starts2))\nvalues2 = np.ones(len(starts2))\n\n```\n\n| Library | Function | Time (\u00b5s) |\n| --- | --- | --- |\n| ncls | single overlap | 1170 |\n| pandas | single overlap | 924 |\n| quicksect | single overlap |  550 |\n| ailist | single overlap | 73 |\n\n| Library | Function | Time (s) | Max Memory (GB) |\n| --- | --- | --- | --- |\n| ncls | bulk overlap | 151 s | >50 |\n| ailist | bulk overlap | 17.8 s | ~9 |\n\n## Usage\n\n```python\nfrom ailist import AIList\nimport numpy as np\n\ni = AIList()\ni.add(15, 20)\ni.add(10, 30)\ni.add(17, 19)\ni.add(5, 20)\ni.add(12, 15)\ni.add(30, 40)\n\n# Print intervals\ni.display()\n# (15-20) (10-30) (17-19) (5-20) (12-15) (30-40)\n\n# Find overlapping intervals\no = i.intersect(6, 15)\no.display()\n# (5-20) (10-30) (12-15)\n\n# Find index of overlaps\ni.intersect_index(6, 15)\n# array([3, 1, 4])\n\n# Now i has been constructed/sorted\ni.display()\n# (5-20) (10-30) (12-15) (15-20) (17-19) (30-40)\n\n# Can be done manually as well at any time\ni.construct()\n\n# Iterate over intervals\nfor x in i:\n   print(x)\n# Interval(5-20, 3)\n# Interval(10-30, 1)\n# Interval(12-15, 4)\n# Interval(15-20, 0)\n# Interval(17-19, 2)\n# Interval(30-40, 5)\n\n# Interval comparisons\nj = AIList()\nj.add(5, 15)\nj.add(50, 60)\n\n# Subtract regions\ns = i - j #also: i.subtract(j)\ns.display()\n# (15-20) (15-30) (15-20) (17-19) (30-40) \n\n# Common regions\ni + j #also: i.common(j)\n# AIList\n#  range: (5-15)\n#    (5-15, 3)\n#    (10-15, 1)\n#    (12-15, 4)\n\n# AIList can also add to from arrays\nstarts = np.arange(10,1000,100)\nends = starts + 50\nids = starts\nvalues = np.ones(10)\ni.from_array(starts, ends, ids, values)\ni.display()\n# (5-20) (10-30) (12-15) (15-20) (17-19) (30-40) \n# (10-60) (110-160) (210-260) (310-360) (410-460) \n# (510-560) (610-660) (710-760) (810-860) (910-960)\n\n# Merge overlapping intervals\nm = i.merge(gap=10)\nm.display()\n# (5-60) (110-160) (210-260) (310-360) (410-460) \n# (510-560) (610-660) (710-760) (810-860) (910-960)\n\n# Find array of coverage\nc = i.coverage()\nc.head()\n# 5    1.0\n# 6    1.0\n# 7    1.0\n# 8    1.0\n# 9    1.0\n# dtype: float64\n\n# Calculate window protection score\nw = i.wps(5)\nw.head()\n# 5   -1.0\n# 6   -1.0\n# 7    1.0\n# 8   -1.0\n# 9   -1.0\n# dtype: float64\n\n# Filter to interval lengths between 3 and 20\nfi = i.filter(3,20)\nfi.display()\n# (5-20) (10-30) (15-20) (30-40)\n\n# Query by array\ni.intersect_from_array(starts, ends, ids)\n# (array([ 10,  10,  10,  10,  10,  10,  10, 110, 210, 310, 410, 510, 610,\n#         710, 810, 910]),\n# array([  5,   2,   0,   4,  10,   1,   3, 110, 210, 310, 410, 510, 610,\n#        710, 810, 910]))\n\n```\n\n\n## Original paper\n\n> Jianglin Feng,  Aakrosh Ratan,  Nathan C Sheffield; Augmented Interval List: a novel data structure for efficient genomic interval search, Bioinformatics, btz407, https://doi.org/10.1093/bioinformatics/btz407\n\n\n[AIList_github]: https://github.com/databio/AIList\n[paper]: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz407/5509521\n[AIList_docs]: https://www.biosciencestack.com/static/ailist/docs/index.html",
    "bugtrack_url": null,
    "license": "GPL-2.0-or-later",
    "summary": "Python package for Augmented Interval List",
    "version": "2.1.3",
    "project_urls": {
        "Documentation": "https://www.biosciencestack.com/static/ailist/docs/index.html",
        "Homepage": "https://github.com/kylessmith/ailist",
        "Repository": "https://github.com/kylessmith/ailist"
    },
    "split_keywords": [
        "cython",
        " interval",
        " ailist",
        " c"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "547da1b88a19895c70081949c448e67213b850f932352605066eff6f748f909a",
                "md5": "7534ecfa67b691201262e6e56d4e28bd",
                "sha256": "e6f4c7d24759c3b0b962fc242beeef9cb962500dec28b78eb85d6f0a60eeeab9"
            },
            "downloads": -1,
            "filename": "ailist-2.1.3-cp311-cp311-macosx_14_0_x86_64.whl",
            "has_sig": false,
            "md5_digest": "7534ecfa67b691201262e6e56d4e28bd",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": "<4.0,>=3.10",
            "size": 1046042,
            "upload_time": "2024-04-16T11:54:51",
            "upload_time_iso_8601": "2024-04-16T11:54:51.311528Z",
            "url": "https://files.pythonhosted.org/packages/54/7d/a1b88a19895c70081949c448e67213b850f932352605066eff6f748f909a/ailist-2.1.3-cp311-cp311-macosx_14_0_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "326bd51299f5fdec5ddb714b7193456a4f0365ab37e9fa07122893f1d8476169",
                "md5": "8b092601f4d23ca7cabcc435b041c77d",
                "sha256": "313d0d326294ebdc6d9f8ff046fabc910aa161bbfed39ebc7deb70bf96bfc66f"
            },
            "downloads": -1,
            "filename": "ailist-2.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "8b092601f4d23ca7cabcc435b041c77d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 1024265,
            "upload_time": "2024-04-16T11:54:53",
            "upload_time_iso_8601": "2024-04-16T11:54:53.479516Z",
            "url": "https://files.pythonhosted.org/packages/32/6b/d51299f5fdec5ddb714b7193456a4f0365ab37e9fa07122893f1d8476169/ailist-2.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-16 11:54:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kylessmith",
    "github_project": "ailist",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "ailist"
}
        
Elapsed time: 0.23141s