pyslow5

Name	pyslow5 JSON
Version	1.1.0 JSON
	download
home_page	https://github.com/hasindu2008/slow5lib
Summary	slow5lib python bindings
upload_time	2023-08-12 08:13:21
maintainer	Hasindu Gamaarachchi
docs_url	None
author	Hasindu Gamaarachchi, Sasha Jenner, James Ferguson
requires_python	>=3.4.3
license	MIT
keywords	nanopore slow5 signal
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI
coveralls test coverage	No coveralls.

            # pyslow5 python library

The slow5 python library (pyslow5) allows a user to read and write slow5/blow5 files.

## Installation

Initial setup and example info for environment

###### slow5lib needs python3.4.2 or higher.

If you only want to use the python library, then you can simply install using pip

Using a virtual environment (see below if you need to install python)

#### Optional zstd compression

You can optionally enable [*zstd* compression](https://facebook.github.io/zstd) support when building *slow5lib/pyslow5*. This requires __zstd 1.3 or higher development libraries__ installed on your system:

```sh
On Debian/Ubuntu : sudo apt-get libzstd1-dev
On Fedora/CentOS : sudo yum libzstd-devel
On OS X : brew install zstd
```

BLOW5 files compressed with *zstd* offer smaller file size and better performance compared to the default *zlib*. However, *zlib* runtime library is available by default on almost all distributions unlike *zstd* and thus files compressed with *zlib* will be more 'portable'.

### Install from pypi

```bash
python3 -m venv path/to/slow5libvenv
source path/to/slow5libvenv/bin/activate
python3 -m pip install --upgrade pip

# do this separately, after the libs above
# zlib only build
python3 -m pip install pyslow5

# for zstd build, run the following
export PYSLOW5_ZSTD=1
python3 -m pip install pyslow5
```

### Dev install

```bash
# If your native python3 meets this requirement, you can use that, or use a
# specific version installed with deadsnakes below. If you install with deadsnakes,
# you will need to call that specific python, such as python3.8 or python3.9,
# in all the following commands until you create a virtual environment with venv.
# Then once activated, you can just use python3.

# To install a specific version of python, the deadsnakes ppa is a good place to start
# This is an example for installing python3.8
# you can then call that specific python version
# > python3.8 -m pip --version
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt install python3.8 python3.8-dev python3.8-venv


# get zlib1g-dev
sudo apt-get update && sudo apt-get install -y zlib1g-dev

# Check with
python3 --version

# You will also need the python headers if you don't already have them installed.

sudo apt-get install python3-dev
```

Building and installing the python library.

```bash
python3 -m venv /path/to/slow5libvenv
source /path/to/slow5libvenv/bin/activate
python3 -m pip install --upgrade pip

git clone git@github.com:hasindu2008/slow5lib.git
cd slow5lib

# New build method to work with setuptools deprication
python3 -m pip install .

# This should not require sudo if using a python virtual environment/venv
# confirm installation, and find pyslow5==<version>
python3 -m pip freeze

# Ensure slow5 library is working by running the basic tests
python3 ./python/example.py


# To Remove the library
python3 -m pip uninstall pyslow5



# Legacy build methods - not recommended
# CHOOSE A OR B:
# (B is the cleanest method)
# |=======================================================================|
# |A. Install with pip if wheel is present, otherwise it uses setuptools  |
    python3 -m pip install . --use-feature=in-tree-build
# |=======================================================================|
# |B. Or build and install manually with setup.py                         |
# |build the package                                                      |
    python3 setup.py build
# |If all went well, install the package                                  |
    python3 setup.py install
# |=======================================================================|

```

## Usage

### Reading/writing a file

#### `Open(FILE, mode, rec_press="zlib", sig_press="svb-zd", DEBUG=0)`:

The pyslow5 library has one main Class, `pyslow5.Open` which opens a slow5/blow5 (slow5 for easy reference) file for reading/writing.

`FILE`: the file or filepath of the slow5 file to open
`mode`: mode in which to open the file.
+ `r`= read only
+ `w`= write/overwrite
+ `a`= append

This is designed to mimic Python's native Open() to help users remember the syntax

To set the record and signal compression methods, use the following `rec_press` and `sig_press` optional args, however these are only used with `mode='w'`. Any append will use whatever is already set in the file.

Compression Options:

`rec_press`:
- "none"
- "zlib" [default]
- "zstd" [requires `export PYSLOW5_ZSTD=1` when building]

`sig_press`:
- "none"
- "svb-zd" [default]

Example:

```python
import pyslow5

# open file
s5 = pyslow5.Open('examples/example.slow5','r')
```

When opening a slow5 file for the first time, and index will be created and saved in the same directory as the file being read. This index will then be loaded. For files that already have an index, that index will be loaded.

#### `get_read_ids()`:

returns a list and total number of reads from the index.
If there is no index, it creates one first.

Example:

```python
read_ids, num_reads = s5.get_read_ids()

print(read_ids)
print("number of reads: {}".format(num_reads))
```

#### `seq_reads(pA=False, aux=None)`:

Access all reads sequentially in an opened slow5.
+ If readID is not found, `None` is returned.
+ pA = Bool for converting signal to picoamps.
+ aux = `str` '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, `None` if `<attr_name>` not found
+ returns `dict` = dictionary of main fields for read_id, with any aux fields added

Example:

```python
# create generator
reads = s5.seq_reads()

# print all readIDs
for read in reads:
    print(read['read_id'])

# or use directly in a for loop
for read in s5.seq_reads(pA=True, aux='all'):
    print("read_id:", read['read_id'])
    print("read_group:", read['read_group'])
    print("digitisation:", read['digitisation'])
    print("offset:", read['offset'])
    print("range:", read['range'])
    print("sampling_rate:", read['sampling_rate'])
    print("len_raw_signal:", read['len_raw_signal'])
    print("signal:", read['signal'][:10])
    print("================================")
```


#### `seq_reads_multi(threads=4, batchsize=4096, pA=False, aux=None)`:

Access all reads sequentially in an opened slow5, using multiple threads.
+ If readID is not found, `None` is returned.
+ threads = number of threads to use in C backend.
+ batchsize = number of reads to fetch at a time. Higher numbers use more ram, but is more efficient with more threads.
+ pA = Bool for converting signal to picoamps.
+ aux = `str` '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, `None` if `<attr_name>` not found
+ returns `dict` = dictionary of main fields for read_id, with any aux fields added

Example:

```python
# create generator
reads = s5.seq_reads_multi(threads=2, batchsize=3)

# print all readIDs
for read in reads:
    print(read['read_id'])

# or use directly in a for loop
for read in s5.seq_reads_multi(threads=2, batchsize=3, pA=True, aux='all'):
    print("read_id:", read['read_id'])
    print("read_group:", read['read_group'])
    print("digitisation:", read['digitisation'])
    print("offset:", read['offset'])
    print("range:", read['range'])
    print("sampling_rate:", read['sampling_rate'])
    print("len_raw_signal:", read['len_raw_signal'])
    print("signal:", read['signal'][:10])
    print("================================")
```

#### `get_read(readID, pA=False, aux=None)`:

Access a specific read using a unique readID. This is a ranom access method, using the index.
+ If readID is not found, `None` is returned.
+ pA = Bool for converting signal to picoamps.
+ aux = `str` '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, `None` if `<attr_name>` not found
+ returns `dict` = dictionary of main fields for read_id, with any aux fields added

Example:

```python
readID = "r1"
read = s5.get_read(readID, pA=True, aux=["read_number", "start_mux"])
if read is not None:
    print("read_id:", read['read_id'])
    print("len_raw_signal:", read['len_raw_signal'])
```


#### `get_read_list(read_list, pA=False, aux=None)`:

Access a list of specific reads using a list `read_list` of unique readIDs. This is a random access method using the index. If an index does not exist, it will create one first.
+ If readID is not found, `None` is returned.
+ pA = Bool for converting signal to picoamps.
+ aux = `str` '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, `None` if `<attr_name>` not found
+ returns `dict` = dictionary of main fields for read_id, with any aux fields added

Example:

```python
read_list = ["r1", "r3", "null_read", "r5", "r2", "r1"]
selected_reads = s5.get_read_list(read_list)
for r, read in zip(read_list,selected_reads):
    if read is not None:
        print(r, read['read_id'])
    else:
        print(r, "read not found")
```


#### `get_read_list_multi(read_list, threads=4, batchsize=100, pA=False, aux=None):`:

Access a list of specific reads using a list `read_list` of unique readIDs using multiple threads. This is a random access method using the index. If an index does not exist, it will create one first.
+ If readID is not found, `None` is returned.
+ threads = number of threads to use in C backend
+ batchsize = number of reads to fetch at a time. Higher numbers use more ram, but is more efficient with more threads.
+ pA = Bool for converting signal to picoamps.
+ aux = `str` '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, `None` if `<attr_name>` not found
+ returns `dict` = dictionary of main fields for read_id, with any aux fields added
Example:

```python
read_list = ["r1", "r3", "null_read", "r5", "r2", "r1"]
selected_reads = s5.get_read_list_multi(read_list, threads=2, batchsize=3)
for r, read in zip(read_list, selected_reads):
    if read is not None:
        print(r, read['read_id'])
    else:
        print(r, "read not found")
```

#### `get_num_read_groups()`:
**NEW: from version 1.1.0+**

Return an int for the number of read_groups present in file

#### `get_header_names()`:

Returns a list containing the uninon of header names from all read_groups

#### `get_header_value(attr, read_group=0)`:

Returns a `str` of the value of a header attribute (`attr`) for a particular read_group.
Returns `None` if value can't be found

#### `get_all_headers(read_group=0)`:

Returns a dictionary with all header attributes and values for a particular read_group
If there are values present for one read_group, and not for another, the attribute will still be returned for the read_group without, but with a value of `None`.

#### `get_aux_names()`:

Returns an ordered list of auxiliary attribute names. (same order as get_aux_types())

This is used for understanding which auxiliary attributes are available within the slow5 file, and providing selections to the `aux` keyword argument in the above functoions

#### `get_aux_types()`:

Returns an ordered list of auxiliary attribute types (same order as get_aux_names())

This can mostly be ignored, but will be used in error tracing in the future, as auxiliary field requests have multiple types, each with their own calls, and not all are used. It could be the case a call for an auxiliary filed fails, and knowing which type the field is requesting is very helpful in understanding which function in C is being called, that could be causing the error.

#### `get_aux_enum_labels(label)`:

Returns an ordered list representing the values in the enum struct in the type header.

The value in the read can then be used to access the labels as an index to the list.

Example:

```python
s5 = slow5.Open(file,'w')
end_reason_labels = s5.get_aux_enum_labels('end_reason')
print(end_reason_labels)

> ['unknown', 'partial', 'mux_change', 'unblock_mux_change', 'signal_positive', 'signal_negative']
# or from newer datsets
> ["unknown", "mux_change", "unblock_mux_change", "data_service_unblock_mux_change", "signal_positive", "signal_negative"]

readID = "r1"
read = s5.get_read(readID, aux='all')
er_index = read['end_reason']
er = end_reason_labels[er_index]

print("{}: {}".format(er_index, er))

> 4: signal_positive
```

### Writing a file

To write a file, `mode` in `Open()` must be set to `'w'` and when appending, `'a'`

#### `get_empty_header(aux=False)`:

Returns a dictionary containing all known header attributes with their values set to `None`.

User can modify each value, and add or remove attributes to be used has header items.
All values end up stored as strings, and anything left as `None` will be skipped.
To write header, see `write_header()`

If `aux=True`, an ordered list of strings for the enum `end_reason` will be returned.
This can be modified depending on the end reason.

Example:

```python
s5 = slow5.Open(file,'w')
header = s5.get_empty_header()
```

`end_reason` enum example

```python
s5 = slow5.Open(file, w)
header, end_reason_labels = s5.get_empty_header(aux=True)
```

#### `write_header(header, read_group=0, end_reason_labels=None)`:

Write header to file

+ `header` = populated dictionary from `get_empty_header()`
+ read_group = read group integer for when multiple runs are written to the same slow5 file
+ end_reason_labels = ordered list used for end_reason enum
+ returns 0 on success, <0 on error with error code

You must write `read_group=0` (default) first before writing any other read_groups, and it is advised to write read_groups in sequential order.

Example:

```python
# Get some empty headers
header = s5.get_empty_header()
header2 = s5.get_empty_header()

# Populate headers with some test data
counter = 0
for i in header:
    header[i] = "test_{}".format(counter)
    counter += 1

for i in header2:
    header2[i] = "test_{}".format(counter)
    counter += 1

# Write first read group
ret = s5.write_header(header)
print("ret: write_header(): {}".format(ret))
# Write second read group, etc
ret = s5.write_header(header2, read_group=1)
print("ret: write_header(): {}".format(ret))
```

`end_reason` example:

```python
# Get some empty headers
header, end_reason_labels = s5.get_empty_header(aux=True)

# Populate headers with some test data
counter = 0
for i in header:
    header[i] = "test_{}".format(counter)
    counter += 1

# Write first read group
ret = s5.write_header(header, end_reason_labels=end_reason_labels)
print("ret: write_header(): {}".format(ret))
```

#### `get_empty_record(aux=False)`:

Get empty read record for populating with data. Use with `write_record()`

+ aux = Bool for returning empty aux dictionary as well as read dictionary
+ returns a single read dictionary or a read and aux dictionary depending on aux flag

Example:
```python
# open some file to read. We will copy the data then write it
# including aux fields
s5_read = slow5.Open(read_file,'r')
reads = s5_read.seq_reads(aux='all')

# For each read in s5_read...
for read in reads:
    # get an empty record and aux dictionary
    record, aux = s5.get_empty_record(aux=True)
    # for each field in read...
    for i in read:
        # if the field is in the record dictionary...
        if i in record:
            # copy the value over...
            record[i] = read[i]
        do same for aux dictionary
        if i in aux:
            aux[i] = read[i]
    # write the record
    ret = s5.write_record(record, aux)
    print("ret: write_record(): {}".format(ret))
```

#### `write_record(record, aux=None)`:

Write a record and optional aux fields.

+ record = a populated dictionary from `get_empty_record()`
+ aux = an empty aux record returned by `get_empty_record(aux=True)`
+ returns 0 on success and -1 on error/failure

Example:

```python

record, aux = s5.get_empty_record(aux=True)
# populate record, aux dictionaries
#....
# Write record
ret = s5.write_record(record, aux)
print("ret: write_record(): {}".format(ret))
```


#### `write_record_batch(records, threads=4, batchsize=4096, aux=None)`:

Write a record and optional aux fields, using multiple threads

+ records = a dictionary of dictionaries where each entry is a populated form of `get_empty_record()` with the key of each being the read['read_id'].
+ threads = number of threads to use in the C backend.
+ batchsize = number of reads to write at a time. If parsing 1000 records, with batchsize=250 and threads=4, 4 threads will be spawned 4 times to write 250 records to the file before returning
+ aux = an empty aux record returned by `get_empty_record(aux=True)`
+ returns 0 on success and -1 on error/failure

Example:

```python

record, aux = s5.get_empty_record(aux=True)
# populate record, aux
#....
records[record['read_id']] = record
auxs[record['read_id']] = aux
# Write record
ret = s5.write_record_batch(records, threads=2, batchsize=3, aux=auxs)
print("ret: write_record(): {}".format(ret))
```

#### `close()`:

Closes a record open for writing or appending, and writes an End Of File (EOF) flag.

If not explicitly closed, when the `s5` object goes out of context in python, it will also trigger a close to attempt to avoid having a missing EOF.

Please call this when you are finished writing a file.

Example:

```python
s5 = slow5.Open(file,'w')

# do some writing....

# Write's EOF and closes file
s5.close()
```

## Citation

Please cite the following in your publications when using *slow5lib/pyslow5*:

> Gamaarachchi, H., Samarakoon, H., Jenner, S.P. et al. Fast nanopore sequencing data analysis with SLOW5. Nat Biotechnol 40, 1026-1029 (2022). https://doi.org/10.1038/s41587-021-01147-4

```
@article{gamaarachchi2022fast,
  title={Fast nanopore sequencing data analysis with SLOW5},
  author={Gamaarachchi, Hasindu and Samarakoon, Hiruna and Jenner, Sasha P and Ferguson, James M and Amos, Timothy G and Hammond, Jillian M and Saadat, Hassaan and Smith, Martin A and Parameswaran, Sri and Deveson, Ira W},
  journal={Nature biotechnology},
  pages={1--4},
  year={2022},
  publisher={Nature Publishing Group}
}
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hasindu2008/slow5lib",
    "name": "pyslow5",
    "maintainer": "Hasindu Gamaarachchi",
    "docs_url": null,
    "requires_python": ">=3.4.3",
    "maintainer_email": "hasindu2008@gmail.com",
    "keywords": "nanopore,slow5,signal",
    "author": "Hasindu Gamaarachchi, Sasha Jenner, James Ferguson",
    "author_email": "hasindu2008@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/96/80/ef430871b57cb97e2ab2449140fd00deec6cd9e8c71249676d561a45862a/pyslow5-1.1.0.tar.gz",
    "platform": null,
    "description": "# pyslow5 python library\n\nThe slow5 python library (pyslow5) allows a user to read and write slow5/blow5 files.\n\n## Installation\n\nInitial setup and example info for environment\n\n###### slow5lib needs python3.4.2 or higher.\n\nIf you only want to use the python library, then you can simply install using pip\n\nUsing a virtual environment (see below if you need to install python)\n\n#### Optional zstd compression\n\nYou can optionally enable [*zstd* compression](https://facebook.github.io/zstd) support when building *slow5lib/pyslow5*. This requires __zstd 1.3 or higher development libraries__ installed on your system:\n\n```sh\nOn Debian/Ubuntu : sudo apt-get libzstd1-dev\nOn Fedora/CentOS : sudo yum libzstd-devel\nOn OS X : brew install zstd\n```\n\nBLOW5 files compressed with *zstd* offer smaller file size and better performance compared to the default *zlib*. However, *zlib* runtime library is available by default on almost all distributions unlike *zstd* and thus files compressed with *zlib* will be more 'portable'.\n\n### Install from pypi\n\n```bash\npython3 -m venv path/to/slow5libvenv\nsource path/to/slow5libvenv/bin/activate\npython3 -m pip install --upgrade pip\n\n# do this separately, after the libs above\n# zlib only build\npython3 -m pip install pyslow5\n\n# for zstd build, run the following\nexport PYSLOW5_ZSTD=1\npython3 -m pip install pyslow5\n```\n\n### Dev install\n\n```bash\n# If your native python3 meets this requirement, you can use that, or use a\n# specific version installed with deadsnakes below. If you install with deadsnakes,\n# you will need to call that specific python, such as python3.8 or python3.9,\n# in all the following commands until you create a virtual environment with venv.\n# Then once activated, you can just use python3.\n\n# To install a specific version of python, the deadsnakes ppa is a good place to start\n# This is an example for installing python3.8\n# you can then call that specific python version\n# > python3.8 -m pip --version\nsudo add-apt-repository ppa:deadsnakes/ppa\nsudo apt-get update\nsudo apt install python3.8 python3.8-dev python3.8-venv\n\n\n# get zlib1g-dev\nsudo apt-get update && sudo apt-get install -y zlib1g-dev\n\n# Check with\npython3 --version\n\n# You will also need the python headers if you don't already have them installed.\n\nsudo apt-get install python3-dev\n```\n\nBuilding and installing the python library.\n\n```bash\npython3 -m venv /path/to/slow5libvenv\nsource /path/to/slow5libvenv/bin/activate\npython3 -m pip install --upgrade pip\n\ngit clone git@github.com:hasindu2008/slow5lib.git\ncd slow5lib\n\n# New build method to work with setuptools deprication\npython3 -m pip install .\n\n# This should not require sudo if using a python virtual environment/venv\n# confirm installation, and find pyslow5==<version>\npython3 -m pip freeze\n\n# Ensure slow5 library is working by running the basic tests\npython3 ./python/example.py\n\n\n# To Remove the library\npython3 -m pip uninstall pyslow5\n\n\n\n# Legacy build methods - not recommended\n# CHOOSE A OR B:\n# (B is the cleanest method)\n# |=======================================================================|\n# |A. Install with pip if wheel is present, otherwise it uses setuptools  |\n    python3 -m pip install . --use-feature=in-tree-build\n# |=======================================================================|\n# |B. Or build and install manually with setup.py                         |\n# |build the package                                                      |\n    python3 setup.py build\n# |If all went well, install the package                                  |\n    python3 setup.py install\n# |=======================================================================|\n\n```\n\n## Usage\n\n### Reading/writing a file\n\n#### `Open(FILE, mode, rec_press=\"zlib\", sig_press=\"svb-zd\", DEBUG=0)`:\n\nThe pyslow5 library has one main Class, `pyslow5.Open` which opens a slow5/blow5 (slow5 for easy reference) file for reading/writing.\n\n`FILE`: the file or filepath of the slow5 file to open\n`mode`: mode in which to open the file.\n+ `r`= read only\n+ `w`= write/overwrite\n+ `a`= append\n\nThis is designed to mimic Python's native Open() to help users remember the syntax\n\nTo set the record and signal compression methods, use the following `rec_press` and `sig_press` optional args, however these are only used with `mode='w'`. Any append will use whatever is already set in the file.\n\nCompression Options:\n\n`rec_press`:\n- \"none\"\n- \"zlib\" [default]\n- \"zstd\" [requires `export PYSLOW5_ZSTD=1` when building]\n\n`sig_press`:\n- \"none\"\n- \"svb-zd\" [default]\n\nExample:\n\n```python\nimport pyslow5\n\n# open file\ns5 = pyslow5.Open('examples/example.slow5','r')\n```\n\nWhen opening a slow5 file for the first time, and index will be created and saved in the same directory as the file being read. This index will then be loaded. For files that already have an index, that index will be loaded.\n\n#### `get_read_ids()`:\n\nreturns a list and total number of reads from the index.\nIf there is no index, it creates one first.\n\nExample:\n\n```python\nread_ids, num_reads = s5.get_read_ids()\n\nprint(read_ids)\nprint(\"number of reads: {}\".format(num_reads))\n```\n\n#### `seq_reads(pA=False, aux=None)`:\n\nAccess all reads sequentially in an opened slow5.\n+ If readID is not found, `None` is returned.\n+ pA = Bool for converting signal to picoamps.\n+ aux = `str` '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, `None` if `<attr_name>` not found\n+ returns `dict` = dictionary of main fields for read_id, with any aux fields added\n\nExample:\n\n```python\n# create generator\nreads = s5.seq_reads()\n\n# print all readIDs\nfor read in reads:\n    print(read['read_id'])\n\n# or use directly in a for loop\nfor read in s5.seq_reads(pA=True, aux='all'):\n    print(\"read_id:\", read['read_id'])\n    print(\"read_group:\", read['read_group'])\n    print(\"digitisation:\", read['digitisation'])\n    print(\"offset:\", read['offset'])\n    print(\"range:\", read['range'])\n    print(\"sampling_rate:\", read['sampling_rate'])\n    print(\"len_raw_signal:\", read['len_raw_signal'])\n    print(\"signal:\", read['signal'][:10])\n    print(\"================================\")\n```\n\n\n#### `seq_reads_multi(threads=4, batchsize=4096, pA=False, aux=None)`:\n\nAccess all reads sequentially in an opened slow5, using multiple threads.\n+ If readID is not found, `None` is returned.\n+ threads = number of threads to use in C backend.\n+ batchsize = number of reads to fetch at a time. Higher numbers use more ram, but is more efficient with more threads.\n+ pA = Bool for converting signal to picoamps.\n+ aux = `str` '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, `None` if `<attr_name>` not found\n+ returns `dict` = dictionary of main fields for read_id, with any aux fields added\n\nExample:\n\n```python\n# create generator\nreads = s5.seq_reads_multi(threads=2, batchsize=3)\n\n# print all readIDs\nfor read in reads:\n    print(read['read_id'])\n\n# or use directly in a for loop\nfor read in s5.seq_reads_multi(threads=2, batchsize=3, pA=True, aux='all'):\n    print(\"read_id:\", read['read_id'])\n    print(\"read_group:\", read['read_group'])\n    print(\"digitisation:\", read['digitisation'])\n    print(\"offset:\", read['offset'])\n    print(\"range:\", read['range'])\n    print(\"sampling_rate:\", read['sampling_rate'])\n    print(\"len_raw_signal:\", read['len_raw_signal'])\n    print(\"signal:\", read['signal'][:10])\n    print(\"================================\")\n```\n\n#### `get_read(readID, pA=False, aux=None)`:\n\nAccess a specific read using a unique readID. This is a ranom access method, using the index.\n+ If readID is not found, `None` is returned.\n+ pA = Bool for converting signal to picoamps.\n+ aux = `str` '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, `None` if `<attr_name>` not found\n+ returns `dict` = dictionary of main fields for read_id, with any aux fields added\n\nExample:\n\n```python\nreadID = \"r1\"\nread = s5.get_read(readID, pA=True, aux=[\"read_number\", \"start_mux\"])\nif read is not None:\n    print(\"read_id:\", read['read_id'])\n    print(\"len_raw_signal:\", read['len_raw_signal'])\n```\n\n\n#### `get_read_list(read_list, pA=False, aux=None)`:\n\nAccess a list of specific reads using a list `read_list` of unique readIDs. This is a random access method using the index. If an index does not exist, it will create one first.\n+ If readID is not found, `None` is returned.\n+ pA = Bool for converting signal to picoamps.\n+ aux = `str` '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, `None` if `<attr_name>` not found\n+ returns `dict` = dictionary of main fields for read_id, with any aux fields added\n\nExample:\n\n```python\nread_list = [\"r1\", \"r3\", \"null_read\", \"r5\", \"r2\", \"r1\"]\nselected_reads = s5.get_read_list(read_list)\nfor r, read in zip(read_list,selected_reads):\n    if read is not None:\n        print(r, read['read_id'])\n    else:\n        print(r, \"read not found\")\n```\n\n\n#### `get_read_list_multi(read_list, threads=4, batchsize=100, pA=False, aux=None):`:\n\nAccess a list of specific reads using a list `read_list` of unique readIDs using multiple threads. This is a random access method using the index. If an index does not exist, it will create one first.\n+ If readID is not found, `None` is returned.\n+ threads = number of threads to use in C backend\n+ batchsize = number of reads to fetch at a time. Higher numbers use more ram, but is more efficient with more threads.\n+ pA = Bool for converting signal to picoamps.\n+ aux = `str` '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, `None` if `<attr_name>` not found\n+ returns `dict` = dictionary of main fields for read_id, with any aux fields added\nExample:\n\n```python\nread_list = [\"r1\", \"r3\", \"null_read\", \"r5\", \"r2\", \"r1\"]\nselected_reads = s5.get_read_list_multi(read_list, threads=2, batchsize=3)\nfor r, read in zip(read_list, selected_reads):\n    if read is not None:\n        print(r, read['read_id'])\n    else:\n        print(r, \"read not found\")\n```\n\n#### `get_num_read_groups()`:\n**NEW: from version 1.1.0+**\n\nReturn an int for the number of read_groups present in file\n\n#### `get_header_names()`:\n\nReturns a list containing the uninon of header names from all read_groups\n\n#### `get_header_value(attr, read_group=0)`:\n\nReturns a `str` of the value of a header attribute (`attr`) for a particular read_group.\nReturns `None` if value can't be found\n\n#### `get_all_headers(read_group=0)`:\n\nReturns a dictionary with all header attributes and values for a particular read_group\nIf there are values present for one read_group, and not for another, the attribute will still be returned for the read_group without, but with a value of `None`.\n\n#### `get_aux_names()`:\n\nReturns an ordered list of auxiliary attribute names. (same order as get_aux_types())\n\nThis is used for understanding which auxiliary attributes are available within the slow5 file, and providing selections to the `aux` keyword argument in the above functoions\n\n#### `get_aux_types()`:\n\nReturns an ordered list of auxiliary attribute types (same order as get_aux_names())\n\nThis can mostly be ignored, but will be used in error tracing in the future, as auxiliary field requests have multiple types, each with their own calls, and not all are used. It could be the case a call for an auxiliary filed fails, and knowing which type the field is requesting is very helpful in understanding which function in C is being called, that could be causing the error.\n\n#### `get_aux_enum_labels(label)`:\n\nReturns an ordered list representing the values in the enum struct in the type header.\n\nThe value in the read can then be used to access the labels as an index to the list.\n\nExample:\n\n```python\ns5 = slow5.Open(file,'w')\nend_reason_labels = s5.get_aux_enum_labels('end_reason')\nprint(end_reason_labels)\n\n> ['unknown', 'partial', 'mux_change', 'unblock_mux_change', 'signal_positive', 'signal_negative']\n# or from newer datsets\n> [\"unknown\", \"mux_change\", \"unblock_mux_change\", \"data_service_unblock_mux_change\", \"signal_positive\", \"signal_negative\"]\n\nreadID = \"r1\"\nread = s5.get_read(readID, aux='all')\ner_index = read['end_reason']\ner = end_reason_labels[er_index]\n\nprint(\"{}: {}\".format(er_index, er))\n\n> 4: signal_positive\n```\n\n### Writing a file\n\nTo write a file, `mode` in `Open()` must be set to `'w'` and when appending, `'a'`\n\n#### `get_empty_header(aux=False)`:\n\nReturns a dictionary containing all known header attributes with their values set to `None`.\n\nUser can modify each value, and add or remove attributes to be used has header items.\nAll values end up stored as strings, and anything left as `None` will be skipped.\nTo write header, see `write_header()`\n\nIf `aux=True`, an ordered list of strings for the enum `end_reason` will be returned.\nThis can be modified depending on the end reason.\n\nExample:\n\n```python\ns5 = slow5.Open(file,'w')\nheader = s5.get_empty_header()\n```\n\n`end_reason` enum example\n\n```python\ns5 = slow5.Open(file, w)\nheader, end_reason_labels = s5.get_empty_header(aux=True)\n```\n\n#### `write_header(header, read_group=0, end_reason_labels=None)`:\n\nWrite header to file\n\n+ `header` = populated dictionary from `get_empty_header()`\n+ read_group = read group integer for when multiple runs are written to the same slow5 file\n+ end_reason_labels = ordered list used for end_reason enum\n+ returns 0 on success, <0 on error with error code\n\nYou must write `read_group=0` (default) first before writing any other read_groups, and it is advised to write read_groups in sequential order.\n\nExample:\n\n```python\n# Get some empty headers\nheader = s5.get_empty_header()\nheader2 = s5.get_empty_header()\n\n# Populate headers with some test data\ncounter = 0\nfor i in header:\n    header[i] = \"test_{}\".format(counter)\n    counter += 1\n\nfor i in header2:\n    header2[i] = \"test_{}\".format(counter)\n    counter += 1\n\n# Write first read group\nret = s5.write_header(header)\nprint(\"ret: write_header(): {}\".format(ret))\n# Write second read group, etc\nret = s5.write_header(header2, read_group=1)\nprint(\"ret: write_header(): {}\".format(ret))\n```\n\n`end_reason` example:\n\n```python\n# Get some empty headers\nheader, end_reason_labels = s5.get_empty_header(aux=True)\n\n# Populate headers with some test data\ncounter = 0\nfor i in header:\n    header[i] = \"test_{}\".format(counter)\n    counter += 1\n\n# Write first read group\nret = s5.write_header(header, end_reason_labels=end_reason_labels)\nprint(\"ret: write_header(): {}\".format(ret))\n```\n\n#### `get_empty_record(aux=False)`:\n\nGet empty read record for populating with data. Use with `write_record()`\n\n+ aux = Bool for returning empty aux dictionary as well as read dictionary\n+ returns a single read dictionary or a read and aux dictionary depending on aux flag\n\nExample:\n```python\n# open some file to read. We will copy the data then write it\n# including aux fields\ns5_read = slow5.Open(read_file,'r')\nreads = s5_read.seq_reads(aux='all')\n\n# For each read in s5_read...\nfor read in reads:\n    # get an empty record and aux dictionary\n    record, aux = s5.get_empty_record(aux=True)\n    # for each field in read...\n    for i in read:\n        # if the field is in the record dictionary...\n        if i in record:\n            # copy the value over...\n            record[i] = read[i]\n        do same for aux dictionary\n        if i in aux:\n            aux[i] = read[i]\n    # write the record\n    ret = s5.write_record(record, aux)\n    print(\"ret: write_record(): {}\".format(ret))\n```\n\n#### `write_record(record, aux=None)`:\n\nWrite a record and optional aux fields.\n\n+ record = a populated dictionary from `get_empty_record()`\n+ aux = an empty aux record returned by `get_empty_record(aux=True)`\n+ returns 0 on success and -1 on error/failure\n\nExample:\n\n```python\n\nrecord, aux = s5.get_empty_record(aux=True)\n# populate record, aux dictionaries\n#....\n# Write record\nret = s5.write_record(record, aux)\nprint(\"ret: write_record(): {}\".format(ret))\n```\n\n\n#### `write_record_batch(records, threads=4, batchsize=4096, aux=None)`:\n\nWrite a record and optional aux fields, using multiple threads\n\n+ records = a dictionary of dictionaries where each entry is a populated form of `get_empty_record()` with the key of each being the read['read_id'].\n+ threads = number of threads to use in the C backend.\n+ batchsize = number of reads to write at a time. If parsing 1000 records, with batchsize=250 and threads=4, 4 threads will be spawned 4 times to write 250 records to the file before returning\n+ aux = an empty aux record returned by `get_empty_record(aux=True)`\n+ returns 0 on success and -1 on error/failure\n\nExample:\n\n```python\n\nrecord, aux = s5.get_empty_record(aux=True)\n# populate record, aux\n#....\nrecords[record['read_id']] = record\nauxs[record['read_id']] = aux\n# Write record\nret = s5.write_record_batch(records, threads=2, batchsize=3, aux=auxs)\nprint(\"ret: write_record(): {}\".format(ret))\n```\n\n#### `close()`:\n\nCloses a record open for writing or appending, and writes an End Of File (EOF) flag.\n\nIf not explicitly closed, when the `s5` object goes out of context in python, it will also trigger a close to attempt to avoid having a missing EOF.\n\nPlease call this when you are finished writing a file.\n\nExample:\n\n```python\ns5 = slow5.Open(file,'w')\n\n# do some writing....\n\n# Write's EOF and closes file\ns5.close()\n```\n\n## Citation\n\nPlease cite the following in your publications when using *slow5lib/pyslow5*:\n\n> Gamaarachchi, H., Samarakoon, H., Jenner, S.P. et al. Fast nanopore sequencing data analysis with SLOW5. Nat Biotechnol 40, 1026-1029 (2022). https://doi.org/10.1038/s41587-021-01147-4\n\n```\n@article{gamaarachchi2022fast,\n  title={Fast nanopore sequencing data analysis with SLOW5},\n  author={Gamaarachchi, Hasindu and Samarakoon, Hiruna and Jenner, Sasha P and Ferguson, James M and Amos, Timothy G and Hammond, Jillian M and Saadat, Hassaan and Smith, Martin A and Parameswaran, Sri and Deveson, Ira W},\n  journal={Nature biotechnology},\n  pages={1--4},\n  year={2022},\n  publisher={Nature Publishing Group}\n}\n```",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "slow5lib python bindings",
    "version": "1.1.0",
    "project_urls": {
        "Homepage": "https://github.com/hasindu2008/slow5lib"
    },
    "split_keywords": [
        "nanopore",
        "slow5",
        "signal"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9680ef430871b57cb97e2ab2449140fd00deec6cd9e8c71249676d561a45862a",
                "md5": "b2cce5ae9d781c55e00cc4b56b236de0",
                "sha256": "2926e13dbf8b1360e7628c32ebf1ad71133ed5bece548f3837b08cd5b7d79811"
            },
            "downloads": -1,
            "filename": "pyslow5-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "b2cce5ae9d781c55e00cc4b56b236de0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.4.3",
            "size": 550378,
            "upload_time": "2023-08-12T08:13:21",
            "upload_time_iso_8601": "2023-08-12T08:13:21.510523Z",
            "url": "https://files.pythonhosted.org/packages/96/80/ef430871b57cb97e2ab2449140fd00deec6cd9e8c71249676d561a45862a/pyslow5-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-12 08:13:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hasindu2008",
    "github_project": "slow5lib",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pyslow5"
}

Hasindu Gamaarachchi, Sasha Jenner, James Ferguson