Name | recode JSON |
Version |
0.1.30
JSON |
| download |
home_page | https://github.com/otosense/recode |
Summary | Tools to make codecs for time-series serialization and deserialization. |
upload_time | 2023-10-27 08:12:03 |
maintainer | |
docs_url | None |
author | OtoSense |
requires_python | |
license | apache-2.0 |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# recode
Make codecs for fixed size structured chunks serialization and deserialization of
sequences, tabular data, and time-series.
To install: ```pip install recode```
[Docs here](https://otosense.github.io/recode/module_docs/recode.html)
Make codecs for fixed size structured chunks serialization and deserialization of
sequences, tabular data, and time-series.
The easiest and bigest bang for your buck is ``mk_codec``
```python
>>> from recode import mk_codec
>>> encoder, decoder = mk_codec()
```
``encoder`` will encode a list (or any iterable) of numbers into bytes
```python
>>> b = encoder([0, -3, 3.14])
>>> b
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x08\xc0\x1f\x85\xebQ\xb8\x1e\t@'
```
``decoder`` will decode those bytes to get you back your numbers
```python
>>> decoder(b)
[0.0, -3.0, 3.14]
```
There's only really one argument you need to know about in ``mk_codec``.
The first argument, called `chk_format`, which is a string of characters from
the "Format" column of the python
[format characters](https://docs.python.org/3/library/struct.html#format-characters)
The one we've just been through is in fact
```python
>>> encoder, decoder = mk_codec('d')
```
That is, it will expect that your data is a list of numbers, and they'll be encoded
with the 'd' format character, that is 8-bytes doubles.
That default is good because it gives you a lot of room, but if you knew that you
would only be dealing with 2-byte integers (as in most WAV audio waveforms),
you would have chosen `h`:
```python
>>> encoder, decoder = mk_codec('h')
```
What about those channels?
Well, some times you need to encode/decode multi-channel streams, such as:
```python
>>> multi_channel_stream = [[3, -1], [4, -1], [5, -9]]
```
Say, for example, if you were dealing with stereo waveform
(with the standard PCM_16 format), you'd do it this way:
```python
>>> encoder, decoder = mk_codec('hh')
>>> pcm_bytes = encoder(multi_channel_stream)
>>> pcm_bytes
b'\x03\x00\xff\xff\x04\x00\xff\xff\x05\x00\xf7\xff'
>>> decoder(pcm_bytes)
[(3, -1), (4, -1), (5, -9)]
```
The `n_channels` and `chk_size_bytes` arguments are there if you want to assert
that your number of channels and chunk size are what you expect.
Again, these are just for verification, because we know how easy it is to
misspecify the `chk_format`, and how hard it can be to notice that we did.
It is advised to use these in any production code, for the sanity of everyone!
```python
>>> mk_codec('hhh', n_channels=2)
Traceback (most recent call last):
...
AssertionError: You said there'd be 2 channels, but I inferred 3
>>> mk_codec('hhh', chk_size_bytes=3)
Traceback (most recent call last):
...
AssertionError: The given chk_size_bytes 3 did not match the inferred (from chk_format) 6
```
Finally, so far we've done it this way:
```python
>>> encoder, decoder = mk_codec('hHifd')
```
But see that what's actually returned is a NAMED tuple, which means that you can
can also get one object that will have `.encode` and `.decode` properties:
```python
>>> codec = mk_codec('hHifd')
>>> to_encode = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]
>>> encoded = codec.encode(to_encode)
>>> decoded = codec.decode(encoded)
>>> decoded
[(1, 2, 3, 4.0, 5.0), (6, 7, 8, 9.0, 10.0)]
```
And you can checkout the properties of your encoder and decoder (they
should be the same)
```python
>>> codec.encode.chk_format
'hHifd'
>>> codec.encode.n_channels
5
>>> codec.encode.chk_size_bytes
24
```
# Further functionality and under-the-hood peeps
This section shows various examples of recode and it how it can be used with:
- Single channel numerical streams
- Multi-channel numerical streams
- DataFrames
- Iterators
```python
from recode import (ChunkedEncoder,
ChunkedDecoder,
MetaEncoder,
MetaDecoder,
IterativeDecoder,
StructCodecSpecs,
specs_from_frames,
frame_to_meta,
meta_to_frame)
```
## Quick run through
First define the frame you want to encode
```python
frame = [1,2,3]
```
Next define the specifications for that encoding
```python
specs = StructCodecSpecs(chk_format='h')
specs
```
StructCodecSpecs(chk_format='h', n_channels=1, chk_size_bytes=2)
Next define an encoder and encode your frame to bytes
```python
encoder = ChunkedEncoder(frame_to_chk=specs.frame_to_chk)
b = encoder(frame)
b
```
b'\x01\x00\x02\x00\x03\x00'
Once you need your original frame again, define a decoder and decode your frame
```python
decoder = ChunkedDecoder(specs.chk_to_frame)
decoded_frames = decoder(b)
decoded_frames
```
[1, 2, 3]
## Step-by-step explanation
### Define your StructCodecSpecs
StructCodecSpecs is used to define the specs for making codecs for fixed size structured chunks serialization and deserialization of sequences, tabular data, and time-series. This definition is based on format strings of the [python struct module](https://docs.python.org/3/library/struct.html#format-strings).
There are two ways to define StructCodecSpecs, the first being to explicitly
define it using the StructCodecSpecs class.
StructCodecSpecs takes three arguments: `chk_format`, `n_channels`,
and `chk_size_bytes`.
Only `chk_format` is required,
as `n_channels` and `chk_size_bytes` can be determined based on `chk_format`.
If `n_channels` or `chk_size_bytes` are given, they will be used to
assert that the values inferred from `chk_format` match.
If this is not the case, then do not provide an argument for `n_channels`
and instead pass a string with the format character matching the data type for
each channel in the frame to `chk_format`.
For example if the first channel
contains integers and the second contains floats, then `chk_format = 'hd'`.
```python
frame = [1,2,3]
specs = StructCodecSpecs(chk_format='h')
specs
```
StructCodecSpecs(chk_format='h', n_channels=1, chk_size_bytes=2)
The second way to define StructCodecSpecs is to use `specs_from_frames` which will implictly define StructCodecSpecs based on the frame that is going to be encoded/decoded. This function will return a tuple containing a reconstituted version of the iterator given as an input, and the defined StructCodecSpecs. The first element of the tuple can be ignored if the frame passed is not an iterator.
```python
_, specs = specs_from_frames(frame)
specs
```
StructCodecSpecs(chk_format='h', n_channels=1, chk_size_bytes=2)
If frame is an iterator, then redefine frame as the first argument of the tuple so the first element of frame is not lost for encoding.
```python
frame = iter([[1,2], [3,4]])
frame, specs = specs_from_frames(frame)
specs, list(frame)
```
(StructCodecSpecs(chk_format='hh', n_channels=2, chk_size_bytes=4),
[[1, 2], [3, 4]])
### Define your Encoder
Your Encoder will allow you to, you guessed it, encode your frames! There are two Encoders currently defined in recode: `ChunkedEncoder` and `MetaEncoder`.
`ChunkedEncoder` should be your goto encoder for sequences, while `MetaEncoder` works best for tabular data (currently must be in the format of list of dicts).
```python
frame = [1,2,3]
_, specs = specs_from_frames(frame)
encoder = ChunkedEncoder(frame_to_chk=specs.frame_to_chk)
b = encoder(frame)
b
```
b'\x01\x00\x02\x00\x03\x00'
When using a `MetaEncoder`, an extra argument named frame_to_meta is required, which can be easily imported from recode!
```python
frame = [{'foo': 1, 'bar': 1}, {'foo': 2, 'bar': 2}, {'foo': 3, 'bar': 4}]
_, specs = specs_from_frames(frame)
encoder = MetaEncoder(frame_to_chk=specs.frame_to_chk, frame_to_meta=frame_to_meta)
b = encoder(frame)
b
```
b'\x07\x00foo.bar\x01\x00\x01\x00\x02\x00\x02\x00\x03\x00\x04\x00'
### Define your Decoder
Next up your Decoder will allow you to decode your encoded bytes. There are three Encoders currently defined in recode: `ChunkedDecoder`, `IterativeDecoder`, and `MetaDecoder`.
Either `ChunkedDecoder` or `IterativeDecoder` will work well for sequences, with the only difference being that `IterativeDecoder` returns an iterator of decoded chunks while `ChunkedDecoder` returns the whole list of decoded chunks. `MetaDecoder` works best for tabular data encoded with `MetaEncoder`.
```python
frame = [1,2,3]
_, specs = specs_from_frames(frame)
encoder = ChunkedEncoder(frame_to_chk=specs.frame_to_chk)
decoder = ChunkedDecoder(specs.chk_to_frame)
b = encoder(frame)
decoder(b)
```
[1, 2, 3]
As is shown in the following example, an `IterativeDecoder` will return an unpack_iterator.
```python
frame = [[1,1],[2,2]]
_, specs = specs_from_frames(frame)
encoder = ChunkedEncoder(frame_to_chk=specs.frame_to_chk)
decoder = IterativeDecoder(chk_to_frame=specs.chk_to_frame)
b = encoder(frame)
iter_frames = decoder(b)
print(type(iter_frames))
next(iter_frames), next(iter_frames)
```
<class 'unpack_iterator'>
((1, 1), (2, 2))
When using a `MetaDecoder`, an extra argument named meta_to_frame is required, which can be easily imported from recode!
```python
frame = [{'foo': 1, 'bar': 1}, {'foo': 2, 'bar': 2}, {'foo': 3, 'bar': 4}]
_, specs = specs_from_frames(frame)
encoder = MetaEncoder(frame_to_chk=specs.frame_to_chk, frame_to_meta=frame_to_meta)
decoder = MetaDecoder(chk_to_frame=specs.chk_to_frame, meta_to_frame=meta_to_frame)
b = encoder(frame)
decoded_frames = decoder(b)
decoded_frames
```
[{'foo': 1, 'bar': 1}, {'foo': 2, 'bar': 2}, {'foo': 3, 'bar': 4}]
## Encoding a waveform
### Create a synthetic waveform using `hum`
```python
from hum.gen.sine_mix import freq_based_stationary_wf
import matplotlib.pyplot as plt
import numpy as np
DFLT_N_SAMPLES = 21 * 2048
DFLT_SR = 44100
```
```python
wf_mix = freq_based_stationary_wf(freqs=(200, 400, 600, 800), weights=None,
n_samples = DFLT_N_SAMPLES, sr = DFLT_SR)
plt.plot(wf_mix[:300]);
```
![png](notebooks/recode_demo_files/recode_demo_35_0.png)
```python
wf_mix
```
array([0. , 0.07114157, 0.14170624, ..., 0.49480205, 0.54853009,
0.5979443 ])
### Encode the wf
```python
specs = StructCodecSpecs('d')
encoder = ChunkedEncoder(frame_to_chk=specs.frame_to_chk)
decoder = ChunkedDecoder(chk_to_frame=specs.chk_to_frame)
b = encoder(wf_mix)
b[:100]
```
b'\x00\x00\x00\x00\x00\x00\x00\x00\x1b:\x11\x8dU6\xb2?\xb8!\xa2\x15n#\xc2?\xe6I\xedz\x15\x06\xcb?L;\xe6\xfah\xd8\xd1?}\x8c\x9b\x9a\xf5\x08\xd6?D\xcc\xf1\x897\x0c\xda?\xe3\xc5\xbeb/\xda\xdd?DE\xb6K\xb6\xb5\xe0?\x82|\xfcz\x90\\\xe2?\xae3:$\x99\xde\xe3?\x1e\x95\xbe\xef$9\xe5?\x81\xd7\xdc\xec'
### Decode and compare to wf_mix
```python
decoded_wf_mix = decoder(b)
plt.plot(decoded_wf_mix[:300]);
```
![png](notebooks/recode_demo_files/recode_demo_40_0.png)
```python
np.all(decoded_wf_mix == wf_mix)
```
True
## Encoding a pandas dataframe
```python
import pandas as pd
```
### Create/import your dataframe
```python
df = pd.DataFrame(data = [[1,2,3],[4,5,6]], columns = ['foo', 'bar', 'set'])
df
```
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>foo</th>
<th>bar</th>
<th>set</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1</td>
<td>2</td>
<td>3</td>
</tr>
<tr>
<th>1</th>
<td>4</td>
<td>5</td>
<td>6</td>
</tr>
</tbody>
</table>
</div>
### Prep the dataframe for encoding
```python
frame = df.to_dict('records')
frame
```
[{'foo': 1, 'bar': 2, 'set': 3}, {'foo': 4, 'bar': 5, 'set': 6}]
### Encode the list of dicts
```python
_, specs = specs_from_frames(frame)
encoder = MetaEncoder(frame_to_chk=specs.frame_to_chk, frame_to_meta=frame_to_meta)
decoder = MetaDecoder(chk_to_frame=specs.chk_to_frame, meta_to_frame=meta_to_frame)
b = encoder(frame)
b
```
b'\x0b\x00foo.bar.set\x01\x00\x02\x00\x03\x00\x04\x00\x05\x00\x06\x00'
```python
decoded_frame = decoder(b)
decoded_frame
```
[{'foo': 1, 'bar': 2, 'set': 3}, {'foo': 4, 'bar': 5, 'set': 6}]
```python
decoded_df = pd.DataFrame(decoded_frame)
print(np.all(decoded_df == df))
decoded_df
```
True
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>foo</th>
<th>bar</th>
<th>set</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1</td>
<td>2</td>
<td>3</td>
</tr>
<tr>
<th>1</th>
<td>4</td>
<td>5</td>
<td>6</td>
</tr>
</tbody>
</table>
</div>
## Miscellaneous information
### Byte order, Size, and Alignment
This table provides the characters associated with Byte order, Size, and Alignment. If one of these characters is not given as the first character of the format string, then `@` will be assumed. More information about Byte order, Size, and Alignment can be found [here](https://docs.python.org/3/library/struct.html#byte-order-size-and-alignment).
|Character|Byte order|Size|Alignment|
|---------|----------|----|---------|
|@|native|native|native|
|=|native|standard|none|
|<|little-endian|standard|none|
|>|big-endian|standard|none|
|!|network (=big-endian)|standard|none|
# Full Examples
## Single channel numerical stream
```python
from recode import StructCodecSpecs, ChunkedEncoder, ChunkedDecoder
specs = StructCodecSpecs(chk_format='h')
encoder = ChunkedEncoder(frame_to_chk=specs.frame_to_chk)
decoder = ChunkedDecoder(chk_to_frame=specs.chk_to_frame)
frames = [1, 2, 3]
b = encoder(frames)
assert b == b'\x01\x00\x02\x00\x03\x00'
decoded_frames = decoder(b)
assert decoded_frames == frames
```
## Multi-channel numerical stream
```python
from recode import StructCodecSpecs, ChunkedEncoder, ChunkedDecoder
specs = StructCodecSpecs(chk_format='@hh', n_channels = 2)
encoder = ChunkedEncoder(frame_to_chk=specs.frame_to_chk)
decoder = ChunkedDecoder(chk_to_frame=specs.chk_to_frame)
frames = [(1, 2), (3, 4), (5, 6)]
b = encoder(frames)
assert b == b'\x01\x00\x02\x00\x03\x00\x04\x00\x05\x00\x06\x00'
decoded_frames = decoder(b)
assert decoded_frames == frames
```
## Iterative decoder
```python
from recode import StructCodecSpecs, ChunkedEncoder, IterativeDecoder
specs = StructCodecSpecs(chk_format = 'hdhd')
encoder = ChunkedEncoder(frame_to_chk = specs.frame_to_chk)
decoder = IterativeDecoder(chk_to_frame = specs.chk_to_frame)
frames = [(1,1.1,1,1.1),(2,2.2,2,2.2),(3,3.3,3,3.3)]
b = encoder(frames)
iter_frames = decoder(b)
assert next(iter_frames) == frames[0]
next(iter_frames)
```
(2, 2.2, 2, 2.2)
## DataFrame (as list of dicts) using MetaEncoder/MetaDecoder
```python
from recode import StructCodecSpecs, MetaEncoder, MetaDecoder, frame_to_meta, meta_to_frame
data = [{'foo': 1.1, 'bar': 2.2}, {'foo': 513.23, 'bar': 456.1}, {'foo': 32.0, 'bar': 6.7}]
specs = StructCodecSpecs(chk_format='dd', n_channels = 2)
encoder = MetaEncoder(frame_to_chk = specs.frame_to_chk, frame_to_meta = frame_to_meta)
decoder = MetaDecoder(chk_to_frame = specs.chk_to_frame, meta_to_frame = meta_to_frame)
b = encoder(data)
assert decoder(b) == data
```
## Implicitly define codec specs based on frames
```python
from recode import specs_from_frames, ChunkedEncoder, ChunkedDecoder
frames = [1,2,3]
_, specs = specs_from_frames(frames)
encoder = ChunkedEncoder(frame_to_chk = specs.frame_to_chk)
decoder = ChunkedDecoder(chk_to_frame=specs.chk_to_frame)
b = encoder(frames)
assert b == b'\x01\x00\x02\x00\x03\x00'
decoded_frames = decoder(b)
decoded_frames
```
[1, 2, 3]
## Implicit definition of iterator
```python
from recode import specs_from_frames, ChunkedEncoder, IterativeDecoder
frames = iter([[1.1,2.2],[3.3,4.4]])
frames, specs = specs_from_frames(frames)
encoder = ChunkedEncoder(frame_to_chk = specs.frame_to_chk)
decoder = ChunkedDecoder(chk_to_frame=specs.chk_to_frame)
b = encoder(frames)
decoded_frames = list(decoder(b))
assert decoded_frames == [(1.1,2.2),(3.3,4.4)]
```
# More
## Example of recode functionality to read and write audio files
In the below example we can see that the functionality of reading and writing audio to bytes is replicated in `recode`.
An example of a waveform audio file is first created using `mk_wf`.
Then, that example wave form is converted to bytes using `BytesIO` and `soundfile.write`.
Then, that same example wave form is converted to bytes using `recode` functionality.
Finally it can be seen through the assertions that `soundfile`'s read/write and `recode`'s encode/decode provide
the same functionality for audio files (aside from the header in `soundfile` as a result of the .wav format).
```python
import soundfile as sf
from io import BytesIO
from enum import Enum
import numpy as np
from recode import ChunkedDecoder, ChunkToFrame, ChunkedEncoder, StructCodecSpecs
class Kind(Enum):
random = 'random'
increasing = 'increasing'
def mk_wf(n_samples=2048, kind: Kind=Kind.random, **kwargs):
if kind == Kind.random:
dtype_str = kwargs.get('num_type', 'int16')
if dtype_str.startswith('int'):
low = kwargs.get('low', int(-2**15))
high = kwargs.get('high', int(2**15-1))
wf = np.random.randint(low=low, high=high, size=n_samples, dtype=dtype_str)
else:
raise TypeError("Don't know how to handle this case")
else:
raise TypeError("Don't know how to handle this case")
return wf
wf = mk_wf(25)
sr = 10000
file = BytesIO()
sf.write(file, wf, sr, format = 'WAV')
file.seek(0)
b = file.read()
specs = StructCodecSpecs('h')
encoder = ChunkedEncoder(frame_to_chk=specs.frame_to_chk)
decoder = ChunkedDecoder(chk_size_bytes=specs.chk_size_bytes,
chk_to_frame=specs.chk_to_frame,
n_channels=specs.n_channels)
d = encoder(wf)
assert d == b[44:]
sf_read = sf.read(BytesIO(b), dtype='int16')
recode_read = decoder(d)
assert np.all(sf_read[0] == recode_read)
assert np.all(recode_read == wf)
assert np.all(sf_read[0] == wf)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/otosense/recode",
"name": "recode",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "OtoSense",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/3f/d1/b13049736e218144ecb3b798aa1679914e65b01d2e119c2ff2ebe430a89f/recode-0.1.30.tar.gz",
"platform": "any",
"description": "# recode\nMake codecs for fixed size structured chunks serialization and deserialization of\nsequences, tabular data, and time-series.\n\nTo install:\t```pip install recode```\n\n[Docs here](https://otosense.github.io/recode/module_docs/recode.html)\n\nMake codecs for fixed size structured chunks serialization and deserialization of\nsequences, tabular data, and time-series.\n\nThe easiest and bigest bang for your buck is ``mk_codec``\n\n```python\n>>> from recode import mk_codec\n>>> encoder, decoder = mk_codec()\n```\n\n``encoder`` will encode a list (or any iterable) of numbers into bytes\n\n```python\n>>> b = encoder([0, -3, 3.14])\n>>> b\nb'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x08\\xc0\\x1f\\x85\\xebQ\\xb8\\x1e\\t@'\n\n```\n\n``decoder`` will decode those bytes to get you back your numbers\n\n```python\n>>> decoder(b)\n[0.0, -3.0, 3.14]\n```\n\nThere's only really one argument you need to know about in ``mk_codec``.\nThe first argument, called `chk_format`, which is a string of characters from\nthe \"Format\" column of the python \n[format characters](https://docs.python.org/3/library/struct.html#format-characters)\n\nThe one we've just been through is in fact\n\n```python\n>>> encoder, decoder = mk_codec('d')\n```\n\nThat is, it will expect that your data is a list of numbers, and they'll be encoded\nwith the 'd' format character, that is 8-bytes doubles.\nThat default is good because it gives you a lot of room, but if you knew that you\nwould only be dealing with 2-byte integers (as in most WAV audio waveforms),\nyou would have chosen `h`:\n\n```python\n>>> encoder, decoder = mk_codec('h')\n```\n\nWhat about those channels?\nWell, some times you need to encode/decode multi-channel streams, such as:\n\n```python\n>>> multi_channel_stream = [[3, -1], [4, -1], [5, -9]]\n```\n\nSay, for example, if you were dealing with stereo waveform\n(with the standard PCM_16 format), you'd do it this way:\n\n```python\n>>> encoder, decoder = mk_codec('hh')\n>>> pcm_bytes = encoder(multi_channel_stream)\n>>> pcm_bytes\nb'\\x03\\x00\\xff\\xff\\x04\\x00\\xff\\xff\\x05\\x00\\xf7\\xff'\n>>> decoder(pcm_bytes)\n[(3, -1), (4, -1), (5, -9)]\n```\n\n\nThe `n_channels` and `chk_size_bytes` arguments are there if you want to assert\nthat your number of channels and chunk size are what you expect.\nAgain, these are just for verification, because we know how easy it is to\nmisspecify the `chk_format`, and how hard it can be to notice that we did.\n\nIt is advised to use these in any production code, for the sanity of everyone!\n\n```python\n>>> mk_codec('hhh', n_channels=2)\nTraceback (most recent call last):\n ...\nAssertionError: You said there'd be 2 channels, but I inferred 3\n>>> mk_codec('hhh', chk_size_bytes=3)\nTraceback (most recent call last):\n ...\nAssertionError: The given chk_size_bytes 3 did not match the inferred (from chk_format) 6\n```\n\nFinally, so far we've done it this way:\n\n\n```python\n>>> encoder, decoder = mk_codec('hHifd')\n```\n\nBut see that what's actually returned is a NAMED tuple, which means that you can\ncan also get one object that will have `.encode` and `.decode` properties:\n\n```python\n>>> codec = mk_codec('hHifd')\n>>> to_encode = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]\n>>> encoded = codec.encode(to_encode)\n>>> decoded = codec.decode(encoded)\n>>> decoded\n[(1, 2, 3, 4.0, 5.0), (6, 7, 8, 9.0, 10.0)]\n```\n\nAnd you can checkout the properties of your encoder and decoder (they\nshould be the same)\n\n```python\n>>> codec.encode.chk_format\n'hHifd'\n>>> codec.encode.n_channels\n5\n>>> codec.encode.chk_size_bytes\n24\n```\n\n\n# Further functionality and under-the-hood peeps\n\nThis section shows various examples of recode and it how it can be used with:\n- Single channel numerical streams\n- Multi-channel numerical streams\n- DataFrames\n- Iterators\n\n\n```python\nfrom recode import (ChunkedEncoder, \n ChunkedDecoder, \n MetaEncoder, \n MetaDecoder, \n IterativeDecoder, \n StructCodecSpecs, \n specs_from_frames,\n frame_to_meta,\n meta_to_frame)\n```\n\n## Quick run through\n\nFirst define the frame you want to encode\n\n\n```python\nframe = [1,2,3]\n```\n\nNext define the specifications for that encoding\n\n\n```python\nspecs = StructCodecSpecs(chk_format='h')\nspecs\n```\n\n\n\n\n StructCodecSpecs(chk_format='h', n_channels=1, chk_size_bytes=2)\n\n\n\nNext define an encoder and encode your frame to bytes\n\n\n```python\nencoder = ChunkedEncoder(frame_to_chk=specs.frame_to_chk)\nb = encoder(frame)\nb\n```\n\n\n\n\n b'\\x01\\x00\\x02\\x00\\x03\\x00'\n\n\n\nOnce you need your original frame again, define a decoder and decode your frame\n\n\n```python\ndecoder = ChunkedDecoder(specs.chk_to_frame)\ndecoded_frames = decoder(b)\ndecoded_frames\n```\n\n\n\n\n [1, 2, 3]\n\n\n\n## Step-by-step explanation\n\n### Define your StructCodecSpecs\n\nStructCodecSpecs is used to define the specs for making codecs for fixed size structured chunks serialization and deserialization of sequences, tabular data, and time-series. This definition is based on format strings of the [python struct module](https://docs.python.org/3/library/struct.html#format-strings).\n\nThere are two ways to define StructCodecSpecs, the first being to explicitly \ndefine it using the StructCodecSpecs class. \nStructCodecSpecs takes three arguments: `chk_format`, `n_channels`, \nand `chk_size_bytes`. \n\nOnly `chk_format` is required, \nas `n_channels` and `chk_size_bytes` can be determined based on `chk_format`. \nIf `n_channels` or `chk_size_bytes` are given, they will be used to \nassert that the values inferred from `chk_format` match.\n\nIf this is not the case, then do not provide an argument for `n_channels` \nand instead pass a string with the format character matching the data type for \neach channel in the frame to `chk_format`. \nFor example if the first channel \ncontains integers and the second contains floats, then `chk_format = 'hd'`.\n\n\n```python\nframe = [1,2,3]\nspecs = StructCodecSpecs(chk_format='h')\nspecs\n```\n\n\n\n\n StructCodecSpecs(chk_format='h', n_channels=1, chk_size_bytes=2)\n\n\n\nThe second way to define StructCodecSpecs is to use `specs_from_frames` which will implictly define StructCodecSpecs based on the frame that is going to be encoded/decoded. This function will return a tuple containing a reconstituted version of the iterator given as an input, and the defined StructCodecSpecs. The first element of the tuple can be ignored if the frame passed is not an iterator.\n\n\n```python\n_, specs = specs_from_frames(frame)\nspecs\n```\n\n\n\n\n StructCodecSpecs(chk_format='h', n_channels=1, chk_size_bytes=2)\n\n\n\nIf frame is an iterator, then redefine frame as the first argument of the tuple so the first element of frame is not lost for encoding.\n\n\n```python\nframe = iter([[1,2], [3,4]])\nframe, specs = specs_from_frames(frame)\nspecs, list(frame)\n```\n\n\n\n\n (StructCodecSpecs(chk_format='hh', n_channels=2, chk_size_bytes=4),\n [[1, 2], [3, 4]])\n\n\n\n### Define your Encoder\n\nYour Encoder will allow you to, you guessed it, encode your frames! There are two Encoders currently defined in recode: `ChunkedEncoder` and `MetaEncoder`.\n\n`ChunkedEncoder` should be your goto encoder for sequences, while `MetaEncoder` works best for tabular data (currently must be in the format of list of dicts).\n\n\n```python\nframe = [1,2,3]\n_, specs = specs_from_frames(frame)\nencoder = ChunkedEncoder(frame_to_chk=specs.frame_to_chk)\nb = encoder(frame)\nb\n```\n\n b'\\x01\\x00\\x02\\x00\\x03\\x00'\n\n\n\nWhen using a `MetaEncoder`, an extra argument named frame_to_meta is required, which can be easily imported from recode!\n\n\n```python\nframe = [{'foo': 1, 'bar': 1}, {'foo': 2, 'bar': 2}, {'foo': 3, 'bar': 4}]\n_, specs = specs_from_frames(frame)\nencoder = MetaEncoder(frame_to_chk=specs.frame_to_chk, frame_to_meta=frame_to_meta)\nb = encoder(frame)\nb\n```\n\n b'\\x07\\x00foo.bar\\x01\\x00\\x01\\x00\\x02\\x00\\x02\\x00\\x03\\x00\\x04\\x00'\n\n\n\n### Define your Decoder\n\nNext up your Decoder will allow you to decode your encoded bytes. There are three Encoders currently defined in recode: `ChunkedDecoder`, `IterativeDecoder`, and `MetaDecoder`.\n\nEither `ChunkedDecoder` or `IterativeDecoder` will work well for sequences, with the only difference being that `IterativeDecoder` returns an iterator of decoded chunks while `ChunkedDecoder` returns the whole list of decoded chunks. `MetaDecoder` works best for tabular data encoded with `MetaEncoder`. \n\n\n```python\nframe = [1,2,3]\n_, specs = specs_from_frames(frame)\nencoder = ChunkedEncoder(frame_to_chk=specs.frame_to_chk)\ndecoder = ChunkedDecoder(specs.chk_to_frame)\nb = encoder(frame)\ndecoder(b)\n```\n\n [1, 2, 3]\n\n\n\nAs is shown in the following example, an `IterativeDecoder` will return an unpack_iterator.\n\n\n```python\nframe = [[1,1],[2,2]]\n_, specs = specs_from_frames(frame)\nencoder = ChunkedEncoder(frame_to_chk=specs.frame_to_chk)\ndecoder = IterativeDecoder(chk_to_frame=specs.chk_to_frame)\nb = encoder(frame)\niter_frames = decoder(b)\nprint(type(iter_frames))\nnext(iter_frames), next(iter_frames)\n```\n\n <class 'unpack_iterator'>\n ((1, 1), (2, 2))\n\n\n\nWhen using a `MetaDecoder`, an extra argument named meta_to_frame is required, which can be easily imported from recode!\n\n\n```python\nframe = [{'foo': 1, 'bar': 1}, {'foo': 2, 'bar': 2}, {'foo': 3, 'bar': 4}]\n_, specs = specs_from_frames(frame)\nencoder = MetaEncoder(frame_to_chk=specs.frame_to_chk, frame_to_meta=frame_to_meta)\ndecoder = MetaDecoder(chk_to_frame=specs.chk_to_frame, meta_to_frame=meta_to_frame)\nb = encoder(frame)\ndecoded_frames = decoder(b)\ndecoded_frames\n```\n\n [{'foo': 1, 'bar': 1}, {'foo': 2, 'bar': 2}, {'foo': 3, 'bar': 4}]\n\n\n\n## Encoding a waveform\n\n### Create a synthetic waveform using `hum`\n\n\n```python\nfrom hum.gen.sine_mix import freq_based_stationary_wf\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nDFLT_N_SAMPLES = 21 * 2048\nDFLT_SR = 44100\n```\n\n\n```python\nwf_mix = freq_based_stationary_wf(freqs=(200, 400, 600, 800), weights=None,\n n_samples = DFLT_N_SAMPLES, sr = DFLT_SR)\nplt.plot(wf_mix[:300]);\n```\n\n\n \n![png](notebooks/recode_demo_files/recode_demo_35_0.png)\n \n\n\n\n```python\nwf_mix\n```\n\n\n\n\n array([0. , 0.07114157, 0.14170624, ..., 0.49480205, 0.54853009,\n 0.5979443 ])\n\n\n\n### Encode the wf\n\n\n```python\nspecs = StructCodecSpecs('d')\nencoder = ChunkedEncoder(frame_to_chk=specs.frame_to_chk)\ndecoder = ChunkedDecoder(chk_to_frame=specs.chk_to_frame)\nb = encoder(wf_mix)\nb[:100]\n```\n\n\n\n\n b'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x1b:\\x11\\x8dU6\\xb2?\\xb8!\\xa2\\x15n#\\xc2?\\xe6I\\xedz\\x15\\x06\\xcb?L;\\xe6\\xfah\\xd8\\xd1?}\\x8c\\x9b\\x9a\\xf5\\x08\\xd6?D\\xcc\\xf1\\x897\\x0c\\xda?\\xe3\\xc5\\xbeb/\\xda\\xdd?DE\\xb6K\\xb6\\xb5\\xe0?\\x82|\\xfcz\\x90\\\\\\xe2?\\xae3:$\\x99\\xde\\xe3?\\x1e\\x95\\xbe\\xef$9\\xe5?\\x81\\xd7\\xdc\\xec'\n\n\n\n### Decode and compare to wf_mix\n\n\n```python\ndecoded_wf_mix = decoder(b)\nplt.plot(decoded_wf_mix[:300]);\n```\n\n\n \n![png](notebooks/recode_demo_files/recode_demo_40_0.png)\n \n\n\n\n```python\nnp.all(decoded_wf_mix == wf_mix)\n```\n\n\n\n\n True\n\n\n\n## Encoding a pandas dataframe\n\n\n```python\nimport pandas as pd\n```\n\n### Create/import your dataframe\n\n\n```python\ndf = pd.DataFrame(data = [[1,2,3],[4,5,6]], columns = ['foo', 'bar', 'set'])\ndf\n```\n\n\n\n\n<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>foo</th>\n <th>bar</th>\n <th>set</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1</td>\n <td>2</td>\n <td>3</td>\n </tr>\n <tr>\n <th>1</th>\n <td>4</td>\n <td>5</td>\n <td>6</td>\n </tr>\n </tbody>\n</table>\n</div>\n\n\n\n### Prep the dataframe for encoding\n\n\n```python\nframe = df.to_dict('records')\nframe\n```\n\n\n\n\n [{'foo': 1, 'bar': 2, 'set': 3}, {'foo': 4, 'bar': 5, 'set': 6}]\n\n\n\n### Encode the list of dicts\n\n\n```python\n_, specs = specs_from_frames(frame)\nencoder = MetaEncoder(frame_to_chk=specs.frame_to_chk, frame_to_meta=frame_to_meta)\ndecoder = MetaDecoder(chk_to_frame=specs.chk_to_frame, meta_to_frame=meta_to_frame)\nb = encoder(frame)\nb\n```\n\n\n\n\n b'\\x0b\\x00foo.bar.set\\x01\\x00\\x02\\x00\\x03\\x00\\x04\\x00\\x05\\x00\\x06\\x00'\n\n\n\n\n```python\ndecoded_frame = decoder(b)\ndecoded_frame\n```\n\n\n\n\n [{'foo': 1, 'bar': 2, 'set': 3}, {'foo': 4, 'bar': 5, 'set': 6}]\n\n\n\n\n```python\ndecoded_df = pd.DataFrame(decoded_frame)\nprint(np.all(decoded_df == df))\ndecoded_df\n```\n\n True\n\n\n\n\n\n<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>foo</th>\n <th>bar</th>\n <th>set</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1</td>\n <td>2</td>\n <td>3</td>\n </tr>\n <tr>\n <th>1</th>\n <td>4</td>\n <td>5</td>\n <td>6</td>\n </tr>\n </tbody>\n</table>\n</div>\n\n\n\n## Miscellaneous information\n\n### Byte order, Size, and Alignment\n\nThis table provides the characters associated with Byte order, Size, and Alignment. If one of these characters is not given as the first character of the format string, then `@` will be assumed. More information about Byte order, Size, and Alignment can be found [here](https://docs.python.org/3/library/struct.html#byte-order-size-and-alignment).\n\n|Character|Byte order|Size|Alignment|\n|---------|----------|----|---------|\n|@|native|native|native|\n|=|native|standard|none|\n|<|little-endian|standard|none|\n|>|big-endian|standard|none|\n|!|network (=big-endian)|standard|none|\n\n\n# Full Examples\n\n## Single channel numerical stream\n\n```python\nfrom recode import StructCodecSpecs, ChunkedEncoder, ChunkedDecoder\nspecs = StructCodecSpecs(chk_format='h')\nencoder = ChunkedEncoder(frame_to_chk=specs.frame_to_chk)\ndecoder = ChunkedDecoder(chk_to_frame=specs.chk_to_frame)\nframes = [1, 2, 3]\nb = encoder(frames)\nassert b == b'\\x01\\x00\\x02\\x00\\x03\\x00'\ndecoded_frames = decoder(b)\nassert decoded_frames == frames\n```\n\n\n## Multi-channel numerical stream\n```python\nfrom recode import StructCodecSpecs, ChunkedEncoder, ChunkedDecoder\nspecs = StructCodecSpecs(chk_format='@hh', n_channels = 2)\nencoder = ChunkedEncoder(frame_to_chk=specs.frame_to_chk)\ndecoder = ChunkedDecoder(chk_to_frame=specs.chk_to_frame)\nframes = [(1, 2), (3, 4), (5, 6)]\nb = encoder(frames)\nassert b == b'\\x01\\x00\\x02\\x00\\x03\\x00\\x04\\x00\\x05\\x00\\x06\\x00'\ndecoded_frames = decoder(b)\nassert decoded_frames == frames\n```\n\n\n## Iterative decoder\n```python\nfrom recode import StructCodecSpecs, ChunkedEncoder, IterativeDecoder\nspecs = StructCodecSpecs(chk_format = 'hdhd')\nencoder = ChunkedEncoder(frame_to_chk = specs.frame_to_chk)\ndecoder = IterativeDecoder(chk_to_frame = specs.chk_to_frame)\nframes = [(1,1.1,1,1.1),(2,2.2,2,2.2),(3,3.3,3,3.3)]\nb = encoder(frames)\niter_frames = decoder(b)\nassert next(iter_frames) == frames[0]\nnext(iter_frames)\n``` \n\n\t(2, 2.2, 2, 2.2)\n\n\n\n## DataFrame (as list of dicts) using MetaEncoder/MetaDecoder\n```python\nfrom recode import StructCodecSpecs, MetaEncoder, MetaDecoder, frame_to_meta, meta_to_frame\ndata = [{'foo': 1.1, 'bar': 2.2}, {'foo': 513.23, 'bar': 456.1}, {'foo': 32.0, 'bar': 6.7}]\nspecs = StructCodecSpecs(chk_format='dd', n_channels = 2)\nencoder = MetaEncoder(frame_to_chk = specs.frame_to_chk, frame_to_meta = frame_to_meta)\ndecoder = MetaDecoder(chk_to_frame = specs.chk_to_frame, meta_to_frame = meta_to_frame)\nb = encoder(data)\nassert decoder(b) == data\n```\n\n\n## Implicitly define codec specs based on frames\n\n```python\nfrom recode import specs_from_frames, ChunkedEncoder, ChunkedDecoder\nframes = [1,2,3]\n_, specs = specs_from_frames(frames)\nencoder = ChunkedEncoder(frame_to_chk = specs.frame_to_chk)\ndecoder = ChunkedDecoder(chk_to_frame=specs.chk_to_frame)\nb = encoder(frames)\nassert b == b'\\x01\\x00\\x02\\x00\\x03\\x00'\ndecoded_frames = decoder(b)\ndecoded_frames\n```\n\t[1, 2, 3]\n\n\n## Implicit definition of iterator\n```python\nfrom recode import specs_from_frames, ChunkedEncoder, IterativeDecoder\nframes = iter([[1.1,2.2],[3.3,4.4]])\nframes, specs = specs_from_frames(frames)\nencoder = ChunkedEncoder(frame_to_chk = specs.frame_to_chk)\ndecoder = ChunkedDecoder(chk_to_frame=specs.chk_to_frame)\nb = encoder(frames)\ndecoded_frames = list(decoder(b))\nassert decoded_frames == [(1.1,2.2),(3.3,4.4)]\n```\n\n# More\n\n\n## Example of recode functionality to read and write audio files\n\nIn the below example we can see that the functionality of reading and writing audio to bytes is replicated in `recode`. \nAn example of a waveform audio file is first created using `mk_wf`. \nThen, that example wave form is converted to bytes using `BytesIO` and `soundfile.write`. \nThen, that same example wave form is converted to bytes using `recode` functionality. \nFinally it can be seen through the assertions that `soundfile`'s read/write and `recode`'s encode/decode provide \nthe same functionality for audio files (aside from the header in `soundfile` as a result of the .wav format).\n\n```python\nimport soundfile as sf\nfrom io import BytesIO\nfrom enum import Enum\nimport numpy as np\nfrom recode import ChunkedDecoder, ChunkToFrame, ChunkedEncoder, StructCodecSpecs\n\nclass Kind(Enum):\n random = 'random'\n increasing = 'increasing'\n\ndef mk_wf(n_samples=2048, kind: Kind=Kind.random, **kwargs):\n if kind == Kind.random:\n dtype_str = kwargs.get('num_type', 'int16')\n if dtype_str.startswith('int'):\n low = kwargs.get('low', int(-2**15))\n high = kwargs.get('high', int(2**15-1))\n wf = np.random.randint(low=low, high=high, size=n_samples, dtype=dtype_str)\n else:\n raise TypeError(\"Don't know how to handle this case\")\n else:\n raise TypeError(\"Don't know how to handle this case\")\n return wf\n\nwf = mk_wf(25)\nsr = 10000\n\nfile = BytesIO()\nsf.write(file, wf, sr, format = 'WAV')\nfile.seek(0)\nb = file.read()\n\nspecs = StructCodecSpecs('h')\nencoder = ChunkedEncoder(frame_to_chk=specs.frame_to_chk)\ndecoder = ChunkedDecoder(chk_size_bytes=specs.chk_size_bytes, \n chk_to_frame=specs.chk_to_frame, \n n_channels=specs.n_channels)\nd = encoder(wf)\n\nassert d == b[44:]\n\nsf_read = sf.read(BytesIO(b), dtype='int16')\nrecode_read = decoder(d)\n\nassert np.all(sf_read[0] == recode_read)\nassert np.all(recode_read == wf)\nassert np.all(sf_read[0] == wf)\n```",
"bugtrack_url": null,
"license": "apache-2.0",
"summary": "Tools to make codecs for time-series serialization and deserialization.",
"version": "0.1.30",
"project_urls": {
"Homepage": "https://github.com/otosense/recode"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3fd1b13049736e218144ecb3b798aa1679914e65b01d2e119c2ff2ebe430a89f",
"md5": "ee12669f27d35a1f30666928d7c131b8",
"sha256": "126b2439c046818a57b8e20c850dbf05ac018e5d4effbf8b8594d12909c45089"
},
"downloads": -1,
"filename": "recode-0.1.30.tar.gz",
"has_sig": false,
"md5_digest": "ee12669f27d35a1f30666928d7c131b8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 24004,
"upload_time": "2023-10-27T08:12:03",
"upload_time_iso_8601": "2023-10-27T08:12:03.835143Z",
"url": "https://files.pythonhosted.org/packages/3f/d1/b13049736e218144ecb3b798aa1679914e65b01d2e119c2ff2ebe430a89f/recode-0.1.30.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-27 08:12:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "otosense",
"github_project": "recode",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "recode"
}