rawutil

Name	rawutil JSON
Version	2.8.1 JSON
	download
home_page	https://github.com/Tyulis/rawutil
Summary	A pure-python module to read and write binary data
upload_time	2024-08-25 10:05:17
maintainer	None
docs_url	None
author	Tyulis
requires_python	>=3.4
license	MIT
keywords	structures struct binary bytes formats
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Rawutil

*A pure-python and lightweight module to read and write binary data*

## Introduction

Rawutil is a module aimed at reading and writing binary data in python in the same way as the built-in `struct` module, but with more features.
The rawutil's interface is thus compatible with `struct`, with a few small exceptions, and many things added.
It does not have any non-builtin dependency.

### What is already in struct

- Unpack and pack fixed structures from/to bytes (`pack`, `pack_into`, `unpack`, `unpack_from`, `iter_unpack`, `calcsize`)
- `Struct` objects that allow to parse one and for all a structure that may be used several times

### What is different compared to struct

- Some rarely-used format characters are not in rawutil (`N`, `P` and `p` are not available, `n` is used for a different purpose)
- There is no consideration for native size and alignment, thus the `@` characters simply applies system byte order with standard sizes and no alignment, just like `=`
- There are several differences in error handling that are described below

### What has been added to struct

- Reading and writing files and file-like objects
- New format characters, to handle padding, alignment, strings, ...
- Internal references in structures
- Loops in structures
- New features to handle variable byte order

## Usage

Rawutil exports more or less the same interface as `struct`. In all those functions, `structure` may be a simple format string or a `Struct` object.

### unpack

```python
unpack(structure, data, names=None, refdata=())
```
Unpacks the given `data` according to the `structure`, and returns the unpacked values as a list.

- `structure` is the structure of the data to unpack, as a format string or a `Struct` object
- `data` may be a bytes-like or a file-like object. If it is a file-like object, the data will be unpacked starting from the current position in the file, and will leave the cursor at the end of the data that has been read (effectively reading the data to unpack from the file).
- `names` may be a list of field names for a `namedtuple`, or a callable that takes all unpacked elements in order as arguments, like a `namedtuple` or a `dataclass`.
- `refdata` may be used to easily input external data into the structure, as `#n` references. This will be described in the References part below

Unlike `struct`, this function does not raises any error if the data is larger than the structure expected size.

Examples :

```python
>>> unpack("4B 3s 3s", b"\x01\x02\x03\x04foobar")
(1, 2, 3, 4, b"foo", b"bar")
>>> unpack("<4s #0I", b"ABCD\x10\x00\x00\x00\x20\x00\x00\x00", names=("string", "num1", "num2"), refdata=(2, ))
RawutilNameSpace(string=b'ABCD', num1=16, num2=32)
```

### unpack_from

```python
unpack_from(structure, data, offset=None, names=None, refdata=(), getptr=False)
```

Unpacks the given `data` according to the `structure` starting from the given `position`, and returns the unpacked values as a list

This function works exactly like `unpack`, with two more optional arguments :

- `offset` can be used to specify a starting position to read. In a file-like object, the cursor is moved to the given absolute `offset`, then the data to unpack is read and the cursor is left at the end of the data that has been read. If this parameter is not set, it works like `unpack` and reads from the current position
- `getptr` can be set to True to return the final position in the data, after the unpacked data. The function will then return `(values, end_position)`. If left to False, it works like `unpack` and only returns the values.

Examples :

```python
>>> unpack_from("<4s #0I", b"ABCD\x10\x00\x00\x00\x20\x00\x00\x00", names=("string", "num1", "num2"), refdata=(2, ))
RawutilNameSpace(string=b'ABCD', num1=16, num2=32)
>>> values, endpos = unpack_from("<2I", b"ABCD\x10\x00\x00\x00\x20\x00\x00\x00EFGH", offset=4, getptr=True)
>>> values
[16, 32]
>>> endpos
12
```

### iter_unpack

```python
iter_unpack(structure, data, names=None, refdata=())
```

Returns an iterator that will unpack according to the structure and return the values as a list at each iteration.
The data must be of a multiple of the structure’s length. If `names` is defined, each iteration will return a namedtuple, most like `unpack` and `unpack_from`. `refdata` also works the same.

This function is present mostly to ensure compatibility with `struct`. It is rather recommended to use iterators in structures, that are faster and offer much more control.

Examples :
```python
>>> for a, b, c in iter_unpack("3c", b"abcdefghijkl"):
...     print(a.decode("ascii"), b.decode("ascii"), c.decode("ascii"))
...
a b c
d e f
g h i
j k l
```

### pack

```python
pack(self, *data, refdata=())
```

Packs the given `data` in the binary format defined by `structure`, and returns the packed data as a `bytes` object.
`refdata` is still there to insert external data in the structure using the `#n` references, and is a named argument only.

Note that if the last element of `data` is a writable file-like object, the data will be written into it instead of being returned. This behaviour is deprecated and kept only for backwards-compatibility, to pack into a file you should rather use `pack_file`.

Examples :
```python
>>> pack("<2In", 10, 100, b"String")
b'\n\x00\x00\x00\n\x00\x00\x00String\x00'
>>> pack(">#0B #1I", 10, 100, 1000, 10000, 100000, refdata=(2, 3))
b"\nd\x00\x00\x03\xe8\x00\x00'\x10\x00\x01\x86\xa0"
>>> unpack(">2B3I", _)
[10, 100, 1000, 10000, 100000]
```

### pack_into

```python
pack_into(structure, buffer, offset, *data, refdata=())
```

Packs the given `data` into the given `buffer` at the given `offset` according to the given `structure`. Refdata still has the same usage as everywhere else.

- `buffer` must be a mutable bytes-like object (typically a `bytearray`). The data will be written directly into it at the given position
- `offset` specifies the position to write the data to. It is a required argument.

Examples :

```python
>>> b = bytearray(b"AB----GH")
>>> pack_into("4s", b, 2, b"CDEF")
>>> b
bytearray(b'ABCDEFGH')
```

### pack_file

```python
pack_file(structure, file, *data, position=None, refdata=())
```

Packs the given `data` into the given `file` according to the given `structure`. `refdata` is still there for the external references data.

- `file` can be any binary writable file-like object.
- `position` can be set to pack the data at a specific position in the file. If it is left to `None`, the data will be packed at the current position in the file. In either case, the cursor will end up at the end of the packed data.

Examples :

```python
>>> file = io.BytesIO(b"\x00\x00\x00\x00\x00\x00\x00\x00")
>>> rawutil.pack_file("2B", file, 60, 61)  # Writes at the current position (0)
>>> rawutil.pack_file("c", file, b"A")     # Writes at the current position (now 2)
>>> rawutil.pack_file("2c", file, b"y", b"z", position=6)  # Writes at the given position (6)
>>> file.seek(0)
>>> file.read()
b'<=A\x00\x00\x00yz'
```

### calcsize

```python
calcsize(structure, refdata=())
```

Returns the size of the data represented by the given `structure`.

This function is kept to ensure compatibility with `struct`.
However, rawutil structure are not always of a fixed length, as they use internal references and variable length formats.
Hence `calcsize` only works on fixed-length structures, thus structures that only use :

- Fixed-length format characters (basic types with set repeat count)
- External references (`#0` type references, if you provide their value in `refdata`)
- Iterators with fixed number of repeats (`2(…)` or `5[…]` will work)
- Alignments (structures with `a` and `|`). As long as everything else is fixed, alignments are too.

Trying to compute the size of a structure that includes any of those elements will raise an `FormatError` (basically, anything that depends on the data to read / write) :

- Variable-length format characters (namely `n` and `$`)
- `{…}` iterators, as they depend on the amount of data remaining.
- Internal references (any `/1` or `/p1` types references)

### Struct

```python
Struct(format, names=None, safe_references=True)
```

Struct objects allow to pre-parse format strings once and for all.
Indeed, using only format strings will force to parse them every time you use them.
If a structure is used more than once, it will thus save time to wrap it in a Struct object.
You can also set the element names once, they will then be used by default every time you unpack data with that structure.
Any function that accepts a format string also accepts Struct objects.
A Struct object is initialized with a format string, and can take a `names` parameter that may be a namedtuple or a list of names, that allows to return data unpacked with this structure in a more convenient namedtuple. The `safe_references`, when set to `False`, allows some seemingly unsafe but sometimes desirable behaviours described in the *References* section.
It works exactly the same as the `names` parameter of `unpack` and its variants, but without having to specify it each time.
You can retrieve the byte order with the `byteorder` attribute (can be `"little"` or `"big"`), and the format string (without byte order mark) with the `format` attribute.
You can also tell whether the structure has an assigned byte order with the `forcebyteorder` attribute.

For convenience, Struct also defines the module-level functions, for the structure it represents (without the `structure` argument as it is for the represented structure) :

```python
unpack(self, data, names=None, refdata=(), byteorder=None)
unpack_from(self, data, offset=None, names=None, refdata=(), getptr=False, byteorder=None)
iter_unpack(self, data, names=None, refdata=(), byteorder=None)
pack(self, *data, refdata=(), byteorder=None)
pack_into(self, buffer, offset, *data, refdata=(), byteorder=None)
pack_file(self, file, *data, position=None, refdata=(), byteorder=None)
calcsize(self, refdata=None, tokens=None)
```

In these method, you can override the structure byteorder on a given use with `byteorder = "little" / "big"`

It is also possible to add structures (it can add Struct and format strings transparently), and multiply a Struct object :

```python
>>> part1 = Struct("<4s")
>>> part2 = Struct("I /0(#0B #0b)")
>>> part3 = "I /0s #0a"
>>> part1 + part2 + part3
Struct("<4s I /1(#0B #0b) I /3s #1a")
>>> part2 * 3
Struct("<I /0(#0B #0b) I /2(#1B #1b) I /4(#2B #2b)")
```
As you can see, the references are automatically fixed : all absolute references in the resulting structure point on the element they pointed to previously.
External references are fixed too, and supposed to be in sequence in `refdata`.

Note that if the added structures have different byte order marks, the resulting structure will always retain the byte order of the left operand.

### Exceptions

Rawutil defines several exception types :

- `rawutil.FormatError` : Raised when the format string parsing fails, or if the structure is invalid
- `rawutil.OperationError` : Raised when operations on data fail
	- `rawutil.DataError` : Raised when data is at fault (e.g. when there is not enough data to unpack the entire format)


It also uses a few others :

- `OverflowError` : When the data is out of range for its format

## Format strings

In the same way as the `struct` module, binary data structures are defined with **format strings**.

### Byte order marks

The first character of the format string may be used to specify the byte order to read the data in.
Those are the same as in `struct`, except `@` that is equivalent to `=` instead of setting native sizes and alignments.

| Chr. | Description |
| ---- | ----------- |
| =    | System byte order (as defined by sys.byteorder) |
| @    | Equivalent to =, system byte order |
| >    | Big endian (most significant byte first) |
| <    | Little endian (least significant byte first) |
| !    | Network byte order (big endian as defined by RFC 1700 |

If no byte order is defined in a structure, it is set to system byte order by default.

### Elements

There are several format characters, that define various data types. Simple data types are described in the following table :

| Chr. | Type   | Size | Description |
| ---- | ------ | ---- | ----------- |
| ?    | bool   | 1    | Boolean value, 0 for False and any other value for True (packed as 0 and 1) |
| b    | int8   | 1    | 8 bits signed integer (7 bits + 1 sign bit) |
| B    | uint8  | 1    | 8 bits unsigned integer |
| h    | int16  | 2    | 16 bits signed integer |
| H    | uint16 | 2    | 16 bits unsigned integer |
| u    | int24  | 3    | 24 bits signed integer |
| U    | uint24 | 3    | 24 bits unsigned integer |
| i    | int32  | 4    | 32 bits signed integer |
| I    | uint32 | 4    | 32 bits unsigned integer |
| l    | int32  | 4    | 32 bits signed integer (same as `i`) |
| L    | uint32 | 4    | 32 bits unsigned integer (same as `I`) |
| q    | int64  | 8    | 64 bits signed integer |
| Q    | uint64 | 8    | 64 bits unsigned integer |
| e    | half   | 2    | IEEE 754 half-precision floating-point number |
| f    | float  | 4    | IEEE 754 single-precision floating-point number |
| d    | double | 8    | IEEE 754 double-precision floating-point number |
| F    | quad   | 16   | IEEE 754 quadruple-precision floating-point number |
| c    | char   | 1    | Character (returned as a 1-byte bytes object) |
| x    | void   | 1    | Convenience padding byte. Takes no data to pack (it simply inserts a null byte) nor returns anything. **Does not fail** when there is no more data to read. To fail in that case, just use a normal `c` |

A number before a simple format character may be added to indicate a repetition : `"4I"` means four 32-bits unsigned integers, and is equivalent to `"IIII"`.

There also exist "special" format characters that define more complex types and behaviours :

| Chr. | Type   | Description |
| ---- | ------ | ----------- |
| s    | char[] | Fixed-length string. Represents a string of a given length, for example `"16s"` represents a 16-byte string. Returned as a single `bytes` object (as a contrary to `c` that only returns individual characters) |
| n    | string | Null-terminated string. To unpack, reads until a null byte is found and returns the result as a `bytes` object, without the null byte. Packs the given bytes, and adds a null byte at the end.
| X    | hex    | Works like `s`, but returns the result as an hexadecimal string. |
| a    |        | Inserts null bytes / reads until the data length reaches the next multiple of the given number (for example, `"4a"` goes to the next multiple of 4). Does not return anything and does not take input data to pack. |
| $    | char[] | When unpacking, returns all remaining data as a bytes object. When packing, simply packs the given bytes object. Must be the last element of the structure. |

You can also set the base position for alignment with the `|` character. An alignment will then be performed according to the latest `|`.
For example, `"QBBB 4a"` represents 1 uint64, 3 bytes and one alignment byte to get to the next multiple of 4 (12), whereas `"QB| BB 4a"` will align according to the `|` and give 1 uint64, 3 bytes and 2 alignment bytes, to get to 4 bytes since the last `|`.

Note that `$` must be at the end of the structure. Any other element after a `$` element will cause a `FormatError`

## References

One of the biggest additions of rawutil is references.
With rawutil, it is possible to use a value previously read as a repeat count for another element, and to insert custom values in a structure at run-time.

There are 3 types of references.

### External references

An external reference is a reference to a value given at run-time — namely through the `refdata` argument of all rawutil functions
In the format string, those are denoted by a `#n` element, with the index in `refdata` as `n`.
For example, in the structure `"#0B #1s"`, `#0` will be replaced by the element 0 of `refdata`, and `#1` by the element 1.

Example :
```python
>>> unpack("#0B #1s", b"\x01\x02\x03foobar", refdata=(3, 6))
[1, 2, 3, b'foobar']
```

In the case above, it is equivalent to have `"3B 6s"` as the structure — but when you have to use several times the same structures with different repeat counts, it is possible to pre-compile the structure in a Struct object with external references, and then use the same object every time with different value, and without re-parsing the structure each time.

### Absolute references

Absolute references allow to use a value previously read as a repeat count for another element further in the structure.
Those are denoted with `/N`, with the index of the referenced element in the structure as `N`.
For example, in the structure `"I /0s"`, the integer is used to tell the length of the string, and the reference allows to read the string with that length.
For absolute and relative references, a sub-structure counts for 1 element.

Example :
```python
>>> unpack("3B /0s /1s /2s", b"\x04\x03\x04spamhameggs")
[4, 3, 4, b'spam', b'ham', b'eggs']
```

### Relative references

Relative references are similar to absolute references, except that they are relative to their location in the structure.
They are denoted with `/pN`, where `N` is the number of elements to go back in the structure to find the referenced element.
It works a bit like negative list indices in Python : `/p1` gives the immediately previous element, `/p2` the one before, and so on.

Example :
```python
>>> unpack("B /p1s 2B /p2s /p2s", b"\x04spam\x03\x04hameggs")
[4, b'spam', 3, 4, b'ham', b'eggs']
```

This is especially useful in cases where there are a variable amount of elements before the referenced element, when the absolute references are unpractical — or when the structure is very long and absolute references become less practical.

### Reference error checking

References come with some error checking : errors are caught while parsing the format when possible. For instance, a reference that points to itself, an element beyond itself, or before the beginning of the format is invalid. Those errors raise a `FormatError`. However, even though it is quite unsafe to reference an element inside or beyond a part with an indeterminate amount of elements (typically, another reference), but that might be useful sometimes. Those "unsafe behaviours" are disabled by default : you need to use `Struct()` with argument `safe_references=False` to activate them.

```python
>>> # For instance, here we reference the last element of the first block, that itself uses a reference
>>> unpack("B /0B /p1c", b"\x02\xFF\x03ABC")
...
rawutil.FormatError: In format 'B /0B /p1c', in subformat 'B/0B/p1c', at position 4 : Unsafe reference index : relative reference references in or beyond an indeterminate amount of elements (typically a reference). If it is intended, use the parameter safe_references=False of the Struct() constructor
>>> Struct("B /0B /p1c", safe_references=False).unpack(b"\x02\xFF\x03ABC")
[2, 255, 3, b'A', b'B', b'C']
```

## Sub-structures

The other big addition in rawutil is the substructures elements.
Those can be used to isolate values in their own group instead of diluted in the global scope, or to easily read several times a similar group of structure elements. They can of course be nested.

Note that a substructure always count as a single element towards references, and that references are local to their group : a `/0` reference inside of a substructure will point to the first element *of that substructure*.

Alignments are also local to their substructure, thus will always align relative to the beginning of the substructure.

### Groups

A group is simply a group of values isolated in their own sub-list.
Those are defined between parentheses `(…)`.
The values in a group are then extracted in a sub-list, and must be in a sub-list when packed.

Example :
```python
>>> unpack("<I (3B) I", b"\xff\xff\xff\xff\x01\x02\x03\xff\xff\xff\xff")
[4294967295, [1, 2, 3], 4294967295]
>>> pack("<I (3B) I", 0xFFFFFFFF, (1, 2, 3), 0xFFFFFFFF)
b'\xff\xff\xff\xff\x01\x02\x03\xff\xff\xff\xff'
```

When a repeat count is set to a group (as a number or as a reference, both are always valid), it will extract the group several times, but in the same sub-list, as a contrary to iterators that are described below.

Example :
```python
>>> unpack("B 3(n)", b"\x0afoo\x00bar\x00foo2\x00")
[10, [b'foo', b'bar', b'foo2']]
>>> unpack("B /0(n)", b"\x03foo\x00bar\x00foo2\x00")
[3, [b'foo', b'bar', b'foo2']]
```

### Iterators

An iterator will extract its substructure as many times as it is told by its repeat count, in separate sub-lists.
It is defined between square brackets `[…]`

Example :
```python
>>> unpack("B /0[B /0s]", b"\x03\x03foo\x03bar\x06foobar")
[3, [[3, b'foo'], [3, b'bar'], [6, b'foobar']]]
>>> pack("B /0[B /0s]", 2, ((3, b"foo"), (3, b"bar")))
b'\x02\x03foo\x03bar'
```

### Unbound iterators

While `[]` iterators are more or less equivalent to a `for i in range(count)`, those are equivalent to a `while`.
This kind of iterator is defined between curly brackets `{…}`, and extracts its substructure into a list of lists just like `[]`, except that it extracts until there are no more data left to read.
Thus you must not give it any repeat count (doing so will throw a `FormatError`), and it must always be the last element of its structure (it also raises an exception otherwise).
The data to read must be an exact multiple of that substructure, otherwise it will throw an `OperationError` when attempting to unpack it.

Example :
```python
>>> unpack("4s {Bn}", b"TEST\x00\foo\x00\x01bar\x00\x02foobar\x00")
[b'TEST', [[0, b'\x0coo'], [1, b'bar'], [2, b'foobar']]]
>>> pack("4s {Hn4a}", b"TEST", ((1, b"foo"), (1295, b"bar")))
b'TEST\x01\x00foo\x00\x00\x00\x0f\x05bar\x00\x00\x00'
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Tyulis/rawutil",
    "name": "rawutil",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.4",
    "maintainer_email": null,
    "keywords": "structures struct binary bytes formats",
    "author": "Tyulis",
    "author_email": "tyulis@laposte.net",
    "download_url": "https://files.pythonhosted.org/packages/9b/4b/f6310566f70a8c3bcbb1ec69bcbb3926c5e3e5ab11bea3d5f86517b13d2b/rawutil-2.8.1.tar.gz",
    "platform": null,
    "description": "# Rawutil\n\n*A pure-python and lightweight module to read and write binary data*\n\n## Introduction\n\nRawutil is a module aimed at reading and writing binary data in python in the same way as the built-in `struct` module, but with more features.\nThe rawutil's interface is thus compatible with `struct`, with a few small exceptions, and many things added.\nIt does not have any non-builtin dependency.\n\n### What is already in struct\n\n- Unpack and pack fixed structures from/to bytes (`pack`, `pack_into`, `unpack`, `unpack_from`, `iter_unpack`, `calcsize`)\n- `Struct` objects that allow to parse one and for all a structure that may be used several times\n\n### What is different compared to struct\n\n- Some rarely-used format characters are not in rawutil (`N`, `P` and `p` are not available, `n` is used for a different purpose)\n- There is no consideration for native size and alignment, thus the `@` characters simply applies system byte order with standard sizes and no alignment, just like `=`\n- There are several differences in error handling that are described below\n\n### What has been added to struct\n\n- Reading and writing files and file-like objects\n- New format characters, to handle padding, alignment, strings, ...\n- Internal references in structures\n- Loops in structures\n- New features to handle variable byte order\n\n## Usage\n\nRawutil exports more or less the same interface as `struct`. In all those functions, `structure` may be a simple format string or a `Struct` object.\n\n### unpack\n\n```python\nunpack(structure, data, names=None, refdata=())\n```\nUnpacks the given `data` according to the `structure`, and returns the unpacked values as a list.\n\n- `structure` is the structure of the data to unpack, as a format string or a `Struct` object\n- `data` may be a bytes-like or a file-like object. If it is a file-like object, the data will be unpacked starting from the current position in the file, and will leave the cursor at the end of the data that has been read (effectively reading the data to unpack from the file).\n- `names` may be a list of field names for a `namedtuple`, or a callable that takes all unpacked elements in order as arguments, like a `namedtuple` or a `dataclass`.\n- `refdata` may be used to easily input external data into the structure, as `#n` references. This will be described in the References part below\n\nUnlike `struct`, this function does not raises any error if the data is larger than the structure expected size.\n\nExamples :\n\n```python\n>>> unpack(\"4B 3s 3s\", b\"\\x01\\x02\\x03\\x04foobar\")\n(1, 2, 3, 4, b\"foo\", b\"bar\")\n>>> unpack(\"<4s #0I\", b\"ABCD\\x10\\x00\\x00\\x00\\x20\\x00\\x00\\x00\", names=(\"string\", \"num1\", \"num2\"), refdata=(2, ))\nRawutilNameSpace(string=b'ABCD', num1=16, num2=32)\n```\n\n### unpack_from\n\n```python\nunpack_from(structure, data, offset=None, names=None, refdata=(), getptr=False)\n```\n\nUnpacks the given `data` according to the `structure` starting from the given `position`, and returns the unpacked values as a list\n\nThis function works exactly like `unpack`, with two more optional arguments :\n\n- `offset` can be used to specify a starting position to read. In a file-like object, the cursor is moved to the given absolute `offset`, then the data to unpack is read and the cursor is left at the end of the data that has been read. If this parameter is not set, it works like `unpack` and reads from the current position\n- `getptr` can be set to True to return the final position in the data, after the unpacked data. The function will then return `(values, end_position)`. If left to False, it works like `unpack` and only returns the values.\n\nExamples :\n\n```python\n>>> unpack_from(\"<4s #0I\", b\"ABCD\\x10\\x00\\x00\\x00\\x20\\x00\\x00\\x00\", names=(\"string\", \"num1\", \"num2\"), refdata=(2, ))\nRawutilNameSpace(string=b'ABCD', num1=16, num2=32)\n>>> values, endpos = unpack_from(\"<2I\", b\"ABCD\\x10\\x00\\x00\\x00\\x20\\x00\\x00\\x00EFGH\", offset=4, getptr=True)\n>>> values\n[16, 32]\n>>> endpos\n12\n```\n\n### iter_unpack\n\n```python\niter_unpack(structure, data, names=None, refdata=())\n```\n\nReturns an iterator that will unpack according to the structure and return the values as a list at each iteration.\nThe data must be of a multiple of the structure\u2019s length. If `names` is defined, each iteration will return a namedtuple, most like `unpack` and `unpack_from`. `refdata` also works the same.\n\nThis function is present mostly to ensure compatibility with `struct`. It is rather recommended to use iterators in structures, that are faster and offer much more control.\n\nExamples :\n```python\n>>> for a, b, c in iter_unpack(\"3c\", b\"abcdefghijkl\"):\n...     print(a.decode(\"ascii\"), b.decode(\"ascii\"), c.decode(\"ascii\"))\n...\na b c\nd e f\ng h i\nj k l\n```\n\n### pack\n\n```python\npack(self, *data, refdata=())\n```\n\nPacks the given `data` in the binary format defined by `structure`, and returns the packed data as a `bytes` object.\n`refdata` is still there to insert external data in the structure using the `#n` references, and is a named argument only.\n\nNote that if the last element of `data` is a writable file-like object, the data will be written into it instead of being returned. This behaviour is deprecated and kept only for backwards-compatibility, to pack into a file you should rather use `pack_file`.\n\nExamples :\n```python\n>>> pack(\"<2In\", 10, 100, b\"String\")\nb'\\n\\x00\\x00\\x00\\n\\x00\\x00\\x00String\\x00'\n>>> pack(\">#0B #1I\", 10, 100, 1000, 10000, 100000, refdata=(2, 3))\nb\"\\nd\\x00\\x00\\x03\\xe8\\x00\\x00'\\x10\\x00\\x01\\x86\\xa0\"\n>>> unpack(\">2B3I\", _)\n[10, 100, 1000, 10000, 100000]\n```\n\n### pack_into\n\n```python\npack_into(structure, buffer, offset, *data, refdata=())\n```\n\nPacks the given `data` into the given `buffer` at the given `offset` according to the given `structure`. Refdata still has the same usage as everywhere else.\n\n- `buffer` must be a mutable bytes-like object (typically a `bytearray`). The data will be written directly into it at the given position\n- `offset` specifies the position to write the data to. It is a required argument.\n\nExamples :\n\n```python\n>>> b = bytearray(b\"AB----GH\")\n>>> pack_into(\"4s\", b, 2, b\"CDEF\")\n>>> b\nbytearray(b'ABCDEFGH')\n```\n\n### pack_file\n\n```python\npack_file(structure, file, *data, position=None, refdata=())\n```\n\nPacks the given `data` into the given `file` according to the given `structure`. `refdata` is still there for the external references data.\n\n- `file` can be any binary writable file-like object.\n- `position` can be set to pack the data at a specific position in the file. If it is left to `None`, the data will be packed at the current position in the file. In either case, the cursor will end up at the end of the packed data.\n\nExamples :\n\n```python\n>>> file = io.BytesIO(b\"\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\")\n>>> rawutil.pack_file(\"2B\", file, 60, 61)  # Writes at the current position (0)\n>>> rawutil.pack_file(\"c\", file, b\"A\")     # Writes at the current position (now 2)\n>>> rawutil.pack_file(\"2c\", file, b\"y\", b\"z\", position=6)  # Writes at the given position (6)\n>>> file.seek(0)\n>>> file.read()\nb'<=A\\x00\\x00\\x00yz'\n```\n\n### calcsize\n\n```python\ncalcsize(structure, refdata=())\n```\n\nReturns the size of the data represented by the given `structure`.\n\nThis function is kept to ensure compatibility with `struct`.\nHowever, rawutil structure are not always of a fixed length, as they use internal references and variable length formats.\nHence `calcsize` only works on fixed-length structures, thus structures that only use :\n\n- Fixed-length format characters (basic types with set repeat count)\n- External references (`#0` type references, if you provide their value in `refdata`)\n- Iterators with fixed number of repeats (`2(\u2026)` or `5[\u2026]` will work)\n- Alignments (structures with `a` and `|`). As long as everything else is fixed, alignments are too.\n\nTrying to compute the size of a structure that includes any of those elements will raise an `FormatError` (basically, anything that depends on the data to read / write) :\n\n- Variable-length format characters (namely `n` and `$`)\n- `{\u2026}` iterators, as they depend on the amount of data remaining.\n- Internal references (any `/1` or `/p1` types references)\n\n### Struct\n\n```python\nStruct(format, names=None, safe_references=True)\n```\n\nStruct objects allow to pre-parse format strings once and for all.\nIndeed, using only format strings will force to parse them every time you use them.\nIf a structure is used more than once, it will thus save time to wrap it in a Struct object.\nYou can also set the element names once, they will then be used by default every time you unpack data with that structure.\nAny function that accepts a format string also accepts Struct objects.\nA Struct object is initialized with a format string, and can take a `names` parameter that may be a namedtuple or a list of names, that allows to return data unpacked with this structure in a more convenient namedtuple. The `safe_references`, when set to `False`, allows some seemingly unsafe but sometimes desirable behaviours described in the *References* section.\nIt works exactly the same as the `names` parameter of `unpack` and its variants, but without having to specify it each time.\nYou can retrieve the byte order with the `byteorder` attribute (can be `\"little\"` or `\"big\"`), and the format string (without byte order mark) with the `format` attribute.\nYou can also tell whether the structure has an assigned byte order with the `forcebyteorder` attribute.\n\nFor convenience, Struct also defines the module-level functions, for the structure it represents (without the `structure` argument as it is for the represented structure) :\n\n```python\nunpack(self, data, names=None, refdata=(), byteorder=None)\nunpack_from(self, data, offset=None, names=None, refdata=(), getptr=False, byteorder=None)\niter_unpack(self, data, names=None, refdata=(), byteorder=None)\npack(self, *data, refdata=(), byteorder=None)\npack_into(self, buffer, offset, *data, refdata=(), byteorder=None)\npack_file(self, file, *data, position=None, refdata=(), byteorder=None)\ncalcsize(self, refdata=None, tokens=None)\n```\n\nIn these method, you can override the structure byteorder on a given use with `byteorder = \"little\" / \"big\"`\n\nIt is also possible to add structures (it can add Struct and format strings transparently), and multiply a Struct object :\n\n```python\n>>> part1 = Struct(\"<4s\")\n>>> part2 = Struct(\"I /0(#0B #0b)\")\n>>> part3 = \"I /0s #0a\"\n>>> part1 + part2 + part3\nStruct(\"<4s I /1(#0B #0b) I /3s #1a\")\n>>> part2 * 3\nStruct(\"<I /0(#0B #0b) I /2(#1B #1b) I /4(#2B #2b)\")\n```\nAs you can see, the references are automatically fixed : all absolute references in the resulting structure point on the element they pointed to previously.\nExternal references are fixed too, and supposed to be in sequence in `refdata`.\n\nNote that if the added structures have different byte order marks, the resulting structure will always retain the byte order of the left operand.\n\n### Exceptions\n\nRawutil defines several exception types :\n\n- `rawutil.FormatError` : Raised when the format string parsing fails, or if the structure is invalid\n- `rawutil.OperationError` : Raised when operations on data fail\n\t- `rawutil.DataError` : Raised when data is at fault (e.g. when there is not enough data to unpack the entire format)\n\n\nIt also uses a few others\u00a0:\n\n- `OverflowError` : When the data is out of range for its format\n\n## Format strings\n\nIn the same way as the `struct` module, binary data structures are defined with **format strings**.\n\n### Byte order marks\n\nThe first character of the format string may be used to specify the byte order to read the data in.\nThose are the same as in `struct`, except `@` that is equivalent to `=` instead of setting native sizes and alignments.\n\n| Chr. | Description |\n| ---- | ----------- |\n| =    | System byte order (as defined by sys.byteorder) |\n| @    | Equivalent to =, system byte order |\n| >    | Big endian (most significant byte first) |\n| <    | Little endian (least significant byte first) |\n| !    | Network byte order (big endian as defined by RFC 1700 |\n\nIf no byte order is defined in a structure, it is set to system byte order by default.\n\n### Elements\n\nThere are several format characters, that define various data types. Simple data types are described in the following table :\n\n| Chr. | Type   | Size | Description |\n| ---- | ------ | ---- | ----------- |\n| ?    | bool   | 1    | Boolean value, 0 for False and any other value for True (packed as 0 and 1) |\n| b    | int8   | 1    | 8 bits signed integer (7 bits + 1 sign bit) |\n| B    | uint8  | 1    | 8 bits unsigned integer |\n| h    | int16  | 2    | 16 bits signed integer |\n| H    | uint16 | 2    | 16 bits unsigned integer |\n| u    | int24  | 3    | 24 bits signed integer |\n| U    | uint24 | 3    | 24 bits unsigned integer |\n| i    | int32  | 4    | 32 bits signed integer |\n| I    | uint32 | 4    | 32 bits unsigned integer |\n| l    | int32  | 4    | 32 bits signed integer (same as `i`) |\n| L    | uint32 | 4    | 32 bits unsigned integer (same as `I`) |\n| q    | int64  | 8    | 64 bits signed integer |\n| Q    | uint64 | 8    | 64 bits unsigned integer |\n| e    | half   | 2    | IEEE 754 half-precision floating-point number |\n| f    | float  | 4    | IEEE 754 single-precision floating-point number |\n| d    | double | 8    | IEEE 754 double-precision floating-point number |\n| F    | quad   | 16   | IEEE 754 quadruple-precision floating-point number |\n| c    | char   | 1    | Character (returned as a 1-byte bytes object) |\n| x    | void   | 1    | Convenience padding byte. Takes no data to pack (it simply inserts a null byte) nor returns anything. **Does not fail** when there is no more data to read. To fail in that case, just use a normal `c` |\n\nA number before a simple format character may be added to indicate a repetition : `\"4I\"` means four 32-bits unsigned integers, and is equivalent to `\"IIII\"`.\n\nThere also exist \"special\" format characters that define more complex types and behaviours :\n\n| Chr. | Type   | Description |\n| ---- | ------ | ----------- |\n| s    | char[] | Fixed-length string. Represents a string of a given length, for example `\"16s\"` represents a 16-byte string. Returned as a single `bytes` object (as a contrary to `c` that only returns individual characters) |\n| n    | string | Null-terminated string. To unpack, reads until a null byte is found and returns the result as a `bytes` object, without the null byte. Packs the given bytes, and adds a null byte at the end.\n| X    | hex    | Works like `s`, but returns the result as an hexadecimal string. |\n| a    |        | Inserts null bytes / reads until the data length reaches the next multiple of the given number (for example, `\"4a\"` goes to the next multiple of 4). Does not return anything and does not take input data to pack. |\n| $    | char[] | When unpacking, returns all remaining data as a bytes object. When packing, simply packs the given bytes object. Must be the last element of the structure. |\n\nYou can also set the base position for alignment with the `|` character. An alignment will then be performed according to the latest `|`.\nFor example, `\"QBBB 4a\"` represents 1 uint64, 3 bytes and one alignment byte to get to the next multiple of 4 (12), whereas `\"QB| BB 4a\"` will align according to the `|` and give 1 uint64, 3 bytes and 2 alignment bytes, to get to 4 bytes since the last `|`.\n\nNote that `$` must be at the end of the structure. Any other element after a `$` element will cause a `FormatError`\n\n## References\n\nOne of the biggest additions of rawutil is references.\nWith rawutil, it is possible to use a value previously read as a repeat count for another element, and to insert custom values in a structure at run-time.\n\nThere are 3 types of references.\n\n### External references\n\nAn external reference is a reference to a value given at run-time \u2014 namely through the `refdata` argument of all rawutil functions\nIn the format string, those are denoted by a `#n` element, with the index in `refdata` as `n`.\nFor example, in the structure `\"#0B #1s\"`, `#0` will be replaced by the element 0 of `refdata`, and `#1` by the element 1.\n\nExample :\n```python\n>>> unpack(\"#0B #1s\", b\"\\x01\\x02\\x03foobar\", refdata=(3, 6))\n[1, 2, 3, b'foobar']\n```\n\nIn the case above, it is equivalent to have `\"3B 6s\"` as the structure \u2014 but when you have to use several times the same structures with different repeat counts, it is possible to pre-compile the structure in a Struct object with external references, and then use the same object every time with different value, and without re-parsing the structure each time.\n\n### Absolute references\n\nAbsolute references allow to use a value previously read as a repeat count for another element further in the structure.\nThose are denoted with `/N`, with the index of the referenced element in the structure as `N`.\nFor example, in the structure `\"I /0s\"`, the integer is used to tell the length of the string, and the reference allows to read the string with that length.\nFor absolute and relative references, a sub-structure counts for 1 element.\n\nExample :\n```python\n>>> unpack(\"3B /0s /1s /2s\", b\"\\x04\\x03\\x04spamhameggs\")\n[4, 3, 4, b'spam', b'ham', b'eggs']\n```\n\n### Relative references\n\nRelative references are similar to absolute references, except that they are relative to their location in the structure.\nThey are denoted with `/pN`, where `N` is the number of elements to go back in the structure to find the referenced element.\nIt works a bit like negative list indices in Python : `/p1` gives the immediately previous element, `/p2` the one before, and so on.\n\nExample :\n```python\n>>> unpack(\"B /p1s 2B /p2s /p2s\", b\"\\x04spam\\x03\\x04hameggs\")\n[4, b'spam', 3, 4, b'ham', b'eggs']\n```\n\nThis is especially useful in cases where there are a variable amount of elements before the referenced element, when the absolute references are unpractical \u2014 or when the structure is very long and absolute references become less practical.\n\n### Reference error checking\n\nReferences come with some error checking : errors are caught while parsing the format when possible. For instance, a reference that points to itself, an element beyond itself, or before the beginning of the format is invalid. Those errors raise a `FormatError`. However, even though it is quite unsafe to reference an element inside or beyond a part with an indeterminate amount of elements (typically, another reference), but that might be useful sometimes. Those \"unsafe behaviours\" are disabled by default : you need to use `Struct()` with argument `safe_references=False` to activate them.\n\n```python\n>>> # For instance, here we reference the last element of the first block, that itself uses a reference\n>>> unpack(\"B /0B /p1c\", b\"\\x02\\xFF\\x03ABC\")\n...\nrawutil.FormatError: In format 'B /0B /p1c', in subformat 'B/0B/p1c', at position 4 : Unsafe reference index : relative reference references in or beyond an indeterminate amount of elements (typically a reference). If it is intended, use the parameter safe_references=False of the Struct() constructor\n>>> Struct(\"B /0B /p1c\", safe_references=False).unpack(b\"\\x02\\xFF\\x03ABC\")\n[2, 255, 3, b'A', b'B', b'C']\n```\n\n## Sub-structures\n\nThe other big addition in rawutil is the substructures elements.\nThose can be used to isolate values in their own group instead of diluted in the global scope, or to easily read several times a similar group of structure elements. They can of course be nested.\n\nNote that a substructure always count as a single element towards references, and that references are local to their group : a `/0` reference inside of a substructure will point to the first element *of that substructure*.\n\nAlignments are also local to their substructure, thus will always align relative to the beginning of the substructure.\n\n### Groups\n\nA group is simply a group of values isolated in their own sub-list.\nThose are defined between parentheses `(\u2026)`.\nThe values in a group are then extracted in a sub-list, and must be in a sub-list when packed.\n\nExample :\n```python\n>>> unpack(\"<I (3B) I\", b\"\\xff\\xff\\xff\\xff\\x01\\x02\\x03\\xff\\xff\\xff\\xff\")\n[4294967295, [1, 2, 3], 4294967295]\n>>> pack(\"<I (3B) I\", 0xFFFFFFFF, (1, 2, 3), 0xFFFFFFFF)\nb'\\xff\\xff\\xff\\xff\\x01\\x02\\x03\\xff\\xff\\xff\\xff'\n```\n\nWhen a repeat count is set to a group (as a number or as a reference, both are always valid), it will extract the group several times, but in the same sub-list, as a contrary to iterators that are described below.\n\nExample :\n```python\n>>> unpack(\"B 3(n)\", b\"\\x0afoo\\x00bar\\x00foo2\\x00\")\n[10, [b'foo', b'bar', b'foo2']]\n>>> unpack(\"B /0(n)\", b\"\\x03foo\\x00bar\\x00foo2\\x00\")\n[3, [b'foo', b'bar', b'foo2']]\n```\n\n### Iterators\n\nAn iterator will extract its substructure as many times as it is told by its repeat count, in separate sub-lists.\nIt is defined between square brackets `[\u2026]`\n\nExample :\n```python\n>>> unpack(\"B /0[B /0s]\", b\"\\x03\\x03foo\\x03bar\\x06foobar\")\n[3, [[3, b'foo'], [3, b'bar'], [6, b'foobar']]]\n>>> pack(\"B /0[B /0s]\", 2, ((3, b\"foo\"), (3, b\"bar\")))\nb'\\x02\\x03foo\\x03bar'\n```\n\n### Unbound iterators\n\nWhile `[]` iterators are more or less equivalent to a `for i in range(count)`, those are equivalent to a `while`.\nThis kind of iterator is defined between curly brackets `{\u2026}`, and extracts its substructure into a list of lists just like `[]`, except that it extracts until there are no more data left to read.\nThus you must not give it any repeat count (doing so will throw a `FormatError`), and it must always be the last element of its structure (it also raises an exception otherwise).\nThe data to read must be an exact multiple of that substructure, otherwise it will throw an `OperationError` when attempting to unpack it.\n\nExample :\n```python\n>>> unpack(\"4s {Bn}\", b\"TEST\\x00\\foo\\x00\\x01bar\\x00\\x02foobar\\x00\")\n[b'TEST', [[0, b'\\x0coo'], [1, b'bar'], [2, b'foobar']]]\n>>> pack(\"4s {Hn4a}\", b\"TEST\", ((1, b\"foo\"), (1295, b\"bar\")))\nb'TEST\\x01\\x00foo\\x00\\x00\\x00\\x0f\\x05bar\\x00\\x00\\x00'\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A pure-python module to read and write binary data",
    "version": "2.8.1",
    "project_urls": {
        "Homepage": "https://github.com/Tyulis/rawutil"
    },
    "split_keywords": [
        "structures",
        "struct",
        "binary",
        "bytes",
        "formats"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dd089e789f9c94078ff982137d58d6ed41266a7a5e55da8472e69fc9699a30e1",
                "md5": "6127dd99a63f4dfe06642bd9adc5196d",
                "sha256": "b7dacaafebb89c77bf6c6a01a3415c2188004663cc82064c0b02d24204cf17ff"
            },
            "downloads": -1,
            "filename": "rawutil-2.8.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6127dd99a63f4dfe06642bd9adc5196d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.4",
            "size": 21065,
            "upload_time": "2024-08-25T10:05:16",
            "upload_time_iso_8601": "2024-08-25T10:05:16.120824Z",
            "url": "https://files.pythonhosted.org/packages/dd/08/9e789f9c94078ff982137d58d6ed41266a7a5e55da8472e69fc9699a30e1/rawutil-2.8.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9b4bf6310566f70a8c3bcbb1ec69bcbb3926c5e3e5ab11bea3d5f86517b13d2b",
                "md5": "7182291e821969b59aaaa9c9e1c642d2",
                "sha256": "73466cf8803bcedbc6c2fab020cf87280e61ca7a76e9eb35260ddc29ed845cb2"
            },
            "downloads": -1,
            "filename": "rawutil-2.8.1.tar.gz",
            "has_sig": false,
            "md5_digest": "7182291e821969b59aaaa9c9e1c642d2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.4",
            "size": 23658,
            "upload_time": "2024-08-25T10:05:17",
            "upload_time_iso_8601": "2024-08-25T10:05:17.555540Z",
            "url": "https://files.pythonhosted.org/packages/9b/4b/f6310566f70a8c3bcbb1ec69bcbb3926c5e3e5ab11bea3d5f86517b13d2b/rawutil-2.8.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-25 10:05:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Tyulis",
    "github_project": "rawutil",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "rawutil"
}

Tyulis