<div align="center">
# python-xz
Pure Python implementation of the XZ file format with random access support
_Leveraging the lzma module for fast (de)compression_
[![GitHub build status](https://img.shields.io/github/actions/workflow/status/rogdham/python-xz/build.yml?branch=master)](https://github.com/rogdham/python-xz/actions?query=branch:master)
[![Release on PyPI](https://img.shields.io/pypi/v/python-xz)](https://pypi.org/project/python-xz/)
[![Code coverage](https://img.shields.io/badge/coverage-100%25-brightgreen)](https://github.com/rogdham/python-xz/search?q=fail+under&type=Code)
[![Mypy type checker](https://img.shields.io/badge/type_checker-mypy-informational)](https://mypy.readthedocs.io/)
[![MIT License](https://img.shields.io/pypi/l/python-xz)](https://github.com/Rogdham/python-xz/blob/master/LICENSE.txt)
---
[📖 Documentation](https://github.com/rogdham/python-xz/#usage) | [📃 Changelog](./CHANGELOG.md)
</div>
---
A XZ file can be composed of several streams and blocks. This allows for fast random
access when reading, but this is not supported by Python's builtin `lzma` module (which
would read all previous blocks for nothing).
<div align="center">
| | [lzma] | [lzmaffi] | python-xz |
| :---------------: | :---------------: | :------------------: | :------------------: |
| module type | builtin | cffi (C extension) | pure Python |
| 📄 **read** | | | |
| random access | ❌ no<sup>1</sup> | ✔️ yes<sup>2</sup> | ✔️ yes<sup>2</sup> |
| several blocks | ✔️ yes | ✔️✔️ yes<sup>3</sup> | ✔️✔️ yes<sup>3</sup> |
| several streams | ✔️ yes | ✔️ yes | ✔️✔️ yes<sup>4</sup> |
| stream padding | ❌ no<sup>5</sup> | ✔️ yes | ✔️ yes |
| 📝 **write** | | | |
| `w` mode | ✔️ yes | ✔️ yes | ✔️ yes |
| `x` mode | ✔️ yes | ❌ no | ✔️ yes |
| `a` mode | ✔️ new stream | ✔️ new stream | ⏳ planned |
| `r+`/`w+`/… modes | ❌ no | ❌ no | ✔️ yes |
| several blocks | ❌ no | ❌ no | ✔️ yes |
| several streams | ❌ no<sup>6</sup> | ❌ no<sup>6</sup> | ✔️ yes |
| stream padding | ❌ no | ❌ no | ⏳ planned |
</div>
<details>
<summary>Notes</summary>
1. Reading from a position will read the file from the very beginning
2. Reading from a position will read the file from the beginning of the block
3. Block positions available with the `block_boundaries` attribute
4. Stream positions available with the `stream_boundaries` attribute
5. Related [issue](https://github.com/python/cpython/issues/88300)
6. Possible by manually closing and re-opening in append mode
</details>
[lzma]: https://docs.python.org/3/library/lzma.html
[lzmaffi]: https://github.com/r3m0t/backports.lzma
---
## Install
Install `python-xz` with pip:
```sh
$ python -m pip install python-xz
```
_An unofficial package for conda is [also available][conda package], see [issue #5][#5]
for more information._
[conda package]: https://anaconda.org/conda-forge/python-xz
[#5]: https://github.com/Rogdham/python-xz/issues/5
## Usage
The API is similar to [lzma]: you can use either `xz.open` or `xz.XZFile`.
### Read mode
```python
>>> with xz.open('example.xz') as fin:
... fin.read(18)
... fin.stream_boundaries # 2 streams
... fin.block_boundaries # 4 blocks in first stream, 2 blocks in second stream
... fin.seek(1000)
... fin.read(31)
...
b'Hello, world! \xf0\x9f\x91\x8b'
[0, 2000]
[0, 500, 1000, 1500, 2000, 3000]
1000
b'\xe2\x9c\xa8 Random access is fast! \xf0\x9f\x9a\x80'
```
Opening in text mode works as well, but notice that seek arguments as well as boundaries
are still in bytes (just like with `lzma.open`).
```python
>>> with xz.open('example.xz', 'rt') as fin:
... fin.read(15)
... fin.stream_boundaries
... fin.block_boundaries
... fin.seek(1000)
... fin.read(26)
...
'Hello, world! 👋'
[0, 2000]
[0, 500, 1000, 1500, 2000, 3000]
1000
'✨ Random access is fast! 🚀'
```
### Write mode
Writing is only supported from the end of file. It is however possible to truncate the
file first. Note that truncating is only supported on block boundaries.
```python
>>> with xz.open('test.xz', 'w') as fout:
... fout.write(b'Hello, world!\n')
... fout.write(b'This sentence is still in the previous block\n')
... fout.change_block()
... fout.write(b'But this one is in its own!\n')
...
14
45
28
```
Advanced usage:
- Modes like `r+`/`w+`/`x+` allow to open for both read and write at the same time;
however in the current implementation, a block with writing in progress is
automatically closed when reading data from it.
- The `check`, `preset` and `filters` arguments to `xz.open` and `xz.XZFile` allow to
configure the default values for new streams and blocks.
- Change block with the `change_block` method (the `preset` and `filters` attributes can
be changed beforehand to apply to the new block).
- Change stream with the `change_stream` method (the `check` attribute can be changed
beforehand to apply to the new stream).
---
## FAQ
### How does random-access works?
XZ files are made of a number of streams, and each stream is composed of a number of
block. This can be seen with `xz --list`:
```sh
$ xz --list file.xz
Strms Blocks Compressed Uncompressed Ratio Check Filename
1 13 16.8 MiB 297.9 MiB 0.056 CRC64 file.xz
```
To read data from the middle of the 10th block, we will decompress the 10th block from
its start it until we reach the middle (and drop that decompressed data), then returned
the decompressed data from that point.
Choosing the good block size is a tradeoff between seeking time during random access and
compression ratio.
### How can I create XZ files optimized for random-access?
You can open the file for writing and use the `change_block` method to create several
blocks.
Other tools allow to create XZ files with several blocks as well:
- [XZ Utils](https://tukaani.org/xz/) needs to be called with flags:
```sh
$ xz -T0 file # threading mode
$ xz --block-size 16M file # same size for all blocks
$ xz --block-list 16M,32M,8M,42M file # specific size for each block
```
- [PIXZ](https://github.com/vasi/pixz) creates files with several blocks by default:
```sh
$ pixz file
```
### Python version support
As a general rule, all Python versions that are both [released and still officially
supported][python-versions] are supported by `python-xz` and tested against (both
CPython and PyPy implementations).
If you have other use cases or find issues with some Python versions, feel free to
[open a ticket](https://github.com/Rogdham/python-xz/issues/new)!
[python-versions]: https://devguide.python.org/versions/#versions
Raw data
{
"_id": null,
"home_page": "https://github.com/rogdham/python-xz",
"name": "python-xz",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "xz lzma compression decompression",
"author": "Rogdham",
"author_email": "contact@rogdham.net",
"download_url": "https://files.pythonhosted.org/packages/2a/53/0eb6460a6854483271e0f67c4b680d02b6486056c3d597ceda224ee3cee7/python-xz-0.5.0.tar.gz",
"platform": "any",
"description": "<div align=\"center\">\n\n# python-xz\n\nPure Python implementation of the XZ file format with random access support\n\n_Leveraging the lzma module for fast (de)compression_\n\n[![GitHub build status](https://img.shields.io/github/actions/workflow/status/rogdham/python-xz/build.yml?branch=master)](https://github.com/rogdham/python-xz/actions?query=branch:master)\n[![Release on PyPI](https://img.shields.io/pypi/v/python-xz)](https://pypi.org/project/python-xz/)\n[![Code coverage](https://img.shields.io/badge/coverage-100%25-brightgreen)](https://github.com/rogdham/python-xz/search?q=fail+under&type=Code)\n[![Mypy type checker](https://img.shields.io/badge/type_checker-mypy-informational)](https://mypy.readthedocs.io/)\n[![MIT License](https://img.shields.io/pypi/l/python-xz)](https://github.com/Rogdham/python-xz/blob/master/LICENSE.txt)\n\n---\n\n[\ud83d\udcd6 Documentation](https://github.com/rogdham/python-xz/#usage) | [\ud83d\udcc3 Changelog](./CHANGELOG.md)\n\n</div>\n\n---\n\nA XZ file can be composed of several streams and blocks. This allows for fast random\naccess when reading, but this is not supported by Python's builtin `lzma` module (which\nwould read all previous blocks for nothing).\n\n<div align=\"center\">\n\n| | [lzma] | [lzmaffi] | python-xz |\n| :---------------: | :---------------: | :------------------: | :------------------: |\n| module type | builtin | cffi (C extension) | pure Python |\n| \ud83d\udcc4 **read** | | | |\n| random access | \u274c no<sup>1</sup> | \u2714\ufe0f yes<sup>2</sup> | \u2714\ufe0f yes<sup>2</sup> |\n| several blocks | \u2714\ufe0f yes | \u2714\ufe0f\u2714\ufe0f yes<sup>3</sup> | \u2714\ufe0f\u2714\ufe0f yes<sup>3</sup> |\n| several streams | \u2714\ufe0f yes | \u2714\ufe0f yes | \u2714\ufe0f\u2714\ufe0f yes<sup>4</sup> |\n| stream padding | \u274c no<sup>5</sup> | \u2714\ufe0f yes | \u2714\ufe0f yes |\n| \ud83d\udcdd **write** | | | |\n| `w` mode | \u2714\ufe0f yes | \u2714\ufe0f yes | \u2714\ufe0f yes |\n| `x` mode | \u2714\ufe0f yes | \u274c no | \u2714\ufe0f yes |\n| `a` mode | \u2714\ufe0f new stream | \u2714\ufe0f new stream | \u23f3 planned |\n| `r+`/`w+`/\u2026 modes | \u274c no | \u274c no | \u2714\ufe0f yes |\n| several blocks | \u274c no | \u274c no | \u2714\ufe0f yes |\n| several streams | \u274c no<sup>6</sup> | \u274c no<sup>6</sup> | \u2714\ufe0f yes |\n| stream padding | \u274c no | \u274c no | \u23f3 planned |\n\n</div>\n\n<details>\n<summary>Notes</summary>\n\n1. Reading from a position will read the file from the very beginning\n2. Reading from a position will read the file from the beginning of the block\n3. Block positions available with the `block_boundaries` attribute\n4. Stream positions available with the `stream_boundaries` attribute\n5. Related [issue](https://github.com/python/cpython/issues/88300)\n6. Possible by manually closing and re-opening in append mode\n\n</details>\n\n[lzma]: https://docs.python.org/3/library/lzma.html\n[lzmaffi]: https://github.com/r3m0t/backports.lzma\n\n---\n\n## Install\n\nInstall `python-xz` with pip:\n\n```sh\n$ python -m pip install python-xz\n```\n\n_An unofficial package for conda is [also available][conda package], see [issue #5][#5]\nfor more information._\n\n[conda package]: https://anaconda.org/conda-forge/python-xz\n[#5]: https://github.com/Rogdham/python-xz/issues/5\n\n## Usage\n\nThe API is similar to [lzma]: you can use either `xz.open` or `xz.XZFile`.\n\n### Read mode\n\n```python\n>>> with xz.open('example.xz') as fin:\n... fin.read(18)\n... fin.stream_boundaries # 2 streams\n... fin.block_boundaries # 4 blocks in first stream, 2 blocks in second stream\n... fin.seek(1000)\n... fin.read(31)\n...\nb'Hello, world! \\xf0\\x9f\\x91\\x8b'\n[0, 2000]\n[0, 500, 1000, 1500, 2000, 3000]\n1000\nb'\\xe2\\x9c\\xa8 Random access is fast! \\xf0\\x9f\\x9a\\x80'\n```\n\nOpening in text mode works as well, but notice that seek arguments as well as boundaries\nare still in bytes (just like with `lzma.open`).\n\n```python\n>>> with xz.open('example.xz', 'rt') as fin:\n... fin.read(15)\n... fin.stream_boundaries\n... fin.block_boundaries\n... fin.seek(1000)\n... fin.read(26)\n...\n'Hello, world! \ud83d\udc4b'\n[0, 2000]\n[0, 500, 1000, 1500, 2000, 3000]\n1000\n'\u2728 Random access is fast! \ud83d\ude80'\n```\n\n### Write mode\n\nWriting is only supported from the end of file. It is however possible to truncate the\nfile first. Note that truncating is only supported on block boundaries.\n\n```python\n>>> with xz.open('test.xz', 'w') as fout:\n... fout.write(b'Hello, world!\\n')\n... fout.write(b'This sentence is still in the previous block\\n')\n... fout.change_block()\n... fout.write(b'But this one is in its own!\\n')\n...\n14\n45\n28\n```\n\nAdvanced usage:\n\n- Modes like `r+`/`w+`/`x+` allow to open for both read and write at the same time;\n however in the current implementation, a block with writing in progress is\n automatically closed when reading data from it.\n- The `check`, `preset` and `filters` arguments to `xz.open` and `xz.XZFile` allow to\n configure the default values for new streams and blocks.\n- Change block with the `change_block` method (the `preset` and `filters` attributes can\n be changed beforehand to apply to the new block).\n- Change stream with the `change_stream` method (the `check` attribute can be changed\n beforehand to apply to the new stream).\n\n---\n\n## FAQ\n\n### How does random-access works?\n\nXZ files are made of a number of streams, and each stream is composed of a number of\nblock. This can be seen with `xz --list`:\n\n```sh\n$ xz --list file.xz\nStrms Blocks Compressed Uncompressed Ratio Check Filename\n 1 13 16.8 MiB 297.9 MiB 0.056 CRC64 file.xz\n```\n\nTo read data from the middle of the 10th block, we will decompress the 10th block from\nits start it until we reach the middle (and drop that decompressed data), then returned\nthe decompressed data from that point.\n\nChoosing the good block size is a tradeoff between seeking time during random access and\ncompression ratio.\n\n### How can I create XZ files optimized for random-access?\n\nYou can open the file for writing and use the `change_block` method to create several\nblocks.\n\nOther tools allow to create XZ files with several blocks as well:\n\n- [XZ Utils](https://tukaani.org/xz/) needs to be called with flags:\n\n```sh\n$ xz -T0 file # threading mode\n$ xz --block-size 16M file # same size for all blocks\n$ xz --block-list 16M,32M,8M,42M file # specific size for each block\n```\n\n- [PIXZ](https://github.com/vasi/pixz) creates files with several blocks by default:\n\n```sh\n$ pixz file\n```\n\n### Python version support\n\nAs a general rule, all Python versions that are both [released and still officially\nsupported][python-versions] are supported by `python-xz` and tested against (both\nCPython and PyPy implementations).\n\nIf you have other use cases or find issues with some Python versions, feel free to\n[open a ticket](https://github.com/Rogdham/python-xz/issues/new)!\n\n[python-versions]: https://devguide.python.org/versions/#versions\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Pure Python implementation of the XZ file format with random access support",
"version": "0.5.0",
"project_urls": {
"Homepage": "https://github.com/rogdham/python-xz",
"Source": "https://github.com/rogdham/python-xz"
},
"split_keywords": [
"xz",
"lzma",
"compression",
"decompression"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b4cd9c8cb6ad5431ba565f0ae39372cf745c047263c99654baba2781d4885edc",
"md5": "ff01c2a115c5c2bf95f3d09a272c0a94",
"sha256": "b32a3fa2653cf92c4088b10827995c5ab2c3a74319e5e54c143e3e914a24385f"
},
"downloads": -1,
"filename": "python_xz-0.5.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ff01c2a115c5c2bf95f3d09a272c0a94",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 20245,
"upload_time": "2023-02-27T19:17:37",
"upload_time_iso_8601": "2023-02-27T19:17:37.622953Z",
"url": "https://files.pythonhosted.org/packages/b4/cd/9c8cb6ad5431ba565f0ae39372cf745c047263c99654baba2781d4885edc/python_xz-0.5.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2a530eb6460a6854483271e0f67c4b680d02b6486056c3d597ceda224ee3cee7",
"md5": "3ca8e737f10ca542eaa7d17377860518",
"sha256": "a188f0436e811455f1bda61dce9dbe6f0fc1430334bff9f5afd0e668bb354774"
},
"downloads": -1,
"filename": "python-xz-0.5.0.tar.gz",
"has_sig": false,
"md5_digest": "3ca8e737f10ca542eaa7d17377860518",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 70473,
"upload_time": "2023-02-27T19:17:39",
"upload_time_iso_8601": "2023-02-27T19:17:39.437490Z",
"url": "https://files.pythonhosted.org/packages/2a/53/0eb6460a6854483271e0f67c4b680d02b6486056c3d597ceda224ee3cee7/python-xz-0.5.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-02-27 19:17:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "rogdham",
"github_project": "python-xz",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"tox": true,
"lcname": "python-xz"
}