sarfile


Namesarfile JSON
Version 0.1.7 PyPI version JSON
download
home_pagehttps://github.com/codekansas/sarfile
SummaryA Python library for reading and writing SAR files.
upload_time2024-03-05 19:25:15
maintainer
docs_urlNone
authorBenjamin Bolte
requires_python>=3.10
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # sarfile

Like tarfile, but streamable.

## What is this?

This repository implements a "streaming archive" file format for collecting multiple files into one. This is similar to the TAR format, but it puts the information about all the files in the archive into a contiguous block at the beginning of the file. This solves a couple problems:

1. Much faster startup times for large archives (we read the entire header into memory in one go)
2. Much friendlier to remote file systems (only one network request rather than a bunch), in combination with `smart_open`
3. Fast random access

The file size is the same as an uncompressed TAR file.

The downside is that once we've written a SAR file, we can't change it. Maybe future formats will support this, but for now, the recommended flow is to first generate a TAR file, then convert it using the builtin `sarpack` command line tool or the `sarfile.pack_tar` Python API.

Also, the file format only exists in this repository, although it's very simple to implement (see the `_header.py` documentation and the `sarfile` object for how to load items).

## Getting Started

Install the package using Pip:

```bash
pip install sarfile
```

Next, simply import the module:

```python
import sarfile
```

You can convert a tarfile to a sarfile using the Python API:

```python
sarfile.pack_tar(out="myfile.sar", tar="myfile.tar")
```

Alternatively, you can use the built-in command line tool:

```bash
sarpack myfile.sar myfile.tar
```

Finally, the file can be used in your Python script:

```python
f = sarfile.open("myfile.sar"):
print(f.names)
with f["myfile.txt"] as myfile:
    print(myfile.read())
```

If you have installed `smart_open`, then you can also read from S3 as follows:

```python
f = sarfile.open("myfile.sar")
print(f.names)
with f["myfile.txt"] as myfile:
    print(myfile.read())
```

The above code is much faster than reading a TAR file from S3, because we read the entire header into memory in one network request, rather than having to make a network request for each file in the archive. On subsequent accesses we also only download the part of the file we want to read.

## Requirements

This package is tested against Python 3.10. Although not required, it is a good idea to install `smart_open` to support reading from S3 or other remote file systems, and `tqdm` to show a progress bar when packing large files.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/codekansas/sarfile",
    "name": "sarfile",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "",
    "author": "Benjamin Bolte",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/0b/5d/f6385ebdaea7535316761a2e522de1b8f5265d7159583625230b2b4bf3fe/sarfile-0.1.7.tar.gz",
    "platform": null,
    "description": "# sarfile\n\nLike tarfile, but streamable.\n\n## What is this?\n\nThis repository implements a \"streaming archive\" file format for collecting multiple files into one. This is similar to the TAR format, but it puts the information about all the files in the archive into a contiguous block at the beginning of the file. This solves a couple problems:\n\n1. Much faster startup times for large archives (we read the entire header into memory in one go)\n2. Much friendlier to remote file systems (only one network request rather than a bunch), in combination with `smart_open`\n3. Fast random access\n\nThe file size is the same as an uncompressed TAR file.\n\nThe downside is that once we've written a SAR file, we can't change it. Maybe future formats will support this, but for now, the recommended flow is to first generate a TAR file, then convert it using the builtin `sarpack` command line tool or the `sarfile.pack_tar` Python API.\n\nAlso, the file format only exists in this repository, although it's very simple to implement (see the `_header.py` documentation and the `sarfile` object for how to load items).\n\n## Getting Started\n\nInstall the package using Pip:\n\n```bash\npip install sarfile\n```\n\nNext, simply import the module:\n\n```python\nimport sarfile\n```\n\nYou can convert a tarfile to a sarfile using the Python API:\n\n```python\nsarfile.pack_tar(out=\"myfile.sar\", tar=\"myfile.tar\")\n```\n\nAlternatively, you can use the built-in command line tool:\n\n```bash\nsarpack myfile.sar myfile.tar\n```\n\nFinally, the file can be used in your Python script:\n\n```python\nf = sarfile.open(\"myfile.sar\"):\nprint(f.names)\nwith f[\"myfile.txt\"] as myfile:\n    print(myfile.read())\n```\n\nIf you have installed `smart_open`, then you can also read from S3 as follows:\n\n```python\nf = sarfile.open(\"myfile.sar\")\nprint(f.names)\nwith f[\"myfile.txt\"] as myfile:\n    print(myfile.read())\n```\n\nThe above code is much faster than reading a TAR file from S3, because we read the entire header into memory in one network request, rather than having to make a network request for each file in the archive. On subsequent accesses we also only download the part of the file we want to read.\n\n## Requirements\n\nThis package is tested against Python 3.10. Although not required, it is a good idea to install `smart_open` to support reading from S3 or other remote file systems, and `tqdm` to show a progress bar when packing large files.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A Python library for reading and writing SAR files.",
    "version": "0.1.7",
    "project_urls": {
        "Homepage": "https://github.com/codekansas/sarfile"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fc8fb883391e12ac34b70a4ea6e095e6d885950c697dd90c71a77501fe2db54e",
                "md5": "c81e0cf59d99088e42b6e65be0f6c159",
                "sha256": "b29400ef0c9bead03296815d042ca7008feeb5b229c0b50bf4708926aad3a6ec"
            },
            "downloads": -1,
            "filename": "sarfile-0.1.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c81e0cf59d99088e42b6e65be0f6c159",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 11015,
            "upload_time": "2024-03-05T19:25:14",
            "upload_time_iso_8601": "2024-03-05T19:25:14.298028Z",
            "url": "https://files.pythonhosted.org/packages/fc/8f/b883391e12ac34b70a4ea6e095e6d885950c697dd90c71a77501fe2db54e/sarfile-0.1.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0b5df6385ebdaea7535316761a2e522de1b8f5265d7159583625230b2b4bf3fe",
                "md5": "b1e7f8a0c92e2cae47140a1267d0b756",
                "sha256": "ba01122793717e63233825b741165b8ff5256db344d03d695513c243b3fb3ae6"
            },
            "downloads": -1,
            "filename": "sarfile-0.1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "b1e7f8a0c92e2cae47140a1267d0b756",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 12448,
            "upload_time": "2024-03-05T19:25:15",
            "upload_time_iso_8601": "2024-03-05T19:25:15.490875Z",
            "url": "https://files.pythonhosted.org/packages/0b/5d/f6385ebdaea7535316761a2e522de1b8f5265d7159583625230b2b4bf3fe/sarfile-0.1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-05 19:25:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "codekansas",
    "github_project": "sarfile",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "sarfile"
}
        
Elapsed time: 0.20391s