rwkit


Namerwkit JSON
Version 2.0.0 PyPI version JSON
download
home_pagehttps://github.com/neural-tools/rwkit
SummarySimplified reading & writing files with support for compression
upload_time2024-08-31 16:59:01
maintainerNone
docs_urlNone
authorDavid Adametz
requires_python<4.0,>=3.8
licenseApache-2.0
keywords io compression json jsonl yaml
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # rwkit

`rwkit` is a Python package that simplifies reading and writing various file formats, including text, json, jsonl and yaml. It supports transparent handling of compression, and allows for processing large files in chunks.

## Features

-   Easy-to-use functions for reading and writing text, json, jsonl and yaml files.
-   Transparent compression support: bz2, gzip, tar, tar.bz2, tar.gz, tar.xz, xz, zip, zstd.
-   Generator functions for processing large files in chunks.

## Installation

Install `rwkit` using pip:

```bash
pip install rwkit
```

### Optional Dependencies

`rwkit` comes with optional features that you can install based on your needs:

```bash
pip install rwkit[zstd]  # For Zstandard compression support
pip install rwkit[yaml]  # For YAML file handling
pip install rwkit[all]   # For all optional features
```

## Quick Start

Here are some examples to get you started:

### Reading and Writing Text Files

Using a single string:

```python
import rwkit as rw


# Sample text
text = "Hello, rwkit!"

# Write a string
rw.write_text("file.txt", text)

# Append another string
rw.write_text("file.txt", "\nNice to meet you.", mode="a")

# Read file
loaded_text = rw.read_text("file.txt")

print(loaded_text)
# Output: 'Hello, rwkit!\nNice to meet you.'
```

... using lines (= list of strings):

```python
import rwkit as rw


# Sample
lines = ["Hello, rwkit!", "Nice to meet you."]

# Write lines, each element on its own line (separated by '\n')
rw.write_lines("file.txt", lines)

# Append a line(s)
rw.write_lines("file.txt", "What a beautiful day.", mode="a")

# Read file (transparently splits on '\n')
loaded_lines = rw.read_lines("file.txt")

print(loaded_lines)
# Output: ['Hello, rwkit!', 'Nice to meet you.', 'What a beautiful day.']
```

### Reading and Writing JSON Files

Using a single object:

```python
import rwkit as rw


# Sample data
data = {"name": "Alice", "age": 25}

# Write data to a JSON file
rw.write_json("file.json", data)

# Read data
loaded_data = rw.read_json("file.json")

print(loaded_data)
# Output: {'name': 'Alice', 'age': 25}
```

### Reading and Writing JSONL (= JSON Lines) Files

Using multiple objects, each on their own line. This format is especially useful for large files that are processed in chunks (see also below).

```python
import rwkit as rw


# Sample data
data = [
    {"name": "Alice", "age": 25},
    {"name": "Bob", "age": 30},
]

# Write data to a JSONL file
rw.write_jsonl("file.jsonl", data)

# Read data
loaded_data = rw.read_jsonl("file.jsonl")

print(loaded_data)
# Output: [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 30}]
```

### Reading and Writing YAML Files

Note: Requires `pyyaml` package.

```python
import rwkit as rw


# Sample data
data = {"name": "Alice", "age": 25}

# Write to a YAML file
rw.write_yaml("file.yaml", data)

# Read a YAML file
loaded_data = rw.read_yaml("file.yaml")

print(loaded_data)
# Output: {'name': 'Alice', 'age': 25}
```

## Compression

`rwkit` supports various compression formats via argument `compression`. The default is `compression='infer'`, which tries to infer it from the filename extension:

```python
import rwkit as rw


# Sample text
text = "Hello, rwkit!"

# Write to a gzip compressed text file, inferred from the filename extension
rw.write_text("file.txt.gz", text)

# Read a gzip compressed text file
loaded_text = rw.read_text("file.txt.gz")

print(loaded_text)
# Output: 'Hello, rwkit!'
```

Alternatively, specify `compression` explicitly (see all available options in table
below):

```python
import rwkit as rw


# Sample text
text = "Hello, rwkit!"

# Write to a gzip compressed text file, explicitly specified
rw.write_text("file.txt.gz", text, compression="gzip")

# Read a gzip compressed text file, explicitly specified
loaded_text = rw.read_text("file.txt.gz", compression="gzip")

print(loaded_text)
# Output: 'Hello, rwkit!'
```

When `compression='infer'`, the following rules apply:

| File extension    | Inferred compression |
| ----------------- | -------------------- |
| `.tar`            | `tar`                |
| `.tar.bz2`        | `tar.bz2`            |
| `.tar.gz`         | `tar.gz`             |
| `.tar.xz`         | `tar.xz`             |
| `.bz2`            | `bz2`                |
| `.gz`             | `gzip`               |
| `.xz`             | `xz`                 |
| `.zip`            | `zip`                |
| `.zst`            | `zstd`               |
| [everything else] | None                 |

## Reading Large Files in Chunks

Both text and jsonl files can be read in chunks using the `chunksize` argument. This
also works in combination with `compression`.

```python
import rwkit as rw


# Assume a large text file, optionally compressed
for chunk in rw.read_lines("file.txt", chunksize=3):
    print(chunk)
    # Output: ['Hello, rwkit!', 'Nice to meet you.', 'What a beautiful day.']
    # ...

# The same works for jsonl files
for chunk in rw.read_jsonl("file.jsonl", chunksize=3):
    print(chunk)
    # Output: [{'name': 'Alice'}, {'name': 'Bob'}, {'name': 'Charlie'}]
    # ...
```

## License

`rwkit` is released under the Apache License Version 2.0. See the LICENSE file for details.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/neural-tools/rwkit",
    "name": "rwkit",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8",
    "maintainer_email": null,
    "keywords": "io, compression, json, jsonl, yaml",
    "author": "David Adametz",
    "author_email": "20501597+david-adametz@users.noreply.github.com",
    "download_url": "https://files.pythonhosted.org/packages/e9/a4/63ce23029cbb938f51aed3ec7b67871849497e79a7f6098eeb26b5545581/rwkit-2.0.0.tar.gz",
    "platform": null,
    "description": "# rwkit\n\n`rwkit` is a Python package that simplifies reading and writing various file formats, including text, json, jsonl and yaml. It supports transparent handling of compression, and allows for processing large files in chunks.\n\n## Features\n\n-   Easy-to-use functions for reading and writing text, json, jsonl and yaml files.\n-   Transparent compression support: bz2, gzip, tar, tar.bz2, tar.gz, tar.xz, xz, zip, zstd.\n-   Generator functions for processing large files in chunks.\n\n## Installation\n\nInstall `rwkit` using pip:\n\n```bash\npip install rwkit\n```\n\n### Optional Dependencies\n\n`rwkit` comes with optional features that you can install based on your needs:\n\n```bash\npip install rwkit[zstd]  # For Zstandard compression support\npip install rwkit[yaml]  # For YAML file handling\npip install rwkit[all]   # For all optional features\n```\n\n## Quick Start\n\nHere are some examples to get you started:\n\n### Reading and Writing Text Files\n\nUsing a single string:\n\n```python\nimport rwkit as rw\n\n\n# Sample text\ntext = \"Hello, rwkit!\"\n\n# Write a string\nrw.write_text(\"file.txt\", text)\n\n# Append another string\nrw.write_text(\"file.txt\", \"\\nNice to meet you.\", mode=\"a\")\n\n# Read file\nloaded_text = rw.read_text(\"file.txt\")\n\nprint(loaded_text)\n# Output: 'Hello, rwkit!\\nNice to meet you.'\n```\n\n... using lines (= list of strings):\n\n```python\nimport rwkit as rw\n\n\n# Sample\nlines = [\"Hello, rwkit!\", \"Nice to meet you.\"]\n\n# Write lines, each element on its own line (separated by '\\n')\nrw.write_lines(\"file.txt\", lines)\n\n# Append a line(s)\nrw.write_lines(\"file.txt\", \"What a beautiful day.\", mode=\"a\")\n\n# Read file (transparently splits on '\\n')\nloaded_lines = rw.read_lines(\"file.txt\")\n\nprint(loaded_lines)\n# Output: ['Hello, rwkit!', 'Nice to meet you.', 'What a beautiful day.']\n```\n\n### Reading and Writing JSON Files\n\nUsing a single object:\n\n```python\nimport rwkit as rw\n\n\n# Sample data\ndata = {\"name\": \"Alice\", \"age\": 25}\n\n# Write data to a JSON file\nrw.write_json(\"file.json\", data)\n\n# Read data\nloaded_data = rw.read_json(\"file.json\")\n\nprint(loaded_data)\n# Output: {'name': 'Alice', 'age': 25}\n```\n\n### Reading and Writing JSONL (= JSON Lines) Files\n\nUsing multiple objects, each on their own line. This format is especially useful for large files that are processed in chunks (see also below).\n\n```python\nimport rwkit as rw\n\n\n# Sample data\ndata = [\n    {\"name\": \"Alice\", \"age\": 25},\n    {\"name\": \"Bob\", \"age\": 30},\n]\n\n# Write data to a JSONL file\nrw.write_jsonl(\"file.jsonl\", data)\n\n# Read data\nloaded_data = rw.read_jsonl(\"file.jsonl\")\n\nprint(loaded_data)\n# Output: [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 30}]\n```\n\n### Reading and Writing YAML Files\n\nNote: Requires `pyyaml` package.\n\n```python\nimport rwkit as rw\n\n\n# Sample data\ndata = {\"name\": \"Alice\", \"age\": 25}\n\n# Write to a YAML file\nrw.write_yaml(\"file.yaml\", data)\n\n# Read a YAML file\nloaded_data = rw.read_yaml(\"file.yaml\")\n\nprint(loaded_data)\n# Output: {'name': 'Alice', 'age': 25}\n```\n\n## Compression\n\n`rwkit` supports various compression formats via argument `compression`. The default is `compression='infer'`, which tries to infer it from the filename extension:\n\n```python\nimport rwkit as rw\n\n\n# Sample text\ntext = \"Hello, rwkit!\"\n\n# Write to a gzip compressed text file, inferred from the filename extension\nrw.write_text(\"file.txt.gz\", text)\n\n# Read a gzip compressed text file\nloaded_text = rw.read_text(\"file.txt.gz\")\n\nprint(loaded_text)\n# Output: 'Hello, rwkit!'\n```\n\nAlternatively, specify `compression` explicitly (see all available options in table\nbelow):\n\n```python\nimport rwkit as rw\n\n\n# Sample text\ntext = \"Hello, rwkit!\"\n\n# Write to a gzip compressed text file, explicitly specified\nrw.write_text(\"file.txt.gz\", text, compression=\"gzip\")\n\n# Read a gzip compressed text file, explicitly specified\nloaded_text = rw.read_text(\"file.txt.gz\", compression=\"gzip\")\n\nprint(loaded_text)\n# Output: 'Hello, rwkit!'\n```\n\nWhen `compression='infer'`, the following rules apply:\n\n| File extension    | Inferred compression |\n| ----------------- | -------------------- |\n| `.tar`            | `tar`                |\n| `.tar.bz2`        | `tar.bz2`            |\n| `.tar.gz`         | `tar.gz`             |\n| `.tar.xz`         | `tar.xz`             |\n| `.bz2`            | `bz2`                |\n| `.gz`             | `gzip`               |\n| `.xz`             | `xz`                 |\n| `.zip`            | `zip`                |\n| `.zst`            | `zstd`               |\n| [everything else] | None                 |\n\n## Reading Large Files in Chunks\n\nBoth text and jsonl files can be read in chunks using the `chunksize` argument. This\nalso works in combination with `compression`.\n\n```python\nimport rwkit as rw\n\n\n# Assume a large text file, optionally compressed\nfor chunk in rw.read_lines(\"file.txt\", chunksize=3):\n    print(chunk)\n    # Output: ['Hello, rwkit!', 'Nice to meet you.', 'What a beautiful day.']\n    # ...\n\n# The same works for jsonl files\nfor chunk in rw.read_jsonl(\"file.jsonl\", chunksize=3):\n    print(chunk)\n    # Output: [{'name': 'Alice'}, {'name': 'Bob'}, {'name': 'Charlie'}]\n    # ...\n```\n\n## License\n\n`rwkit` is released under the Apache License Version 2.0. See the LICENSE file for details.\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Simplified reading & writing files with support for compression",
    "version": "2.0.0",
    "project_urls": {
        "Homepage": "https://github.com/neural-tools/rwkit",
        "Repository": "https://github.com/neural-tools/rwkit"
    },
    "split_keywords": [
        "io",
        " compression",
        " json",
        " jsonl",
        " yaml"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "062ace0a79b2d16aa01c3a687c8f703a9ae4388d738fd6cf82f008dad0fd3241",
                "md5": "d830ccb07c18a37a5c48cd67785daa39",
                "sha256": "79ca7053ba906a75b034894b70647057832ab478410ec58602af3d61ffa478b9"
            },
            "downloads": -1,
            "filename": "rwkit-2.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d830ccb07c18a37a5c48cd67785daa39",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8",
            "size": 15127,
            "upload_time": "2024-08-31T16:59:00",
            "upload_time_iso_8601": "2024-08-31T16:59:00.690863Z",
            "url": "https://files.pythonhosted.org/packages/06/2a/ce0a79b2d16aa01c3a687c8f703a9ae4388d738fd6cf82f008dad0fd3241/rwkit-2.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e9a463ce23029cbb938f51aed3ec7b67871849497e79a7f6098eeb26b5545581",
                "md5": "309369d5c5470bc5ed88d03a99300a9a",
                "sha256": "0c56550f18a4158ed2d4d84702264954f47476ac69100d2d99dd38a980d80bba"
            },
            "downloads": -1,
            "filename": "rwkit-2.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "309369d5c5470bc5ed88d03a99300a9a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8",
            "size": 12857,
            "upload_time": "2024-08-31T16:59:01",
            "upload_time_iso_8601": "2024-08-31T16:59:01.989624Z",
            "url": "https://files.pythonhosted.org/packages/e9/a4/63ce23029cbb938f51aed3ec7b67871849497e79a7f6098eeb26b5545581/rwkit-2.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-31 16:59:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "neural-tools",
    "github_project": "rwkit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "rwkit"
}
        
Elapsed time: 3.21293s