# rwkit
`rwkit` is a Python package that simplifies reading and writing various file formats, including text, json, jsonl and yaml. It supports transparent handling of compression, and allows for processing large files in chunks.
## Features
- Easy-to-use functions for reading and writing text, json, jsonl and yaml files.
- Transparent compression support: bz2, gzip, tar, tar.bz2, tar.gz, tar.xz, xz, zip, zstd.
- Generator functions for processing large files in chunks.
## Installation
Install `rwkit` using pip:
```bash
pip install rwkit
```
### Optional Dependencies
`rwkit` comes with optional features that you can install based on your needs:
```bash
pip install rwkit[zstd] # For Zstandard compression support
pip install rwkit[yaml] # For YAML file handling
pip install rwkit[all] # For all optional features
```
## Quick Start
Here are some examples to get you started:
### Reading and Writing Text Files
Using a single string:
```python
import rwkit as rw
# Sample text
text = "Hello, rwkit!"
# Write a string
rw.write_text("file.txt", text)
# Append another string
rw.write_text("file.txt", "\nNice to meet you.", mode="a")
# Read file
loaded_text = rw.read_text("file.txt")
print(loaded_text)
# Output: 'Hello, rwkit!\nNice to meet you.'
```
... using lines (= list of strings):
```python
import rwkit as rw
# Sample
lines = ["Hello, rwkit!", "Nice to meet you."]
# Write lines, each element on its own line (separated by '\n')
rw.write_lines("file.txt", lines)
# Append a line(s)
rw.write_lines("file.txt", "What a beautiful day.", mode="a")
# Read file (transparently splits on '\n')
loaded_lines = rw.read_lines("file.txt")
print(loaded_lines)
# Output: ['Hello, rwkit!', 'Nice to meet you.', 'What a beautiful day.']
```
### Reading and Writing JSON Files
Using a single object:
```python
import rwkit as rw
# Sample data
data = {"name": "Alice", "age": 25}
# Write data to a JSON file
rw.write_json("file.json", data)
# Read data
loaded_data = rw.read_json("file.json")
print(loaded_data)
# Output: {'name': 'Alice', 'age': 25}
```
### Reading and Writing JSONL (= JSON Lines) Files
Using multiple objects, each on their own line. This format is especially useful for large files that are processed in chunks (see also below).
```python
import rwkit as rw
# Sample data
data = [
{"name": "Alice", "age": 25},
{"name": "Bob", "age": 30},
]
# Write data to a JSONL file
rw.write_jsonl("file.jsonl", data)
# Read data
loaded_data = rw.read_jsonl("file.jsonl")
print(loaded_data)
# Output: [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 30}]
```
### Reading and Writing YAML Files
Note: Requires `pyyaml` package.
```python
import rwkit as rw
# Sample data
data = {"name": "Alice", "age": 25}
# Write to a YAML file
rw.write_yaml("file.yaml", data)
# Read a YAML file
loaded_data = rw.read_yaml("file.yaml")
print(loaded_data)
# Output: {'name': 'Alice', 'age': 25}
```
## Compression
`rwkit` supports various compression formats via argument `compression`. The default is `compression='infer'`, which tries to infer it from the filename extension:
```python
import rwkit as rw
# Sample text
text = "Hello, rwkit!"
# Write to a gzip compressed text file, inferred from the filename extension
rw.write_text("file.txt.gz", text)
# Read a gzip compressed text file
loaded_text = rw.read_text("file.txt.gz")
print(loaded_text)
# Output: 'Hello, rwkit!'
```
Alternatively, specify `compression` explicitly (see all available options in table
below):
```python
import rwkit as rw
# Sample text
text = "Hello, rwkit!"
# Write to a gzip compressed text file, explicitly specified
rw.write_text("file.txt.gz", text, compression="gzip")
# Read a gzip compressed text file, explicitly specified
loaded_text = rw.read_text("file.txt.gz", compression="gzip")
print(loaded_text)
# Output: 'Hello, rwkit!'
```
When `compression='infer'`, the following rules apply:
| File extension | Inferred compression |
| ----------------- | -------------------- |
| `.tar` | `tar` |
| `.tar.bz2` | `tar.bz2` |
| `.tar.gz` | `tar.gz` |
| `.tar.xz` | `tar.xz` |
| `.bz2` | `bz2` |
| `.gz` | `gzip` |
| `.xz` | `xz` |
| `.zip` | `zip` |
| `.zst` | `zstd` |
| [everything else] | None |
## Reading Large Files in Chunks
Both text and jsonl files can be read in chunks using the `chunksize` argument. This
also works in combination with `compression`.
```python
import rwkit as rw
# Assume a large text file, optionally compressed
for chunk in rw.read_lines("file.txt", chunksize=3):
print(chunk)
# Output: ['Hello, rwkit!', 'Nice to meet you.', 'What a beautiful day.']
# ...
# The same works for jsonl files
for chunk in rw.read_jsonl("file.jsonl", chunksize=3):
print(chunk)
# Output: [{'name': 'Alice'}, {'name': 'Bob'}, {'name': 'Charlie'}]
# ...
```
## License
`rwkit` is released under the Apache License Version 2.0. See the LICENSE file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/neural-tools/rwkit",
"name": "rwkit",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.8",
"maintainer_email": null,
"keywords": "io, compression, json, jsonl, yaml",
"author": "David Adametz",
"author_email": "20501597+david-adametz@users.noreply.github.com",
"download_url": "https://files.pythonhosted.org/packages/e9/a4/63ce23029cbb938f51aed3ec7b67871849497e79a7f6098eeb26b5545581/rwkit-2.0.0.tar.gz",
"platform": null,
"description": "# rwkit\n\n`rwkit` is a Python package that simplifies reading and writing various file formats, including text, json, jsonl and yaml. It supports transparent handling of compression, and allows for processing large files in chunks.\n\n## Features\n\n- Easy-to-use functions for reading and writing text, json, jsonl and yaml files.\n- Transparent compression support: bz2, gzip, tar, tar.bz2, tar.gz, tar.xz, xz, zip, zstd.\n- Generator functions for processing large files in chunks.\n\n## Installation\n\nInstall `rwkit` using pip:\n\n```bash\npip install rwkit\n```\n\n### Optional Dependencies\n\n`rwkit` comes with optional features that you can install based on your needs:\n\n```bash\npip install rwkit[zstd] # For Zstandard compression support\npip install rwkit[yaml] # For YAML file handling\npip install rwkit[all] # For all optional features\n```\n\n## Quick Start\n\nHere are some examples to get you started:\n\n### Reading and Writing Text Files\n\nUsing a single string:\n\n```python\nimport rwkit as rw\n\n\n# Sample text\ntext = \"Hello, rwkit!\"\n\n# Write a string\nrw.write_text(\"file.txt\", text)\n\n# Append another string\nrw.write_text(\"file.txt\", \"\\nNice to meet you.\", mode=\"a\")\n\n# Read file\nloaded_text = rw.read_text(\"file.txt\")\n\nprint(loaded_text)\n# Output: 'Hello, rwkit!\\nNice to meet you.'\n```\n\n... using lines (= list of strings):\n\n```python\nimport rwkit as rw\n\n\n# Sample\nlines = [\"Hello, rwkit!\", \"Nice to meet you.\"]\n\n# Write lines, each element on its own line (separated by '\\n')\nrw.write_lines(\"file.txt\", lines)\n\n# Append a line(s)\nrw.write_lines(\"file.txt\", \"What a beautiful day.\", mode=\"a\")\n\n# Read file (transparently splits on '\\n')\nloaded_lines = rw.read_lines(\"file.txt\")\n\nprint(loaded_lines)\n# Output: ['Hello, rwkit!', 'Nice to meet you.', 'What a beautiful day.']\n```\n\n### Reading and Writing JSON Files\n\nUsing a single object:\n\n```python\nimport rwkit as rw\n\n\n# Sample data\ndata = {\"name\": \"Alice\", \"age\": 25}\n\n# Write data to a JSON file\nrw.write_json(\"file.json\", data)\n\n# Read data\nloaded_data = rw.read_json(\"file.json\")\n\nprint(loaded_data)\n# Output: {'name': 'Alice', 'age': 25}\n```\n\n### Reading and Writing JSONL (= JSON Lines) Files\n\nUsing multiple objects, each on their own line. This format is especially useful for large files that are processed in chunks (see also below).\n\n```python\nimport rwkit as rw\n\n\n# Sample data\ndata = [\n {\"name\": \"Alice\", \"age\": 25},\n {\"name\": \"Bob\", \"age\": 30},\n]\n\n# Write data to a JSONL file\nrw.write_jsonl(\"file.jsonl\", data)\n\n# Read data\nloaded_data = rw.read_jsonl(\"file.jsonl\")\n\nprint(loaded_data)\n# Output: [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 30}]\n```\n\n### Reading and Writing YAML Files\n\nNote: Requires `pyyaml` package.\n\n```python\nimport rwkit as rw\n\n\n# Sample data\ndata = {\"name\": \"Alice\", \"age\": 25}\n\n# Write to a YAML file\nrw.write_yaml(\"file.yaml\", data)\n\n# Read a YAML file\nloaded_data = rw.read_yaml(\"file.yaml\")\n\nprint(loaded_data)\n# Output: {'name': 'Alice', 'age': 25}\n```\n\n## Compression\n\n`rwkit` supports various compression formats via argument `compression`. The default is `compression='infer'`, which tries to infer it from the filename extension:\n\n```python\nimport rwkit as rw\n\n\n# Sample text\ntext = \"Hello, rwkit!\"\n\n# Write to a gzip compressed text file, inferred from the filename extension\nrw.write_text(\"file.txt.gz\", text)\n\n# Read a gzip compressed text file\nloaded_text = rw.read_text(\"file.txt.gz\")\n\nprint(loaded_text)\n# Output: 'Hello, rwkit!'\n```\n\nAlternatively, specify `compression` explicitly (see all available options in table\nbelow):\n\n```python\nimport rwkit as rw\n\n\n# Sample text\ntext = \"Hello, rwkit!\"\n\n# Write to a gzip compressed text file, explicitly specified\nrw.write_text(\"file.txt.gz\", text, compression=\"gzip\")\n\n# Read a gzip compressed text file, explicitly specified\nloaded_text = rw.read_text(\"file.txt.gz\", compression=\"gzip\")\n\nprint(loaded_text)\n# Output: 'Hello, rwkit!'\n```\n\nWhen `compression='infer'`, the following rules apply:\n\n| File extension | Inferred compression |\n| ----------------- | -------------------- |\n| `.tar` | `tar` |\n| `.tar.bz2` | `tar.bz2` |\n| `.tar.gz` | `tar.gz` |\n| `.tar.xz` | `tar.xz` |\n| `.bz2` | `bz2` |\n| `.gz` | `gzip` |\n| `.xz` | `xz` |\n| `.zip` | `zip` |\n| `.zst` | `zstd` |\n| [everything else] | None |\n\n## Reading Large Files in Chunks\n\nBoth text and jsonl files can be read in chunks using the `chunksize` argument. This\nalso works in combination with `compression`.\n\n```python\nimport rwkit as rw\n\n\n# Assume a large text file, optionally compressed\nfor chunk in rw.read_lines(\"file.txt\", chunksize=3):\n print(chunk)\n # Output: ['Hello, rwkit!', 'Nice to meet you.', 'What a beautiful day.']\n # ...\n\n# The same works for jsonl files\nfor chunk in rw.read_jsonl(\"file.jsonl\", chunksize=3):\n print(chunk)\n # Output: [{'name': 'Alice'}, {'name': 'Bob'}, {'name': 'Charlie'}]\n # ...\n```\n\n## License\n\n`rwkit` is released under the Apache License Version 2.0. See the LICENSE file for details.\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Simplified reading & writing files with support for compression",
"version": "2.0.0",
"project_urls": {
"Homepage": "https://github.com/neural-tools/rwkit",
"Repository": "https://github.com/neural-tools/rwkit"
},
"split_keywords": [
"io",
" compression",
" json",
" jsonl",
" yaml"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "062ace0a79b2d16aa01c3a687c8f703a9ae4388d738fd6cf82f008dad0fd3241",
"md5": "d830ccb07c18a37a5c48cd67785daa39",
"sha256": "79ca7053ba906a75b034894b70647057832ab478410ec58602af3d61ffa478b9"
},
"downloads": -1,
"filename": "rwkit-2.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d830ccb07c18a37a5c48cd67785daa39",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8",
"size": 15127,
"upload_time": "2024-08-31T16:59:00",
"upload_time_iso_8601": "2024-08-31T16:59:00.690863Z",
"url": "https://files.pythonhosted.org/packages/06/2a/ce0a79b2d16aa01c3a687c8f703a9ae4388d738fd6cf82f008dad0fd3241/rwkit-2.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e9a463ce23029cbb938f51aed3ec7b67871849497e79a7f6098eeb26b5545581",
"md5": "309369d5c5470bc5ed88d03a99300a9a",
"sha256": "0c56550f18a4158ed2d4d84702264954f47476ac69100d2d99dd38a980d80bba"
},
"downloads": -1,
"filename": "rwkit-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "309369d5c5470bc5ed88d03a99300a9a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8",
"size": 12857,
"upload_time": "2024-08-31T16:59:01",
"upload_time_iso_8601": "2024-08-31T16:59:01.989624Z",
"url": "https://files.pythonhosted.org/packages/e9/a4/63ce23029cbb938f51aed3ec7b67871849497e79a7f6098eeb26b5545581/rwkit-2.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-31 16:59:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "neural-tools",
"github_project": "rwkit",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "rwkit"
}