Name | io-bench JSON |
Version |
0.1.0
JSON |
| download |
home_page | None |
Summary | IO Bench is a library designed to benchmark the performance of standard flat file formats and partitioning schemes. |
upload_time | 2024-08-21 01:43:11 |
maintainer | None |
docs_url | None |
author | Aaron Stopher |
requires_python | >=3.6 |
license | Copyright (c) 2018 The Python Packaging Authority Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
keywords |
arrow
avro
feather
parquet
polars
utils
performance counter
benchmark
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<!-- [![PyPI version](https://badge.fury.io/py/io_bench.svg)](https://badge.fury.io/py/io_bench) -->
[![Documentation Status](https://img.shields.io/badge/docs-online-brightgreen)](https://aastopher.github.io/io_bench/)
[![codecov](https://codecov.io/gh/aastopher/io_bench/graph/badge.svg?token=79V7VRZWV0)](https://codecov.io/gh/aastopher/io_bench)
[![DeepSource](https://app.deepsource.com/gh/aastopher/io_bench.svg/?label=active+issues&show_trend=true&token=3NT8mR1AQRLW9zDNKWQ8vgFl)](https://app.deepsource.com/gh/aastopher/io_bench/)
# IOBench Quick Start Guide
## Generating Sample Data
To generate sample data, initialize the `IOBench` object with the path to the source CSV file and call the `generate_sample` method:
```python
from io_bench import IOBench
bench = IOBench(source_file='./data/source_100K.csv', runs=20, parsers=['avro', 'parquet_polars', 'parquet_arrow', 'parquet_fast', 'feather', 'feather_arrow'])
bench.generate_sample(records=100000) # default value
```
**NOTE:** `source_file` behavior is contextual; providing a desired name for a sample file then calling `generate_sample` will create the file. Otherwise a valid path to an existing file must be provided.
## Converting Data to Partitioned Formats
Convert the generated CSV data to partitioned formats (Avro, Parquet, Feather) will automatically partition on default column selection chunks if not defined.
```python
bench.partition(rows={'avro': 500000, 'parquet': 3000000, 'feather': 1600000})
```
## Running Benchmarks
NOTE: Partition is stateful per bench object. If partition is not called manually it will automatically be called on the first run only assuming a valid source file exists.
### Without Column Selection
Run benchmarks without column selection:
```python
benchmarks_no_select = bench.run(suffix='_no_select')
```
### With Column Selection
Run benchmarks with column selection:
```python
columns = ['Region', 'Country', 'Total Cost']
benchmarks_column_select = bench.run(columns=columns, suffix='_column_select')
```
## Generating Reports
Combine results and generate the final report:
```python
all_benchmarks = benchmarks_no_select + benchmarks_column_select
io_bench.report(all_benchmarks, report_dir='./result')
```
## Full Example
Here is a full example of using `IOBench`:
```python
from io_bench import IOBench
def main() -> None:
# Initialize the IOBench object with runs and parsers
bench = IOBench(source_file='./data/source_100K.csv', runs=20, parsers=['avro', 'parquet_polars'])
# Generate sample data - (optional)
bench.generate_sample()
# Convert the source file to partitioned formats - (optional)
bench.partition(rows={'avro': 500000, 'parquet': 3000000, 'feather': 1600000})
# Run benchmarks without column selection
benchmarks_no_select = bench.run(suffix='_no_select')
# Run benchmarks with column selection
columns = ['Region', 'Country', 'Total Cost']
benchmarks_column_select = bench.run(columns=columns, suffix='_column_select')
# Combine results and generate the final report
all_benchmarks = benchmarks_no_select + benchmarks_column_select
bench.report(all_benchmarks, report_dir='./result')
if __name__ == "__main__":
main()
```
Raw data
{
"_id": null,
"home_page": null,
"name": "io-bench",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "arrow, avro, feather, parquet, polars, utils, performance counter, benchmark",
"author": "Aaron Stopher",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/99/6a/767e68dce50e8bd0c458a7356d732257ddcd14aef4f6b62b9bf39a98836c/io_bench-0.1.0.tar.gz",
"platform": null,
"description": "<!-- [![PyPI version](https://badge.fury.io/py/io_bench.svg)](https://badge.fury.io/py/io_bench) -->\n[![Documentation Status](https://img.shields.io/badge/docs-online-brightgreen)](https://aastopher.github.io/io_bench/)\n[![codecov](https://codecov.io/gh/aastopher/io_bench/graph/badge.svg?token=79V7VRZWV0)](https://codecov.io/gh/aastopher/io_bench)\n[![DeepSource](https://app.deepsource.com/gh/aastopher/io_bench.svg/?label=active+issues&show_trend=true&token=3NT8mR1AQRLW9zDNKWQ8vgFl)](https://app.deepsource.com/gh/aastopher/io_bench/)\n\n# IOBench Quick Start Guide\n\n## Generating Sample Data\nTo generate sample data, initialize the `IOBench` object with the path to the source CSV file and call the `generate_sample` method:\n\n```python\nfrom io_bench import IOBench\n\nbench = IOBench(source_file='./data/source_100K.csv', runs=20, parsers=['avro', 'parquet_polars', 'parquet_arrow', 'parquet_fast', 'feather', 'feather_arrow'])\nbench.generate_sample(records=100000) # default value\n```\n**NOTE:** `source_file` behavior is contextual; providing a desired name for a sample file then calling `generate_sample` will create the file. Otherwise a valid path to an existing file must be provided.\n\n## Converting Data to Partitioned Formats\nConvert the generated CSV data to partitioned formats (Avro, Parquet, Feather) will automatically partition on default column selection chunks if not defined.\n\n```python\nbench.partition(rows={'avro': 500000, 'parquet': 3000000, 'feather': 1600000})\n```\n\n## Running Benchmarks\nNOTE: Partition is stateful per bench object. If partition is not called manually it will automatically be called on the first run only assuming a valid source file exists.\n### Without Column Selection\nRun benchmarks without column selection:\n\n```python\nbenchmarks_no_select = bench.run(suffix='_no_select')\n```\n\n### With Column Selection\nRun benchmarks with column selection:\n\n```python\ncolumns = ['Region', 'Country', 'Total Cost']\nbenchmarks_column_select = bench.run(columns=columns, suffix='_column_select')\n```\n\n## Generating Reports\nCombine results and generate the final report:\n\n```python\nall_benchmarks = benchmarks_no_select + benchmarks_column_select\nio_bench.report(all_benchmarks, report_dir='./result')\n```\n\n## Full Example\n\nHere is a full example of using `IOBench`:\n\n```python\nfrom io_bench import IOBench\n\ndef main() -> None:\n # Initialize the IOBench object with runs and parsers\n bench = IOBench(source_file='./data/source_100K.csv', runs=20, parsers=['avro', 'parquet_polars'])\n\n # Generate sample data - (optional)\n bench.generate_sample()\n\n # Convert the source file to partitioned formats - (optional)\n bench.partition(rows={'avro': 500000, 'parquet': 3000000, 'feather': 1600000})\n\n # Run benchmarks without column selection\n benchmarks_no_select = bench.run(suffix='_no_select')\n\n # Run benchmarks with column selection\n columns = ['Region', 'Country', 'Total Cost']\n benchmarks_column_select = bench.run(columns=columns, suffix='_column_select')\n\n # Combine results and generate the final report\n all_benchmarks = benchmarks_no_select + benchmarks_column_select\n bench.report(all_benchmarks, report_dir='./result')\n\nif __name__ == \"__main__\":\n main()\n```\n",
"bugtrack_url": null,
"license": "Copyright (c) 2018 The Python Packaging Authority Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
"summary": "IO Bench is a library designed to benchmark the performance of standard flat file formats and partitioning schemes.",
"version": "0.1.0",
"project_urls": {
"Bug Reports": "https://github.com/aastopher/io_bench/issues",
"Documentation": "https://aastopher.github.io/io_bench/",
"Homepage": "https://github.com/aastopher/io_bench"
},
"split_keywords": [
"arrow",
" avro",
" feather",
" parquet",
" polars",
" utils",
" performance counter",
" benchmark"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5e3adf0a5f7ad190f0cdb3e75cb445bb863211b0373c29d7870ca2cd22eed26a",
"md5": "1dd454c83640345a1a08995c87d2555f",
"sha256": "f1ccf1c3e7e8d13619846aeb90ad8abc3c69cf196039cef1c10747661e75c647"
},
"downloads": -1,
"filename": "io_bench-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1dd454c83640345a1a08995c87d2555f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 11548,
"upload_time": "2024-08-21T01:43:10",
"upload_time_iso_8601": "2024-08-21T01:43:10.695643Z",
"url": "https://files.pythonhosted.org/packages/5e/3a/df0a5f7ad190f0cdb3e75cb445bb863211b0373c29d7870ca2cd22eed26a/io_bench-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "996a767e68dce50e8bd0c458a7356d732257ddcd14aef4f6b62b9bf39a98836c",
"md5": "8a2e0c72f5014eca756f92fd9cdeced3",
"sha256": "7d603a4385b09a001a784f9e7a7eb6e5ede9bc144c92640ce8ebd5e199e0d55b"
},
"downloads": -1,
"filename": "io_bench-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "8a2e0c72f5014eca756f92fd9cdeced3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 14391,
"upload_time": "2024-08-21T01:43:11",
"upload_time_iso_8601": "2024-08-21T01:43:11.998217Z",
"url": "https://files.pythonhosted.org/packages/99/6a/767e68dce50e8bd0c458a7356d732257ddcd14aef4f6b62b9bf39a98836c/io_bench-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-21 01:43:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aastopher",
"github_project": "io_bench",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "io-bench"
}