# parquet-tools
![Run Unittest](https://github.com/ktrueda/parquet-tools/workflows/Run%20Unittest/badge.svg)
![Run CLI test](https://github.com/ktrueda/parquet-tools/workflows/Run%20CLI%20test/badge.svg)
This is a pip installable [parquet-tools](https://github.com/apache/parquet-mr).
In other words, parquet-tools is a CLI tools of [Apache Arrow](https://github.com/apache/arrow).
You can show parquet file content/schema on local disk or on Amazon S3.
It is incompatible with original parquet-tools.
## Features
- Read Parquet data (local file or file on S3)
- Read Parquet metadata/schema (local file or file on S3)
## Installation
```bash
$ pip install parquet-tools
```
## Usage
```bash
$ parquet-tools --help
usage: parquet-tools [-h] {show,csv,inspect} ...
parquet CLI tools
positional arguments:
{show,csv,inspect}
show Show human readble format. see `show -h`
csv Cat csv style. see `csv -h`
inspect Inspect parquet file. see `inspect -h`
optional arguments:
-h, --help show this help message and exit
```
## Usage Examples
#### Show local parquet file
```bash
$ parquet-tools show test.parquet
+-------+-------+---------+
| one | two | three |
|-------+-------+---------|
| -1 | foo | True |
| nan | bar | False |
| 2.5 | baz | True |
+-------+-------+---------+
```
#### Show parquet file on S3
```bash
$ parquet-tools show s3://bucket-name/prefix/*
+-------+-------+---------+
| one | two | three |
|-------+-------+---------|
| -1 | foo | True |
| nan | bar | False |
| 2.5 | baz | True |
+-------+-------+---------+
```
#### Inspect parquet file schema
```bash
$ parquet-tools inspect /path/to/parquet
```
<details>
<summary>Inspect output</summary>
```
############ file meta data ############
created_by: parquet-cpp version 1.5.1-SNAPSHOT
num_columns: 3
num_rows: 3
num_row_groups: 1
format_version: 1.0
serialized_size: 2226
############ Columns ############
one
two
three
############ Column(one) ############
name: one
path: one
max_definition_level: 1
max_repetition_level: 0
physical_type: DOUBLE
logical_type: None
converted_type (legacy): NONE
############ Column(two) ############
name: two
path: two
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
############ Column(three) ############
name: three
path: three
max_definition_level: 1
max_repetition_level: 0
physical_type: BOOLEAN
logical_type: None
converted_type (legacy): NONE
```
</details>
#### Cat CSV parquet and transform [csvq](https://github.com/mithrandie/csvq)
```bash
$ parquet-tools csv s3://bucket-name/test.parquet |csvq "select one, three where three"
+-------+-------+
| one | three |
+-------+-------+
| -1.0 | True |
| 2.5 | True |
+-------+-------+
```
Raw data
{
"_id": null,
"home_page": "https://github.com/ktrueda/parquet-tools",
"name": "parquet-tools",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "parquet-tools,parquet",
"author": "Kentaro Ueda",
"author_email": "kentaro.ueda.kentaro@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/56/ca/2f676fd43f4a020a5a96544b8622a4668bdb0a76ae3f6c28cc2aecbe2f03/parquet_tools-0.2.15.tar.gz",
"platform": null,
"description": "# parquet-tools\n\n![Run Unittest](https://github.com/ktrueda/parquet-tools/workflows/Run%20Unittest/badge.svg)\n![Run CLI test](https://github.com/ktrueda/parquet-tools/workflows/Run%20CLI%20test/badge.svg)\n\nThis is a pip installable [parquet-tools](https://github.com/apache/parquet-mr).\nIn other words, parquet-tools is a CLI tools of [Apache Arrow](https://github.com/apache/arrow).\nYou can show parquet file content/schema on local disk or on Amazon S3.\nIt is incompatible with original parquet-tools.\n\n## Features\n\n- Read Parquet data (local file or file on S3)\n- Read Parquet metadata/schema (local file or file on S3)\n\n## Installation\n\n```bash\n$ pip install parquet-tools\n```\n\n## Usage\n\n```bash\n$ parquet-tools --help\nusage: parquet-tools [-h] {show,csv,inspect} ...\n\nparquet CLI tools\n\npositional arguments:\n {show,csv,inspect}\n show Show human readble format. see `show -h`\n csv Cat csv style. see `csv -h`\n inspect Inspect parquet file. see `inspect -h`\n\noptional arguments:\n -h, --help show this help message and exit\n```\n\n## Usage Examples\n\n#### Show local parquet file\n\n```bash\n$ parquet-tools show test.parquet\n+-------+-------+---------+\n| one | two | three |\n|-------+-------+---------|\n| -1 | foo | True |\n| nan | bar | False |\n| 2.5 | baz | True |\n+-------+-------+---------+\n```\n\n#### Show parquet file on S3\n\n```bash\n$ parquet-tools show s3://bucket-name/prefix/*\n+-------+-------+---------+\n| one | two | three |\n|-------+-------+---------|\n| -1 | foo | True |\n| nan | bar | False |\n| 2.5 | baz | True |\n+-------+-------+---------+\n```\n\n\n#### Inspect parquet file schema\n\n```bash\n$ parquet-tools inspect /path/to/parquet\n```\n\n<details>\n\n<summary>Inspect output</summary>\n\n```\n############ file meta data ############\ncreated_by: parquet-cpp version 1.5.1-SNAPSHOT\nnum_columns: 3\nnum_rows: 3\nnum_row_groups: 1\nformat_version: 1.0\nserialized_size: 2226\n\n\n############ Columns ############\none\ntwo\nthree\n\n############ Column(one) ############\nname: one\npath: one\nmax_definition_level: 1\nmax_repetition_level: 0\nphysical_type: DOUBLE\nlogical_type: None\nconverted_type (legacy): NONE\n\n############ Column(two) ############\nname: two\npath: two\nmax_definition_level: 1\nmax_repetition_level: 0\nphysical_type: BYTE_ARRAY\nlogical_type: String\nconverted_type (legacy): UTF8\n\n############ Column(three) ############\nname: three\npath: three\nmax_definition_level: 1\nmax_repetition_level: 0\nphysical_type: BOOLEAN\nlogical_type: None\nconverted_type (legacy): NONE\n```\n</details>\n\n#### Cat CSV parquet and transform [csvq](https://github.com/mithrandie/csvq)\n\n```bash\n$ parquet-tools csv s3://bucket-name/test.parquet |csvq \"select one, three where three\"\n+-------+-------+\n| one | three |\n+-------+-------+\n| -1.0 | True |\n| 2.5 | True |\n+-------+-------+\n```\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Easy install parquet-tools",
"version": "0.2.15",
"project_urls": {
"Homepage": "https://github.com/ktrueda/parquet-tools",
"Repository": "https://github.com/ktrueda/parquet-tools"
},
"split_keywords": [
"parquet-tools",
"parquet"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7af7e014fb36bc98f23182d87d8fe31eef3450d073fd8be398a93c66202ee507",
"md5": "852e6041290eaac0aee3253184ca7773",
"sha256": "6b4efbc51a82f2e91312524394dacedb88367d2614a22052e31cce7de1f5cd36"
},
"downloads": -1,
"filename": "parquet_tools-0.2.15-py3-none-any.whl",
"has_sig": false,
"md5_digest": "852e6041290eaac0aee3253184ca7773",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 31814,
"upload_time": "2024-01-02T11:12:00",
"upload_time_iso_8601": "2024-01-02T11:12:00.934516Z",
"url": "https://files.pythonhosted.org/packages/7a/f7/e014fb36bc98f23182d87d8fe31eef3450d073fd8be398a93c66202ee507/parquet_tools-0.2.15-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "56ca2f676fd43f4a020a5a96544b8622a4668bdb0a76ae3f6c28cc2aecbe2f03",
"md5": "6cf0decd1de232fa8be0e48d8e198f9a",
"sha256": "c2ba80e7d400997cca4af953015d2b39c547f985c0fe9173348f67f07d7751ec"
},
"downloads": -1,
"filename": "parquet_tools-0.2.15.tar.gz",
"has_sig": false,
"md5_digest": "6cf0decd1de232fa8be0e48d8e198f9a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 28199,
"upload_time": "2024-01-02T11:12:03",
"upload_time_iso_8601": "2024-01-02T11:12:03.211422Z",
"url": "https://files.pythonhosted.org/packages/56/ca/2f676fd43f4a020a5a96544b8622a4668bdb0a76ae3f6c28cc2aecbe2f03/parquet_tools-0.2.15.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-02 11:12:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ktrueda",
"github_project": "parquet-tools",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "parquet-tools"
}