parquet-tools


Nameparquet-tools JSON
Version 0.2.15 PyPI version JSON
download
home_pagehttps://github.com/ktrueda/parquet-tools
SummaryEasy install parquet-tools
upload_time2024-01-02 11:12:03
maintainer
docs_urlNone
authorKentaro Ueda
requires_python>=3.8
licenseMIT
keywords parquet-tools parquet
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # parquet-tools

![Run Unittest](https://github.com/ktrueda/parquet-tools/workflows/Run%20Unittest/badge.svg)
![Run CLI test](https://github.com/ktrueda/parquet-tools/workflows/Run%20CLI%20test/badge.svg)

This is a pip installable [parquet-tools](https://github.com/apache/parquet-mr).
In other words, parquet-tools is a CLI tools of [Apache Arrow](https://github.com/apache/arrow).
You can show parquet file content/schema on local disk or on Amazon S3.
It is incompatible with original parquet-tools.

## Features

- Read Parquet data (local file or file on S3)
- Read Parquet metadata/schema (local file or file on S3)

## Installation

```bash
$ pip install parquet-tools
```

## Usage

```bash
$ parquet-tools --help
usage: parquet-tools [-h] {show,csv,inspect} ...

parquet CLI tools

positional arguments:
  {show,csv,inspect}
    show              Show human readble format. see `show -h`
    csv               Cat csv style. see `csv -h`
    inspect           Inspect parquet file. see `inspect -h`

optional arguments:
  -h, --help          show this help message and exit
```

## Usage Examples

#### Show local parquet file

```bash
$ parquet-tools show test.parquet
+-------+-------+---------+
|   one | two   | three   |
|-------+-------+---------|
|  -1   | foo   | True    |
| nan   | bar   | False   |
|   2.5 | baz   | True    |
+-------+-------+---------+
```

#### Show parquet file on S3

```bash
$ parquet-tools show s3://bucket-name/prefix/*
+-------+-------+---------+
|   one | two   | three   |
|-------+-------+---------|
|  -1   | foo   | True    |
| nan   | bar   | False   |
|   2.5 | baz   | True    |
+-------+-------+---------+
```


#### Inspect parquet file schema

```bash
$ parquet-tools inspect /path/to/parquet
```

<details>

<summary>Inspect output</summary>

```
############ file meta data ############
created_by: parquet-cpp version 1.5.1-SNAPSHOT
num_columns: 3
num_rows: 3
num_row_groups: 1
format_version: 1.0
serialized_size: 2226


############ Columns ############
one
two
three

############ Column(one) ############
name: one
path: one
max_definition_level: 1
max_repetition_level: 0
physical_type: DOUBLE
logical_type: None
converted_type (legacy): NONE

############ Column(two) ############
name: two
path: two
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8

############ Column(three) ############
name: three
path: three
max_definition_level: 1
max_repetition_level: 0
physical_type: BOOLEAN
logical_type: None
converted_type (legacy): NONE
```
</details>

#### Cat CSV parquet and transform [csvq](https://github.com/mithrandie/csvq)

```bash
$ parquet-tools csv s3://bucket-name/test.parquet |csvq "select one, three where three"
+-------+-------+
|  one  | three |
+-------+-------+
| -1.0  | True  |
| 2.5   | True  |
+-------+-------+
```


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ktrueda/parquet-tools",
    "name": "parquet-tools",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "parquet-tools,parquet",
    "author": "Kentaro Ueda",
    "author_email": "kentaro.ueda.kentaro@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/56/ca/2f676fd43f4a020a5a96544b8622a4668bdb0a76ae3f6c28cc2aecbe2f03/parquet_tools-0.2.15.tar.gz",
    "platform": null,
    "description": "# parquet-tools\n\n![Run Unittest](https://github.com/ktrueda/parquet-tools/workflows/Run%20Unittest/badge.svg)\n![Run CLI test](https://github.com/ktrueda/parquet-tools/workflows/Run%20CLI%20test/badge.svg)\n\nThis is a pip installable [parquet-tools](https://github.com/apache/parquet-mr).\nIn other words, parquet-tools is a CLI tools of [Apache Arrow](https://github.com/apache/arrow).\nYou can show parquet file content/schema on local disk or on Amazon S3.\nIt is incompatible with original parquet-tools.\n\n## Features\n\n- Read Parquet data (local file or file on S3)\n- Read Parquet metadata/schema (local file or file on S3)\n\n## Installation\n\n```bash\n$ pip install parquet-tools\n```\n\n## Usage\n\n```bash\n$ parquet-tools --help\nusage: parquet-tools [-h] {show,csv,inspect} ...\n\nparquet CLI tools\n\npositional arguments:\n  {show,csv,inspect}\n    show              Show human readble format. see `show -h`\n    csv               Cat csv style. see `csv -h`\n    inspect           Inspect parquet file. see `inspect -h`\n\noptional arguments:\n  -h, --help          show this help message and exit\n```\n\n## Usage Examples\n\n#### Show local parquet file\n\n```bash\n$ parquet-tools show test.parquet\n+-------+-------+---------+\n|   one | two   | three   |\n|-------+-------+---------|\n|  -1   | foo   | True    |\n| nan   | bar   | False   |\n|   2.5 | baz   | True    |\n+-------+-------+---------+\n```\n\n#### Show parquet file on S3\n\n```bash\n$ parquet-tools show s3://bucket-name/prefix/*\n+-------+-------+---------+\n|   one | two   | three   |\n|-------+-------+---------|\n|  -1   | foo   | True    |\n| nan   | bar   | False   |\n|   2.5 | baz   | True    |\n+-------+-------+---------+\n```\n\n\n#### Inspect parquet file schema\n\n```bash\n$ parquet-tools inspect /path/to/parquet\n```\n\n<details>\n\n<summary>Inspect output</summary>\n\n```\n############ file meta data ############\ncreated_by: parquet-cpp version 1.5.1-SNAPSHOT\nnum_columns: 3\nnum_rows: 3\nnum_row_groups: 1\nformat_version: 1.0\nserialized_size: 2226\n\n\n############ Columns ############\none\ntwo\nthree\n\n############ Column(one) ############\nname: one\npath: one\nmax_definition_level: 1\nmax_repetition_level: 0\nphysical_type: DOUBLE\nlogical_type: None\nconverted_type (legacy): NONE\n\n############ Column(two) ############\nname: two\npath: two\nmax_definition_level: 1\nmax_repetition_level: 0\nphysical_type: BYTE_ARRAY\nlogical_type: String\nconverted_type (legacy): UTF8\n\n############ Column(three) ############\nname: three\npath: three\nmax_definition_level: 1\nmax_repetition_level: 0\nphysical_type: BOOLEAN\nlogical_type: None\nconverted_type (legacy): NONE\n```\n</details>\n\n#### Cat CSV parquet and transform [csvq](https://github.com/mithrandie/csvq)\n\n```bash\n$ parquet-tools csv s3://bucket-name/test.parquet |csvq \"select one, three where three\"\n+-------+-------+\n|  one  | three |\n+-------+-------+\n| -1.0  | True  |\n| 2.5   | True  |\n+-------+-------+\n```\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Easy install parquet-tools",
    "version": "0.2.15",
    "project_urls": {
        "Homepage": "https://github.com/ktrueda/parquet-tools",
        "Repository": "https://github.com/ktrueda/parquet-tools"
    },
    "split_keywords": [
        "parquet-tools",
        "parquet"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7af7e014fb36bc98f23182d87d8fe31eef3450d073fd8be398a93c66202ee507",
                "md5": "852e6041290eaac0aee3253184ca7773",
                "sha256": "6b4efbc51a82f2e91312524394dacedb88367d2614a22052e31cce7de1f5cd36"
            },
            "downloads": -1,
            "filename": "parquet_tools-0.2.15-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "852e6041290eaac0aee3253184ca7773",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 31814,
            "upload_time": "2024-01-02T11:12:00",
            "upload_time_iso_8601": "2024-01-02T11:12:00.934516Z",
            "url": "https://files.pythonhosted.org/packages/7a/f7/e014fb36bc98f23182d87d8fe31eef3450d073fd8be398a93c66202ee507/parquet_tools-0.2.15-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "56ca2f676fd43f4a020a5a96544b8622a4668bdb0a76ae3f6c28cc2aecbe2f03",
                "md5": "6cf0decd1de232fa8be0e48d8e198f9a",
                "sha256": "c2ba80e7d400997cca4af953015d2b39c547f985c0fe9173348f67f07d7751ec"
            },
            "downloads": -1,
            "filename": "parquet_tools-0.2.15.tar.gz",
            "has_sig": false,
            "md5_digest": "6cf0decd1de232fa8be0e48d8e198f9a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 28199,
            "upload_time": "2024-01-02T11:12:03",
            "upload_time_iso_8601": "2024-01-02T11:12:03.211422Z",
            "url": "https://files.pythonhosted.org/packages/56/ca/2f676fd43f4a020a5a96544b8622a4668bdb0a76ae3f6c28cc2aecbe2f03/parquet_tools-0.2.15.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-02 11:12:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ktrueda",
    "github_project": "parquet-tools",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "parquet-tools"
}
        
Elapsed time: 3.13246s