parquet-viewer


Nameparquet-viewer JSON
Version 0.1.3 PyPI version JSON
download
home_pagehttps://github.com/Ashlo/ParquetViewer
SummaryA powerful command-line tool for viewing Parquet files
upload_time2024-10-29 07:58:33
maintainerNone
docs_urlNone
authorAshutosh Bele
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Parquet Viewer

A powerful command-line tool for viewing, analyzing, and manipulating Parquet files with ease.

## Features

- 📊 View Parquet files in various table formats
- 📤 Export to different formats (CSV, Excel, JSON, HTML)
- 📈 Display dataset statistics and summaries
- 🔍 Filter and sort data
- 📉 Analyze correlations and missing values
- 🎲 Sample data randomly
- 💾 Memory-efficient handling of large files
- 🎨 Multiple display format options

## Installation

```bash
pip install parquet-viewer
```

## Usage

### Basic Commands

#### View Parquet File
```bash
# Basic viewing
pqview view data.parquet

# Customize display
pqview view data.parquet --max-rows 20 --format github
pqview view data.parquet -n 50 -f pretty --no-stats
```

#### Export to Other Formats
```bash
# Export to CSV
pqview export data.parquet output.csv

# Export to other formats
pqview export data.parquet output.xlsx --format excel
pqview export data.parquet output.json --format json
pqview export data.parquet output.html --format html
```

### Analysis Commands

#### Summary Statistics
```bash
# Show summary statistics for numerical columns
pqview stats data.parquet
```

#### Value Counts
```bash
# Show value counts for a specific column
pqview counts data.parquet column_name
```

#### Missing Values Analysis
```bash
# Show statistics about missing values
pqview missing data.parquet
```

#### Correlation Analysis
```bash
# Show correlation matrix
pqview correlations data.parquet

# Use different correlation methods
pqview correlations data.parquet --method spearman
```

### Data Manipulation Commands

#### Filter Data
```bash
# Filter data using pandas query syntax
pqview filter data.parquet "age > 25 and department == 'IT'"
```

#### Sort Data
```bash
# Sort by single column
pqview sort data.parquet "salary"

# Sort by multiple columns
pqview sort data.parquet "department,salary" --descending
```

#### Sample Data
```bash
# Sample specific number of rows
pqview sample data.parquet --rows 100

# Sample by fraction
pqview sample data.parquet --fraction 0.1 --seed 42
```

## Display Formats

The tool supports various display formats for tables:

| Format  | Description |
|---------|-------------|
| grid    | ASCII grid table |
| pipe    | Markdown-compatible table |
| orgtbl  | Org-mode table |
| github  | GitHub-flavored Markdown table |
| pretty  | Pretty printed table |
| html    | HTML table |
| latex   | LaTeX table |

## Export Formats

Supported export formats:
- CSV
- Excel
- JSON
- HTML

## File Size Limits

By default, the tool has a 5MB file size limit to prevent memory issues. This can be adjusted in the configuration.

## Error Handling

The tool provides clear error messages for common issues:
- File not found
- Invalid file format
- Memory limitations
- Invalid query syntax
- Data type conversion errors

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

MIT License

## Author

Ashutosh Bele

## Changelog

### v0.1.0
- Initial release
- Basic viewing and export functionality
- Statistical analysis features
- Data manipulation capabilities

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Ashlo/ParquetViewer",
    "name": "parquet-viewer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Ashutosh Bele",
    "author_email": "ashutoshbele5@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f9/02/5d5594ed8d208a56c2511b49d7a5d53e3c870ab084b73f3e8f5dab7a7142/parquet_viewer-0.1.3.tar.gz",
    "platform": null,
    "description": "# Parquet Viewer\n\nA powerful command-line tool for viewing, analyzing, and manipulating Parquet files with ease.\n\n## Features\n\n- \ud83d\udcca View Parquet files in various table formats\n- \ud83d\udce4 Export to different formats (CSV, Excel, JSON, HTML)\n- \ud83d\udcc8 Display dataset statistics and summaries\n- \ud83d\udd0d Filter and sort data\n- \ud83d\udcc9 Analyze correlations and missing values\n- \ud83c\udfb2 Sample data randomly\n- \ud83d\udcbe Memory-efficient handling of large files\n- \ud83c\udfa8 Multiple display format options\n\n## Installation\n\n```bash\npip install parquet-viewer\n```\n\n## Usage\n\n### Basic Commands\n\n#### View Parquet File\n```bash\n# Basic viewing\npqview view data.parquet\n\n# Customize display\npqview view data.parquet --max-rows 20 --format github\npqview view data.parquet -n 50 -f pretty --no-stats\n```\n\n#### Export to Other Formats\n```bash\n# Export to CSV\npqview export data.parquet output.csv\n\n# Export to other formats\npqview export data.parquet output.xlsx --format excel\npqview export data.parquet output.json --format json\npqview export data.parquet output.html --format html\n```\n\n### Analysis Commands\n\n#### Summary Statistics\n```bash\n# Show summary statistics for numerical columns\npqview stats data.parquet\n```\n\n#### Value Counts\n```bash\n# Show value counts for a specific column\npqview counts data.parquet column_name\n```\n\n#### Missing Values Analysis\n```bash\n# Show statistics about missing values\npqview missing data.parquet\n```\n\n#### Correlation Analysis\n```bash\n# Show correlation matrix\npqview correlations data.parquet\n\n# Use different correlation methods\npqview correlations data.parquet --method spearman\n```\n\n### Data Manipulation Commands\n\n#### Filter Data\n```bash\n# Filter data using pandas query syntax\npqview filter data.parquet \"age > 25 and department == 'IT'\"\n```\n\n#### Sort Data\n```bash\n# Sort by single column\npqview sort data.parquet \"salary\"\n\n# Sort by multiple columns\npqview sort data.parquet \"department,salary\" --descending\n```\n\n#### Sample Data\n```bash\n# Sample specific number of rows\npqview sample data.parquet --rows 100\n\n# Sample by fraction\npqview sample data.parquet --fraction 0.1 --seed 42\n```\n\n## Display Formats\n\nThe tool supports various display formats for tables:\n\n| Format  | Description |\n|---------|-------------|\n| grid    | ASCII grid table |\n| pipe    | Markdown-compatible table |\n| orgtbl  | Org-mode table |\n| github  | GitHub-flavored Markdown table |\n| pretty  | Pretty printed table |\n| html    | HTML table |\n| latex   | LaTeX table |\n\n## Export Formats\n\nSupported export formats:\n- CSV\n- Excel\n- JSON\n- HTML\n\n## File Size Limits\n\nBy default, the tool has a 5MB file size limit to prevent memory issues. This can be adjusted in the configuration.\n\n## Error Handling\n\nThe tool provides clear error messages for common issues:\n- File not found\n- Invalid file format\n- Memory limitations\n- Invalid query syntax\n- Data type conversion errors\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nMIT License\n\n## Author\n\nAshutosh Bele\n\n## Changelog\n\n### v0.1.0\n- Initial release\n- Basic viewing and export functionality\n- Statistical analysis features\n- Data manipulation capabilities\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A powerful command-line tool for viewing Parquet files",
    "version": "0.1.3",
    "project_urls": {
        "Homepage": "https://github.com/Ashlo/ParquetViewer"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c56c85bda3205358e9e5f3f629d065dfb312f10c5c42e532f0b21f2d66ef0a10",
                "md5": "dba7b24717bbe948ab6113d0b50aeb50",
                "sha256": "e183426cbd388fdd956bd04c3724b56ebf9573956eb07a24556c712a6f32913a"
            },
            "downloads": -1,
            "filename": "parquet_viewer-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dba7b24717bbe948ab6113d0b50aeb50",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 7510,
            "upload_time": "2024-10-29T07:58:31",
            "upload_time_iso_8601": "2024-10-29T07:58:31.580220Z",
            "url": "https://files.pythonhosted.org/packages/c5/6c/85bda3205358e9e5f3f629d065dfb312f10c5c42e532f0b21f2d66ef0a10/parquet_viewer-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f9025d5594ed8d208a56c2511b49d7a5d53e3c870ab084b73f3e8f5dab7a7142",
                "md5": "3eac7be2a2d10e76035b3a4b16d5ad3c",
                "sha256": "f30ca89cadf4161e7eee4e1bf043c35d30960b23a43b3cb70214a3ac615642d2"
            },
            "downloads": -1,
            "filename": "parquet_viewer-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "3eac7be2a2d10e76035b3a4b16d5ad3c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 7600,
            "upload_time": "2024-10-29T07:58:33",
            "upload_time_iso_8601": "2024-10-29T07:58:33.000212Z",
            "url": "https://files.pythonhosted.org/packages/f9/02/5d5594ed8d208a56c2511b49d7a5d53e3c870ab084b73f3e8f5dab7a7142/parquet_viewer-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-29 07:58:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Ashlo",
    "github_project": "ParquetViewer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "parquet-viewer"
}
        
Elapsed time: 0.34346s