parquetconv


Nameparquetconv JSON
Version 0.2.1 PyPI version JSON
download
home_pageNone
SummaryA command-line tool for converting between Parquet and CSV file formats
upload_time2025-08-25 20:58:42
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords conversion csv data pandas parquet
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ParquetConv

A command-line tool for converting between Parquet and CSV file formats using pandas.

## Features

- **Automatic format detection**: Automatically detects whether the input file is Parquet or CSV
- **Bidirectional conversion**: Convert Parquet to CSV or CSV to Parquet
- **Flexible output naming**: Auto-generates output filenames or allows custom naming
- **Error handling**: Comprehensive error handling with informative messages
- **Force conversion**: Option to force conversion even with uncertain file formats

## Installation

### Option 1: Install from PyPI (Recommended)

```bash
pip install parquetconv
```

After installation, you can use the `parquetconv` command directly from anywhere in your terminal.

### Option 2: Install from source

Clone the repository and install:

```bash
git clone https://github.com/ToyokoLabs/parquetconv.git
cd parquetconv
pip install -e .
```

### Option 3: Development setup with uv

The project uses `uv` for dependency management. Install dependencies with:

```bash
uv sync
```

## Usage

### After pip installation

Convert a Parquet file to CSV:
```bash
parquetconv input.parquet
```

Convert a CSV file to Parquet:
```bash
parquetconv input.csv
```

### From source or development

```bash
python -m parquetconv.cli input.parquet
python -m parquetconv.cli input.csv
```

### Advanced Usage

Specify a custom output filename:
```bash
parquetconv input.parquet -o custom_output.csv
parquetconv input.csv -o custom_output.parquet
```

Force conversion (useful when file format detection is uncertain):
```bash
parquetconv input_file --force
```

### Command Line Options

- `input_file`: Path to the input file (required)
- `-o, --output`: Custom output file path (optional)
- `--force`: Force conversion even if file format detection is uncertain
- `-h, --help`: Show help message

## Examples

```bash
# Convert Parquet to CSV with auto-generated filename
parquetconv data.parquet
# Output: data.csv

# Convert CSV to Parquet with custom filename
parquetconv data.csv -o processed_data.parquet

# Convert with force flag
parquetconv unknown_file --force

# Get help
parquetconv --help
```

## Requirements

- Python 3.9+
- pandas >= 2.3.2
- pyarrow >= 21.0.0

## How It Works

1. **File Detection**: The tool first checks the file extension, then attempts to read the file to determine its format
2. **Format Conversion**: Uses pandas to read the input file and convert it to the opposite format
3. **Output Generation**: Creates the output file with an appropriate extension if not specified

## Error Handling

The tool provides clear error messages for:
- Missing input files
- Unsupported file formats
- Read/write errors during conversion
- Invalid file content

## Development

To contribute to the project:

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run tests (if available)
5. Submit a pull request

## License

This project is open source and available under the GNU General Public License v3.0.

## Author

**Sebastian Bassi** - [sebastian@toyoko.io](mailto:sebastian@toyoko.io)

## Repository

- **Homepage**: https://github.com/ToyokoLabs/parquetconv
- **Repository**: https://github.com/ToyokoLabs/parquetconv
- **Issues**: https://github.com/ToyokoLabs/parquetconv/issues

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "parquetconv",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "conversion, csv, data, pandas, parquet",
    "author": null,
    "author_email": "Sebastian Bassi <sebastian@toyoko.io>",
    "download_url": "https://files.pythonhosted.org/packages/22/64/44b2a1e1803dd93420928db6234927af9907d6389b8d07c322b6c1913d71/parquetconv-0.2.1.tar.gz",
    "platform": null,
    "description": "# ParquetConv\n\nA command-line tool for converting between Parquet and CSV file formats using pandas.\n\n## Features\n\n- **Automatic format detection**: Automatically detects whether the input file is Parquet or CSV\n- **Bidirectional conversion**: Convert Parquet to CSV or CSV to Parquet\n- **Flexible output naming**: Auto-generates output filenames or allows custom naming\n- **Error handling**: Comprehensive error handling with informative messages\n- **Force conversion**: Option to force conversion even with uncertain file formats\n\n## Installation\n\n### Option 1: Install from PyPI (Recommended)\n\n```bash\npip install parquetconv\n```\n\nAfter installation, you can use the `parquetconv` command directly from anywhere in your terminal.\n\n### Option 2: Install from source\n\nClone the repository and install:\n\n```bash\ngit clone https://github.com/ToyokoLabs/parquetconv.git\ncd parquetconv\npip install -e .\n```\n\n### Option 3: Development setup with uv\n\nThe project uses `uv` for dependency management. Install dependencies with:\n\n```bash\nuv sync\n```\n\n## Usage\n\n### After pip installation\n\nConvert a Parquet file to CSV:\n```bash\nparquetconv input.parquet\n```\n\nConvert a CSV file to Parquet:\n```bash\nparquetconv input.csv\n```\n\n### From source or development\n\n```bash\npython -m parquetconv.cli input.parquet\npython -m parquetconv.cli input.csv\n```\n\n### Advanced Usage\n\nSpecify a custom output filename:\n```bash\nparquetconv input.parquet -o custom_output.csv\nparquetconv input.csv -o custom_output.parquet\n```\n\nForce conversion (useful when file format detection is uncertain):\n```bash\nparquetconv input_file --force\n```\n\n### Command Line Options\n\n- `input_file`: Path to the input file (required)\n- `-o, --output`: Custom output file path (optional)\n- `--force`: Force conversion even if file format detection is uncertain\n- `-h, --help`: Show help message\n\n## Examples\n\n```bash\n# Convert Parquet to CSV with auto-generated filename\nparquetconv data.parquet\n# Output: data.csv\n\n# Convert CSV to Parquet with custom filename\nparquetconv data.csv -o processed_data.parquet\n\n# Convert with force flag\nparquetconv unknown_file --force\n\n# Get help\nparquetconv --help\n```\n\n## Requirements\n\n- Python 3.9+\n- pandas >= 2.3.2\n- pyarrow >= 21.0.0\n\n## How It Works\n\n1. **File Detection**: The tool first checks the file extension, then attempts to read the file to determine its format\n2. **Format Conversion**: Uses pandas to read the input file and convert it to the opposite format\n3. **Output Generation**: Creates the output file with an appropriate extension if not specified\n\n## Error Handling\n\nThe tool provides clear error messages for:\n- Missing input files\n- Unsupported file formats\n- Read/write errors during conversion\n- Invalid file content\n\n## Development\n\nTo contribute to the project:\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Run tests (if available)\n5. Submit a pull request\n\n## License\n\nThis project is open source and available under the GNU General Public License v3.0.\n\n## Author\n\n**Sebastian Bassi** - [sebastian@toyoko.io](mailto:sebastian@toyoko.io)\n\n## Repository\n\n- **Homepage**: https://github.com/ToyokoLabs/parquetconv\n- **Repository**: https://github.com/ToyokoLabs/parquetconv\n- **Issues**: https://github.com/ToyokoLabs/parquetconv/issues\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A command-line tool for converting between Parquet and CSV file formats",
    "version": "0.2.1",
    "project_urls": {
        "Homepage": "https://github.com/ToyokoLabs/parquetconv",
        "Issues": "https://github.com/ToyokoLabs/parquetconv/issues",
        "Repository": "https://github.com/ToyokoLabs/parquetconv"
    },
    "split_keywords": [
        "conversion",
        " csv",
        " data",
        " pandas",
        " parquet"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b1e324a2273a85d44ba57342db7ad079635043d541982d416ffdc71570a392bd",
                "md5": "5c37d30c2a3f9babb0aa2b3ed22c5fc3",
                "sha256": "123b4e05ab2956ed77a919a64045227b82bbba11e622f0cc590c4735c72456f2"
            },
            "downloads": -1,
            "filename": "parquetconv-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5c37d30c2a3f9babb0aa2b3ed22c5fc3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 16608,
            "upload_time": "2025-08-25T20:58:41",
            "upload_time_iso_8601": "2025-08-25T20:58:41.773915Z",
            "url": "https://files.pythonhosted.org/packages/b1/e3/24a2273a85d44ba57342db7ad079635043d541982d416ffdc71570a392bd/parquetconv-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "226444b2a1e1803dd93420928db6234927af9907d6389b8d07c322b6c1913d71",
                "md5": "0fadc36c57dbfe0c98f80e371e34b39c",
                "sha256": "b48f03ff42de9636949f2d6552c79f7fdba1c08959d526ec6ec6b0738c549b6d"
            },
            "downloads": -1,
            "filename": "parquetconv-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "0fadc36c57dbfe0c98f80e371e34b39c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 48106,
            "upload_time": "2025-08-25T20:58:42",
            "upload_time_iso_8601": "2025-08-25T20:58:42.998981Z",
            "url": "https://files.pythonhosted.org/packages/22/64/44b2a1e1803dd93420928db6234927af9907d6389b8d07c322b6c1913d71/parquetconv-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-25 20:58:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ToyokoLabs",
    "github_project": "parquetconv",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "parquetconv"
}
        
Elapsed time: 0.59267s