cdef-utils


Namecdef-utils JSON
Version 1.0.0 PyPI version JSON
download
home_pageNone
SummaryUtility for converting CSV to Parquet files
upload_time2024-10-05 10:47:34
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords csv data processing parquet
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # cdef-utils

cdef-utils is a Python package designed to convert CSV and Parquet files to a standardized Parquet format, specifically tailored for processing register data. It provides utilities for batch processing files, generating summaries, and handling various encoding issues.

## Features

- Convert CSV and Parquet files to a standardized Parquet format
- Automatic encoding detection for CSV files
- Batch processing of multiple files
- Generation of summary reports
- Progress tracking and resumable processing
- Rich console output with logging

## Installation

To install cdef-utils, you can use pip:

```
pip install cdef-utils
```

## Usage

You can use cdef-utils as a command-line tool:

```
python -m cdef_utils /path/to/input/directory --summary_file output_summary.json
```

### Arguments

- `input_directory`: Path to the directory containing CSV and Parquet files to process
- `--summary_file`: (Optional) Path to save the summary JSON file (default: "register_summary.json")

## Output

The script will:

1. Convert all CSV and Parquet files in the input directory to Parquet format
2. Save the converted files in a structured directory format under `/path/to/your/fixed/output/directory/registers`
3. Generate a summary JSON file with details about each processed register
4. Display a summary table in the console
5. Log processing details and any errors

## Requirements

- Python 3.7+
- polars
- rich

## Configuration

- The `OUTPUT_DIRECTORY` is set to `/path/to/your/fixed/output/directory` in the script. Modify this path as needed.
- Logging is configured to save logs in a `logs` directory in the current working directory.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cdef-utils",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "csv, data processing, parquet",
    "author": null,
    "author_email": "Tobias Kragholm <tkragholm@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/3a/c6/84ef6087e4f52e588f93df787fda05afdf0b9662844faed14cce1233585f/cdef_utils-1.0.0.tar.gz",
    "platform": null,
    "description": "# cdef-utils\n\ncdef-utils is a Python package designed to convert CSV and Parquet files to a standardized Parquet format, specifically tailored for processing register data. It provides utilities for batch processing files, generating summaries, and handling various encoding issues.\n\n## Features\n\n- Convert CSV and Parquet files to a standardized Parquet format\n- Automatic encoding detection for CSV files\n- Batch processing of multiple files\n- Generation of summary reports\n- Progress tracking and resumable processing\n- Rich console output with logging\n\n## Installation\n\nTo install cdef-utils, you can use pip:\n\n```\npip install cdef-utils\n```\n\n## Usage\n\nYou can use cdef-utils as a command-line tool:\n\n```\npython -m cdef_utils /path/to/input/directory --summary_file output_summary.json\n```\n\n### Arguments\n\n- `input_directory`: Path to the directory containing CSV and Parquet files to process\n- `--summary_file`: (Optional) Path to save the summary JSON file (default: \"register_summary.json\")\n\n## Output\n\nThe script will:\n\n1. Convert all CSV and Parquet files in the input directory to Parquet format\n2. Save the converted files in a structured directory format under `/path/to/your/fixed/output/directory/registers`\n3. Generate a summary JSON file with details about each processed register\n4. Display a summary table in the console\n5. Log processing details and any errors\n\n## Requirements\n\n- Python 3.7+\n- polars\n- rich\n\n## Configuration\n\n- The `OUTPUT_DIRECTORY` is set to `/path/to/your/fixed/output/directory` in the script. Modify this path as needed.\n- Logging is configured to save logs in a `logs` directory in the current working directory.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Utility for converting CSV to Parquet files",
    "version": "1.0.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/tkragholm/cdef-utils/issues",
        "Homepage": "https://github.com/tkragholm/cdef-utils"
    },
    "split_keywords": [
        "csv",
        " data processing",
        " parquet"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "448cc7f10ab920c3467c71976927ff6de12b07b4f73ffcf9157c338e0588b0ba",
                "md5": "d19aad489333219bee2ce24684583c9f",
                "sha256": "80cccbb7042d885bad59d0ad48f847a84bead84b09a8aa84303c0f3481fbc7f8"
            },
            "downloads": -1,
            "filename": "cdef_utils-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d19aad489333219bee2ce24684583c9f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 5992,
            "upload_time": "2024-10-05T10:47:32",
            "upload_time_iso_8601": "2024-10-05T10:47:32.919007Z",
            "url": "https://files.pythonhosted.org/packages/44/8c/c7f10ab920c3467c71976927ff6de12b07b4f73ffcf9157c338e0588b0ba/cdef_utils-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3ac684ef6087e4f52e588f93df787fda05afdf0b9662844faed14cce1233585f",
                "md5": "9952e55f975639c50bbfd911150668e3",
                "sha256": "698ee3dd23e2aeca35aa41f4e3794653e5402b855d5dc383eae7ccd2e0afb20b"
            },
            "downloads": -1,
            "filename": "cdef_utils-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "9952e55f975639c50bbfd911150668e3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 33979,
            "upload_time": "2024-10-05T10:47:34",
            "upload_time_iso_8601": "2024-10-05T10:47:34.255237Z",
            "url": "https://files.pythonhosted.org/packages/3a/c6/84ef6087e4f52e588f93df787fda05afdf0b9662844faed14cce1233585f/cdef_utils-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-05 10:47:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tkragholm",
    "github_project": "cdef-utils",
    "github_not_found": true,
    "lcname": "cdef-utils"
}
        
Elapsed time: 5.04002s