csv-gp


Namecsv-gp JSON
Version 0.2.1 PyPI version JSON
download
home_pageNone
SummaryCSV GP allows you to pinpoint these common issues with a CSV file, as well as export just the parsable lines from a file.
upload_time2024-09-16 09:18:23
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords rust csv
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # CSV GP: Diagnose all your CSV issues

CSVs are a ubiquitous format for data transfer that are commonly [riddled with issues](https://donatstudios.com/Falsehoods-Programmers-Believe-About-CSVs). Most CSV libraries abort with an unhelpful error, CSV GP allows you to pinpoint these common issues with a CSV file, as well as export just the parsable lines from a file.

## Installation

CSV GP can be used in three ways.

### Standalone binary

1. [Install rust](https://www.rust-lang.org/tools/install)
2. Clone the repo and navigate into it
3. Run `cargo install --path csv_gp`
4. The `csv-gp` command will now be available to run, please see `csv-gp --help` for usage

### Rust library

Add the following to your `Cargo.toml`:

`csv-gp = { git = "https://github.com/xelixdev/csv-gp", rev = "<optional git tag>" }`

### Python library

### From package manager

The library is available on PyPI, at https://pypi.org/project/csv-gp/ so you can just run:

`pip install csv-gp`

### Compiling from source

1. [Install rust](https://www.rust-lang.org/tools/install)
2. Install (`pip install maturin`)
3. Clone the repo
4. Run `make all`
5. `cd csv_gp_python && maturin develop`

## Usage

## Rust standalone binary

After installing the binary, the default usage is running `csv-gp $FILE`. This will print a diagnosis of the file. The command provides options to change the delimiter and the encoding of the file. See `csv-gp -h` for details.

Another option provided is `--correct-rows-path` which will export only the correct rows to the provided path.

## Python library

The python library exposes two main functions, `check_file` and `get_rows`.

The check file function takes a path to file, the delimiter and the encoding (see https://github.com/xelixdev/csv-gp/blob/0f77c62841509c134a3bbe06ec178426e9c5aa10/csv_gp_python/csv_gp.pyi) and returns an instance of a class `CSVDetails` which provides details about the file. See the same file to see all the available attributes and their names/types.
If the `valid_rows_output_path` argument is provided to the function, only the correct rows will be exported to that path.

The get_rows once again takes a path to file, the delimiter and the encoding and additionally a list of row numbers. The function will then return the parsed cells for given rows. See the above file for the exact typing of the parameter and returned values.

## Releasing a new version of the Python lib

1. Update version numbers in `csv_gp_python/Cargo.toml` and `csv_gp/Cargo.toml`
2. Merge this change into main
3. Create a new release on GitHub, creating a tag in the form `vX.Y.Z`
4. The 'Publish' pipeline should begin running, and the new version will be published

## Running tests

### Running Rust tests

Run `cargo test`.

### Running Python tests

Follow the instructions on compiling from source. Then you can run `pytest`.


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "csv-gp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "rust, csv",
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/63/d6/89ff7ce764d72391388abcf72bbdf17d38a4687c1939d9a5476916d7ab87/csv_gp-0.2.1.tar.gz",
    "platform": null,
    "description": "# CSV GP: Diagnose all your CSV issues\n\nCSVs are a ubiquitous format for data transfer that are commonly [riddled with issues](https://donatstudios.com/Falsehoods-Programmers-Believe-About-CSVs). Most CSV libraries abort with an unhelpful error, CSV GP allows you to pinpoint these common issues with a CSV file, as well as export just the parsable lines from a file.\n\n## Installation\n\nCSV GP can be used in three ways.\n\n### Standalone binary\n\n1. [Install rust](https://www.rust-lang.org/tools/install)\n2. Clone the repo and navigate into it\n3. Run `cargo install --path csv_gp`\n4. The `csv-gp` command will now be available to run, please see `csv-gp --help` for usage\n\n### Rust library\n\nAdd the following to your `Cargo.toml`:\n\n`csv-gp = { git = \"https://github.com/xelixdev/csv-gp\", rev = \"<optional git tag>\" }`\n\n### Python library\n\n### From package manager\n\nThe library is available on PyPI, at https://pypi.org/project/csv-gp/ so you can just run:\n\n`pip install csv-gp`\n\n### Compiling from source\n\n1. [Install rust](https://www.rust-lang.org/tools/install)\n2. Install (`pip install maturin`)\n3. Clone the repo\n4. Run `make all`\n5. `cd csv_gp_python && maturin develop`\n\n## Usage\n\n## Rust standalone binary\n\nAfter installing the binary, the default usage is running `csv-gp $FILE`. This will print a diagnosis of the file. The command provides options to change the delimiter and the encoding of the file. See `csv-gp -h` for details.\n\nAnother option provided is `--correct-rows-path` which will export only the correct rows to the provided path.\n\n## Python library\n\nThe python library exposes two main functions, `check_file` and `get_rows`.\n\nThe check file function takes a path to file, the delimiter and the encoding (see https://github.com/xelixdev/csv-gp/blob/0f77c62841509c134a3bbe06ec178426e9c5aa10/csv_gp_python/csv_gp.pyi) and returns an instance of a class `CSVDetails` which provides details about the file. See the same file to see all the available attributes and their names/types.\nIf the `valid_rows_output_path` argument is provided to the function, only the correct rows will be exported to that path.\n\nThe get_rows once again takes a path to file, the delimiter and the encoding and additionally a list of row numbers. The function will then return the parsed cells for given rows. See the above file for the exact typing of the parameter and returned values.\n\n## Releasing a new version of the Python lib\n\n1. Update version numbers in `csv_gp_python/Cargo.toml` and `csv_gp/Cargo.toml`\n2. Merge this change into main\n3. Create a new release on GitHub, creating a tag in the form `vX.Y.Z`\n4. The 'Publish' pipeline should begin running, and the new version will be published\n\n## Running tests\n\n### Running Rust tests\n\nRun `cargo test`.\n\n### Running Python tests\n\nFollow the instructions on compiling from source. Then you can run `pytest`.\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "CSV GP allows you to pinpoint these common issues with a CSV file, as well as export just the parsable lines from a file.",
    "version": "0.2.1",
    "project_urls": {
        "Repository": "https://github.com/xelixdev/csv-gp"
    },
    "split_keywords": [
        "rust",
        " csv"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7c120d02e5b3ee4ec5b2f84ce881954262345ffc5324ecaea1cdbcdf95d1edbd",
                "md5": "0a24b9b8619099580d8ad8dc24e4539f",
                "sha256": "c5dbf4371629470b287d43642baa40802c2e3f9439cd1e716b4d180149ca9882"
            },
            "downloads": -1,
            "filename": "csv_gp-0.2.1-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "0a24b9b8619099580d8ad8dc24e4539f",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.8",
            "size": 476082,
            "upload_time": "2024-09-16T09:18:21",
            "upload_time_iso_8601": "2024-09-16T09:18:21.744613Z",
            "url": "https://files.pythonhosted.org/packages/7c/12/0d02e5b3ee4ec5b2f84ce881954262345ffc5324ecaea1cdbcdf95d1edbd/csv_gp-0.2.1-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "63d689ff7ce764d72391388abcf72bbdf17d38a4687c1939d9a5476916d7ab87",
                "md5": "2ee646580bfe047b82c2c1eda8e7733f",
                "sha256": "9aedeb307f0d97bd1d8fdb5bd3ffac2a15cad2ca207f4ebb68da0fe05aab7668"
            },
            "downloads": -1,
            "filename": "csv_gp-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "2ee646580bfe047b82c2c1eda8e7733f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 24015,
            "upload_time": "2024-09-16T09:18:23",
            "upload_time_iso_8601": "2024-09-16T09:18:23.436520Z",
            "url": "https://files.pythonhosted.org/packages/63/d6/89ff7ce764d72391388abcf72bbdf17d38a4687c1939d9a5476916d7ab87/csv_gp-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-16 09:18:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "xelixdev",
    "github_project": "csv-gp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "csv-gp"
}
        
Elapsed time: 0.35542s