csv-diff


Namecsv-diff JSON
Version 1.1 PyPI version JSON
download
home_pagehttps://github.com/simonw/csv-diff
SummaryPython CLI tool and library for diffing CSV and JSON files
upload_time2021-02-23 01:15:21
maintainer
docs_urlNone
authorSimon Willison
requires_python
licenseApache License, Version 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # csv-diff

[![PyPI](https://img.shields.io/pypi/v/csv-diff.svg)](https://pypi.org/project/csv-diff/)
[![Changelog](https://img.shields.io/github/v/release/simonw/csv-diff?include_prereleases&label=changelog)](https://github.com/simonw/csv-diff/releases)
[![Tests](https://github.com/simonw/csv-diff/workflows/Test/badge.svg)](https://github.com/simonw/csv-diff/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/csv-diff/blob/main/LICENSE)

Tool for viewing the difference between two CSV, TSV or JSON files. See [Generating a commit log for San Francisco’s official list of trees](https://simonwillison.net/2019/Mar/13/tree-history/) (and the [sf-tree-history repo commit log](https://github.com/simonw/sf-tree-history/commits)) for background information on this project.

## Installation

    pip install csv-diff

## Usage

Consider two CSV files:

`one.csv`

    id,name,age
    1,Cleo,4
    2,Pancakes,2

`two.csv`

    id,name,age
    1,Cleo,5
    3,Bailey,1

`csv-diff` can show a human-readable summary of differences between the files:

    $ csv-diff one.csv two.csv --key=id
    1 row changed, 1 row added, 1 row removed

    1 row changed

      Row 1
        age: "4" => "5"

    1 row added

      id: 3
      name: Bailey
      age: 1

    1 row removed

      id: 2
      name: Pancakes
      age: 2

The `--key=id` option means that the `id` column should be treated as the unique key, to identify which records have changed.

The tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using `--format=tsv` or `--format=csv`.

You can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use `--format=json` if your input files are JSON.

Use `--show-unchanged` to include full details of the unchanged values for rows with at least one change in the diff output:

    % csv-diff one.csv two.csv --key=id --show-unchanged
    1 row changed

      id: 1
        age: "4" => "5"

        Unchanged:
          name: "Cleo"

You can use the `--json` option to get a machine-readable difference:

    $ csv-diff one.csv two.csv --key=id --json
    {
        "added": [
            {
                "id": "3",
                "name": "Bailey",
                "age": "1"
            }
        ],
        "removed": [
            {
                "id": "2",
                "name": "Pancakes",
                "age": "2"
            }
        ],
        "changed": [
            {
                "key": "1",
                "changes": {
                    "age": [
                        "4",
                        "5"
                    ]
                }
            }
        ],
        "columns_added": [],
        "columns_removed": []
    }

## As a Python library

You can also import the Python library into your own code like so:

    from csv_diff import load_csv, compare
    diff = compare(
        load_csv(open("one.csv"), key="id"),
        load_csv(open("two.csv"), key="id")
    )

`diff` will now contain the same data structure as the output in the `--json` example above.

If the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/simonw/csv-diff",
    "name": "csv-diff",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Simon Willison",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/df/3e/1873856cd1cdb2b373f592e4985d5de99e485daff425c37d6916c89adb3c/csv-diff-1.1.tar.gz",
    "platform": "",
    "description": "# csv-diff\n\n[![PyPI](https://img.shields.io/pypi/v/csv-diff.svg)](https://pypi.org/project/csv-diff/)\n[![Changelog](https://img.shields.io/github/v/release/simonw/csv-diff?include_prereleases&label=changelog)](https://github.com/simonw/csv-diff/releases)\n[![Tests](https://github.com/simonw/csv-diff/workflows/Test/badge.svg)](https://github.com/simonw/csv-diff/actions?query=workflow%3ATest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/csv-diff/blob/main/LICENSE)\n\nTool for viewing the difference between two CSV, TSV or JSON files. See [Generating a commit log for San Francisco\u2019s official list of trees](https://simonwillison.net/2019/Mar/13/tree-history/) (and the [sf-tree-history repo commit log](https://github.com/simonw/sf-tree-history/commits)) for background information on this project.\n\n## Installation\n\n    pip install csv-diff\n\n## Usage\n\nConsider two CSV files:\n\n`one.csv`\n\n    id,name,age\n    1,Cleo,4\n    2,Pancakes,2\n\n`two.csv`\n\n    id,name,age\n    1,Cleo,5\n    3,Bailey,1\n\n`csv-diff` can show a human-readable summary of differences between the files:\n\n    $ csv-diff one.csv two.csv --key=id\n    1 row changed, 1 row added, 1 row removed\n\n    1 row changed\n\n      Row 1\n        age: \"4\" => \"5\"\n\n    1 row added\n\n      id: 3\n      name: Bailey\n      age: 1\n\n    1 row removed\n\n      id: 2\n      name: Pancakes\n      age: 2\n\nThe `--key=id` option means that the `id` column should be treated as the unique key, to identify which records have changed.\n\nThe tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using `--format=tsv` or `--format=csv`.\n\nYou can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use `--format=json` if your input files are JSON.\n\nUse `--show-unchanged` to include full details of the unchanged values for rows with at least one change in the diff output:\n\n    % csv-diff one.csv two.csv --key=id --show-unchanged\n    1 row changed\n\n      id: 1\n        age: \"4\" => \"5\"\n\n        Unchanged:\n          name: \"Cleo\"\n\nYou can use the `--json` option to get a machine-readable difference:\n\n    $ csv-diff one.csv two.csv --key=id --json\n    {\n        \"added\": [\n            {\n                \"id\": \"3\",\n                \"name\": \"Bailey\",\n                \"age\": \"1\"\n            }\n        ],\n        \"removed\": [\n            {\n                \"id\": \"2\",\n                \"name\": \"Pancakes\",\n                \"age\": \"2\"\n            }\n        ],\n        \"changed\": [\n            {\n                \"key\": \"1\",\n                \"changes\": {\n                    \"age\": [\n                        \"4\",\n                        \"5\"\n                    ]\n                }\n            }\n        ],\n        \"columns_added\": [],\n        \"columns_removed\": []\n    }\n\n## As a Python library\n\nYou can also import the Python library into your own code like so:\n\n    from csv_diff import load_csv, compare\n    diff = compare(\n        load_csv(open(\"one.csv\"), key=\"id\"),\n        load_csv(open(\"two.csv\"), key=\"id\")\n    )\n\n`diff` will now contain the same data structure as the output in the `--json` example above.\n\nIf the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.\n\n\n",
    "bugtrack_url": null,
    "license": "Apache License, Version 2.0",
    "summary": "Python CLI tool and library for diffing CSV and JSON files",
    "version": "1.1",
    "project_urls": {
        "Homepage": "https://github.com/simonw/csv-diff"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bbcf53d23ae469f2727a5bdb7d6442573084136d2d713129e420cb554ff5506c",
                "md5": "5d2bb522ddb9d5354d82f4e1fdf446d2",
                "sha256": "f41447fb69165c4b4ae04bb0081884162502ff932e3471e78c97b053dc7ce0ef"
            },
            "downloads": -1,
            "filename": "csv_diff-1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5d2bb522ddb9d5354d82f4e1fdf446d2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 12554,
            "upload_time": "2021-02-23T01:15:19",
            "upload_time_iso_8601": "2021-02-23T01:15:19.756228Z",
            "url": "https://files.pythonhosted.org/packages/bb/cf/53d23ae469f2727a5bdb7d6442573084136d2d713129e420cb554ff5506c/csv_diff-1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "df3e1873856cd1cdb2b373f592e4985d5de99e485daff425c37d6916c89adb3c",
                "md5": "2c9cf505b45db4cb3c586c84cfef6f2e",
                "sha256": "ff94117992c67dd8bc4f917374f4dc198945ff85b1d67a11bd148b635f576bea"
            },
            "downloads": -1,
            "filename": "csv-diff-1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "2c9cf505b45db4cb3c586c84cfef6f2e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7301,
            "upload_time": "2021-02-23T01:15:21",
            "upload_time_iso_8601": "2021-02-23T01:15:21.250780Z",
            "url": "https://files.pythonhosted.org/packages/df/3e/1873856cd1cdb2b373f592e4985d5de99e485daff425c37d6916c89adb3c/csv-diff-1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-02-23 01:15:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "simonw",
    "github_project": "csv-diff",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "csv-diff"
}
        
Elapsed time: 0.24364s