csv-diff


Namecsv-diff JSON
Version 1.2 PyPI version JSON
download
home_pagehttps://github.com/simonw/csv-diff
SummaryPython CLI tool and library for diffing CSV and JSON files
upload_time2024-09-06 05:21:20
maintainerNone
docs_urlNone
authorSimon Willison
requires_pythonNone
licenseApache License, Version 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # csv-diff

[![PyPI](https://img.shields.io/pypi/v/csv-diff.svg)](https://pypi.org/project/csv-diff/)
[![Changelog](https://img.shields.io/github/v/release/simonw/csv-diff?include_prereleases&label=changelog)](https://github.com/simonw/csv-diff/releases)
[![Tests](https://github.com/simonw/csv-diff/workflows/Test/badge.svg)](https://github.com/simonw/csv-diff/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/csv-diff/blob/main/LICENSE)

Tool for viewing the difference between two CSV, TSV or JSON files. See [Generating a commit log for San Francisco’s official list of trees](https://simonwillison.net/2019/Mar/13/tree-history/) (and the [sf-tree-history repo commit log](https://github.com/simonw/sf-tree-history/commits)) for background information on this project.

## Installation

    pip install csv-diff

## Usage

Consider two CSV files:

`one.csv`

    id,name,age
    1,Cleo,4
    2,Pancakes,2

`two.csv`

    id,name,age
    1,Cleo,5
    3,Bailey,1

`csv-diff` can show a human-readable summary of differences between the files:

    $ csv-diff one.csv two.csv --key=id
    1 row changed, 1 row added, 1 row removed

    1 row changed

      Row 1
        age: "4" => "5"

    1 row added

      id: 3
      name: Bailey
      age: 1

    1 row removed

      id: 2
      name: Pancakes
      age: 2

The `--key=id` option means that the `id` column should be treated as the unique key, to identify which records have changed.

The tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using `--format=tsv` or `--format=csv`.

You can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use `--format=json` if your input files are JSON.

Use `--show-unchanged` to include full details of the unchanged values for rows with at least one change in the diff output:

    % csv-diff one.csv two.csv --key=id --show-unchanged
    1 row changed

      id: 1
        age: "4" => "5"

        Unchanged:
          name: "Cleo"

### JSON output

You can use the `--json` option to get a machine-readable difference:

    $ csv-diff one.csv two.csv --key=id --json
    {
        "added": [
            {
                "id": "3",
                "name": "Bailey",
                "age": "1"
            }
        ],
        "removed": [
            {
                "id": "2",
                "name": "Pancakes",
                "age": "2"
            }
        ],
        "changed": [
            {
                "key": "1",
                "changes": {
                    "age": [
                        "4",
                        "5"
                    ]
                }
            }
        ],
        "columns_added": [],
        "columns_removed": []
    }

### Adding templated extras

You can specify additional keys to be displayed in the human-readable format using the `--extra` option:

    --extra name "Python format string with {id} for variables"

For example, to output a link to `https://news.ycombinator.com/latest?id={id}` for each item with an ID, you could use this:

```bash
csv-diff one.csv two.csv --key=id \
  --extra latest "https://news.ycombinator.com/latest?id={id}"
```
These extras display something like this:
```
1 row changed

  id: 41459472
    points: "24" => "25"
    numComments: "5" => "6"
  extras:
    latest: https://news.ycombinator.com/latest?id=41459472
```

## As a Python library

You can also import the Python library into your own code like so:

    from csv_diff import load_csv, compare
    diff = compare(
        load_csv(open("one.csv"), key="id"),
        load_csv(open("two.csv"), key="id")
    )

`diff` will now contain the same data structure as the output in the `--json` example above.

If the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.

## As a Docker container

### Build the image

    $ docker build -t csvdiff .

### Run the container

    $ docker run --rm -v $(pwd):/files csvdiff

Suppose current directory contains two csv files : one.csv two.csv

    $ docker run --rm -v $(pwd):/files csvdiff one.csv two.csv
    
## Alternatives

- [csvdiff](https://github.com/aswinkarthik/csvdiff) is a "fast diff tool for comparing CSV files" - you may get better results from this than from `csv-diff` against larger files.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/simonw/csv-diff",
    "name": "csv-diff",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Simon Willison",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/8e/40/428351c5f57b676e30b7f3a2940b3d017ee1b8e4e091dec4931e488a59fe/csv_diff-1.2.tar.gz",
    "platform": null,
    "description": "# csv-diff\n\n[![PyPI](https://img.shields.io/pypi/v/csv-diff.svg)](https://pypi.org/project/csv-diff/)\n[![Changelog](https://img.shields.io/github/v/release/simonw/csv-diff?include_prereleases&label=changelog)](https://github.com/simonw/csv-diff/releases)\n[![Tests](https://github.com/simonw/csv-diff/workflows/Test/badge.svg)](https://github.com/simonw/csv-diff/actions?query=workflow%3ATest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/csv-diff/blob/main/LICENSE)\n\nTool for viewing the difference between two CSV, TSV or JSON files. See [Generating a commit log for San Francisco\u2019s official list of trees](https://simonwillison.net/2019/Mar/13/tree-history/) (and the [sf-tree-history repo commit log](https://github.com/simonw/sf-tree-history/commits)) for background information on this project.\n\n## Installation\n\n    pip install csv-diff\n\n## Usage\n\nConsider two CSV files:\n\n`one.csv`\n\n    id,name,age\n    1,Cleo,4\n    2,Pancakes,2\n\n`two.csv`\n\n    id,name,age\n    1,Cleo,5\n    3,Bailey,1\n\n`csv-diff` can show a human-readable summary of differences between the files:\n\n    $ csv-diff one.csv two.csv --key=id\n    1 row changed, 1 row added, 1 row removed\n\n    1 row changed\n\n      Row 1\n        age: \"4\" => \"5\"\n\n    1 row added\n\n      id: 3\n      name: Bailey\n      age: 1\n\n    1 row removed\n\n      id: 2\n      name: Pancakes\n      age: 2\n\nThe `--key=id` option means that the `id` column should be treated as the unique key, to identify which records have changed.\n\nThe tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using `--format=tsv` or `--format=csv`.\n\nYou can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use `--format=json` if your input files are JSON.\n\nUse `--show-unchanged` to include full details of the unchanged values for rows with at least one change in the diff output:\n\n    % csv-diff one.csv two.csv --key=id --show-unchanged\n    1 row changed\n\n      id: 1\n        age: \"4\" => \"5\"\n\n        Unchanged:\n          name: \"Cleo\"\n\n### JSON output\n\nYou can use the `--json` option to get a machine-readable difference:\n\n    $ csv-diff one.csv two.csv --key=id --json\n    {\n        \"added\": [\n            {\n                \"id\": \"3\",\n                \"name\": \"Bailey\",\n                \"age\": \"1\"\n            }\n        ],\n        \"removed\": [\n            {\n                \"id\": \"2\",\n                \"name\": \"Pancakes\",\n                \"age\": \"2\"\n            }\n        ],\n        \"changed\": [\n            {\n                \"key\": \"1\",\n                \"changes\": {\n                    \"age\": [\n                        \"4\",\n                        \"5\"\n                    ]\n                }\n            }\n        ],\n        \"columns_added\": [],\n        \"columns_removed\": []\n    }\n\n### Adding templated extras\n\nYou can specify additional keys to be displayed in the human-readable format using the `--extra` option:\n\n    --extra name \"Python format string with {id} for variables\"\n\nFor example, to output a link to `https://news.ycombinator.com/latest?id={id}` for each item with an ID, you could use this:\n\n```bash\ncsv-diff one.csv two.csv --key=id \\\n  --extra latest \"https://news.ycombinator.com/latest?id={id}\"\n```\nThese extras display something like this:\n```\n1 row changed\n\n  id: 41459472\n    points: \"24\" => \"25\"\n    numComments: \"5\" => \"6\"\n  extras:\n    latest: https://news.ycombinator.com/latest?id=41459472\n```\n\n## As a Python library\n\nYou can also import the Python library into your own code like so:\n\n    from csv_diff import load_csv, compare\n    diff = compare(\n        load_csv(open(\"one.csv\"), key=\"id\"),\n        load_csv(open(\"two.csv\"), key=\"id\")\n    )\n\n`diff` will now contain the same data structure as the output in the `--json` example above.\n\nIf the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.\n\n## As a Docker container\n\n### Build the image\n\n    $ docker build -t csvdiff .\n\n### Run the container\n\n    $ docker run --rm -v $(pwd):/files csvdiff\n\nSuppose current directory contains two csv files : one.csv two.csv\n\n    $ docker run --rm -v $(pwd):/files csvdiff one.csv two.csv\n    \n## Alternatives\n\n- [csvdiff](https://github.com/aswinkarthik/csvdiff) is a \"fast diff tool for comparing CSV files\" - you may get better results from this than from `csv-diff` against larger files.\n",
    "bugtrack_url": null,
    "license": "Apache License, Version 2.0",
    "summary": "Python CLI tool and library for diffing CSV and JSON files",
    "version": "1.2",
    "project_urls": {
        "Homepage": "https://github.com/simonw/csv-diff"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "239b72d060d002cd391044ddd0816bd5027f6fda6d62104b81a90a7c3611345c",
                "md5": "ec98c4d69318a98a4d099a96fa66f937",
                "sha256": "bf9c621a45d250f54a8b08cab14813509aa2709b3f7ad45dffc130383f9a8190"
            },
            "downloads": -1,
            "filename": "csv_diff-1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ec98c4d69318a98a4d099a96fa66f937",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 13587,
            "upload_time": "2024-09-06T05:21:19",
            "upload_time_iso_8601": "2024-09-06T05:21:19.027468Z",
            "url": "https://files.pythonhosted.org/packages/23/9b/72d060d002cd391044ddd0816bd5027f6fda6d62104b81a90a7c3611345c/csv_diff-1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8e40428351c5f57b676e30b7f3a2940b3d017ee1b8e4e091dec4931e488a59fe",
                "md5": "b5afcf156b1fa071544282526d987b13",
                "sha256": "f6c251542fbcd9d6eef8b27c3870fc4dfc3592c880c34dac57ed94e382f2d53b"
            },
            "downloads": -1,
            "filename": "csv_diff-1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "b5afcf156b1fa071544282526d987b13",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 11922,
            "upload_time": "2024-09-06T05:21:20",
            "upload_time_iso_8601": "2024-09-06T05:21:20.290152Z",
            "url": "https://files.pythonhosted.org/packages/8e/40/428351c5f57b676e30b7f3a2940b3d017ee1b8e4e091dec4931e488a59fe/csv_diff-1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-06 05:21:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "simonw",
    "github_project": "csv-diff",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "csv-diff"
}
        
Elapsed time: 2.10381s