# csv-diff
[![PyPI](https://img.shields.io/pypi/v/csv-diff.svg)](https://pypi.org/project/csv-diff/)
[![Changelog](https://img.shields.io/github/v/release/simonw/csv-diff?include_prereleases&label=changelog)](https://github.com/simonw/csv-diff/releases)
[![Tests](https://github.com/simonw/csv-diff/workflows/Test/badge.svg)](https://github.com/simonw/csv-diff/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/csv-diff/blob/main/LICENSE)
Tool for viewing the difference between two CSV, TSV or JSON files. See [Generating a commit log for San Francisco’s official list of trees](https://simonwillison.net/2019/Mar/13/tree-history/) (and the [sf-tree-history repo commit log](https://github.com/simonw/sf-tree-history/commits)) for background information on this project.
## Installation
pip install csv-diff
## Usage
Consider two CSV files:
`one.csv`
id,name,age
1,Cleo,4
2,Pancakes,2
`two.csv`
id,name,age
1,Cleo,5
3,Bailey,1
`csv-diff` can show a human-readable summary of differences between the files:
$ csv-diff one.csv two.csv --key=id
1 row changed, 1 row added, 1 row removed
1 row changed
Row 1
age: "4" => "5"
1 row added
id: 3
name: Bailey
age: 1
1 row removed
id: 2
name: Pancakes
age: 2
The `--key=id` option means that the `id` column should be treated as the unique key, to identify which records have changed.
The tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using `--format=tsv` or `--format=csv`.
You can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use `--format=json` if your input files are JSON.
Use `--show-unchanged` to include full details of the unchanged values for rows with at least one change in the diff output:
% csv-diff one.csv two.csv --key=id --show-unchanged
1 row changed
id: 1
age: "4" => "5"
Unchanged:
name: "Cleo"
You can use the `--json` option to get a machine-readable difference:
$ csv-diff one.csv two.csv --key=id --json
{
"added": [
{
"id": "3",
"name": "Bailey",
"age": "1"
}
],
"removed": [
{
"id": "2",
"name": "Pancakes",
"age": "2"
}
],
"changed": [
{
"key": "1",
"changes": {
"age": [
"4",
"5"
]
}
}
],
"columns_added": [],
"columns_removed": []
}
## As a Python library
You can also import the Python library into your own code like so:
from csv_diff import load_csv, compare
diff = compare(
load_csv(open("one.csv"), key="id"),
load_csv(open("two.csv"), key="id")
)
`diff` will now contain the same data structure as the output in the `--json` example above.
If the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.
Raw data
{
"_id": null,
"home_page": "https://github.com/simonw/csv-diff",
"name": "csv-diff",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Simon Willison",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/df/3e/1873856cd1cdb2b373f592e4985d5de99e485daff425c37d6916c89adb3c/csv-diff-1.1.tar.gz",
"platform": "",
"description": "# csv-diff\n\n[![PyPI](https://img.shields.io/pypi/v/csv-diff.svg)](https://pypi.org/project/csv-diff/)\n[![Changelog](https://img.shields.io/github/v/release/simonw/csv-diff?include_prereleases&label=changelog)](https://github.com/simonw/csv-diff/releases)\n[![Tests](https://github.com/simonw/csv-diff/workflows/Test/badge.svg)](https://github.com/simonw/csv-diff/actions?query=workflow%3ATest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/csv-diff/blob/main/LICENSE)\n\nTool for viewing the difference between two CSV, TSV or JSON files. See [Generating a commit log for San Francisco\u2019s official list of trees](https://simonwillison.net/2019/Mar/13/tree-history/) (and the [sf-tree-history repo commit log](https://github.com/simonw/sf-tree-history/commits)) for background information on this project.\n\n## Installation\n\n pip install csv-diff\n\n## Usage\n\nConsider two CSV files:\n\n`one.csv`\n\n id,name,age\n 1,Cleo,4\n 2,Pancakes,2\n\n`two.csv`\n\n id,name,age\n 1,Cleo,5\n 3,Bailey,1\n\n`csv-diff` can show a human-readable summary of differences between the files:\n\n $ csv-diff one.csv two.csv --key=id\n 1 row changed, 1 row added, 1 row removed\n\n 1 row changed\n\n Row 1\n age: \"4\" => \"5\"\n\n 1 row added\n\n id: 3\n name: Bailey\n age: 1\n\n 1 row removed\n\n id: 2\n name: Pancakes\n age: 2\n\nThe `--key=id` option means that the `id` column should be treated as the unique key, to identify which records have changed.\n\nThe tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using `--format=tsv` or `--format=csv`.\n\nYou can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use `--format=json` if your input files are JSON.\n\nUse `--show-unchanged` to include full details of the unchanged values for rows with at least one change in the diff output:\n\n % csv-diff one.csv two.csv --key=id --show-unchanged\n 1 row changed\n\n id: 1\n age: \"4\" => \"5\"\n\n Unchanged:\n name: \"Cleo\"\n\nYou can use the `--json` option to get a machine-readable difference:\n\n $ csv-diff one.csv two.csv --key=id --json\n {\n \"added\": [\n {\n \"id\": \"3\",\n \"name\": \"Bailey\",\n \"age\": \"1\"\n }\n ],\n \"removed\": [\n {\n \"id\": \"2\",\n \"name\": \"Pancakes\",\n \"age\": \"2\"\n }\n ],\n \"changed\": [\n {\n \"key\": \"1\",\n \"changes\": {\n \"age\": [\n \"4\",\n \"5\"\n ]\n }\n }\n ],\n \"columns_added\": [],\n \"columns_removed\": []\n }\n\n## As a Python library\n\nYou can also import the Python library into your own code like so:\n\n from csv_diff import load_csv, compare\n diff = compare(\n load_csv(open(\"one.csv\"), key=\"id\"),\n load_csv(open(\"two.csv\"), key=\"id\")\n )\n\n`diff` will now contain the same data structure as the output in the `--json` example above.\n\nIf the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.\n\n\n",
"bugtrack_url": null,
"license": "Apache License, Version 2.0",
"summary": "Python CLI tool and library for diffing CSV and JSON files",
"version": "1.1",
"project_urls": {
"Homepage": "https://github.com/simonw/csv-diff"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bbcf53d23ae469f2727a5bdb7d6442573084136d2d713129e420cb554ff5506c",
"md5": "5d2bb522ddb9d5354d82f4e1fdf446d2",
"sha256": "f41447fb69165c4b4ae04bb0081884162502ff932e3471e78c97b053dc7ce0ef"
},
"downloads": -1,
"filename": "csv_diff-1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5d2bb522ddb9d5354d82f4e1fdf446d2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 12554,
"upload_time": "2021-02-23T01:15:19",
"upload_time_iso_8601": "2021-02-23T01:15:19.756228Z",
"url": "https://files.pythonhosted.org/packages/bb/cf/53d23ae469f2727a5bdb7d6442573084136d2d713129e420cb554ff5506c/csv_diff-1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "df3e1873856cd1cdb2b373f592e4985d5de99e485daff425c37d6916c89adb3c",
"md5": "2c9cf505b45db4cb3c586c84cfef6f2e",
"sha256": "ff94117992c67dd8bc4f917374f4dc198945ff85b1d67a11bd148b635f576bea"
},
"downloads": -1,
"filename": "csv-diff-1.1.tar.gz",
"has_sig": false,
"md5_digest": "2c9cf505b45db4cb3c586c84cfef6f2e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7301,
"upload_time": "2021-02-23T01:15:21",
"upload_time_iso_8601": "2021-02-23T01:15:21.250780Z",
"url": "https://files.pythonhosted.org/packages/df/3e/1873856cd1cdb2b373f592e4985d5de99e485daff425c37d6916c89adb3c/csv-diff-1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2021-02-23 01:15:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "simonw",
"github_project": "csv-diff",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "csv-diff"
}