# csv-diff
[![PyPI](https://img.shields.io/pypi/v/csv-diff.svg)](https://pypi.org/project/csv-diff/)
[![Changelog](https://img.shields.io/github/v/release/simonw/csv-diff?include_prereleases&label=changelog)](https://github.com/simonw/csv-diff/releases)
[![Tests](https://github.com/simonw/csv-diff/workflows/Test/badge.svg)](https://github.com/simonw/csv-diff/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/csv-diff/blob/main/LICENSE)
Tool for viewing the difference between two CSV, TSV or JSON files. See [Generating a commit log for San Francisco’s official list of trees](https://simonwillison.net/2019/Mar/13/tree-history/) (and the [sf-tree-history repo commit log](https://github.com/simonw/sf-tree-history/commits)) for background information on this project.
## Installation
pip install csv-diff
## Usage
Consider two CSV files:
`one.csv`
id,name,age
1,Cleo,4
2,Pancakes,2
`two.csv`
id,name,age
1,Cleo,5
3,Bailey,1
`csv-diff` can show a human-readable summary of differences between the files:
$ csv-diff one.csv two.csv --key=id
1 row changed, 1 row added, 1 row removed
1 row changed
Row 1
age: "4" => "5"
1 row added
id: 3
name: Bailey
age: 1
1 row removed
id: 2
name: Pancakes
age: 2
The `--key=id` option means that the `id` column should be treated as the unique key, to identify which records have changed.
The tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using `--format=tsv` or `--format=csv`.
You can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use `--format=json` if your input files are JSON.
Use `--show-unchanged` to include full details of the unchanged values for rows with at least one change in the diff output:
% csv-diff one.csv two.csv --key=id --show-unchanged
1 row changed
id: 1
age: "4" => "5"
Unchanged:
name: "Cleo"
### JSON output
You can use the `--json` option to get a machine-readable difference:
$ csv-diff one.csv two.csv --key=id --json
{
"added": [
{
"id": "3",
"name": "Bailey",
"age": "1"
}
],
"removed": [
{
"id": "2",
"name": "Pancakes",
"age": "2"
}
],
"changed": [
{
"key": "1",
"changes": {
"age": [
"4",
"5"
]
}
}
],
"columns_added": [],
"columns_removed": []
}
### Adding templated extras
You can specify additional keys to be displayed in the human-readable format using the `--extra` option:
--extra name "Python format string with {id} for variables"
For example, to output a link to `https://news.ycombinator.com/latest?id={id}` for each item with an ID, you could use this:
```bash
csv-diff one.csv two.csv --key=id \
--extra latest "https://news.ycombinator.com/latest?id={id}"
```
These extras display something like this:
```
1 row changed
id: 41459472
points: "24" => "25"
numComments: "5" => "6"
extras:
latest: https://news.ycombinator.com/latest?id=41459472
```
## As a Python library
You can also import the Python library into your own code like so:
from csv_diff import load_csv, compare
diff = compare(
load_csv(open("one.csv"), key="id"),
load_csv(open("two.csv"), key="id")
)
`diff` will now contain the same data structure as the output in the `--json` example above.
If the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.
## As a Docker container
### Build the image
$ docker build -t csvdiff .
### Run the container
$ docker run --rm -v $(pwd):/files csvdiff
Suppose current directory contains two csv files : one.csv two.csv
$ docker run --rm -v $(pwd):/files csvdiff one.csv two.csv
## Alternatives
- [csvdiff](https://github.com/aswinkarthik/csvdiff) is a "fast diff tool for comparing CSV files" - you may get better results from this than from `csv-diff` against larger files.
Raw data
{
"_id": null,
"home_page": "https://github.com/simonw/csv-diff",
"name": "csv-diff",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Simon Willison",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/8e/40/428351c5f57b676e30b7f3a2940b3d017ee1b8e4e091dec4931e488a59fe/csv_diff-1.2.tar.gz",
"platform": null,
"description": "# csv-diff\n\n[![PyPI](https://img.shields.io/pypi/v/csv-diff.svg)](https://pypi.org/project/csv-diff/)\n[![Changelog](https://img.shields.io/github/v/release/simonw/csv-diff?include_prereleases&label=changelog)](https://github.com/simonw/csv-diff/releases)\n[![Tests](https://github.com/simonw/csv-diff/workflows/Test/badge.svg)](https://github.com/simonw/csv-diff/actions?query=workflow%3ATest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/csv-diff/blob/main/LICENSE)\n\nTool for viewing the difference between two CSV, TSV or JSON files. See [Generating a commit log for San Francisco\u2019s official list of trees](https://simonwillison.net/2019/Mar/13/tree-history/) (and the [sf-tree-history repo commit log](https://github.com/simonw/sf-tree-history/commits)) for background information on this project.\n\n## Installation\n\n pip install csv-diff\n\n## Usage\n\nConsider two CSV files:\n\n`one.csv`\n\n id,name,age\n 1,Cleo,4\n 2,Pancakes,2\n\n`two.csv`\n\n id,name,age\n 1,Cleo,5\n 3,Bailey,1\n\n`csv-diff` can show a human-readable summary of differences between the files:\n\n $ csv-diff one.csv two.csv --key=id\n 1 row changed, 1 row added, 1 row removed\n\n 1 row changed\n\n Row 1\n age: \"4\" => \"5\"\n\n 1 row added\n\n id: 3\n name: Bailey\n age: 1\n\n 1 row removed\n\n id: 2\n name: Pancakes\n age: 2\n\nThe `--key=id` option means that the `id` column should be treated as the unique key, to identify which records have changed.\n\nThe tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using `--format=tsv` or `--format=csv`.\n\nYou can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use `--format=json` if your input files are JSON.\n\nUse `--show-unchanged` to include full details of the unchanged values for rows with at least one change in the diff output:\n\n % csv-diff one.csv two.csv --key=id --show-unchanged\n 1 row changed\n\n id: 1\n age: \"4\" => \"5\"\n\n Unchanged:\n name: \"Cleo\"\n\n### JSON output\n\nYou can use the `--json` option to get a machine-readable difference:\n\n $ csv-diff one.csv two.csv --key=id --json\n {\n \"added\": [\n {\n \"id\": \"3\",\n \"name\": \"Bailey\",\n \"age\": \"1\"\n }\n ],\n \"removed\": [\n {\n \"id\": \"2\",\n \"name\": \"Pancakes\",\n \"age\": \"2\"\n }\n ],\n \"changed\": [\n {\n \"key\": \"1\",\n \"changes\": {\n \"age\": [\n \"4\",\n \"5\"\n ]\n }\n }\n ],\n \"columns_added\": [],\n \"columns_removed\": []\n }\n\n### Adding templated extras\n\nYou can specify additional keys to be displayed in the human-readable format using the `--extra` option:\n\n --extra name \"Python format string with {id} for variables\"\n\nFor example, to output a link to `https://news.ycombinator.com/latest?id={id}` for each item with an ID, you could use this:\n\n```bash\ncsv-diff one.csv two.csv --key=id \\\n --extra latest \"https://news.ycombinator.com/latest?id={id}\"\n```\nThese extras display something like this:\n```\n1 row changed\n\n id: 41459472\n points: \"24\" => \"25\"\n numComments: \"5\" => \"6\"\n extras:\n latest: https://news.ycombinator.com/latest?id=41459472\n```\n\n## As a Python library\n\nYou can also import the Python library into your own code like so:\n\n from csv_diff import load_csv, compare\n diff = compare(\n load_csv(open(\"one.csv\"), key=\"id\"),\n load_csv(open(\"two.csv\"), key=\"id\")\n )\n\n`diff` will now contain the same data structure as the output in the `--json` example above.\n\nIf the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.\n\n## As a Docker container\n\n### Build the image\n\n $ docker build -t csvdiff .\n\n### Run the container\n\n $ docker run --rm -v $(pwd):/files csvdiff\n\nSuppose current directory contains two csv files : one.csv two.csv\n\n $ docker run --rm -v $(pwd):/files csvdiff one.csv two.csv\n \n## Alternatives\n\n- [csvdiff](https://github.com/aswinkarthik/csvdiff) is a \"fast diff tool for comparing CSV files\" - you may get better results from this than from `csv-diff` against larger files.\n",
"bugtrack_url": null,
"license": "Apache License, Version 2.0",
"summary": "Python CLI tool and library for diffing CSV and JSON files",
"version": "1.2",
"project_urls": {
"Homepage": "https://github.com/simonw/csv-diff"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "239b72d060d002cd391044ddd0816bd5027f6fda6d62104b81a90a7c3611345c",
"md5": "ec98c4d69318a98a4d099a96fa66f937",
"sha256": "bf9c621a45d250f54a8b08cab14813509aa2709b3f7ad45dffc130383f9a8190"
},
"downloads": -1,
"filename": "csv_diff-1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ec98c4d69318a98a4d099a96fa66f937",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 13587,
"upload_time": "2024-09-06T05:21:19",
"upload_time_iso_8601": "2024-09-06T05:21:19.027468Z",
"url": "https://files.pythonhosted.org/packages/23/9b/72d060d002cd391044ddd0816bd5027f6fda6d62104b81a90a7c3611345c/csv_diff-1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8e40428351c5f57b676e30b7f3a2940b3d017ee1b8e4e091dec4931e488a59fe",
"md5": "b5afcf156b1fa071544282526d987b13",
"sha256": "f6c251542fbcd9d6eef8b27c3870fc4dfc3592c880c34dac57ed94e382f2d53b"
},
"downloads": -1,
"filename": "csv_diff-1.2.tar.gz",
"has_sig": false,
"md5_digest": "b5afcf156b1fa071544282526d987b13",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 11922,
"upload_time": "2024-09-06T05:21:20",
"upload_time_iso_8601": "2024-09-06T05:21:20.290152Z",
"url": "https://files.pythonhosted.org/packages/8e/40/428351c5f57b676e30b7f3a2940b3d017ee1b8e4e091dec4931e488a59fe/csv_diff-1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-06 05:21:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "simonw",
"github_project": "csv-diff",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "csv-diff"
}