[![Build Status](https://travis-ci.org/paulfitz/daff.svg?branch=master)](https://travis-ci.org/paulfitz/daff)
[![NPM version](https://badge.fury.io/js/daff.svg)](http://badge.fury.io/js/daff)
[![Gem Version](https://badge.fury.io/rb/daff.svg)](http://badge.fury.io/rb/daff)
[![PyPI version](https://badge.fury.io/py/daff.svg)](http://badge.fury.io/py/daff)
[![PHP version](https://badge.fury.io/ph/paulfitz%2Fdaff-php.svg)](http://badge.fury.io/ph/paulfitz%2Fdaff-php)
[![Bower version](https://badge.fury.io/bo/daff.svg)](http://badge.fury.io/bo/daff)
![Badge count](http://img.shields.io/:badges-7/7-33aa33.svg)
daff: data diff
===============
This is a library for comparing tables, producing a summary of their
differences, and using such a summary as a patch file. It is
optimized for comparing tables that share a common origin, in other
words multiple versions of the "same" table.
For a live demo, see:
> http://paulfitz.github.com/daff/
Install the library for your favorite language:
````sh
npm install daff -g # node/javascript
pip install daff # python
gem install daff # ruby
composer require paulfitz/daff-php # php
install.packages('daff') # R wrapper by Edwin de Jonge
bower install daff # web/javascript
````
Other translations are available here:
> https://github.com/paulfitz/daff/releases
Or use the library to view csv diffs on github via a chrome extension:
> https://github.com/theodi/csvhub
The diff format used by `daff` is specified here:
> http://paulfitz.github.io/daff-doc/spec.html
This library is a stripped down version of the coopy toolbox (see
http://share.find.coop). To compare tables from different origins,
or with automatically generated IDs, or other complications, check out
the coopy toolbox.
The program
-----------
You can run `daff`/`daff.py`/`daff.rb` as a utility program:
````
$ daff
daff can produce and apply tabular diffs.
Call as:
daff a.csv b.csv
daff [--color] [--no-color] [--output OUTPUT.csv] a.csv b.csv
daff [--output OUTPUT.html] a.csv b.csv
daff [--www] a.csv b.csv
daff parent.csv a.csv b.csv
daff --input-format sqlite a.db b.db
daff patch [--inplace] a.csv patch.csv
daff merge [--inplace] parent.csv a.csv b.csv
daff trim [--output OUTPUT.csv] source.csv
daff render [--output OUTPUT.html] diff.csv
daff copy in.csv out.tsv
daff in.csv
daff git
daff version
The --inplace option to patch and merge will result in modification of a.csv.
If you need more control, here is the full list of flags:
daff diff [--output OUTPUT.csv] [--context NUM] [--all] [--act ACT] a.csv b.csv
--act ACT: show only a certain kind of change (update, insert, delete, column)
--all: do not prune unchanged rows or columns
--all-rows: do not prune unchanged rows
--all-columns: do not prune unchanged columns
--color: highlight changes with terminal colors (default in terminals)
--context NUM: show NUM rows of context (0=none)
--context-columns NUM: show NUM columns of context (0=none)
--fail-if-diff: return status is 0 if equal, 1 if different, 2 if problem
--id: specify column to use as primary key (repeat for multi-column key)
--ignore: specify column to ignore completely (can repeat)
--index: include row/columns numbers from original tables
--input-format [csv|tsv|ssv|psv|json|sqlite]: set format to expect for input
--eol [crlf|lf|cr|auto]: separator between rows of csv output.
--no-color: make sure terminal colors are not used
--ordered: assume row order is meaningful (default for CSV)
--output-format [csv|tsv|ssv|psv|json|copy|html]: set format for output
--padding [dense|sparse|smart]: set padding method for aligning columns
--table NAME: compare the named table, used with SQL sources. If name changes, use 'n1:n2'
--unordered: assume row order is meaningless (default for json formats)
-w / --ignore-whitespace: ignore changes in leading/trailing whitespace
-i / --ignore-case: ignore differences in case
daff render [--output OUTPUT.html] [--css CSS.css] [--fragment] [--plain] diff.csv
--css CSS.css: generate a suitable css file to go with the html
--fragment: generate just a html fragment rather than a page
--plain: do not use fancy utf8 characters to make arrows prettier
--unquote: do not quote html characters in html diffs
--www: send output to a browser
````
Formats supported are CSV, TSV, Sqlite (with `--input-format sqlite` or
the `.sqlite` extension), and ndjson.
Using with git
--------------
Run `daff git csv` to install daff as a diff and merge handler
for `*.csv` files in your repository. Run `daff git` for instructions
on doing this manually. Your CSV diffs and merges will get smarter,
since git will suddenly understand about rows and columns, not just lines:
![Example CSV diff](http://paulfitz.github.io/daff-doc/images/daff_vs_diff.png)
The library
-----------
You can use `daff` as a library from any supported language. We take
here the example of Javascript. To use `daff` on a webpage,
first include `daff.js`:
```html
<script src="daff.js"></script>
```
Or if using node outside the browser:
```js
var daff = require('daff');
```
For concreteness, assume we have two versions of a table,
`data1` and `data2`:
```js
var data1 = [
['Country','Capital'],
['Ireland','Dublin'],
['France','Paris'],
['Spain','Barcelona']
];
var data2 = [
['Country','Code','Capital'],
['Ireland','ie','Dublin'],
['France','fr','Paris'],
['Spain','es','Madrid'],
['Germany','de','Berlin']
];
```
To make those tables accessible to the library, we wrap them
in `daff.TableView`:
```js
var table1 = new daff.TableView(data1);
var table2 = new daff.TableView(data2);
```
We can now compute the alignment between the rows and columns
in the two tables:
```js
var alignment = daff.compareTables(table1,table2).align();
```
To produce a diff from the alignment, we first need a table
for the output:
```js
var data_diff = [];
var table_diff = new daff.TableView(data_diff);
```
Using default options for the diff:
```js
var flags = new daff.CompareFlags();
var highlighter = new daff.TableDiff(alignment,flags);
highlighter.hilite(table_diff);
```
The diff is now in `data_diff` in highlighter format, see
specification here:
> http://paulfitz.github.io/daff-doc/spec.html
```js
[ [ '!', '', '+++', '' ],
[ '@@', 'Country', 'Code', 'Capital' ],
[ '+', 'Ireland', 'ie', 'Dublin' ],
[ '+', 'France', 'fr', 'Paris' ],
[ '->', 'Spain', 'es', 'Barcelona->Madrid' ],
[ '+++', 'Germany', 'de', 'Berlin' ] ]
```
For visualization, you may want to convert this to a HTML table
with appropriate classes on cells so you can color-code inserts,
deletes, updates, etc. You can do this with:
```js
var diff2html = new daff.DiffRender();
diff2html.render(table_diff);
var table_diff_html = diff2html.html();
```
For 3-way differences (that is, comparing two tables given knowledge
of a common ancestor) use `daff.compareTables3` (give ancestor
table as the first argument).
Here is how to apply that difference as a patch:
```js
var patcher = new daff.HighlightPatch(table1,table_diff);
patcher.apply();
// table1 should now equal table2
```
For other languages, you should find sample code in
the packages on the [Releases](https://github.com/paulfitz/daff/releases) page.
Supported languages
-------------------
The `daff` library is written in [Haxe](http://haxe.org/), which
can be translated reasonably well into at least the following languages:
* Javascript
* Python
* Java
* C#
* C++
* Ruby (using an [unofficial haxe target](https://github.com/paulfitz/haxe) developed for `daff`)
* PHP
Some translations are done for you on the
[Releases](https://github.com/paulfitz/daff/releases) page.
To make another translation, or to compile from source
first follow the [Haxe language introduction](https://haxe.org/documentation/introduction/language-introduction.html) for the
language you care about. At the time of writing, if you are on OSX, you should
install haxe using `brew install haxe`. Then do one of:
```
make js
make php
make py
make java
make cs
make cpp
```
For each language, the `daff` library expects to be handed an interface to tables you create, rather than creating them
itself. This is to avoid inefficient copies from one format to another. You'll find a `SimpleTable` class you can use if
you find this awkward.
Other possibilities:
* There's a daff wrapper for R written by [Edwin de Jonge](https://github.com/edwindj), see https://github.com/edwindj/daff and http://cran.r-project.org/web/packages/daff
* There's a hand-written ruby port by [James Smith](https://github.com/Floppy), see https://github.com/theodi/coopy-ruby
API documentation
-----------------
* You can browse the `daff` classes at http://paulfitz.github.io/daff-doc/
Sponsors
--------
<img src="http://datacommons.coop/images/the_zen_of_venn.png" alt="the zen of venn" height="100">
The <a href="https://datacommons.coop">Data Commons Co-op</a>, "perhaps the geekiest of all cooperative organizations on the planet," has given great moral support during the development of `daff`.
Donate a multiple of `42.42` in your currency to let them know you care: <a href="https://datacommons.coop/donate/">https://datacommons.coop/donate/</a>.
Reading material
----------------
* https://specs.frictionlessdata.io/tabular-diff : a specification of the diff format we use.
* http://theodi.org/blog/csvhub-github-diffs-for-csv-files : using this library with github.
* https://github.com/ropensci/unconf/issues/19 : a thread about diffing data in which daff shows up in at least four guises (see if you can spot them all).
* http://theodi.org/blog/adapting-git-simple-data : using this library with gitlab.
* http://okfnlabs.org/blog/2013/08/08/diffing-and-patching-data.html : a summary of where the library came from.
* http://blog.okfn.org/2013/07/02/git-and-github-for-data/ : a post about storing small data in git/github.
* http://blog.ouseful.info/2013/08/27/diff-or-chop-github-csv-data-files-and-openrefine/ : counterpoint - a post discussing tracked-changes rather than diffs.
* http://blog.byronjsmith.com/makefile-shortcuts.html : a tutorial on using `make` for data, with daff in the mix. "Since git considers changes on a per-line basis,
looking at diffs of comma-delimited and tab-delimited files can get obnoxious. The program daff fixes this problem."
## License
daff is distributed under the MIT License.
Raw data
{
"_id": null,
"home_page": "https://github.com/paulfitz/daff",
"name": "daff",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "data diff patch",
"author": "Paul Fitzpatrick",
"author_email": "paul@robotrebuilt.com",
"download_url": "https://files.pythonhosted.org/packages/0e/fc/82796c10545f3df9882566c79debac28b664e3a3a08fdb493ac3cc418709/daff-1.3.46.tar.gz",
"platform": "",
"description": "[![Build Status](https://travis-ci.org/paulfitz/daff.svg?branch=master)](https://travis-ci.org/paulfitz/daff)\n[![NPM version](https://badge.fury.io/js/daff.svg)](http://badge.fury.io/js/daff)\n[![Gem Version](https://badge.fury.io/rb/daff.svg)](http://badge.fury.io/rb/daff)\n[![PyPI version](https://badge.fury.io/py/daff.svg)](http://badge.fury.io/py/daff)\n[![PHP version](https://badge.fury.io/ph/paulfitz%2Fdaff-php.svg)](http://badge.fury.io/ph/paulfitz%2Fdaff-php)\n[![Bower version](https://badge.fury.io/bo/daff.svg)](http://badge.fury.io/bo/daff)\n![Badge count](http://img.shields.io/:badges-7/7-33aa33.svg)\n\ndaff: data diff\n===============\n\nThis is a library for comparing tables, producing a summary of their\ndifferences, and using such a summary as a patch file. It is\noptimized for comparing tables that share a common origin, in other\nwords multiple versions of the \"same\" table.\n\nFor a live demo, see:\n> http://paulfitz.github.com/daff/\n\nInstall the library for your favorite language:\n````sh\nnpm install daff -g # node/javascript\npip install daff # python\ngem install daff # ruby\ncomposer require paulfitz/daff-php # php\ninstall.packages('daff') # R wrapper by Edwin de Jonge\nbower install daff # web/javascript\n````\n\nOther translations are available here:\n> https://github.com/paulfitz/daff/releases\n\nOr use the library to view csv diffs on github via a chrome extension:\n> https://github.com/theodi/csvhub\n\nThe diff format used by `daff` is specified here:\n> http://paulfitz.github.io/daff-doc/spec.html\n\nThis library is a stripped down version of the coopy toolbox (see\nhttp://share.find.coop). To compare tables from different origins,\nor with automatically generated IDs, or other complications, check out\nthe coopy toolbox.\n\nThe program\n-----------\n\nYou can run `daff`/`daff.py`/`daff.rb` as a utility program:\n````\n$ daff\ndaff can produce and apply tabular diffs.\nCall as:\n daff a.csv b.csv\n daff [--color] [--no-color] [--output OUTPUT.csv] a.csv b.csv\n daff [--output OUTPUT.html] a.csv b.csv\n daff [--www] a.csv b.csv\n daff parent.csv a.csv b.csv\n daff --input-format sqlite a.db b.db\n daff patch [--inplace] a.csv patch.csv\n daff merge [--inplace] parent.csv a.csv b.csv\n daff trim [--output OUTPUT.csv] source.csv\n daff render [--output OUTPUT.html] diff.csv\n daff copy in.csv out.tsv\n daff in.csv\n daff git\n daff version\n\nThe --inplace option to patch and merge will result in modification of a.csv.\n\nIf you need more control, here is the full list of flags:\n daff diff [--output OUTPUT.csv] [--context NUM] [--all] [--act ACT] a.csv b.csv\n --act ACT: show only a certain kind of change (update, insert, delete, column)\n --all: do not prune unchanged rows or columns\n --all-rows: do not prune unchanged rows\n --all-columns: do not prune unchanged columns\n --color: highlight changes with terminal colors (default in terminals)\n --context NUM: show NUM rows of context (0=none)\n --context-columns NUM: show NUM columns of context (0=none)\n --fail-if-diff: return status is 0 if equal, 1 if different, 2 if problem\n --id: specify column to use as primary key (repeat for multi-column key)\n --ignore: specify column to ignore completely (can repeat)\n --index: include row/columns numbers from original tables\n --input-format [csv|tsv|ssv|psv|json|sqlite]: set format to expect for input\n --eol [crlf|lf|cr|auto]: separator between rows of csv output.\n --no-color: make sure terminal colors are not used\n --ordered: assume row order is meaningful (default for CSV)\n --output-format [csv|tsv|ssv|psv|json|copy|html]: set format for output\n --padding [dense|sparse|smart]: set padding method for aligning columns\n --table NAME: compare the named table, used with SQL sources. If name changes, use 'n1:n2'\n --unordered: assume row order is meaningless (default for json formats)\n -w / --ignore-whitespace: ignore changes in leading/trailing whitespace\n -i / --ignore-case: ignore differences in case\n\n daff render [--output OUTPUT.html] [--css CSS.css] [--fragment] [--plain] diff.csv\n --css CSS.css: generate a suitable css file to go with the html\n --fragment: generate just a html fragment rather than a page\n --plain: do not use fancy utf8 characters to make arrows prettier\n --unquote: do not quote html characters in html diffs\n --www: send output to a browser\n````\n\nFormats supported are CSV, TSV, Sqlite (with `--input-format sqlite` or\nthe `.sqlite` extension), and ndjson.\n\nUsing with git\n--------------\n\nRun `daff git csv` to install daff as a diff and merge handler\nfor `*.csv` files in your repository. Run `daff git` for instructions\non doing this manually. Your CSV diffs and merges will get smarter,\nsince git will suddenly understand about rows and columns, not just lines:\n\n![Example CSV diff](http://paulfitz.github.io/daff-doc/images/daff_vs_diff.png)\n\nThe library\n-----------\n\nYou can use `daff` as a library from any supported language. We take\nhere the example of Javascript. To use `daff` on a webpage,\nfirst include `daff.js`:\n```html\n<script src=\"daff.js\"></script>\n```\nOr if using node outside the browser:\n```js\nvar daff = require('daff');\n```\n\nFor concreteness, assume we have two versions of a table,\n`data1` and `data2`:\n```js\nvar data1 = [\n ['Country','Capital'],\n ['Ireland','Dublin'],\n ['France','Paris'],\n ['Spain','Barcelona']\n];\nvar data2 = [\n ['Country','Code','Capital'],\n ['Ireland','ie','Dublin'],\n ['France','fr','Paris'],\n ['Spain','es','Madrid'],\n ['Germany','de','Berlin']\n];\n```\n\nTo make those tables accessible to the library, we wrap them\nin `daff.TableView`:\n```js\nvar table1 = new daff.TableView(data1);\nvar table2 = new daff.TableView(data2);\n```\n\nWe can now compute the alignment between the rows and columns\nin the two tables:\n```js\nvar alignment = daff.compareTables(table1,table2).align();\n```\n\nTo produce a diff from the alignment, we first need a table\nfor the output:\n```js\nvar data_diff = [];\nvar table_diff = new daff.TableView(data_diff);\n```\n\nUsing default options for the diff:\n```js\nvar flags = new daff.CompareFlags();\nvar highlighter = new daff.TableDiff(alignment,flags);\nhighlighter.hilite(table_diff);\n```\n\nThe diff is now in `data_diff` in highlighter format, see\nspecification here:\n> http://paulfitz.github.io/daff-doc/spec.html\n\n```js\n[ [ '!', '', '+++', '' ],\n [ '@@', 'Country', 'Code', 'Capital' ],\n [ '+', 'Ireland', 'ie', 'Dublin' ],\n [ '+', 'France', 'fr', 'Paris' ],\n [ '->', 'Spain', 'es', 'Barcelona->Madrid' ],\n [ '+++', 'Germany', 'de', 'Berlin' ] ]\n```\n\nFor visualization, you may want to convert this to a HTML table\nwith appropriate classes on cells so you can color-code inserts,\ndeletes, updates, etc. You can do this with:\n```js\nvar diff2html = new daff.DiffRender();\ndiff2html.render(table_diff);\nvar table_diff_html = diff2html.html();\n```\n\nFor 3-way differences (that is, comparing two tables given knowledge\nof a common ancestor) use `daff.compareTables3` (give ancestor\ntable as the first argument).\n\nHere is how to apply that difference as a patch:\n```js\nvar patcher = new daff.HighlightPatch(table1,table_diff);\npatcher.apply();\n// table1 should now equal table2\n```\n\nFor other languages, you should find sample code in\nthe packages on the [Releases](https://github.com/paulfitz/daff/releases) page.\n\nSupported languages\n-------------------\n\nThe `daff` library is written in [Haxe](http://haxe.org/), which\ncan be translated reasonably well into at least the following languages:\n\n * Javascript\n * Python\n * Java\n * C#\n * C++\n * Ruby (using an [unofficial haxe target](https://github.com/paulfitz/haxe) developed for `daff`)\n * PHP\n\nSome translations are done for you on the\n[Releases](https://github.com/paulfitz/daff/releases) page.\nTo make another translation, or to compile from source\nfirst follow the [Haxe language introduction](https://haxe.org/documentation/introduction/language-introduction.html) for the\nlanguage you care about. At the time of writing, if you are on OSX, you should\ninstall haxe using `brew install haxe`. Then do one of:\n\n```\nmake js\nmake php\nmake py\nmake java\nmake cs\nmake cpp\n```\n\nFor each language, the `daff` library expects to be handed an interface to tables you create, rather than creating them\nitself. This is to avoid inefficient copies from one format to another. You'll find a `SimpleTable` class you can use if\nyou find this awkward.\n\nOther possibilities:\n\n * There's a daff wrapper for R written by [Edwin de Jonge](https://github.com/edwindj), see https://github.com/edwindj/daff and http://cran.r-project.org/web/packages/daff\n * There's a hand-written ruby port by [James Smith](https://github.com/Floppy), see https://github.com/theodi/coopy-ruby\n\nAPI documentation\n-----------------\n\n * You can browse the `daff` classes at http://paulfitz.github.io/daff-doc/\n\nSponsors\n--------\n\n<img src=\"http://datacommons.coop/images/the_zen_of_venn.png\" alt=\"the zen of venn\" height=\"100\">\nThe <a href=\"https://datacommons.coop\">Data Commons Co-op</a>, \"perhaps the geekiest of all cooperative organizations on the planet,\" has given great moral support during the development of `daff`.\nDonate a multiple of `42.42` in your currency to let them know you care: <a href=\"https://datacommons.coop/donate/\">https://datacommons.coop/donate/</a>.\n\nReading material\n----------------\n\n * https://specs.frictionlessdata.io/tabular-diff : a specification of the diff format we use.\n * http://theodi.org/blog/csvhub-github-diffs-for-csv-files : using this library with github.\n * https://github.com/ropensci/unconf/issues/19 : a thread about diffing data in which daff shows up in at least four guises (see if you can spot them all).\n * http://theodi.org/blog/adapting-git-simple-data : using this library with gitlab.\n * http://okfnlabs.org/blog/2013/08/08/diffing-and-patching-data.html : a summary of where the library came from.\n * http://blog.okfn.org/2013/07/02/git-and-github-for-data/ : a post about storing small data in git/github.\n * http://blog.ouseful.info/2013/08/27/diff-or-chop-github-csv-data-files-and-openrefine/ : counterpoint - a post discussing tracked-changes rather than diffs.\n * http://blog.byronjsmith.com/makefile-shortcuts.html : a tutorial on using `make` for data, with daff in the mix. \"Since git considers changes on a per-line basis,\n looking at diffs of comma-delimited and tab-delimited files can get obnoxious. The program daff fixes this problem.\"\n\n## License\n\ndaff is distributed under the MIT License.",
"bugtrack_url": null,
"license": "MIT",
"summary": "Diff and patch tables",
"version": "1.3.46",
"project_urls": {
"Homepage": "https://github.com/paulfitz/daff"
},
"split_keywords": [
"data",
"diff",
"patch"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0efc82796c10545f3df9882566c79debac28b664e3a3a08fdb493ac3cc418709",
"md5": "da35c0b3055ce2bef6f9daf5acf88652",
"sha256": "22d0da9fd6a3275b54c926a9c97b180f9258aad65113ea18f3fec52cbadcd818"
},
"downloads": -1,
"filename": "daff-1.3.46.tar.gz",
"has_sig": false,
"md5_digest": "da35c0b3055ce2bef6f9daf5acf88652",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 149820,
"upload_time": "2020-08-05T11:21:28",
"upload_time_iso_8601": "2020-08-05T11:21:28.588255Z",
"url": "https://files.pythonhosted.org/packages/0e/fc/82796c10545f3df9882566c79debac28b664e3a3a08fdb493ac3cc418709/daff-1.3.46.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2020-08-05 11:21:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "paulfitz",
"github_project": "daff",
"travis_ci": true,
"coveralls": false,
"github_actions": false,
"lcname": "daff"
}