# csvw
[![Build Status](https://github.com/cldf/csvw/workflows/tests/badge.svg)](https://github.com/cldf/csvw/actions?query=workflow%3Atests)
[![PyPI](https://img.shields.io/pypi/v/csvw.svg)](https://pypi.org/project/csvw)
[![Documentation Status](https://readthedocs.org/projects/csvw/badge/?version=latest)](https://csvw.readthedocs.io/en/latest/?badge=latest)
This package provides
- a Python API to read and write relational, tabular data according to the [CSV on the Web](https://csvw.org/) specification and
- commandline tools for reading and validating CSVW data.
## Links
- GitHub: https://github.com/cldf/csvw
- PyPI: https://pypi.org/project/csvw
- Issue Tracker: https://github.com/cldf/csvw/issues
## Installation
This package runs under Python >=3.8, use pip to install:
```bash
$ pip install csvw
```
## CLI
### `csvw2json`
Converting CSVW data [to JSON](https://www.w3.org/TR/csv2json/)
```shell
$ csvw2json tests/fixtures/zipped-metadata.json
{
"tables": [
{
"url": "tests/fixtures/zipped.csv",
"row": [
{
"url": "tests/fixtures/zipped.csv#row=2",
"rownum": 1,
"describes": [
{
"ID": "abc",
"Value": "the value"
}
]
},
{
"url": "tests/fixtures/zipped.csv#row=3",
"rownum": 2,
"describes": [
{
"ID": "cde",
"Value": "another one"
}
]
}
]
}
]
}
```
### `csvwvalidate`
Validating CSVW data
```shell
$ csvwvalidate tests/fixtures/zipped-metadata.json
OK
```
### `csvwdescribe`
Describing tabular-data files with CSVW metadata
```shell
$ csvwdescribe --delimiter "|" tests/fixtures/frictionless-data.csv
{
"@context": "http://www.w3.org/ns/csvw",
"dc:conformsTo": "data-package",
"tables": [
{
"dialect": {
"delimiter": "|"
},
"tableSchema": {
"columns": [
{
"datatype": "string",
"name": "FK"
},
{
"datatype": "integer",
"name": "Year"
},
{
"datatype": "string",
"name": "Location name"
},
{
"datatype": "string",
"name": "Value"
},
{
"datatype": "string",
"name": "binary"
},
{
"datatype": "string",
"name": "anyURI"
},
{
"datatype": "string",
"name": "email"
},
{
"datatype": "string",
"name": "boolean"
},
{
"datatype": {
"dc:format": "application/json",
"base": "json"
},
"name": "array"
},
{
"datatype": {
"dc:format": "application/json",
"base": "json"
},
"name": "geojson"
}
]
},
"url": "tests/fixtures/frictionless-data.csv"
}
]
}
```
## Python API
Find the Python API documentation at [csvw.readthedocs.io](https://csvw.readthedocs.io/en/latest/).
A quick example for using `csvw` from Python code:
```python
import json
from csvw import CSVW
data = CSVW('https://raw.githubusercontent.com/cldf/csvw/master/tests/fixtures/test.tsv')
print(json.dumps(data.to_json(minimal=True), indent=4))
[
{
"province": "Hello",
"territory": "world",
"precinct": "1"
}
]
```
## Known limitations
- We read **all** data which is specified as UTF-8 encoded using the
[`utf-8-sig` codecs](https://docs.python.org/3/library/codecs.html#module-encodings.utf_8_sig).
Thus, if such data starts with `U+FEFF` this will be interpreted as [BOM](https://en.wikipedia.org/wiki/Byte_order_mark)
and skipped.
- Low level CSV parsing is delegated to the `csv` module in Python's standard library. Thus, if a `commentPrefix`
is specified in a `Dialect` instance, this will lead to skipping rows where the first value starts
with `commentPrefix`, **even if the value was quoted**.
- Also, cell content containing `escapechar` may not be round-tripped as expected (when specifying
`escapechar` or a `csvw.Dialect` with `quoteChar` but `doubleQuote==False`),
when minimal quoting is specified. This is due to inconsistent `csv` behaviour
across Python versions (see https://bugs.python.org/issue44861).
## CSVW conformance
While we use the CSVW specification as guideline, this package does not (and
probably never will) implement the full extent of this spec.
- When CSV files with a header are read, columns are not matched in order with
column descriptions in the `tableSchema`, but instead are matched based on the
CSV column header and the column descriptions' `name` and `titles` atributes.
This allows for more flexibility, because columns in the CSV file may be
re-ordered without invalidating the metadata. A stricter matching can be forced
by specifying `"header": false` and `"skipRows": 1` in the table's dialect
description.
However, `csvw.CSVW` works correctly for
- 269 out of 270 [JSON tests](https://w3c.github.io/csvw/tests/#manifest-json),
- 280 out of 282 [validation tests](https://w3c.github.io/csvw/tests/#manifest-validation),
- 10 out of 18 [non-normative tests](https://w3c.github.io/csvw/tests/#manifest-nonnorm)
from the [CSVW Test suites](https://w3c.github.io/csvw/tests/).
## Compatibility with [Frictionless Data Specs](https://specs.frictionlessdata.io/)
A CSVW-described dataset is basically equivalent to a Frictionless DataPackage where all
[Data Resources](https://specs.frictionlessdata.io/data-resource/) are [Tabular Data](https://specs.frictionlessdata.io/tabular-data-resource/).
Thus, the `csvw` package provides some conversion functionality. To
"read CSVW data from a Data Package", there's the `csvw.TableGroup.from_frictionless_datapackage` method:
```python
from csvw import TableGroup
tg = TableGroup.from_frictionless_datapackage('PATH/TO/datapackage.json')
```
To convert the metadata, the `TableGroup` can then be serialzed:
```python
tg.to_file('csvw-metadata.json')
```
Note that the CSVW metadata file must be written to the Data Package's directory
to make sure relative paths to data resources work.
This functionality - together with the schema inference capabilities
of [`frictionless describe`](https://framework.frictionlessdata.io/docs/guides/describing-data/) - provides
a convenient way to bootstrap CSVW metadata for a set of "raw" CSV
files, implemented in the [`csvwdescribe` command described above](#csvwdescribe).
## See also
- https://www.w3.org/2013/csvw/wiki/Main_Page
- https://csvw.org
- https://github.com/CLARIAH/COW
- https://github.com/CLARIAH/ruminator
- https://github.com/bloomberg/pycsvw
- https://specs.frictionlessdata.io/table-schema/
- https://github.com/theodi/csvlint.rb
- https://github.com/ruby-rdf/rdf-tabular
- https://github.com/rdf-ext/rdf-parser-csvw
- https://github.com/Robsteranium/csvwr
## License
This package is distributed under the [Apache 2.0 license](https://opensource.org/licenses/Apache-2.0).
Raw data
{
"_id": null,
"home_page": "https://github.com/cldf/csvw",
"name": "csvw",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "csv, w3c, tabular-data",
"author": "Robert Forkel",
"author_email": "robert_forkel@eva.mpg.de",
"download_url": "https://files.pythonhosted.org/packages/5b/f0/1f76b07f2ddedc0e3b3b6e746c820de7028c6ac44416876a3ddc8243beb0/csvw-3.5.1.tar.gz",
"platform": "any",
"description": "# csvw\n\n[![Build Status](https://github.com/cldf/csvw/workflows/tests/badge.svg)](https://github.com/cldf/csvw/actions?query=workflow%3Atests)\n[![PyPI](https://img.shields.io/pypi/v/csvw.svg)](https://pypi.org/project/csvw)\n[![Documentation Status](https://readthedocs.org/projects/csvw/badge/?version=latest)](https://csvw.readthedocs.io/en/latest/?badge=latest)\n\n\nThis package provides\n- a Python API to read and write relational, tabular data according to the [CSV on the Web](https://csvw.org/) specification and \n- commandline tools for reading and validating CSVW data.\n\n\n## Links\n\n- GitHub: https://github.com/cldf/csvw\n- PyPI: https://pypi.org/project/csvw\n- Issue Tracker: https://github.com/cldf/csvw/issues\n\n\n## Installation\n\nThis package runs under Python >=3.8, use pip to install:\n\n```bash\n$ pip install csvw\n```\n\n\n## CLI\n\n### `csvw2json`\n\nConverting CSVW data [to JSON](https://www.w3.org/TR/csv2json/)\n\n```shell\n$ csvw2json tests/fixtures/zipped-metadata.json \n{\n \"tables\": [\n {\n \"url\": \"tests/fixtures/zipped.csv\",\n \"row\": [\n {\n \"url\": \"tests/fixtures/zipped.csv#row=2\",\n \"rownum\": 1,\n \"describes\": [\n {\n \"ID\": \"abc\",\n \"Value\": \"the value\"\n }\n ]\n },\n {\n \"url\": \"tests/fixtures/zipped.csv#row=3\",\n \"rownum\": 2,\n \"describes\": [\n {\n \"ID\": \"cde\",\n \"Value\": \"another one\"\n }\n ]\n }\n ]\n }\n ]\n}\n```\n\n### `csvwvalidate`\n\nValidating CSVW data\n\n```shell\n$ csvwvalidate tests/fixtures/zipped-metadata.json \nOK\n```\n\n### `csvwdescribe`\n\nDescribing tabular-data files with CSVW metadata\n\n```shell\n$ csvwdescribe --delimiter \"|\" tests/fixtures/frictionless-data.csv\n{\n \"@context\": \"http://www.w3.org/ns/csvw\",\n \"dc:conformsTo\": \"data-package\",\n \"tables\": [\n {\n \"dialect\": {\n \"delimiter\": \"|\"\n },\n \"tableSchema\": {\n \"columns\": [\n {\n \"datatype\": \"string\",\n \"name\": \"FK\"\n },\n {\n \"datatype\": \"integer\",\n \"name\": \"Year\"\n },\n {\n \"datatype\": \"string\",\n \"name\": \"Location name\"\n },\n {\n \"datatype\": \"string\",\n \"name\": \"Value\"\n },\n {\n \"datatype\": \"string\",\n \"name\": \"binary\"\n },\n {\n \"datatype\": \"string\",\n \"name\": \"anyURI\"\n },\n {\n \"datatype\": \"string\",\n \"name\": \"email\"\n },\n {\n \"datatype\": \"string\",\n \"name\": \"boolean\"\n },\n {\n \"datatype\": {\n \"dc:format\": \"application/json\",\n \"base\": \"json\"\n },\n \"name\": \"array\"\n },\n {\n \"datatype\": {\n \"dc:format\": \"application/json\",\n \"base\": \"json\"\n },\n \"name\": \"geojson\"\n }\n ]\n },\n \"url\": \"tests/fixtures/frictionless-data.csv\"\n }\n ]\n}\n```\n\n\n## Python API\n\nFind the Python API documentation at [csvw.readthedocs.io](https://csvw.readthedocs.io/en/latest/).\n\nA quick example for using `csvw` from Python code:\n\n```python\nimport json\nfrom csvw import CSVW\ndata = CSVW('https://raw.githubusercontent.com/cldf/csvw/master/tests/fixtures/test.tsv')\nprint(json.dumps(data.to_json(minimal=True), indent=4))\n[\n {\n \"province\": \"Hello\",\n \"territory\": \"world\",\n \"precinct\": \"1\"\n }\n]\n```\n\n\n## Known limitations\n\n- We read **all** data which is specified as UTF-8 encoded using the \n [`utf-8-sig` codecs](https://docs.python.org/3/library/codecs.html#module-encodings.utf_8_sig).\n Thus, if such data starts with `U+FEFF` this will be interpreted as [BOM](https://en.wikipedia.org/wiki/Byte_order_mark)\n and skipped.\n- Low level CSV parsing is delegated to the `csv` module in Python's standard library. Thus, if a `commentPrefix`\n is specified in a `Dialect` instance, this will lead to skipping rows where the first value starts\n with `commentPrefix`, **even if the value was quoted**.\n- Also, cell content containing `escapechar` may not be round-tripped as expected (when specifying\n `escapechar` or a `csvw.Dialect` with `quoteChar` but `doubleQuote==False`),\n when minimal quoting is specified. This is due to inconsistent `csv` behaviour\n across Python versions (see https://bugs.python.org/issue44861).\n\n\n## CSVW conformance\n\nWhile we use the CSVW specification as guideline, this package does not (and \nprobably never will) implement the full extent of this spec.\n\n- When CSV files with a header are read, columns are not matched in order with\n column descriptions in the `tableSchema`, but instead are matched based on the\n CSV column header and the column descriptions' `name` and `titles` atributes.\n This allows for more flexibility, because columns in the CSV file may be\n re-ordered without invalidating the metadata. A stricter matching can be forced\n by specifying `\"header\": false` and `\"skipRows\": 1` in the table's dialect\n description.\n\nHowever, `csvw.CSVW` works correctly for\n- 269 out of 270 [JSON tests](https://w3c.github.io/csvw/tests/#manifest-json),\n- 280 out of 282 [validation tests](https://w3c.github.io/csvw/tests/#manifest-validation),\n- 10 out of 18 [non-normative tests](https://w3c.github.io/csvw/tests/#manifest-nonnorm)\n\nfrom the [CSVW Test suites](https://w3c.github.io/csvw/tests/).\n\n\n## Compatibility with [Frictionless Data Specs](https://specs.frictionlessdata.io/)\n\nA CSVW-described dataset is basically equivalent to a Frictionless DataPackage where all \n[Data Resources](https://specs.frictionlessdata.io/data-resource/) are [Tabular Data](https://specs.frictionlessdata.io/tabular-data-resource/).\nThus, the `csvw` package provides some conversion functionality. To\n\"read CSVW data from a Data Package\", there's the `csvw.TableGroup.from_frictionless_datapackage` method:\n```python\nfrom csvw import TableGroup\ntg = TableGroup.from_frictionless_datapackage('PATH/TO/datapackage.json')\n```\nTo convert the metadata, the `TableGroup` can then be serialzed:\n```python\ntg.to_file('csvw-metadata.json')\n```\n\nNote that the CSVW metadata file must be written to the Data Package's directory\nto make sure relative paths to data resources work.\n\nThis functionality - together with the schema inference capabilities\nof [`frictionless describe`](https://framework.frictionlessdata.io/docs/guides/describing-data/) - provides\na convenient way to bootstrap CSVW metadata for a set of \"raw\" CSV\nfiles, implemented in the [`csvwdescribe` command described above](#csvwdescribe).\n\n\n## See also\n\n- https://www.w3.org/2013/csvw/wiki/Main_Page\n- https://csvw.org\n- https://github.com/CLARIAH/COW\n- https://github.com/CLARIAH/ruminator\n- https://github.com/bloomberg/pycsvw\n- https://specs.frictionlessdata.io/table-schema/\n- https://github.com/theodi/csvlint.rb\n- https://github.com/ruby-rdf/rdf-tabular\n- https://github.com/rdf-ext/rdf-parser-csvw\n- https://github.com/Robsteranium/csvwr\n\n\n## License\n\nThis package is distributed under the [Apache 2.0 license](https://opensource.org/licenses/Apache-2.0).\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Python library to work with CSVW described tabular data",
"version": "3.5.1",
"project_urls": {
"Bug Tracker": "https://github.com/cldf/csvw/issues",
"Homepage": "https://github.com/cldf/csvw"
},
"split_keywords": [
"csv",
" w3c",
" tabular-data"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1384069db1325f5a6dd034524c0099da3978de8fdb6242ce63223ead188da940",
"md5": "965225fc80d0de5d86f307e459f7b46e",
"sha256": "8dd3864aae51bfd943713a62ec2c6688d3f406a9627b5f16de2479b1281febe5"
},
"downloads": -1,
"filename": "csvw-3.5.1-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "965225fc80d0de5d86f307e459f7b46e",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.8",
"size": 59881,
"upload_time": "2024-10-24T05:32:30",
"upload_time_iso_8601": "2024-10-24T05:32:30.959023Z",
"url": "https://files.pythonhosted.org/packages/13/84/069db1325f5a6dd034524c0099da3978de8fdb6242ce63223ead188da940/csvw-3.5.1-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5bf01f76b07f2ddedc0e3b3b6e746c820de7028c6ac44416876a3ddc8243beb0",
"md5": "c21b300b9466bc85d04ad0440d070d3a",
"sha256": "e13dfcbf56a51f66bd4c6b442b8b32b3fee4b615dcd9436149832b41572938f3"
},
"downloads": -1,
"filename": "csvw-3.5.1.tar.gz",
"has_sig": false,
"md5_digest": "c21b300b9466bc85d04ad0440d070d3a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 73493,
"upload_time": "2024-10-24T05:32:33",
"upload_time_iso_8601": "2024-10-24T05:32:33.110327Z",
"url": "https://files.pythonhosted.org/packages/5b/f0/1f76b07f2ddedc0e3b3b6e746c820de7028c6ac44416876a3ddc8243beb0/csvw-3.5.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-24 05:32:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cldf",
"github_project": "csvw",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "csvw"
}