csvw


Namecsvw JSON
Version 3.3.0 PyPI version JSON
download
home_pagehttps://github.com/cldf/csvw
SummaryPython library to work with CSVW described tabular data
upload_time2024-01-18 11:11:22
maintainer
docs_urlNone
authorRobert Forkel
requires_python>=3.8
licenseApache 2.0
keywords csv w3c tabular-data
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # csvw

[![Build Status](https://github.com/cldf/csvw/workflows/tests/badge.svg)](https://github.com/cldf/csvw/actions?query=workflow%3Atests)
[![PyPI](https://img.shields.io/pypi/v/csvw.svg)](https://pypi.org/project/csvw)
[![Documentation Status](https://readthedocs.org/projects/csvw/badge/?version=latest)](https://csvw.readthedocs.io/en/latest/?badge=latest)


This package provides
- a Python API to read and write relational, tabular data according to the [CSV on the Web](https://csvw.org/) specification and 
- commandline tools for reading and validating CSVW data.


## Links

- GitHub: https://github.com/cldf/csvw
- PyPI: https://pypi.org/project/csvw
- Issue Tracker: https://github.com/cldf/csvw/issues


## Installation

This package runs under Python >=3.8, use pip to install:

```bash
$ pip install csvw
```


## CLI

### `csvw2json`

Converting CSVW data [to JSON](https://www.w3.org/TR/csv2json/)

```shell
$ csvw2json tests/fixtures/zipped-metadata.json 
{
    "tables": [
        {
            "url": "tests/fixtures/zipped.csv",
            "row": [
                {
                    "url": "tests/fixtures/zipped.csv#row=2",
                    "rownum": 1,
                    "describes": [
                        {
                            "ID": "abc",
                            "Value": "the value"
                        }
                    ]
                },
                {
                    "url": "tests/fixtures/zipped.csv#row=3",
                    "rownum": 2,
                    "describes": [
                        {
                            "ID": "cde",
                            "Value": "another one"
                        }
                    ]
                }
            ]
        }
    ]
}
```

### `csvwvalidate`

Validating CSVW data

```shell
$ csvwvalidate tests/fixtures/zipped-metadata.json 
OK
```

### `csvwdescribe`

Describing tabular-data files with CSVW metadata

```shell
$ csvwdescribe --delimiter "|" tests/fixtures/frictionless-data.csv
{
    "@context": "http://www.w3.org/ns/csvw",
    "dc:conformsTo": "data-package",
    "tables": [
        {
            "dialect": {
                "delimiter": "|"
            },
            "tableSchema": {
                "columns": [
                    {
                        "datatype": "string",
                        "name": "FK"
                    },
                    {
                        "datatype": "integer",
                        "name": "Year"
                    },
                    {
                        "datatype": "string",
                        "name": "Location name"
                    },
                    {
                        "datatype": "string",
                        "name": "Value"
                    },
                    {
                        "datatype": "string",
                        "name": "binary"
                    },
                    {
                        "datatype": "string",
                        "name": "anyURI"
                    },
                    {
                        "datatype": "string",
                        "name": "email"
                    },
                    {
                        "datatype": "string",
                        "name": "boolean"
                    },
                    {
                        "datatype": {
                            "dc:format": "application/json",
                            "base": "json"
                        },
                        "name": "array"
                    },
                    {
                        "datatype": {
                            "dc:format": "application/json",
                            "base": "json"
                        },
                        "name": "geojson"
                    }
                ]
            },
            "url": "tests/fixtures/frictionless-data.csv"
        }
    ]
}
```


## Python API

Find the Python API documentation at [csvw.readthedocs.io](https://csvw.readthedocs.io/en/latest/).

A quick example for using `csvw` from Python code:

```python
import json
from csvw import CSVW
data = CSVW('https://raw.githubusercontent.com/cldf/csvw/master/tests/fixtures/test.tsv')
print(json.dumps(data.to_json(minimal=True), indent=4))
[
    {
        "province": "Hello",
        "territory": "world",
        "precinct": "1"
    }
]
```


## Known limitations

- We read **all** data which is specified as UTF-8 encoded using the 
  [`utf-8-sig` codecs](https://docs.python.org/3/library/codecs.html#module-encodings.utf_8_sig).
  Thus, if such data starts with `U+FEFF` this will be interpreted as [BOM](https://en.wikipedia.org/wiki/Byte_order_mark)
  and skipped.
- Low level CSV parsing is delegated to the `csv` module in Python's standard library. Thus, if a `commentPrefix`
  is specified in a `Dialect` instance, this will lead to skipping rows where the first value starts
  with `commentPrefix`, **even if the value was quoted**.
- Also, cell content containing `escapechar` may not be round-tripped as expected (when specifying
  `escapechar` or a `csvw.Dialect` with `quoteChar` but `doubleQuote==False`),
  when minimal quoting is specified. This is due to inconsistent `csv` behaviour
  across Python versions (see https://bugs.python.org/issue44861).


## CSVW conformance

While we use the CSVW specification as guideline, this package does not (and 
probably never will) implement the full extent of this spec.

- When CSV files with a header are read, columns are not matched in order with
  column descriptions in the `tableSchema`, but instead are matched based on the
  CSV column header and the column descriptions' `name` and `titles` atributes.
  This allows for more flexibility, because columns in the CSV file may be
  re-ordered without invalidating the metadata. A stricter matching can be forced
  by specifying `"header": false` and `"skipRows": 1` in the table's dialect
  description.

However, `csvw.CSVW` works correctly for
- 269 out of 270 [JSON tests](https://w3c.github.io/csvw/tests/#manifest-json),
- 280 out of 282 [validation tests](https://w3c.github.io/csvw/tests/#manifest-validation),
- 10 out of 18 [non-normative tests](https://w3c.github.io/csvw/tests/#manifest-nonnorm)

from the [CSVW Test suites](https://w3c.github.io/csvw/tests/).


## Compatibility with [Frictionless Data Specs](https://specs.frictionlessdata.io/)

A CSVW-described dataset is basically equivalent to a Frictionless DataPackage where all 
[Data Resources](https://specs.frictionlessdata.io/data-resource/) are [Tabular Data](https://specs.frictionlessdata.io/tabular-data-resource/).
Thus, the `csvw` package provides some conversion functionality. To
"read CSVW data from a Data Package", there's the `csvw.TableGroup.from_frictionless_datapackage` method:
```python
from csvw import TableGroup
tg = TableGroup.from_frictionless_datapackage('PATH/TO/datapackage.json')
```
To convert the metadata, the `TableGroup` can then be serialzed:
```python
tg.to_file('csvw-metadata.json')
```

Note that the CSVW metadata file must be written to the Data Package's directory
to make sure relative paths to data resources work.

This functionality - together with the schema inference capabilities
of [`frictionless describe`](https://framework.frictionlessdata.io/docs/guides/describing-data/) - provides
a convenient way to bootstrap CSVW metadata for a set of "raw" CSV
files, implemented in the [`csvwdescribe` command described above](#csvwdescribe).


## See also

- https://www.w3.org/2013/csvw/wiki/Main_Page
- https://csvw.org
- https://github.com/CLARIAH/COW
- https://github.com/CLARIAH/ruminator
- https://github.com/bloomberg/pycsvw
- https://specs.frictionlessdata.io/table-schema/
- https://github.com/theodi/csvlint.rb
- https://github.com/ruby-rdf/rdf-tabular
- https://github.com/rdf-ext/rdf-parser-csvw
- https://github.com/Robsteranium/csvwr


## License

This package is distributed under the [Apache 2.0 license](https://opensource.org/licenses/Apache-2.0).



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/cldf/csvw",
    "name": "csvw",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "csv,w3c,tabular-data",
    "author": "Robert Forkel",
    "author_email": "robert_forkel@eva.mpg.de",
    "download_url": "https://files.pythonhosted.org/packages/5b/29/9613d6e9913e100f0afc489c836cd86a8b0d2d6ff6c75313ce0b1579dcce/csvw-3.3.0.tar.gz",
    "platform": "any",
    "description": "# csvw\n\n[![Build Status](https://github.com/cldf/csvw/workflows/tests/badge.svg)](https://github.com/cldf/csvw/actions?query=workflow%3Atests)\n[![PyPI](https://img.shields.io/pypi/v/csvw.svg)](https://pypi.org/project/csvw)\n[![Documentation Status](https://readthedocs.org/projects/csvw/badge/?version=latest)](https://csvw.readthedocs.io/en/latest/?badge=latest)\n\n\nThis package provides\n- a Python API to read and write relational, tabular data according to the [CSV on the Web](https://csvw.org/) specification and \n- commandline tools for reading and validating CSVW data.\n\n\n## Links\n\n- GitHub: https://github.com/cldf/csvw\n- PyPI: https://pypi.org/project/csvw\n- Issue Tracker: https://github.com/cldf/csvw/issues\n\n\n## Installation\n\nThis package runs under Python >=3.8, use pip to install:\n\n```bash\n$ pip install csvw\n```\n\n\n## CLI\n\n### `csvw2json`\n\nConverting CSVW data [to JSON](https://www.w3.org/TR/csv2json/)\n\n```shell\n$ csvw2json tests/fixtures/zipped-metadata.json \n{\n    \"tables\": [\n        {\n            \"url\": \"tests/fixtures/zipped.csv\",\n            \"row\": [\n                {\n                    \"url\": \"tests/fixtures/zipped.csv#row=2\",\n                    \"rownum\": 1,\n                    \"describes\": [\n                        {\n                            \"ID\": \"abc\",\n                            \"Value\": \"the value\"\n                        }\n                    ]\n                },\n                {\n                    \"url\": \"tests/fixtures/zipped.csv#row=3\",\n                    \"rownum\": 2,\n                    \"describes\": [\n                        {\n                            \"ID\": \"cde\",\n                            \"Value\": \"another one\"\n                        }\n                    ]\n                }\n            ]\n        }\n    ]\n}\n```\n\n### `csvwvalidate`\n\nValidating CSVW data\n\n```shell\n$ csvwvalidate tests/fixtures/zipped-metadata.json \nOK\n```\n\n### `csvwdescribe`\n\nDescribing tabular-data files with CSVW metadata\n\n```shell\n$ csvwdescribe --delimiter \"|\" tests/fixtures/frictionless-data.csv\n{\n    \"@context\": \"http://www.w3.org/ns/csvw\",\n    \"dc:conformsTo\": \"data-package\",\n    \"tables\": [\n        {\n            \"dialect\": {\n                \"delimiter\": \"|\"\n            },\n            \"tableSchema\": {\n                \"columns\": [\n                    {\n                        \"datatype\": \"string\",\n                        \"name\": \"FK\"\n                    },\n                    {\n                        \"datatype\": \"integer\",\n                        \"name\": \"Year\"\n                    },\n                    {\n                        \"datatype\": \"string\",\n                        \"name\": \"Location name\"\n                    },\n                    {\n                        \"datatype\": \"string\",\n                        \"name\": \"Value\"\n                    },\n                    {\n                        \"datatype\": \"string\",\n                        \"name\": \"binary\"\n                    },\n                    {\n                        \"datatype\": \"string\",\n                        \"name\": \"anyURI\"\n                    },\n                    {\n                        \"datatype\": \"string\",\n                        \"name\": \"email\"\n                    },\n                    {\n                        \"datatype\": \"string\",\n                        \"name\": \"boolean\"\n                    },\n                    {\n                        \"datatype\": {\n                            \"dc:format\": \"application/json\",\n                            \"base\": \"json\"\n                        },\n                        \"name\": \"array\"\n                    },\n                    {\n                        \"datatype\": {\n                            \"dc:format\": \"application/json\",\n                            \"base\": \"json\"\n                        },\n                        \"name\": \"geojson\"\n                    }\n                ]\n            },\n            \"url\": \"tests/fixtures/frictionless-data.csv\"\n        }\n    ]\n}\n```\n\n\n## Python API\n\nFind the Python API documentation at [csvw.readthedocs.io](https://csvw.readthedocs.io/en/latest/).\n\nA quick example for using `csvw` from Python code:\n\n```python\nimport json\nfrom csvw import CSVW\ndata = CSVW('https://raw.githubusercontent.com/cldf/csvw/master/tests/fixtures/test.tsv')\nprint(json.dumps(data.to_json(minimal=True), indent=4))\n[\n    {\n        \"province\": \"Hello\",\n        \"territory\": \"world\",\n        \"precinct\": \"1\"\n    }\n]\n```\n\n\n## Known limitations\n\n- We read **all** data which is specified as UTF-8 encoded using the \n  [`utf-8-sig` codecs](https://docs.python.org/3/library/codecs.html#module-encodings.utf_8_sig).\n  Thus, if such data starts with `U+FEFF` this will be interpreted as [BOM](https://en.wikipedia.org/wiki/Byte_order_mark)\n  and skipped.\n- Low level CSV parsing is delegated to the `csv` module in Python's standard library. Thus, if a `commentPrefix`\n  is specified in a `Dialect` instance, this will lead to skipping rows where the first value starts\n  with `commentPrefix`, **even if the value was quoted**.\n- Also, cell content containing `escapechar` may not be round-tripped as expected (when specifying\n  `escapechar` or a `csvw.Dialect` with `quoteChar` but `doubleQuote==False`),\n  when minimal quoting is specified. This is due to inconsistent `csv` behaviour\n  across Python versions (see https://bugs.python.org/issue44861).\n\n\n## CSVW conformance\n\nWhile we use the CSVW specification as guideline, this package does not (and \nprobably never will) implement the full extent of this spec.\n\n- When CSV files with a header are read, columns are not matched in order with\n  column descriptions in the `tableSchema`, but instead are matched based on the\n  CSV column header and the column descriptions' `name` and `titles` atributes.\n  This allows for more flexibility, because columns in the CSV file may be\n  re-ordered without invalidating the metadata. A stricter matching can be forced\n  by specifying `\"header\": false` and `\"skipRows\": 1` in the table's dialect\n  description.\n\nHowever, `csvw.CSVW` works correctly for\n- 269 out of 270 [JSON tests](https://w3c.github.io/csvw/tests/#manifest-json),\n- 280 out of 282 [validation tests](https://w3c.github.io/csvw/tests/#manifest-validation),\n- 10 out of 18 [non-normative tests](https://w3c.github.io/csvw/tests/#manifest-nonnorm)\n\nfrom the [CSVW Test suites](https://w3c.github.io/csvw/tests/).\n\n\n## Compatibility with [Frictionless Data Specs](https://specs.frictionlessdata.io/)\n\nA CSVW-described dataset is basically equivalent to a Frictionless DataPackage where all \n[Data Resources](https://specs.frictionlessdata.io/data-resource/) are [Tabular Data](https://specs.frictionlessdata.io/tabular-data-resource/).\nThus, the `csvw` package provides some conversion functionality. To\n\"read CSVW data from a Data Package\", there's the `csvw.TableGroup.from_frictionless_datapackage` method:\n```python\nfrom csvw import TableGroup\ntg = TableGroup.from_frictionless_datapackage('PATH/TO/datapackage.json')\n```\nTo convert the metadata, the `TableGroup` can then be serialzed:\n```python\ntg.to_file('csvw-metadata.json')\n```\n\nNote that the CSVW metadata file must be written to the Data Package's directory\nto make sure relative paths to data resources work.\n\nThis functionality - together with the schema inference capabilities\nof [`frictionless describe`](https://framework.frictionlessdata.io/docs/guides/describing-data/) - provides\na convenient way to bootstrap CSVW metadata for a set of \"raw\" CSV\nfiles, implemented in the [`csvwdescribe` command described above](#csvwdescribe).\n\n\n## See also\n\n- https://www.w3.org/2013/csvw/wiki/Main_Page\n- https://csvw.org\n- https://github.com/CLARIAH/COW\n- https://github.com/CLARIAH/ruminator\n- https://github.com/bloomberg/pycsvw\n- https://specs.frictionlessdata.io/table-schema/\n- https://github.com/theodi/csvlint.rb\n- https://github.com/ruby-rdf/rdf-tabular\n- https://github.com/rdf-ext/rdf-parser-csvw\n- https://github.com/Robsteranium/csvwr\n\n\n## License\n\nThis package is distributed under the [Apache 2.0 license](https://opensource.org/licenses/Apache-2.0).\n\n\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Python library to work with CSVW described tabular data",
    "version": "3.3.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/cldf/csvw/issues",
        "Homepage": "https://github.com/cldf/csvw"
    },
    "split_keywords": [
        "csv",
        "w3c",
        "tabular-data"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0d1b85a62fc18a098560c5f6477dc8029ae895e913fd5d14b152ecec6143b5f6",
                "md5": "d1a7ed8f86afcaa48168104afa164a27",
                "sha256": "a8fc72d2a6ab36f0b9a8dab1c9a49ee5bbef1e6aa4b2a82076b0a91aa3eabb2f"
            },
            "downloads": -1,
            "filename": "csvw-3.3.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d1a7ed8f86afcaa48168104afa164a27",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.8",
            "size": 57837,
            "upload_time": "2024-01-18T11:11:20",
            "upload_time_iso_8601": "2024-01-18T11:11:20.126713Z",
            "url": "https://files.pythonhosted.org/packages/0d/1b/85a62fc18a098560c5f6477dc8029ae895e913fd5d14b152ecec6143b5f6/csvw-3.3.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5b299613d6e9913e100f0afc489c836cd86a8b0d2d6ff6c75313ce0b1579dcce",
                "md5": "9b49aa1513d7cadecf0c5e5eb8112d9b",
                "sha256": "59b6c4c725fb02138b3adb5e678e7b94f3baf7f8286c958fbd6d9d9aac5540d7"
            },
            "downloads": -1,
            "filename": "csvw-3.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "9b49aa1513d7cadecf0c5e5eb8112d9b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 71403,
            "upload_time": "2024-01-18T11:11:22",
            "upload_time_iso_8601": "2024-01-18T11:11:22.742947Z",
            "url": "https://files.pythonhosted.org/packages/5b/29/9613d6e9913e100f0afc489c836cd86a8b0d2d6ff6c75313ce0b1579dcce/csvw-3.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-18 11:11:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cldf",
    "github_project": "csvw",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "csvw"
}
        
Elapsed time: 0.16495s