## CSV on the Web (CoW)
> CoW is a tool to convert a .csv file into Linked Data. Specifically, CoW is an integrated CSV to RDF converter using the W3C standard [CSVW](https://www.w3.org/TR/tabular-data-primer/) for rich semantic table specificatons, producing [nanopublications](http://nanopub.org/) as an output RDF model. CoW converts any CSV file into an RDF dataset.
### Features
- Expressive CSVW-compatible schemas based on the [Jinja](https://github.com/pallets/jinja) template enginge.
- Highly efficient implementation leveraging multithreaded and multicore architectures.
- Available as a [Docker image](#docker-image), [command line interface (CLI) tool](command-line-interface), and [library](#library).
### Documentation and support
For user documentation see the [basic introduction video](https://t.co/SDWC3NhWZf) and the [GitHub wiki](https://github.com/clariah/cow/wiki/). [Technical details](#technical-details) are provided below. If you encounter an issue then please [report](https://github.com/CLARIAH/COW/issues/new/choose) it. Also feel free to create pull requests.
## Quick Start Guide
There are two ways to run CoW. The quickest is via Docker, the more flexible via PIP.
### Docker Image
Several data science tools, including CoW, are available via a [Docker image](https://github.com/CLARIAH/datalegendtools).
#### Install
First, install the Docker virtualisation engine on your computer. Instructions on how to accomplish this can be found on the [official Docker website](https://docs.docker.com/get-docker). Use the following command in the Docker terminal:
```
# docker pull wxwilcke/datalegend
```
Here, the #-symbol refers to the terminal of a user with administrative privileges on your machine and is not part of the command.
After the image has successfully been downloaded (or 'pulled'), the container can be run as follows:
```
# docker run --rm -p 3000:3000 -it wxwilcke/datalegend
```
The virtual system can now be accessed by opening [http://localhost:3000/wetty](http://localhost:3000/wetty) in your preferred browser, and by logging in using username **datalegend** and password **datalegend**.
For detailed instructions on this Docker image, see [DataLegend Playground](https://github.com/CLARIAH/datalegendtools). For instructions on how to use the tool, see [usage](#usage) below.
### Command Line Interface (CLI)
The Command Line Interface (CLI) is the recommended way of using CoW for most users.
#### Install
> Check whether the latest version of Python is installed on your device. For Windows/MacOS we recommend to install Python via the [official distribution page](https://www.python.org/downloads/).
The recommended method of installing CoW on your system is `pip3`:
```
pip3 install cow-csvw
```
You can upgrade your currently installed version with:
```
pip3 install cow-csvw --upgrade
```
Possible installation issues:
- Permission issues. You can get around them by installing CoW in user space: `pip3 install cow-csvw --user`.
- Cannot find command: make sure your binary user directory (typically something like `/Users/user/Library/Python/3.7/bin` in MacOS or `/home/user/.local/bin` in Linux) is in your PATH (in MacOS: `/etc/paths`).
- Please [report your unlisted issue](https://github.com/CLARIAH/CoW/issues/new).
#### Usage
Start the graphical interface by entering the following command:
```
cow_tool
```
Select a CSV file and click `build` to generate a file named `myfile.csv-metadata.json` (JSON schema file) with your mappings. Edit this file (optional) and then click `convert` to convert the CSV file to RDF. The output should be a `myfile.csv.nq` RDF file (nquads by default).
#### Command Line Interface
The straightforward CSV to RDF conversion is done by entering the following commands:
```
cow_tool_cli build myfile.csv
```
This will create a file named `myfile.csv-metadata.json` (JSON schema file). Next:
```
cow_tool_cli convert myfile.csv
```
This command will output a `myfile.csv.nq` RDF file (nquads by default).
You don't need to worry about the JSON file, unless you want to change the metadata schema. To control the base URI namespace, URIs used in predicates, virtual columns, etcetera, edit the `myfile.csv-metadata.json` file and/or use CoW commands. For instance, you can control the output RDF serialization (with e.g. ``--format turtle``). Have a look at the [options](#options) below, the examples in the [GitHub wiki](https://github.com/CLARIAH/CoW/wiki), and the [technical documentation](http://csvw-converter.readthedocs.io/en/latest/).
##### Options
Check the ``--help`` for a complete list of options:
```
usage: cow_tool_cli [-h] [--dataset DATASET] [--delimiter DELIMITER]
[--quotechar QUOTECHAR] [--encoding ENCODING] [--processes PROCESSES]
[--chunksize CHUNKSIZE] [--base BASE]
[--format [{xml,n3,turtle,nt,pretty-xml,trix,trig,nquads}]]
[--gzip] [--version]
{convert,build} file [file ...]
Not nearly CSVW compliant schema builder and RDF converter
positional arguments:
{convert,build} Use the schema of the `file` specified to convert it
to RDF, or build a schema from scratch.
file Path(s) of the file(s) that should be used for
building or converting. Must be a CSV file.
optional arguments:
-h, --help show this help message and exit
--dataset DATASET A short name (slug) for the name of the dataset (will
use input file name if not specified)
--delimiter DELIMITER
The delimiter used in the CSV file(s)
--quotechar QUOTECHAR
The character used as quotation character in the CSV
file(s)
--encoding ENCODING The character encoding used in the CSV file(s)
--processes PROCESSES
The number of processes the converter should use
--chunksize CHUNKSIZE
The number of rows processed at each time
--base BASE The base for URIs generated with the schema (only
relevant when `build`ing a schema)
--gzip Compress the output file using gzip
--format [{xml,n3,turtle,nt,pretty-xml,trix,trig,nquads}], -f [{xml,n3,turtle,nt,pretty-xml,trix,trig,nquads}]
RDF serialization format
--version show program's version number and exit
```
### Library
Once installed, CoW can be used as a library as follows:
```
from cow_csvw.csvw_tool import COW
import os
COW(mode='build', files=[os.path.join(path, filename)], dataset='My dataset', delimiter=';', quotechar='\"')
COW(mode='convert', files=[os.path.join(path, filename)], dataset='My dataset', delimiter=';', quotechar='\"', processes=4, chunksize=100, base='http://example.org/my-dataset', format='turtle', gzipped=False)
```
## Further Information
### Examples
The [GitHub wiki](https://github.com/CLARIAH/COW/wiki) provides more hands-on examples of transposing CSVs into Linked Data.
### Technical documentation
Technical documentation for CoW are maintained in this GitHub repository (under <docs>), and published through [Read the Docs](http://readthedocs.org) at <http://csvw-converter.readthedocs.io/en/latest/>.
To build the documentation from source, change into the `docs` directory, and run `make html`. This should produce an HTML version of the documentation in the `_build/html` directory.
### License
MIT License (see [license.txt](license.txt))
### Acknowledgements
**Authors:** Albert Meroño-Peñuela, Roderick van der Weerdt, Rinke Hoekstra, Kathrin Dentler, Auke Rijpma, Richard Zijdeman, Melvin Roest, Xander Wilcke
**Copyright:** Vrije Universiteit Amsterdam, Utrecht University, International Institute of Social History
CoW is developed and maintained by the [CLARIAH project](https://www.clariah.nl) and funded by NWO.
Raw data
{
"_id": null,
"home_page": "https://github.com/CLARIAH/COW",
"name": "cow-csvw",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "csv,rdf,csvw",
"author": "Albert Mero\u00f1o-Pe\u00f1uela, Roderick van der Weerdt, Rinke Hoekstra, Kathrin Dentler, Auke Rijpma, Richard Zijdeman, Melvin Roest, Xander Wilcke",
"author_email": "albert.merono@vu.nl",
"download_url": "https://files.pythonhosted.org/packages/14/24/6eb7a76f272b61b597627720acdd4507e261e8403f8c18287ee9f0df9b56/cow_csvw-2.0.0.tar.gz",
"platform": null,
"description": "## CSV on the Web (CoW)\n\n> CoW is a tool to convert a .csv file into Linked Data. Specifically, CoW is an integrated CSV to RDF converter using the W3C standard [CSVW](https://www.w3.org/TR/tabular-data-primer/) for rich semantic table specificatons, producing [nanopublications](http://nanopub.org/) as an output RDF model. CoW converts any CSV file into an RDF dataset.\n\n\n\n### Features\n\n- Expressive CSVW-compatible schemas based on the [Jinja](https://github.com/pallets/jinja) template enginge.\n- Highly efficient implementation leveraging multithreaded and multicore architectures.\n- Available as a [Docker image](#docker-image), [command line interface (CLI) tool](command-line-interface), and [library](#library).\n\n### Documentation and support\nFor user documentation see the [basic introduction video](https://t.co/SDWC3NhWZf) and the [GitHub wiki](https://github.com/clariah/cow/wiki/). [Technical details](#technical-details) are provided below. If you encounter an issue then please [report](https://github.com/CLARIAH/COW/issues/new/choose) it. Also feel free to create pull requests.\n\n## Quick Start Guide\n\nThere are two ways to run CoW. The quickest is via Docker, the more flexible via PIP.\n\n### Docker Image\n\nSeveral data science tools, including CoW, are available via a [Docker image](https://github.com/CLARIAH/datalegendtools).\n\n#### Install\n\nFirst, install the Docker virtualisation engine on your computer. Instructions on how to accomplish this can be found on the [official Docker website](https://docs.docker.com/get-docker). Use the following command in the Docker terminal:\n\n```\n# docker pull wxwilcke/datalegend\n```\nHere, the #-symbol refers to the terminal of a user with administrative privileges on your machine and is not part of the command.\n\nAfter the image has successfully been downloaded (or 'pulled'), the container can be run as follows:\n\n```\n# docker run --rm -p 3000:3000 -it wxwilcke/datalegend\n```\nThe virtual system can now be accessed by opening [http://localhost:3000/wetty](http://localhost:3000/wetty) in your preferred browser, and by logging in using username **datalegend** and password **datalegend**.\n\nFor detailed instructions on this Docker image, see [DataLegend Playground](https://github.com/CLARIAH/datalegendtools). For instructions on how to use the tool, see [usage](#usage) below.\n\n\n\n### Command Line Interface (CLI)\n\nThe Command Line Interface (CLI) is the recommended way of using CoW for most users.\n\n#### Install\n\n> Check whether the latest version of Python is installed on your device. For Windows/MacOS we recommend to install Python via the [official distribution page](https://www.python.org/downloads/).\n\nThe recommended method of installing CoW on your system is `pip3`:\n\n```\npip3 install cow-csvw\n```\n\nYou can upgrade your currently installed version with:\n\n```\npip3 install cow-csvw --upgrade\n```\n\nPossible installation issues:\n\n- Permission issues. You can get around them by installing CoW in user space: `pip3 install cow-csvw --user`. \n- Cannot find command: make sure your binary user directory (typically something like `/Users/user/Library/Python/3.7/bin` in MacOS or `/home/user/.local/bin` in Linux) is in your PATH (in MacOS: `/etc/paths`).\n- Please [report your unlisted issue](https://github.com/CLARIAH/CoW/issues/new).\n\n#### Usage\n\nStart the graphical interface by entering the following command:\n\n```\ncow_tool\n```\n\nSelect a CSV file and click `build` to generate a file named `myfile.csv-metadata.json` (JSON schema file) with your mappings. Edit this file (optional) and then click `convert` to convert the CSV file to RDF. The output should be a `myfile.csv.nq` RDF file (nquads by default).\n\n#### Command Line Interface\n\nThe straightforward CSV to RDF conversion is done by entering the following commands:\n\n```\ncow_tool_cli build myfile.csv\n```\n\nThis will create a file named `myfile.csv-metadata.json` (JSON schema file). Next:\n\n```\ncow_tool_cli convert myfile.csv\n```\nThis command will output a `myfile.csv.nq` RDF file (nquads by default).\n\nYou don't need to worry about the JSON file, unless you want to change the metadata schema. To control the base URI namespace, URIs used in predicates, virtual columns, etcetera, edit the `myfile.csv-metadata.json` file and/or use CoW commands. For instance, you can control the output RDF serialization (with e.g. ``--format turtle``). Have a look at the [options](#options) below, the examples in the [GitHub wiki](https://github.com/CLARIAH/CoW/wiki), and the [technical documentation](http://csvw-converter.readthedocs.io/en/latest/).\n\n##### Options\n\nCheck the ``--help`` for a complete list of options:\n\n```\nusage: cow_tool_cli [-h] [--dataset DATASET] [--delimiter DELIMITER]\n [--quotechar QUOTECHAR] [--encoding ENCODING] [--processes PROCESSES]\n [--chunksize CHUNKSIZE] [--base BASE]\n [--format [{xml,n3,turtle,nt,pretty-xml,trix,trig,nquads}]]\n [--gzip] [--version]\n {convert,build} file [file ...]\n\nNot nearly CSVW compliant schema builder and RDF converter\n\npositional arguments:\n {convert,build} Use the schema of the `file` specified to convert it\n to RDF, or build a schema from scratch.\n file Path(s) of the file(s) that should be used for\n building or converting. Must be a CSV file.\n\noptional arguments:\n -h, --help show this help message and exit\n --dataset DATASET A short name (slug) for the name of the dataset (will\n use input file name if not specified)\n --delimiter DELIMITER\n The delimiter used in the CSV file(s)\n --quotechar QUOTECHAR\n The character used as quotation character in the CSV\n file(s)\n --encoding ENCODING The character encoding used in the CSV file(s)\n\n --processes PROCESSES\n The number of processes the converter should use\n --chunksize CHUNKSIZE\n The number of rows processed at each time\n --base BASE The base for URIs generated with the schema (only\n relevant when `build`ing a schema)\n --gzip \t\t\t\tCompress the output file using gzip\n --format [{xml,n3,turtle,nt,pretty-xml,trix,trig,nquads}], -f [{xml,n3,turtle,nt,pretty-xml,trix,trig,nquads}]\n RDF serialization format\n --version show program's version number and exit\n```\n\n\n\n### Library\n\nOnce installed, CoW can be used as a library as follows:\n\n```\nfrom cow_csvw.csvw_tool import COW\nimport os\n\nCOW(mode='build', files=[os.path.join(path, filename)], dataset='My dataset', delimiter=';', quotechar='\\\"')\n\nCOW(mode='convert', files=[os.path.join(path, filename)], dataset='My dataset', delimiter=';', quotechar='\\\"', processes=4, chunksize=100, base='http://example.org/my-dataset', format='turtle', gzipped=False)\n```\n\n\n\n## Further Information\n\n### Examples\n\nThe [GitHub wiki](https://github.com/CLARIAH/COW/wiki) provides more hands-on examples of transposing CSVs into Linked Data.\n\n### Technical documentation\n\nTechnical documentation for CoW are maintained in this GitHub repository (under <docs>), and published through [Read the Docs](http://readthedocs.org) at <http://csvw-converter.readthedocs.io/en/latest/>.\n\nTo build the documentation from source, change into the `docs` directory, and run `make html`. This should produce an HTML version of the documentation in the `_build/html` directory.\n\n### License\n\nMIT License (see [license.txt](license.txt))\n\n### Acknowledgements\n\n**Authors:** Albert Mero\u00f1o-Pe\u00f1uela, Roderick van der Weerdt, Rinke Hoekstra, Kathrin Dentler, Auke Rijpma, Richard Zijdeman, Melvin Roest, Xander Wilcke\n\n**Copyright:** Vrije Universiteit Amsterdam, Utrecht University, International Institute of Social History\n\n\nCoW is developed and maintained by the [CLARIAH project](https://www.clariah.nl) and funded by NWO.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Integrated CSV to RDF converter, using CSVW and nanopublications",
"version": "2.0.0",
"project_urls": {
"Download": "https://github.com/CLARIAH/COW/archive/2.0.0.tar.gz",
"Homepage": "https://github.com/CLARIAH/COW"
},
"split_keywords": [
"csv",
"rdf",
"csvw"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "14246eb7a76f272b61b597627720acdd4507e261e8403f8c18287ee9f0df9b56",
"md5": "bee7ed0bf5a59c0393e1eb63d3322805",
"sha256": "a762bfb0b1db578bd63bffd670f4a7372e071a75d1cf15393e5fc4c71de09f52"
},
"downloads": -1,
"filename": "cow_csvw-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "bee7ed0bf5a59c0393e1eb63d3322805",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 59878,
"upload_time": "2024-03-08T15:32:49",
"upload_time_iso_8601": "2024-03-08T15:32:49.515005Z",
"url": "https://files.pythonhosted.org/packages/14/24/6eb7a76f272b61b597627720acdd4507e261e8403f8c18287ee9f0df9b56/cow_csvw-2.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-08 15:32:49",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "CLARIAH",
"github_project": "COW",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "chardet",
"specs": [
[
"==",
"4.0.0"
]
]
},
{
"name": "iribaker",
"specs": [
[
"==",
"0.2"
]
]
},
{
"name": "isodate",
"specs": [
[
"==",
"0.6.1"
]
]
},
{
"name": "Jinja2",
"specs": [
[
"==",
"3.0.3"
]
]
},
{
"name": "Js2Py",
"specs": [
[
"==",
"0.71"
]
]
},
{
"name": "pyjsparser",
"specs": [
[
"==",
"2.7.1"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2021.3"
]
]
},
{
"name": "PyYAML",
"specs": [
[
"==",
"6.0"
]
]
},
{
"name": "rdflib",
"specs": [
[
"==",
"6.0.2"
]
]
},
{
"name": "rfc3987",
"specs": [
[
"==",
"1.3.8"
]
]
},
{
"name": "tzlocal",
"specs": [
[
"==",
"4.1"
]
]
},
{
"name": "unicodecsv",
"specs": [
[
"==",
"0.14.1"
]
]
},
{
"name": "Werkzeug",
"specs": [
[
"==",
"2.0.2"
]
]
},
{
"name": "PyQt5",
"specs": [
[
"==",
"5.15.10"
]
]
}
],
"lcname": "cow-csvw"
}