triplify-csv


Nametriplify-csv JSON
Version 0.5.1 PyPI version JSON
download
home_pagehttps://github.com/AAtley/triplify_csv
SummaryA tool to generate triples from CSV files according to a configuration file.
upload_time2023-07-24 14:51:27
maintainer
docs_urlNone
authorAdrian Atley
requires_python>=3.7,<4.0
licenseBSD-3-Clause
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            triplify\_csv
=============

Features
========

Generates RDF triples or quads in Turtle or NQuad syntax from one or more CSV files and a configuration file.

Installation
============

triplify\_csv can be installed from PyPI using 'pip':

```
pip install triplify_csv
```

Usage
=====

triplify\_csv installs as both a package and command line interface tool

**Example of using the package**

	from TriplifyCsv import Rml, CsvOptions
	
	# config mapping files are .ttl files
	configfile = 'myconfig.ttl'
	
	# csv files should be .csv files
	csvfile1, csvfile2 = 'mycsv1.csv', 'mycsv2.csv'
	
	# output file must have either a .ttl extension for turtle triples
	# or a .nq extension for quads
	outputfile = 'mytriples.ttl'
	
	rml = Rml()
	
	# default date format of dates in your CSV files is '%Y-%m-%d'
	# default csv delimiter is ','
	# override the defaults by setting options
	options = CsvOptions(dateformat='%d/%m/%Y', delimiter='|')
	
	# load one rml and one or more csvs
	rml.loadFile(configfile, [csvfile1,csvfile2], options)
	 
	rml.create_triples()
	
	# "nquads" for named graphs need a .nq extension
	# here we are generating triples so .ttl for turtle syntax
	rml.write_file(outputfile, format="ttl")



**Example of CLI use - help text**

To display full help text on the options enter the following at the command line

```
triplify_csv --help
```


**Example of CLI use - making triples** The same example as the one in code above as a CLI call instead ...

```
triplify_csv -m 'myconfig.ttl' -c 'mycsv1.csv' -c 'mycsv2.csv' -o 'mytriples.ttl'
```

**How to make your configuration file**

The configuration file contains a set of mappings for triplify\_csv to follow to set the subjects, predicates and objects or literal values of your triples or nquads from the data in one or more CSV files. These mappings are RDF triples in the turtle syntax. The terms that can be used are a subset of the terms defined in the R2RML standard.

R2RML was not designed for this purpose. R2RML is '.. a language for expressing customized mappings from relational databases to RDF datasets.' (see [https://www.w3.org/TR/r2rml/](https://www.w3.org/TR/r2rml/) ). Triplify\_csv uses a subset of R2RML to express customised mappings from CSV files to RDF datasets. Where R2RML refers to the tables of a database using 'rr:logicalTable' this should be understood in the triplify\_csv use of R2RML as referring to the name (without '.csv') of a corresponding csv file. 'rr:sqlQuery', the term of the R2RML language that lets you express mappings from database queries to RDF isn't supported in the triplify\_csv usage. Also, there is no need to support 'rr:sqlVersion'.

For a complete list of what parts of the R2RML language are supported see the examples in the /tests folder and refer to the R2RML test cases document ([https://www.w3.org/TR/rdb2rdf-test-cases/](https://www.w3.org/TR/rdb2rdf-test-cases/)). As of version 0.3.0 the test cases supported are

- R2RMLTC0003c - Using 'rr:template' within objectmaps to build up string literals for object values
- R2RMLTC0007a - Typing resources by relying on rdf:type predicate
- R2RMLTC0007b - Assigning triples to Named Graphs
- R2RMLTC0007c - One column mapping, using rr:class
- R2RMLTC0007d - One column mapping, specifying an rr:predicateObjectMap with rdf:type
- R2RMLTC0007e - One column mapping, using rr:graphMap and rr:class
- R2RMLTC0007f - One column mapping, using rr:graphMap and specifying an rr:predicateObjectMap with rdf:type
- R2RMLTC0007g - Assigning triples to the default graph
- R2RMLTC0007h - Assigning triples to a non-IRI named graph
- R2RMLTC0008a - Generation of triples to a target graph by using rr:graphMap and rr:template
- R2RMLTC0008b - Generation of triples referencing object map
- R2RMLTC0008c - Generation of triples by using multiple predicateMaps within a rr:predicateObjectMap
- R2RMLTC0009a - Generation of triples from foreign key relations
	- Testing added about creating triples from foreign key relations, even for self-joins to represent things like hierarchy in the same "table"/csv
- R2RMLTC0015a - Generation of language tags for plain literals from a CSV 'table' with language information
	- note: this test uses a separate CSV file for each language and differs from the original test case (in the [rdf-test-cases page](https://www.w3.org/TR/rdb2rdf-test-cases/)) which uses 'rr:sqlQuery' to select tags in each language from a single table.
- R2RMLTC0016a to R2RMLTC0016d, setting data types as in these tests for string, integer, real, float, date, timestamp and boolean.  
	- note: instead of deriving the data type from the sql column, as the subset of r2rml used here does not refer to a database the user must use 'explicitly typed literals' as in section [7.6 Typed Literals](https://www.w3.org/TR/r2rml/#typed-literals) (rr:datatype) of the r2rml standard.

Copyright © 2015 W3C® (MIT, ERCIM, Keio, Beihang). This software or document includes material copied from or derived from 'R2RML: RDB to RDF Mapping Language' [http://www.w3.org/TR/2012/REC-r2rml-20120927/](http://www.w3.org/TR/2012/REC-r2rml-20120927/) and 'R2RML and Direct Mapping Test Cases' [http://www.w3.org/TR/2012/NOTE-rdb2rdf-test-cases-20120814/](http://www.w3.org/TR/2012/NOTE-rdb2rdf-test-cases-20120814/)

**Simple config file example** Suppose you have a CSV file containing details of contacts (example CSV below) and you want to generate RDF data from this using FOAF, the R2RML config file might look like this ...

	@prefix rr: <http://www.w3.org/ns/r2rml#> .
	@prefix foaf: <http://xmlns.com/foaf/0.1/> .
	@prefix ex: <http://example.com/> .
	@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
	@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
	@base <http://example.com/base/> .
	
	<TriplesMap1> a rr:TriplesMap;
	rr:logicalTable [ rr:tableName "\"Contacts\"" ];
	
	rr:subjectMap [ rr:template "http://example.com/Contact/{\"ID\"}/{\"Name\"}";
	 rr:class foaf:Person;
	];
	
	rr:predicateObjectMap [ rr:predicate ex:id ;
	 rr:objectMap [ rr:column "\"ID\"" ;  ] ;
	];
	
	rr:predicateObjectMap [ rr:predicate foaf:name ;
	 rr:objectMap [ rr:column "\"Name\"" ; ] ;
	];
	
	rr:predicateObjectMap [ rr:predicate foaf:interest ;
	  rr:objectMap [ rr:column "\"Interest\"" ; ] ;
	];
	
	.



Create a CSV file called 'Contacts.csv' using commas as delimiters between the following values (shown here in a table) ...

ID  | Name | Interest
:---- | :---- | :--------
10 | John Smith | https://en.m.wikipedia.org/wiki/Tennis
20 | Joe Bloggs | https://en.m.wikipedia.org/wiki/Golf
30 | Mr Bun | https://en.m.wikipedia.org/wiki/Spam_(food) 


Now, with triplify_csv installed save the R2RML config file as 'contactsmap.ttl' and the csv file as 'Contacts.csv' and generate the output containing your triples to a file called 'contactstriples.ttl' (for example) with the following command ...

```
triplify_csv -m 'contactsmap.ttl' -c 'Contacts.csv' -o 'contactstriples.ttl'
```

The resulting triples in turtle syntax in the 'contactstriples.ttl' file would look like this ...


	@prefix ex: <http://example.com/> .
	@prefix foaf: <http://xmlns.com/foaf/0.1/> .
	@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
	
	<http://example.com/Contact/10/John%20Smith> a foaf:Person ;
	    ex:id 10 ;
	    foaf:interest "https://en.m.wikipedia.org/wiki/Tennis" ;
	    foaf:name "John Smith" .
	
	<http://example.com/Contact/20/Joe%20Bloggs> a foaf:Person ;
	    ex:id 20 ;
	    foaf:interest "https://en.m.wikipedia.org/wiki/Golf" ;
	    foaf:name "Joe Bloggs" .
	
	<http://example.com/Contact/30/Mr%20Bun> a foaf:Person ;
	    ex:id 30 ;
	    foaf:interest "https://en.m.wikipedia.org/wiki/Spam\_(food)" ;
	    foaf:name "Mr Bun" .

If you wanted this serialised to json-ld format instead you could use the following command ...

```
triplify_csv -m 'contactsmap.ttl' -c 'Contacts.csv' -o 'contactstriples.json' -f 'json-ld'
```

 triplify_csv uses rdflib and can output to all the serialisation formats that [rdflib](https://pypi.org/project/rdflib/) provides. (See also 'format' [here](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html#rdflib.Graph.serialize))

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AAtley/triplify_csv",
    "name": "triplify-csv",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Adrian Atley",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/f3/3c/f52197e45686f6f9c3bfe8ecd487a26802a228cd8c50996daf580eef5e91/triplify_csv-0.5.1.tar.gz",
    "platform": null,
    "description": "triplify\\_csv\n=============\n\nFeatures\n========\n\nGenerates RDF triples or quads in Turtle or NQuad syntax from one or more CSV files and a configuration file.\n\nInstallation\n============\n\ntriplify\\_csv can be installed from PyPI using 'pip':\n\n```\npip install triplify_csv\n```\n\nUsage\n=====\n\ntriplify\\_csv installs as both a package and command line interface tool\n\n**Example of using the package**\n\n\tfrom TriplifyCsv import Rml, CsvOptions\n\t\n\t# config mapping files are .ttl files\n\tconfigfile = 'myconfig.ttl'\n\t\n\t# csv files should be .csv files\n\tcsvfile1, csvfile2 = 'mycsv1.csv', 'mycsv2.csv'\n\t\n\t# output file must have either a .ttl extension for turtle triples\n\t# or a .nq extension for quads\n\toutputfile = 'mytriples.ttl'\n\t\n\trml = Rml()\n\t\n\t# default date format of dates in your CSV files is '%Y-%m-%d'\n\t# default csv delimiter is ','\n\t# override the defaults by setting options\n\toptions = CsvOptions(dateformat='%d/%m/%Y', delimiter='|')\n\t\n\t# load one rml and one or more csvs\n\trml.loadFile(configfile, [csvfile1,csvfile2], options)\n\t \n\trml.create_triples()\n\t\n\t# \"nquads\" for named graphs need a .nq extension\n\t# here we are generating triples so .ttl for turtle syntax\n\trml.write_file(outputfile, format=\"ttl\")\n\n\n\n**Example of CLI use - help text**\n\nTo display full help text on the options enter the following at the command line\n\n```\ntriplify_csv --help\n```\n\n\n**Example of CLI use - making triples** The same example as the one in code above as a CLI call instead ...\n\n```\ntriplify_csv -m 'myconfig.ttl' -c 'mycsv1.csv' -c 'mycsv2.csv' -o 'mytriples.ttl'\n```\n\n**How to make your configuration file**\n\nThe configuration file contains a set of mappings for triplify\\_csv to follow to set the subjects, predicates and objects or literal values of your triples or nquads from the data in one or more CSV files. These mappings are RDF triples in the turtle syntax. The terms that can be used are a subset of the terms defined in the R2RML standard.\n\nR2RML was not designed for this purpose. R2RML is '.. a language for expressing customized mappings from relational databases to RDF datasets.' (see [https://www.w3.org/TR/r2rml/](https://www.w3.org/TR/r2rml/) ). Triplify\\_csv uses a subset of R2RML to express customised mappings from CSV files to RDF datasets. Where R2RML refers to the tables of a database using 'rr:logicalTable' this should be understood in the triplify\\_csv use of R2RML as referring to the name (without '.csv') of a corresponding csv file. 'rr:sqlQuery', the term of the R2RML language that lets you express mappings from database queries to RDF isn't supported in the triplify\\_csv usage. Also, there is no need to support 'rr:sqlVersion'.\n\nFor a complete list of what parts of the R2RML language are supported see the examples in the /tests folder and refer to the R2RML test cases document ([https://www.w3.org/TR/rdb2rdf-test-cases/](https://www.w3.org/TR/rdb2rdf-test-cases/)). As of version 0.3.0 the test cases supported are\n\n- R2RMLTC0003c - Using 'rr:template' within objectmaps to build up string literals for object values\n- R2RMLTC0007a - Typing resources by relying on rdf:type predicate\n- R2RMLTC0007b - Assigning triples to Named Graphs\n- R2RMLTC0007c - One column mapping, using rr:class\n- R2RMLTC0007d - One column mapping, specifying an rr:predicateObjectMap with rdf:type\n- R2RMLTC0007e - One column mapping, using rr:graphMap and rr:class\n- R2RMLTC0007f - One column mapping, using rr:graphMap and specifying an rr:predicateObjectMap with rdf:type\n- R2RMLTC0007g - Assigning triples to the default graph\n- R2RMLTC0007h - Assigning triples to a non-IRI named graph\n- R2RMLTC0008a - Generation of triples to a target graph by using rr:graphMap and rr:template\n- R2RMLTC0008b - Generation of triples referencing object map\n- R2RMLTC0008c - Generation of triples by using multiple predicateMaps within a rr:predicateObjectMap\n- R2RMLTC0009a - Generation of triples from foreign key relations\n\t- Testing added about creating triples from foreign key relations, even for self-joins to represent things like hierarchy in the same \"table\"/csv\n- R2RMLTC0015a - Generation of language tags for plain literals from a CSV 'table' with language information\n\t- note: this test uses a separate CSV file for each language and differs from the original test case (in the [rdf-test-cases page](https://www.w3.org/TR/rdb2rdf-test-cases/)) which uses 'rr:sqlQuery' to select tags in each language from a single table.\n- R2RMLTC0016a to R2RMLTC0016d, setting data types as in these tests for string, integer, real, float, date, timestamp and boolean.  \n\t- note: instead of deriving the data type from the sql column, as the subset of r2rml used here does not refer to a database the user must use 'explicitly typed literals' as in section\u00a0[7.6 Typed Literals](https://www.w3.org/TR/r2rml/#typed-literals) (rr:datatype)\u00a0of the r2rml standard.\n\nCopyright \u00a9 2015 W3C\u00ae (MIT, ERCIM, Keio, Beihang). This software or document includes material copied from or derived from 'R2RML: RDB to RDF Mapping Language' [http://www.w3.org/TR/2012/REC-r2rml-20120927/](http://www.w3.org/TR/2012/REC-r2rml-20120927/) and 'R2RML and Direct Mapping Test Cases' [http://www.w3.org/TR/2012/NOTE-rdb2rdf-test-cases-20120814/](http://www.w3.org/TR/2012/NOTE-rdb2rdf-test-cases-20120814/)\n\n**Simple config file example** Suppose you have a CSV file containing details of contacts (example CSV below) and you want to generate RDF data from this using FOAF, the R2RML config file might look like this ...\n\n\t@prefix rr: <http://www.w3.org/ns/r2rml#> .\n\t@prefix foaf: <http://xmlns.com/foaf/0.1/> .\n\t@prefix ex: <http://example.com/> .\n\t@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .\n\t@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .\n\t@base <http://example.com/base/> .\n\t\n\t<TriplesMap1> a rr:TriplesMap;\n\trr:logicalTable [ rr:tableName \"\\\"Contacts\\\"\" ];\n\t\n\trr:subjectMap [ rr:template \"http://example.com/Contact/{\\\"ID\\\"}/{\\\"Name\\\"}\";\n\t rr:class foaf:Person;\n\t];\n\t\n\trr:predicateObjectMap [ rr:predicate ex:id ;\n\t rr:objectMap [ rr:column \"\\\"ID\\\"\" ;  ] ;\n\t];\n\t\n\trr:predicateObjectMap [ rr:predicate foaf:name ;\n\t rr:objectMap [ rr:column \"\\\"Name\\\"\" ; ] ;\n\t];\n\t\n\trr:predicateObjectMap [ rr:predicate foaf:interest ;\n\t  rr:objectMap [ rr:column \"\\\"Interest\\\"\" ; ] ;\n\t];\n\t\n\t.\n\n\n\nCreate a CSV file called 'Contacts.csv' using commas as delimiters between the following values (shown here in a table) ...\n\nID  | Name | Interest\n:---- | :---- | :--------\n10 | John Smith | https://en.m.wikipedia.org/wiki/Tennis\n20 | Joe Bloggs | https://en.m.wikipedia.org/wiki/Golf\n30 | Mr Bun | https://en.m.wikipedia.org/wiki/Spam_(food) \n\n\nNow, with triplify_csv installed save the R2RML config file as 'contactsmap.ttl' and the csv file as 'Contacts.csv' and generate the output containing your triples to a file called 'contactstriples.ttl' (for example) with the following command ...\n\n```\ntriplify_csv -m 'contactsmap.ttl' -c 'Contacts.csv' -o 'contactstriples.ttl'\n```\n\nThe resulting triples in turtle syntax in the 'contactstriples.ttl' file would look like this ...\n\n\n\t@prefix ex: <http://example.com/> .\n\t@prefix foaf: <http://xmlns.com/foaf/0.1/> .\n\t@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .\n\t\n\t<http://example.com/Contact/10/John%20Smith> a foaf:Person ;\n\t    ex:id 10 ;\n\t    foaf:interest \"https://en.m.wikipedia.org/wiki/Tennis\" ;\n\t    foaf:name \"John Smith\" .\n\t\n\t<http://example.com/Contact/20/Joe%20Bloggs> a foaf:Person ;\n\t    ex:id 20 ;\n\t    foaf:interest \"https://en.m.wikipedia.org/wiki/Golf\" ;\n\t    foaf:name \"Joe Bloggs\" .\n\t\n\t<http://example.com/Contact/30/Mr%20Bun> a foaf:Person ;\n\t    ex:id 30 ;\n\t    foaf:interest \"https://en.m.wikipedia.org/wiki/Spam\\_(food)\" ;\n\t    foaf:name \"Mr Bun\" .\n\nIf you wanted this serialised to json-ld format instead you could use the following command ...\n\n```\ntriplify_csv -m 'contactsmap.ttl' -c 'Contacts.csv' -o 'contactstriples.json' -f 'json-ld'\n```\n\n triplify_csv uses rdflib and can output to all the serialisation formats that [rdflib](https://pypi.org/project/rdflib/) provides. (See also 'format' [here](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html#rdflib.Graph.serialize))\n",
    "bugtrack_url": null,
    "license": "BSD-3-Clause",
    "summary": "A tool to generate triples from CSV files according to a configuration file.",
    "version": "0.5.1",
    "project_urls": {
        "Homepage": "https://github.com/AAtley/triplify_csv",
        "Repository": "https://github.com/AAtley/triplify_csv"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1fcdbe56234ffe8134504c2fed587513d028b74bc6b776320250c8d5337673bf",
                "md5": "93d1236c8b2e77fa5d2fd9c32d551dcd",
                "sha256": "5d30c9c548ad481cb787d4846d5f302868b43b9964985224ea7e9c15e9bf8185"
            },
            "downloads": -1,
            "filename": "triplify_csv-0.5.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "93d1236c8b2e77fa5d2fd9c32d551dcd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7,<4.0",
            "size": 11279,
            "upload_time": "2023-07-24T14:51:25",
            "upload_time_iso_8601": "2023-07-24T14:51:25.939088Z",
            "url": "https://files.pythonhosted.org/packages/1f/cd/be56234ffe8134504c2fed587513d028b74bc6b776320250c8d5337673bf/triplify_csv-0.5.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f33cf52197e45686f6f9c3bfe8ecd487a26802a228cd8c50996daf580eef5e91",
                "md5": "9eb63909b14c4547a0628511063dc1f3",
                "sha256": "ac09e2e48cb571aa75068583f6f5402642117f89ea60f11e54aabc8a2cc27541"
            },
            "downloads": -1,
            "filename": "triplify_csv-0.5.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9eb63909b14c4547a0628511063dc1f3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7,<4.0",
            "size": 12756,
            "upload_time": "2023-07-24T14:51:27",
            "upload_time_iso_8601": "2023-07-24T14:51:27.636923Z",
            "url": "https://files.pythonhosted.org/packages/f3/3c/f52197e45686f6f9c3bfe8ecd487a26802a228cd8c50996daf580eef5e91/triplify_csv-0.5.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-24 14:51:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AAtley",
    "github_project": "triplify_csv",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "triplify-csv"
}
        
Elapsed time: 0.09633s