Name | rdflib-hdt JSON |
Version |
3.1
JSON |
| download |
home_page | |
Summary | A Store back-end for rdflib to allow for reading and querying HDT documents |
upload_time | 2023-06-02 12:22:02 |
maintainer | |
docs_url | None |
author | |
requires_python | |
license | MIT License |
keywords |
rdflib
hdt
rdf
semantic web
search
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
|rdflib-htd logo|
|Build Status| |PyPI version|
A Store back-end for `rdflib <https://github.com/RDFLib>`_ to allow for reading and querying HDT documents.
`Online Documentation <https://rdflib.dev/rdflib-hdt/>`_
Requirements
============
* Python *version 3.6.4 or higher*
* `pip <https://pip.pypa.io/en/stable/>`_
* **gcc/clang** with **c++11 support**
* **Python Development headers**
..
You should have the ``Python.h`` header available on your system.\
For example, for Python 3.6, install the ``python3.6-dev`` package on Debian/Ubuntu systems.
Installation
============
Installation using `pipenv <https://github.com/pypa/pipenv>`_ or a `virtualenv <https://virtualenv.pypa.io/en/stable/>`_ is **strongly advised!**
PyPi installation (recommended)
-------------------------------
.. code-block:: bash
# you can install using pip
pip install rdflib-hdt
# or you can use pipenv
pipenv install rdflib-hdt
Manual installation
-------------------
**Requirement:** `pipenv <https://github.com/pypa/pipenv>`_
.. code-block:: bash
git clone https://github.com/Callidon/pyHDT
cd pyHDT/
./install.sh
Getting started
===============
You can use the ``rdflib-hdt`` library in two modes: as an rdflib Graph or as a raw HDT document.
Graph usage (recommended)
-------------------------
.. code-block:: python
from rdflib import Graph
from rdflib_hdt import HDTStore
from rdflib.namespace import FOAF
# Load an HDT file. Missing indexes are generated automatically
# You can provide the index file by putting them in the same directory than the HDT file.
store = HDTGraph("test.hdt")
# Display some metadata about the HDT document itself
print(f"Number of RDF triples: {len(store)}")
print(f"Number of subjects: {store.nb_subjects}")
print(f"Number of predicates: {store.nb_predicates}")
print(f"Number of objects: {store.nb_objects}")
print(f"Number of shared subject-object: {store.nb_shared}")
Using the RDFlib API, you can also `execute SPARQL queries <https://rdflib.readthedocs.io/en/stable/intro_to_sparql.html>`_ over an HDT document.
If you do so, we recommend that you first call the ``optimize_sparql`` function, which optimize
the RDFlib SPARQL query engine in the context of HDT documents.
.. code-block:: python
from rdflib import Graph
from rdflib_hdt import HDTStore, optimize_sparql
# Calling this function optimizes the RDFlib SPARQL engine for HDT documents
optimize_sparql()
graph = Graph(store=HDTStore("test.hdt"))
# You can execute SPARQL queries using the regular RDFlib API
qres = graph.query("""
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?friend WHERE {
?a foaf:knows ?b.
?a foaf:name ?name.
?b foaf:name ?friend.
}""")
for row in qres:
print(f"{row.name} knows {row.friend}")
HDT Document usage
------------------
.. code-block:: python
from rdflib_hdt import HDTDocument
# Load an HDT file. Missing indexes are generated automatically.
# You can provide the index file by putting them in the same directory than the HDT file.
document = HDTDocument("test.hdt")
# Display some metadata about the HDT document itself
print(f"Number of RDF triples: {document.total_triples}")
print(f"Number of subjects: {document.nb_subjects}")
print(f"Number of predicates: {document.nb_predicates}")
print(f"Number of objects: {document.nb_objects}")
print(f"Number of shared subject-object: {document.nb_shared}")
# Fetch all triples that matches { ?s foaf:name ?o }
# Use None to indicates variables
triples, cardinality = document.search_triples((None, FOAF("name"), None))
print(f"Cardinality of (?s foaf:name ?o): {cardinality}")
for s, p, o in triples:
print(triple)
# The search also support limit and offset
triples, cardinality = document.search_triples((None, FOAF("name"), None), limit=10, offset=100)
# etc ...
An HDT document also provides support for evaluating joins over a set of triples patterns.
.. code-block:: python
from rdflib_hdt import HDTDocument
from rdflib import Variable
from rdflib.namespace import FOAF, RDF
document = HDTDocument("test.hdt")
# find the names of two entities that know each other
tp_a = (Variable("a"), FOAF("knows"), Variable("b"))
tp_b = (Variable("a"), FOAF("name"), Variable("name"))
tp_c = (Variable("b"), FOAF("name"), Variable("friend"))
query = set([tp_a, tp_b, tp_c])
iterator = document.search_join(query)
print(f"Estimated join cardinality: {len(iterator)}")
# Join results are produced as ResultRow, like in the RDFlib SPARQL API
for row in iterator:
print(f"{row.name} knows {row.friend}")
Handling non UTF-8 strings in python
====================================
If the HDT document has been encoded with a non UTF-8 encoding the previous code won't work correctly and will result in a ``UnicodeDecodeError``.
More details on how to convert string to str from C++ to Python `here <https://pybind11.readthedocs.io/en/stable/advanced/cast/strings.html>`_
To handle this, we doubled the API of the HDT document by adding:
* ``search_triples_bytes(...)`` return an iterator of triples as ``(py::bytes, py::bytes, py::bytes)``
* ``search_join_bytes(...)`` return an iterator of sets of solutions mapping as ``py::set(py::bytes, py::bytes)``
* ``convert_tripleid_bytes(...)`` return a triple as: ``(py::bytes, py::bytes, py::bytes)``
* ``convert_id_bytes(...)`` return a ``py::bytes``
**Parameters and documentation are the same as the standard version**
.. code-block:: python
from rdflib_hdt import HDTDocument
document = HDTDocument("test.hdt")
it = document.search_triple_bytes("", "", "")
for s, p, o in it:
print(s, p, o) # print b'...', b'...', b'...'
# now decode it, or handle any error
try:
s, p, o = s.decode('UTF-8'), p.decode('UTF-8'), o.decode('UTF-8')
except UnicodeDecodeError as err:
# try another other codecs, ignore error, etc
pass
.. |Build Status| image:: https://github.com/RDFLib/rdflib-hdt/workflows/Python%20tests/badge.svg
:target: https://github.com/RDFLib/rdflib-hdt/actions?query=workflow%3A%22Python+tests%22
.. |PyPI version| image:: https://badge.fury.io/py/rdflib-hdt.svg
:target: https://badge.fury.io/py/rdflib-hdt
.. |rdflib-htd logo| image:: https://raw.githubusercontent.com/RDFLib/rdflib-hdt/master/docs/source/_static/rdflib-hdt-250.png
:target: https://rdflib.dev/rdflib-hdt/
Raw data
{
"_id": null,
"home_page": "",
"name": "rdflib-hdt",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "rdflib,hdt,rdf,semantic web,search",
"author": "",
"author_email": "Thomas Minier <tminier01@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/ea/11/f83aedf9517a20fe10fbd2e1ccd147760322b8cce36eac9efedb8b3783be/rdflib_hdt-3.1.tar.gz",
"platform": null,
"description": "|rdflib-htd logo|\n\n|Build Status| |PyPI version|\n\nA Store back-end for `rdflib <https://github.com/RDFLib>`_ to allow for reading and querying HDT documents.\n\n`Online Documentation <https://rdflib.dev/rdflib-hdt/>`_\n\nRequirements\n============\n\n\n* Python *version 3.6.4 or higher*\n* `pip <https://pip.pypa.io/en/stable/>`_\n* **gcc/clang** with **c++11 support**\n* **Python Development headers**\n ..\n\n You should have the ``Python.h`` header available on your system.\\\n For example, for Python 3.6, install the ``python3.6-dev`` package on Debian/Ubuntu systems.\n\n\nInstallation\n============\n\nInstallation using `pipenv <https://github.com/pypa/pipenv>`_ or a `virtualenv <https://virtualenv.pypa.io/en/stable/>`_ is **strongly advised!**\n\nPyPi installation (recommended)\n-------------------------------\n\n.. code-block:: bash\n\n # you can install using pip\n pip install rdflib-hdt\n\n # or you can use pipenv\n pipenv install rdflib-hdt\n\nManual installation\n-------------------\n\n**Requirement:** `pipenv <https://github.com/pypa/pipenv>`_ \n\n.. code-block:: bash\n\n git clone https://github.com/Callidon/pyHDT\n cd pyHDT/\n ./install.sh\n\nGetting started\n===============\n\nYou can use the ``rdflib-hdt`` library in two modes: as an rdflib Graph or as a raw HDT document.\n\nGraph usage (recommended)\n-------------------------\n\n.. code-block:: python\n\n from rdflib import Graph\n from rdflib_hdt import HDTStore\n from rdflib.namespace import FOAF\n\n # Load an HDT file. Missing indexes are generated automatically\n # You can provide the index file by putting them in the same directory than the HDT file.\n store = HDTGraph(\"test.hdt\")\n\n # Display some metadata about the HDT document itself\n print(f\"Number of RDF triples: {len(store)}\")\n print(f\"Number of subjects: {store.nb_subjects}\")\n print(f\"Number of predicates: {store.nb_predicates}\")\n print(f\"Number of objects: {store.nb_objects}\")\n print(f\"Number of shared subject-object: {store.nb_shared}\")\n\n\nUsing the RDFlib API, you can also `execute SPARQL queries <https://rdflib.readthedocs.io/en/stable/intro_to_sparql.html>`_ over an HDT document.\nIf you do so, we recommend that you first call the ``optimize_sparql`` function, which optimize\nthe RDFlib SPARQL query engine in the context of HDT documents.\n\n.. code-block:: python\n\n from rdflib import Graph\n from rdflib_hdt import HDTStore, optimize_sparql\n\n # Calling this function optimizes the RDFlib SPARQL engine for HDT documents\n optimize_sparql()\n\n graph = Graph(store=HDTStore(\"test.hdt\"))\n\n # You can execute SPARQL queries using the regular RDFlib API\n qres = graph.query(\"\"\"\n PREFIX foaf: <http://xmlns.com/foaf/0.1/>\n SELECT ?name ?friend WHERE {\n ?a foaf:knows ?b.\n ?a foaf:name ?name.\n ?b foaf:name ?friend.\n }\"\"\")\n\n for row in qres:\n print(f\"{row.name} knows {row.friend}\")\n\nHDT Document usage\n------------------\n\n.. code-block:: python\n\n from rdflib_hdt import HDTDocument\n\n # Load an HDT file. Missing indexes are generated automatically.\n # You can provide the index file by putting them in the same directory than the HDT file.\n document = HDTDocument(\"test.hdt\")\n\n # Display some metadata about the HDT document itself\n print(f\"Number of RDF triples: {document.total_triples}\")\n print(f\"Number of subjects: {document.nb_subjects}\")\n print(f\"Number of predicates: {document.nb_predicates}\")\n print(f\"Number of objects: {document.nb_objects}\")\n print(f\"Number of shared subject-object: {document.nb_shared}\")\n\n # Fetch all triples that matches { ?s foaf:name ?o }\n # Use None to indicates variables\n triples, cardinality = document.search_triples((None, FOAF(\"name\"), None))\n\n print(f\"Cardinality of (?s foaf:name ?o): {cardinality}\")\n for s, p, o in triples:\n print(triple)\n\n # The search also support limit and offset\n triples, cardinality = document.search_triples((None, FOAF(\"name\"), None), limit=10, offset=100)\n # etc ...\n\nAn HDT document also provides support for evaluating joins over a set of triples patterns.\n\n.. code-block:: python\n\n from rdflib_hdt import HDTDocument\n from rdflib import Variable\n from rdflib.namespace import FOAF, RDF\n \n document = HDTDocument(\"test.hdt\")\n \n # find the names of two entities that know each other\n tp_a = (Variable(\"a\"), FOAF(\"knows\"), Variable(\"b\"))\n tp_b = (Variable(\"a\"), FOAF(\"name\"), Variable(\"name\"))\n tp_c = (Variable(\"b\"), FOAF(\"name\"), Variable(\"friend\"))\n query = set([tp_a, tp_b, tp_c])\n \n iterator = document.search_join(query)\n print(f\"Estimated join cardinality: {len(iterator)}\")\n \n # Join results are produced as ResultRow, like in the RDFlib SPARQL API\n for row in iterator:\n print(f\"{row.name} knows {row.friend}\")\n\nHandling non UTF-8 strings in python\n====================================\n\nIf the HDT document has been encoded with a non UTF-8 encoding the previous code won't work correctly and will result in a ``UnicodeDecodeError``.\nMore details on how to convert string to str from C++ to Python `here <https://pybind11.readthedocs.io/en/stable/advanced/cast/strings.html>`_\n\nTo handle this, we doubled the API of the HDT document by adding:\n\n\n* ``search_triples_bytes(...)`` return an iterator of triples as ``(py::bytes, py::bytes, py::bytes)``\n* ``search_join_bytes(...)`` return an iterator of sets of solutions mapping as ``py::set(py::bytes, py::bytes)``\n* ``convert_tripleid_bytes(...)`` return a triple as: ``(py::bytes, py::bytes, py::bytes)``\n* ``convert_id_bytes(...)`` return a ``py::bytes``\n\n**Parameters and documentation are the same as the standard version**\n\n.. code-block:: python\n\n from rdflib_hdt import HDTDocument\n\n document = HDTDocument(\"test.hdt\")\n it = document.search_triple_bytes(\"\", \"\", \"\")\n\n for s, p, o in it:\n print(s, p, o) # print b'...', b'...', b'...'\n # now decode it, or handle any error\n try:\n s, p, o = s.decode('UTF-8'), p.decode('UTF-8'), o.decode('UTF-8')\n except UnicodeDecodeError as err:\n # try another other codecs, ignore error, etc\n pass\n\n.. |Build Status| image:: https://github.com/RDFLib/rdflib-hdt/workflows/Python%20tests/badge.svg\n :target: https://github.com/RDFLib/rdflib-hdt/actions?query=workflow%3A%22Python+tests%22\n.. |PyPI version| image:: https://badge.fury.io/py/rdflib-hdt.svg\n :target: https://badge.fury.io/py/rdflib-hdt\n.. |rdflib-htd logo| image:: https://raw.githubusercontent.com/RDFLib/rdflib-hdt/master/docs/source/_static/rdflib-hdt-250.png\n :target: https://rdflib.dev/rdflib-hdt/\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "A Store back-end for rdflib to allow for reading and querying HDT documents",
"version": "3.1",
"project_urls": {
"homepage": "https://rdflib.dev/rdflib-hdt",
"repository": "https://github.com/RDFLib/rdflib-hdt.git"
},
"split_keywords": [
"rdflib",
"hdt",
"rdf",
"semantic web",
"search"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3b04d584ae3e684d8522ce77baf1c155bb35ad2ca0462d5e1b6b675f8c3b3c87",
"md5": "cb6993ccd6f278349d2e2d188c701fc4",
"sha256": "5efa586ae8934b4c968c4f7b9ec14864b08efd6b9a3342f2463982172f6519ed"
},
"downloads": -1,
"filename": "rdflib_hdt-3.1-py3.7-linux-x86_64.egg",
"has_sig": false,
"md5_digest": "cb6993ccd6f278349d2e2d188c701fc4",
"packagetype": "bdist_egg",
"python_version": "3.1",
"requires_python": null,
"size": 7374098,
"upload_time": "2023-06-02T12:21:59",
"upload_time_iso_8601": "2023-06-02T12:21:59.662810Z",
"url": "https://files.pythonhosted.org/packages/3b/04/d584ae3e684d8522ce77baf1c155bb35ad2ca0462d5e1b6b675f8c3b3c87/rdflib_hdt-3.1-py3.7-linux-x86_64.egg",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ea11f83aedf9517a20fe10fbd2e1ccd147760322b8cce36eac9efedb8b3783be",
"md5": "f7e619415939ff0ee56d6405160fef0f",
"sha256": "0db95fe58e276fe58668cae6ef94dd7b7b30bf1e9f89dd06d3941208a17fcc44"
},
"downloads": -1,
"filename": "rdflib_hdt-3.1.tar.gz",
"has_sig": false,
"md5_digest": "f7e619415939ff0ee56d6405160fef0f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 235883,
"upload_time": "2023-06-02T12:22:02",
"upload_time_iso_8601": "2023-06-02T12:22:02.609459Z",
"url": "https://files.pythonhosted.org/packages/ea/11/f83aedf9517a20fe10fbd2e1ccd147760322b8cce36eac9efedb8b3783be/rdflib_hdt-3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-02 12:22:02",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "RDFLib",
"github_project": "rdflib-hdt",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "rdflib-hdt"
}