morph-kgc


Namemorph-kgc JSON
Version 2.7.0 PyPI version JSON
download
home_pageNone
SummaryPowerful [R2]RML engine to create RDF knowledge graphs from heterogeneous data sources.
upload_time2024-04-01 15:16:50
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords data integration knowledge graph morph-kgc r2rml rdf rml rml-star
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
<img src="https://github.com/morph-kgc/morph-kgc/blob/main/logo/logo.png" height="100" alt="morph">
</p>

[![License](https://img.shields.io/pypi/l/morph-kgc.svg)](https://github.com/morph-kgc/morph-kgc/blob/main/LICENSE)
[![DOI](https://zenodo.org/badge/311956260.svg?style=flat)](https://zenodo.org/badge/latestdoi/311956260)
[![Latest PyPI version](https://img.shields.io/pypi/v/morph-kgc?style=flat)](https://pypi.python.org/pypi/morph-kgc)
[![Python Version](https://img.shields.io/pypi/pyversions/morph-kgc.svg)](https://pypi.python.org/pypi/morph-kgc)
[![PyPI status](https://img.shields.io:/pypi/status/morph-kgc?)](https://pypi.python.org/pypi/morph-kgc)
[![build](https://github.com/morph-kgc/morph-kgc/actions/workflows/ci.yml/badge.svg)](https://github.com/morph-kgc/morph-kgc/actions/workflows/ci.yml)
[![Documentation Status](https://readthedocs.org/projects/morph-kgc/badge/?version=latest)](https://morph-kgc.readthedocs.io/en/latest/?badge=latest)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ByFx_NOEfTZeaJ1Wtw3UwTH3H3-Sye2O?usp=sharing)

**Morph-KGC** is an engine that constructs **[RDF](https://www.w3.org/TR/rdf11-concepts/)** knowledge graphs from heterogeneous data sources with the **[R2RML](https://www.w3.org/TR/r2rml/)** and **[RML](https://w3id.org/rml/core/spec)** mapping languages. Morph-KGC is built on top of [pandas](https://pandas.pydata.org/) and it leverages *mapping partitions* to significantly reduce execution times and memory consumption for large data sources.

## Features :sparkles:

- Supports the **[R2RML](https://www.w3.org/TR/r2rml/)** and **[RML](https://w3id.org/rml/core/spec)** mapping languages.
- User-friendly mappings with **[YARRRML](https://rml.io/yarrrml/spec/)**.
- Transformation functions with **[RML-FNML](https://w3id.org/rml/fnml/spec)**, including **Python user-defined functions**.
- [RDF-star](https://w3c.github.io/rdf-star/cg-spec/2021-12-17.html) generation with **[RML-star](https://w3id.org/rml/star/spec)**.
- **[RML views](https://oa.upm.es/73463/1/_2023___ESWC__RML_Tabular_Views.pdf)** over tabular data sources and [JSON](https://www.json.org) files.
- Integration with **[RDFLib](https://rdflib.readthedocs.io)**, **[Oxigraph](https://pyoxigraph.readthedocs.io/en/latest/)** and **[Kafka](https://kafka-python.readthedocs.io)**.
- **Optimized** to materialize large knowledge graphs.
- **Remote** data and mapping files.
- Input data formats:
    - **Relational databases**: **[MySQL](https://www.mysql.com/)**, **[PostgreSQL](https://www.postgresql.org/)**, **[Oracle](https://www.oracle.com/database/)**, **[Microsoft SQL Server](https://www.microsoft.com/sql-server)**, **[MariaDB](https://mariadb.org/)**, **[SQLite](https://www.sqlite.org)**.
    - **Tabular files**: **[CSV](https://en.wikipedia.org/wiki/Comma-separated_values)**, **[TSV](https://en.wikipedia.org/wiki/Tab-separated_values)**, **[Excel](https://www.microsoft.com/en-us/microsoft-365/excel)**, **[Parquet](https://parquet.apache.org/documentation/latest/)**, **[Feather](https://arrow.apache.org/docs/python/feather.html)**, **[ORC](https://orc.apache.org/)**, **[Stata](https://www.stata.com/)**, **[SAS](https://www.sas.com)**, **[SPSS](https://www.ibm.com/analytics/spss-statistics-software)**, **[ODS](https://en.wikipedia.org/wiki/OpenDocument)**.
    - **Hierarchical files**: **[JSON](https://www.json.org)**, **[XML](https://www.w3.org/TR/xml/)**.
    - **In-memory data structures**: **[Python Dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)**, **[DataFrames](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)**.
	- **Cloud data lake solutions**: **[Databricks](https://www.databricks.com/)**.

## Documentation :bookmark_tabs:

**[Read the documentation](https://morph-kgc.readthedocs.io/en/latest/documentation/)**.

## Tutorial :woman_teacher:

Learn quickly with the tutorial in **[Google Colaboratory](https://colab.research.google.com/drive/1ByFx_NOEfTZeaJ1Wtw3UwTH3H3-Sye2O?usp=sharing)**!

## Getting Started :rocket:

**[PyPi](https://pypi.org/project/morph-kgc/)** is the fastest way to install Morph-KGC:
```bash
pip install morph-kgc
```

We recommend to use **[virtual environments](https://docs.python.org/3/library/venv.html#)** to install Morph-KGC.

To run the engine via **command line** you just need to execute the following:
```bash
python3 -m morph_kgc config.ini
```

Check the **[documentation](https://morph-kgc.readthedocs.io/en/latest/documentation/#configuration)** to see how to generate the configuration **INI file**. **[Here](https://github.com/morph-kgc/morph-kgc/blob/main/examples/configuration-file/default_config.ini)** you can also see an example INI file.

It is also possible to run Morph-KGC as a **library** with **[RDFLib](https://rdflib.readthedocs.io)**, **[Oxigraph](https://pyoxigraph.readthedocs.io/en/latest/)** and **[Kafka](https://kafka-python.readthedocs.io)**:
```python
import morph_kgc

# generate the triples and load them to an RDFLib graph
g_rdflib = morph_kgc.materialize('/path/to/config.ini')
# work with the RDFLib graph
q_res = g_rdflib.query('SELECT DISTINCT ?classes WHERE { ?s a ?classes }')

# generate the triples and load them to Oxigraph
g_oxigraph = morph_kgc.materialize_oxigraph('/path/to/config.ini')
# work with Oxigraph
q_res = g_oxigraph.query('SELECT DISTINCT ?classes WHERE { ?s a ?classes }')

# the methods above also accept the config as a string
config = """
            [DataSource1]
            mappings: /path/to/mapping/mapping_file.rml.ttl
            db_url: mysql+pymysql://user:password@localhost:3306/db_name
         """
g_rdflib = morph_kgc.materialize(config)
```

## License :unlock:

Morph-KGC is available under the **[Apache License 2.0](https://github.com/morph-kgc/morph-kgc/blob/main/LICENSE)**.

## Author & Contact :mailbox_with_mail:

- **[Julián Arenas-Guerrero](https://github.com/arenas-guerrero-julian/) - [julian.arenas.guerrero@upm.es](mailto:julian.arenas.guerrero@upm.es)**

*[Ontology Engineering Group](https://oeg.fi.upm.es)*, *[Universidad Politécnica de Madrid](https://www.upm.es/internacional)*.

## Citing :speech_balloon:

If you used Morph-KGC in your work, please cite the **[SWJ paper](https://www.doi.org/10.3233/SW-223135)**:

```bib
@article{arenas2024morph,
  title     = {{Morph-KGC: Scalable knowledge graph materialization with mapping partitions}},
  author    = {Arenas-Guerrero, Julián and Chaves-Fraga, David and Toledo, Jhon and Pérez, María S. and Corcho, Oscar},
  journal   = {Semantic Web},
  publisher = {IOS Press},
  issn      = {2210-4968},
  year      = {2024},
  doi       = {10.3233/SW-223135},
  volume    = {15},
  number    = {1},
  pages     = {1-20}
}
```

## Sponsor :shield:

<p align="center">
<img src="https://github.com/morph-kgc/morph-kgc-docs/blob/main/docs/assets/BASF.png" height="100" alt="BASF">
</p>

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "morph-kgc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "Data Integration, Knowledge Graph, Morph-KGC, R2RML, RDF, RML, RML-star",
    "author": null,
    "author_email": "Juli\u00e1n Arenas-Guerrero <julian.arenas.guerrero@upm.es>",
    "download_url": "https://files.pythonhosted.org/packages/ae/e8/1a4d93643215c375351b02242ac44a2741c2ff68d4af4859862ce8281c73/morph_kgc-2.7.0.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n<img src=\"https://github.com/morph-kgc/morph-kgc/blob/main/logo/logo.png\" height=\"100\" alt=\"morph\">\n</p>\n\n[![License](https://img.shields.io/pypi/l/morph-kgc.svg)](https://github.com/morph-kgc/morph-kgc/blob/main/LICENSE)\n[![DOI](https://zenodo.org/badge/311956260.svg?style=flat)](https://zenodo.org/badge/latestdoi/311956260)\n[![Latest PyPI version](https://img.shields.io/pypi/v/morph-kgc?style=flat)](https://pypi.python.org/pypi/morph-kgc)\n[![Python Version](https://img.shields.io/pypi/pyversions/morph-kgc.svg)](https://pypi.python.org/pypi/morph-kgc)\n[![PyPI status](https://img.shields.io:/pypi/status/morph-kgc?)](https://pypi.python.org/pypi/morph-kgc)\n[![build](https://github.com/morph-kgc/morph-kgc/actions/workflows/ci.yml/badge.svg)](https://github.com/morph-kgc/morph-kgc/actions/workflows/ci.yml)\n[![Documentation Status](https://readthedocs.org/projects/morph-kgc/badge/?version=latest)](https://morph-kgc.readthedocs.io/en/latest/?badge=latest)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ByFx_NOEfTZeaJ1Wtw3UwTH3H3-Sye2O?usp=sharing)\n\n**Morph-KGC** is an engine that constructs **[RDF](https://www.w3.org/TR/rdf11-concepts/)** knowledge graphs from heterogeneous data sources with the **[R2RML](https://www.w3.org/TR/r2rml/)** and **[RML](https://w3id.org/rml/core/spec)** mapping languages. Morph-KGC is built on top of [pandas](https://pandas.pydata.org/) and it leverages *mapping partitions* to significantly reduce execution times and memory consumption for large data sources.\n\n## Features :sparkles:\n\n- Supports the **[R2RML](https://www.w3.org/TR/r2rml/)** and **[RML](https://w3id.org/rml/core/spec)** mapping languages.\n- User-friendly mappings with **[YARRRML](https://rml.io/yarrrml/spec/)**.\n- Transformation functions with **[RML-FNML](https://w3id.org/rml/fnml/spec)**, including **Python user-defined functions**.\n- [RDF-star](https://w3c.github.io/rdf-star/cg-spec/2021-12-17.html) generation with **[RML-star](https://w3id.org/rml/star/spec)**.\n- **[RML views](https://oa.upm.es/73463/1/_2023___ESWC__RML_Tabular_Views.pdf)** over tabular data sources and [JSON](https://www.json.org) files.\n- Integration with **[RDFLib](https://rdflib.readthedocs.io)**, **[Oxigraph](https://pyoxigraph.readthedocs.io/en/latest/)** and **[Kafka](https://kafka-python.readthedocs.io)**.\n- **Optimized** to materialize large knowledge graphs.\n- **Remote** data and mapping files.\n- Input data formats:\n    - **Relational databases**: **[MySQL](https://www.mysql.com/)**, **[PostgreSQL](https://www.postgresql.org/)**, **[Oracle](https://www.oracle.com/database/)**, **[Microsoft SQL Server](https://www.microsoft.com/sql-server)**, **[MariaDB](https://mariadb.org/)**, **[SQLite](https://www.sqlite.org)**.\n    - **Tabular files**: **[CSV](https://en.wikipedia.org/wiki/Comma-separated_values)**, **[TSV](https://en.wikipedia.org/wiki/Tab-separated_values)**, **[Excel](https://www.microsoft.com/en-us/microsoft-365/excel)**, **[Parquet](https://parquet.apache.org/documentation/latest/)**, **[Feather](https://arrow.apache.org/docs/python/feather.html)**, **[ORC](https://orc.apache.org/)**, **[Stata](https://www.stata.com/)**, **[SAS](https://www.sas.com)**, **[SPSS](https://www.ibm.com/analytics/spss-statistics-software)**, **[ODS](https://en.wikipedia.org/wiki/OpenDocument)**.\n    - **Hierarchical files**: **[JSON](https://www.json.org)**, **[XML](https://www.w3.org/TR/xml/)**.\n    - **In-memory data structures**: **[Python Dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)**, **[DataFrames](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)**.\n\t- **Cloud data lake solutions**: **[Databricks](https://www.databricks.com/)**.\n\n## Documentation :bookmark_tabs:\n\n**[Read the documentation](https://morph-kgc.readthedocs.io/en/latest/documentation/)**.\n\n## Tutorial :woman_teacher:\n\nLearn quickly with the tutorial in **[Google Colaboratory](https://colab.research.google.com/drive/1ByFx_NOEfTZeaJ1Wtw3UwTH3H3-Sye2O?usp=sharing)**!\n\n## Getting Started :rocket:\n\n**[PyPi](https://pypi.org/project/morph-kgc/)** is the fastest way to install Morph-KGC:\n```bash\npip install morph-kgc\n```\n\nWe recommend to use **[virtual environments](https://docs.python.org/3/library/venv.html#)** to install Morph-KGC.\n\nTo run the engine via **command line** you just need to execute the following:\n```bash\npython3 -m morph_kgc config.ini\n```\n\nCheck the **[documentation](https://morph-kgc.readthedocs.io/en/latest/documentation/#configuration)** to see how to generate the configuration **INI file**. **[Here](https://github.com/morph-kgc/morph-kgc/blob/main/examples/configuration-file/default_config.ini)** you can also see an example INI file.\n\nIt is also possible to run Morph-KGC as a **library** with **[RDFLib](https://rdflib.readthedocs.io)**, **[Oxigraph](https://pyoxigraph.readthedocs.io/en/latest/)** and **[Kafka](https://kafka-python.readthedocs.io)**:\n```python\nimport morph_kgc\n\n# generate the triples and load them to an RDFLib graph\ng_rdflib = morph_kgc.materialize('/path/to/config.ini')\n# work with the RDFLib graph\nq_res = g_rdflib.query('SELECT DISTINCT ?classes WHERE { ?s a ?classes }')\n\n# generate the triples and load them to Oxigraph\ng_oxigraph = morph_kgc.materialize_oxigraph('/path/to/config.ini')\n# work with Oxigraph\nq_res = g_oxigraph.query('SELECT DISTINCT ?classes WHERE { ?s a ?classes }')\n\n# the methods above also accept the config as a string\nconfig = \"\"\"\n            [DataSource1]\n            mappings: /path/to/mapping/mapping_file.rml.ttl\n            db_url: mysql+pymysql://user:password@localhost:3306/db_name\n         \"\"\"\ng_rdflib = morph_kgc.materialize(config)\n```\n\n## License :unlock:\n\nMorph-KGC is available under the **[Apache License 2.0](https://github.com/morph-kgc/morph-kgc/blob/main/LICENSE)**.\n\n## Author & Contact :mailbox_with_mail:\n\n- **[Juli\u00e1n Arenas-Guerrero](https://github.com/arenas-guerrero-julian/) - [julian.arenas.guerrero@upm.es](mailto:julian.arenas.guerrero@upm.es)**\n\n*[Ontology Engineering Group](https://oeg.fi.upm.es)*, *[Universidad Polit\u00e9cnica de Madrid](https://www.upm.es/internacional)*.\n\n## Citing :speech_balloon:\n\nIf you used Morph-KGC in your work, please cite the **[SWJ paper](https://www.doi.org/10.3233/SW-223135)**:\n\n```bib\n@article{arenas2024morph,\n  title     = {{Morph-KGC: Scalable knowledge graph materialization with mapping partitions}},\n  author    = {Arenas-Guerrero, Juli\u00e1n and Chaves-Fraga, David and Toledo, Jhon and P\u00e9rez, Mar\u00eda S. and Corcho, Oscar},\n  journal   = {Semantic Web},\n  publisher = {IOS Press},\n  issn      = {2210-4968},\n  year      = {2024},\n  doi       = {10.3233/SW-223135},\n  volume    = {15},\n  number    = {1},\n  pages     = {1-20}\n}\n```\n\n## Sponsor :shield:\n\n<p align=\"center\">\n<img src=\"https://github.com/morph-kgc/morph-kgc-docs/blob/main/docs/assets/BASF.png\" height=\"100\" alt=\"BASF\">\n</p>\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Powerful [R2]RML engine to create RDF knowledge graphs from heterogeneous data sources.",
    "version": "2.7.0",
    "project_urls": {
        "CI": "https://github.com/morph-kgc/morph-kgc/actions",
        "Documentation": "https://morph-kgc.readthedocs.io",
        "History": "https://github.com/morph-kgc/morph-kgc/releases",
        "Homepage": "https://morph-kgc.readthedocs.io",
        "Source": "https://github.com/morph-kgc/morph-kgc",
        "Tracker": "https://github.com/morph-kgc/morph-kgc/issues"
    },
    "split_keywords": [
        "data integration",
        " knowledge graph",
        " morph-kgc",
        " r2rml",
        " rdf",
        " rml",
        " rml-star"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "35241c042fc0a4b9c0bf6d7fe267e9f1f51124f1eb7f8e5d0e90064672df7093",
                "md5": "2764d83b9aecfe5761e529145afaecc3",
                "sha256": "241c9526f41cce20310f5ad45c2af4841f330f2c7dd6ada347077bbccae32963"
            },
            "downloads": -1,
            "filename": "morph_kgc-2.7.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2764d83b9aecfe5761e529145afaecc3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 54094,
            "upload_time": "2024-04-01T15:16:47",
            "upload_time_iso_8601": "2024-04-01T15:16:47.459084Z",
            "url": "https://files.pythonhosted.org/packages/35/24/1c042fc0a4b9c0bf6d7fe267e9f1f51124f1eb7f8e5d0e90064672df7093/morph_kgc-2.7.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aee81a4d93643215c375351b02242ac44a2741c2ff68d4af4859862ce8281c73",
                "md5": "0a97065d412341dcbce080b6b2fdb71d",
                "sha256": "11c317a02075ec211b5de39eb9a0d5b8d82ad3686fcfe12cc0a9cd4583888a58"
            },
            "downloads": -1,
            "filename": "morph_kgc-2.7.0.tar.gz",
            "has_sig": false,
            "md5_digest": "0a97065d412341dcbce080b6b2fdb71d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 219846,
            "upload_time": "2024-04-01T15:16:50",
            "upload_time_iso_8601": "2024-04-01T15:16:50.013969Z",
            "url": "https://files.pythonhosted.org/packages/ae/e8/1a4d93643215c375351b02242ac44a2741c2ff68d4af4859862ce8281c73/morph_kgc-2.7.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-01 15:16:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "morph-kgc",
    "github_project": "morph-kgc",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "morph-kgc"
}
        
Elapsed time: 0.21711s