<p align="center">
<img src="https://raw.githubusercontent.com/morph-kgc/morph-kgc/main/logo/logo.png" height="100" alt="morph">
</p>
[![License](https://img.shields.io/pypi/l/morph-kgc.svg)](https://github.com/morph-kgc/morph-kgc/blob/main/LICENSE)
[![DOI](https://zenodo.org/badge/311956260.svg?style=flat)](https://zenodo.org/badge/latestdoi/311956260)
[![Latest PyPI version](https://img.shields.io/pypi/v/morph-kgc?style=flat)](https://pypi.python.org/pypi/morph-kgc)
[![Python Version](https://img.shields.io/pypi/pyversions/morph-kgc.svg)](https://pypi.python.org/pypi/morph-kgc)
[![PyPI status](https://img.shields.io:/pypi/status/morph-kgc?)](https://pypi.python.org/pypi/morph-kgc)
[![build](https://github.com/morph-kgc/morph-kgc/actions/workflows/ci.yml/badge.svg)](https://github.com/morph-kgc/morph-kgc/actions/workflows/ci.yml)
[![Documentation Status](https://readthedocs.org/projects/morph-kgc/badge/?version=latest)](https://morph-kgc.readthedocs.io/en/stable/?badge=latest)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ByFx_NOEfTZeaJ1Wtw3UwTH3H3-Sye2O?usp=sharing)
**Morph-KGC** is an engine that constructs **[RDF](https://www.w3.org/TR/rdf11-concepts/)** knowledge graphs from heterogeneous data sources with the **[R2RML](https://www.w3.org/TR/r2rml/)** and **[RML](https://w3id.org/rml/core/spec)** mapping languages. Morph-KGC is built on top of [pandas](https://pandas.pydata.org/) and it leverages *mapping partitions* to significantly reduce execution times and memory consumption for large data sources.
## Features :sparkles:
- User-friendly mappings with **[YARRRML](https://rml.io/yarrrml/spec/)**.
- Transformation functions with **[RML-FNML](https://w3id.org/rml/fnml/spec)**, including **Python user-defined functions**.
- [RDF-star](https://w3c.github.io/rdf-star/cg-spec/2021-12-17.html) generation with **[RML-star](https://w3id.org/rml/star/spec)**.
- **[RML views](https://2023.eswc-conferences.org/wp-content/uploads/2023/05/paper_Arenas-Guerrero_2023_Boosting.pdf)** over tabular data sources and [JSON](https://www.json.org) files.
- Integration with **[RDFLib](https://rdflib.readthedocs.io)**, **[Oxigraph](https://pyoxigraph.readthedocs.io/en)** and [Kafka](https://kafka-python.readthedocs.io).
- **Optimized** to materialize large knowledge graphs.
- **Remote** data and mapping files.
- Input data formats:
- **Relational databases**: [MySQL](https://www.mysql.com/), [PostgreSQL](https://www.postgresql.org/), [Oracle](https://www.oracle.com/database/), [Microsoft SQL Server](https://www.microsoft.com/sql-server), [MariaDB](https://mariadb.org/), [SQLite](https://www.sqlite.org).
- **Tabular files**: [CSV](https://en.wikipedia.org/wiki/Comma-separated_values), [TSV](https://en.wikipedia.org/wiki/Tab-separated_values), [Excel](https://www.microsoft.com/en-us/microsoft-365/excel), [Parquet](https://parquet.apache.org/documentation), [Feather](https://arrow.apache.org/docs/python/feather.html), [ORC](https://orc.apache.org/), [Stata](https://www.stata.com/), [SAS](https://www.sas.com), [SPSS](https://www.ibm.com/analytics/spss-statistics-software), [ODS](https://en.wikipedia.org/wiki/OpenDocument).
- **Hierarchical files**: [JSON](https://www.json.org), [XML](https://www.w3.org/TR/xml/).
- **In-memory data structures**: [Python Dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries), [DataFrames](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).
- **Cloud data lake solutions**: [Databricks](https://www.databricks.com/).
- **Property graph databases**: [Neo4j](https://neo4j.com/), [Kùzu](https://kuzudb.com).
## Documentation :bookmark_tabs:
**[Read the documentation](https://morph-kgc.readthedocs.io/en/stable/documentation/)**.
## Tutorial :woman_teacher:
Learn quickly with the tutorial in **[Google Colaboratory](https://colab.research.google.com/drive/1ByFx_NOEfTZeaJ1Wtw3UwTH3H3-Sye2O?usp=sharing)**!
## Getting Started :rocket:
**[PyPi](https://pypi.org/project/morph-kgc/)** is the fastest way to install Morph-KGC:
```bash
pip install morph-kgc
```
We recommend to use [virtual environments](https://docs.python.org/3/library/venv.html#) to install Morph-KGC.
To run the engine via **command line** you just need to execute the following:
```bash
python3 -m morph_kgc config.ini
```
Check the **[documentation](https://morph-kgc.readthedocs.io/endocumentation/#configuration)** to see how to generate the configuration **INI file**. **[Here](https://github.com/morph-kgc/morph-kgc/blob/main/examples/configuration-file/default_config.ini)** you can also see an example INI file.
It is also possible to run Morph-KGC as a **library** with **[RDFLib](https://rdflib.readthedocs.io)** and **[Oxigraph](https://pyoxigraph.readthedocs.io/en)**:
```python
import morph_kgc
# generate the triples and load them to an RDFLib graph
g_rdflib = morph_kgc.materialize('/path/to/config.ini')
# work with the RDFLib graph
q_res = g_rdflib.query('SELECT DISTINCT ?classes WHERE { ?s a ?classes }')
# generate the triples and load them to Oxigraph
g_oxigraph = morph_kgc.materialize_oxigraph('/path/to/config.ini')
# work with Oxigraph
q_res = g_oxigraph.query('SELECT DISTINCT ?classes WHERE { ?s a ?classes }')
# the methods above also accept the config as a string
config = """
[DataSource1]
mappings: /path/to/mapping/mapping_file.rml.ttl
db_url: mysql+pymysql://user:password@localhost:3306/db_name
"""
g_rdflib = morph_kgc.materialize(config)
```
## License :unlock:
Morph-KGC is available under the **[Apache License 2.0](https://github.com/morph-kgc/morph-kgc/blob/main/LICENSE)**.
## Author & Contact :mailbox_with_mail:
- **[Julián Arenas-Guerrero](https://github.com/arenas-guerrero-julian/) - [julian.arenas.guerrero@upm.es](mailto:julian.arenas.guerrero@upm.es)**
*[Ontology Engineering Group](https://oeg.fi.upm.es)*, *[Universidad Politécnica de Madrid](https://www.upm.es/internacional)*.
## Citing :speech_balloon:
If you used Morph-KGC in your work, please cite the **[SoftwareX](https://www.sciencedirect.com/science/article/pii/S2352711024000803)** or **[SWJ](https://www.doi.org/10.3233/SW-223135)** papers:
```bib
@article{arenas2024rmlfnml,
title = {{An RML-FNML module for Python user-defined functions in Morph-KGC}},
author = {Julián Arenas-Guerrero and Paola Espinoza-Arias and José Antonio Bernabé-Diaz and Prashant Deshmukh and José Luis Sánchez-Fernández and Oscar Corcho},
journal = {SoftwareX},
year = {2024},
volume = {26},
pages = {101709},
issn = {2352-7110},
publisher = {Elsevier},
doi = {10.1016/j.softx.2024.101709}
}
@article{arenas2024morph,
title = {{Morph-KGC: Scalable knowledge graph materialization with mapping partitions}},
author = {Arenas-Guerrero, Julián and Chaves-Fraga, David and Toledo, Jhon and Pérez, María S. and Corcho, Oscar},
journal = {Semantic Web},
year = {2024},
volume = {15},
number = {1},
pages = {1-20},
issn = {2210-4968},
publisher = {IOS Press},
doi = {10.3233/SW-223135}
}
```
## Sponsor :shield:
<p align="center">
<img src="https://github.com/morph-kgc/morph-kgc-docs/blob/main/docs/assets/BASF.png" height="100" alt="BASF">
</p>
Raw data
{
"_id": null,
"home_page": null,
"name": "morph-kgc",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "Data Integration, Knowledge Graph, Morph-KGC, R2RML, RDF, RML, RML-star",
"author": null,
"author_email": "Juli\u00e1n Arenas-Guerrero <julian.arenas.guerrero@upm.es>",
"download_url": "https://files.pythonhosted.org/packages/d4/fd/fbc64581ab6861fa6972173c04bfb8924021e7d7415ead318f40ecdb9bf3/morph_kgc-2.8.1.tar.gz",
"platform": null,
"description": "<p align=\"center\">\n<img src=\"https://raw.githubusercontent.com/morph-kgc/morph-kgc/main/logo/logo.png\" height=\"100\" alt=\"morph\">\n</p>\n\n[![License](https://img.shields.io/pypi/l/morph-kgc.svg)](https://github.com/morph-kgc/morph-kgc/blob/main/LICENSE)\n[![DOI](https://zenodo.org/badge/311956260.svg?style=flat)](https://zenodo.org/badge/latestdoi/311956260)\n[![Latest PyPI version](https://img.shields.io/pypi/v/morph-kgc?style=flat)](https://pypi.python.org/pypi/morph-kgc)\n[![Python Version](https://img.shields.io/pypi/pyversions/morph-kgc.svg)](https://pypi.python.org/pypi/morph-kgc)\n[![PyPI status](https://img.shields.io:/pypi/status/morph-kgc?)](https://pypi.python.org/pypi/morph-kgc)\n[![build](https://github.com/morph-kgc/morph-kgc/actions/workflows/ci.yml/badge.svg)](https://github.com/morph-kgc/morph-kgc/actions/workflows/ci.yml)\n[![Documentation Status](https://readthedocs.org/projects/morph-kgc/badge/?version=latest)](https://morph-kgc.readthedocs.io/en/stable/?badge=latest)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ByFx_NOEfTZeaJ1Wtw3UwTH3H3-Sye2O?usp=sharing)\n\n**Morph-KGC** is an engine that constructs **[RDF](https://www.w3.org/TR/rdf11-concepts/)** knowledge graphs from heterogeneous data sources with the **[R2RML](https://www.w3.org/TR/r2rml/)** and **[RML](https://w3id.org/rml/core/spec)** mapping languages. Morph-KGC is built on top of [pandas](https://pandas.pydata.org/) and it leverages *mapping partitions* to significantly reduce execution times and memory consumption for large data sources.\n\n## Features :sparkles:\n\n- User-friendly mappings with **[YARRRML](https://rml.io/yarrrml/spec/)**.\n- Transformation functions with **[RML-FNML](https://w3id.org/rml/fnml/spec)**, including **Python user-defined functions**.\n- [RDF-star](https://w3c.github.io/rdf-star/cg-spec/2021-12-17.html) generation with **[RML-star](https://w3id.org/rml/star/spec)**.\n- **[RML views](https://2023.eswc-conferences.org/wp-content/uploads/2023/05/paper_Arenas-Guerrero_2023_Boosting.pdf)** over tabular data sources and [JSON](https://www.json.org) files.\n- Integration with **[RDFLib](https://rdflib.readthedocs.io)**, **[Oxigraph](https://pyoxigraph.readthedocs.io/en)** and [Kafka](https://kafka-python.readthedocs.io).\n- **Optimized** to materialize large knowledge graphs.\n- **Remote** data and mapping files.\n- Input data formats:\n - **Relational databases**: [MySQL](https://www.mysql.com/), [PostgreSQL](https://www.postgresql.org/), [Oracle](https://www.oracle.com/database/), [Microsoft SQL Server](https://www.microsoft.com/sql-server), [MariaDB](https://mariadb.org/), [SQLite](https://www.sqlite.org).\n - **Tabular files**: [CSV](https://en.wikipedia.org/wiki/Comma-separated_values), [TSV](https://en.wikipedia.org/wiki/Tab-separated_values), [Excel](https://www.microsoft.com/en-us/microsoft-365/excel), [Parquet](https://parquet.apache.org/documentation), [Feather](https://arrow.apache.org/docs/python/feather.html), [ORC](https://orc.apache.org/), [Stata](https://www.stata.com/), [SAS](https://www.sas.com), [SPSS](https://www.ibm.com/analytics/spss-statistics-software), [ODS](https://en.wikipedia.org/wiki/OpenDocument).\n - **Hierarchical files**: [JSON](https://www.json.org), [XML](https://www.w3.org/TR/xml/).\n - **In-memory data structures**: [Python Dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries), [DataFrames](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).\n - **Cloud data lake solutions**: [Databricks](https://www.databricks.com/).\n - **Property graph databases**: [Neo4j](https://neo4j.com/), [K\u00f9zu](https://kuzudb.com).\n\n## Documentation :bookmark_tabs:\n\n**[Read the documentation](https://morph-kgc.readthedocs.io/en/stable/documentation/)**.\n\n## Tutorial :woman_teacher:\n\nLearn quickly with the tutorial in **[Google Colaboratory](https://colab.research.google.com/drive/1ByFx_NOEfTZeaJ1Wtw3UwTH3H3-Sye2O?usp=sharing)**!\n\n## Getting Started :rocket:\n\n**[PyPi](https://pypi.org/project/morph-kgc/)** is the fastest way to install Morph-KGC:\n```bash\npip install morph-kgc\n```\n\nWe recommend to use [virtual environments](https://docs.python.org/3/library/venv.html#) to install Morph-KGC.\n\nTo run the engine via **command line** you just need to execute the following:\n```bash\npython3 -m morph_kgc config.ini\n```\n\nCheck the **[documentation](https://morph-kgc.readthedocs.io/endocumentation/#configuration)** to see how to generate the configuration **INI file**. **[Here](https://github.com/morph-kgc/morph-kgc/blob/main/examples/configuration-file/default_config.ini)** you can also see an example INI file.\n\nIt is also possible to run Morph-KGC as a **library** with **[RDFLib](https://rdflib.readthedocs.io)** and **[Oxigraph](https://pyoxigraph.readthedocs.io/en)**:\n```python\nimport morph_kgc\n\n# generate the triples and load them to an RDFLib graph\ng_rdflib = morph_kgc.materialize('/path/to/config.ini')\n# work with the RDFLib graph\nq_res = g_rdflib.query('SELECT DISTINCT ?classes WHERE { ?s a ?classes }')\n\n# generate the triples and load them to Oxigraph\ng_oxigraph = morph_kgc.materialize_oxigraph('/path/to/config.ini')\n# work with Oxigraph\nq_res = g_oxigraph.query('SELECT DISTINCT ?classes WHERE { ?s a ?classes }')\n\n# the methods above also accept the config as a string\nconfig = \"\"\"\n [DataSource1]\n mappings: /path/to/mapping/mapping_file.rml.ttl\n db_url: mysql+pymysql://user:password@localhost:3306/db_name\n \"\"\"\ng_rdflib = morph_kgc.materialize(config)\n```\n\n## License :unlock:\n\nMorph-KGC is available under the **[Apache License 2.0](https://github.com/morph-kgc/morph-kgc/blob/main/LICENSE)**.\n\n## Author & Contact :mailbox_with_mail:\n\n- **[Juli\u00e1n Arenas-Guerrero](https://github.com/arenas-guerrero-julian/) - [julian.arenas.guerrero@upm.es](mailto:julian.arenas.guerrero@upm.es)**\n\n*[Ontology Engineering Group](https://oeg.fi.upm.es)*, *[Universidad Polit\u00e9cnica de Madrid](https://www.upm.es/internacional)*.\n\n## Citing :speech_balloon:\n\nIf you used Morph-KGC in your work, please cite the **[SoftwareX](https://www.sciencedirect.com/science/article/pii/S2352711024000803)** or **[SWJ](https://www.doi.org/10.3233/SW-223135)** papers:\n\n```bib\n@article{arenas2024rmlfnml,\n title = {{An RML-FNML module for Python user-defined functions in Morph-KGC}},\n author = {Juli\u00e1n Arenas-Guerrero and Paola Espinoza-Arias and Jos\u00e9 Antonio Bernab\u00e9-Diaz and Prashant Deshmukh and Jos\u00e9 Luis S\u00e1nchez-Fern\u00e1ndez and Oscar Corcho},\n journal = {SoftwareX},\n year = {2024},\n volume = {26},\n pages = {101709},\n issn = {2352-7110},\n publisher = {Elsevier},\n doi = {10.1016/j.softx.2024.101709}\n}\n@article{arenas2024morph,\n title = {{Morph-KGC: Scalable knowledge graph materialization with mapping partitions}},\n author = {Arenas-Guerrero, Juli\u00e1n and Chaves-Fraga, David and Toledo, Jhon and P\u00e9rez, Mar\u00eda S. and Corcho, Oscar},\n journal = {Semantic Web},\n year = {2024},\n volume = {15},\n number = {1},\n pages = {1-20},\n issn = {2210-4968},\n publisher = {IOS Press},\n doi = {10.3233/SW-223135}\n}\n```\n\n## Sponsor :shield:\n\n<p align=\"center\">\n<img src=\"https://github.com/morph-kgc/morph-kgc-docs/blob/main/docs/assets/BASF.png\" height=\"100\" alt=\"BASF\">\n</p>\n",
"bugtrack_url": null,
"license": null,
"summary": "Powerful [R2]RML engine to create RDF knowledge graphs from heterogeneous data sources.",
"version": "2.8.1",
"project_urls": {
"CI": "https://github.com/morph-kgc/morph-kgc/actions",
"Documentation": "https://morph-kgc.readthedocs.io",
"History": "https://github.com/morph-kgc/morph-kgc/releases",
"Homepage": "https://morph-kgc.readthedocs.io",
"Source": "https://github.com/morph-kgc/morph-kgc",
"Tracker": "https://github.com/morph-kgc/morph-kgc/issues"
},
"split_keywords": [
"data integration",
" knowledge graph",
" morph-kgc",
" r2rml",
" rdf",
" rml",
" rml-star"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "41406c8090ea35652ca4db5549548fbc99f7aecacb699ec2cf7afb611472fa8d",
"md5": "1d92c688cd9c2cf80f9ee20879649c0b",
"sha256": "b9c6e55f25a70bd821567865b58a8649add5c0baa4d3d02a514ffb0522cc458c"
},
"downloads": -1,
"filename": "morph_kgc-2.8.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1d92c688cd9c2cf80f9ee20879649c0b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 55422,
"upload_time": "2025-01-03T17:19:10",
"upload_time_iso_8601": "2025-01-03T17:19:10.271123Z",
"url": "https://files.pythonhosted.org/packages/41/40/6c8090ea35652ca4db5549548fbc99f7aecacb699ec2cf7afb611472fa8d/morph_kgc-2.8.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d4fdfbc64581ab6861fa6972173c04bfb8924021e7d7415ead318f40ecdb9bf3",
"md5": "60e1af7c84d507a1daf2f9b1e518dd38",
"sha256": "bacd4c827ae480b3b63482a210b84c1db0f92303d8268aa4e4c041d50d175741"
},
"downloads": -1,
"filename": "morph_kgc-2.8.1.tar.gz",
"has_sig": false,
"md5_digest": "60e1af7c84d507a1daf2f9b1e518dd38",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 218428,
"upload_time": "2025-01-03T17:19:12",
"upload_time_iso_8601": "2025-01-03T17:19:12.735735Z",
"url": "https://files.pythonhosted.org/packages/d4/fd/fbc64581ab6861fa6972173c04bfb8924021e7d7415ead318f40ecdb9bf3/morph_kgc-2.8.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-03 17:19:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "morph-kgc",
"github_project": "morph-kgc",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "morph-kgc"
}