# SimpleKG 🧠
[](https://badge.fury.io/py/simplekg)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
**SimpleKG** is a powerful Python package for generating Knowledge Graphs from Hebrew text using state-of-the-art language models like OpenAI's GPT-4o. It extracts entities, relationships, and creates visual knowledge representations with ontology support.
## 🚀 Features
- **Hebrew Text Processing**: Specialized for Hebrew text analysis and knowledge extraction
- **Entity & Relation Extraction**: Automatically identifies concepts and their relationships
- **Ontology Integration**: Supports SKOS and other ontology standards
- **Interactive Visualizations**: Generates beautiful HTML visualizations of knowledge graphs
- **Command Line Interface**: Easy-to-use CLI for batch processing
- **Python API**: Flexible programmatic interface for integration
- **Multiple Output Formats**: JSON, HTML visualizations, and more
## 📦 Installation
Install SimpleKG using pip:
```bash
pip install simplekg
```
## 🔧 Quick Start
### Command Line Interface
```bash
# Process a Hebrew text file
simplekg -i input.txt -o output/ --model "openai/gpt-4o" --api-key YOUR_API_KEY
# Use environment variable for API key
export OPENAI_API_KEY="your-api-key"
simplekg -i input.txt -o output/
# Get help
simplekg --help
```
### Python API
```python
from simplekg import KnowledgeGraphGenerator
# Initialize the generator
kggen = KnowledgeGraphGenerator(
model="openai/gpt-4o",
api_key="your-api-key"
)
# Process Hebrew text
text = "טקסט עברי לעיבוד..."
kggen.extractConcepts(text=text)
kggen.extractRelations()
kggen.groupConcepts()
# Apply ontology
kggen.relations2ontology(["SKOS"])
# Generate visualization
kggen.visualize("output.html")
# Save as JSON
kggen.dump_graph("graph.json")
```
## 🛠️ API Reference
### KnowledgeGraphGenerator
The main class for knowledge graph generation.
**Parameters:**
- `model` (str): Language model to use (default: "openai/gpt-4o")
- `api_key` (str): API key for the language model
- `temperature` (float): Model temperature (default: 0.0)
- `api_base` (str): Custom API base URL (optional)
- `chunk_size` (int): Text chunk size for processing (default: 0)
**Key Methods:**
- `extractConcepts(text, chunk_size=0, verbose=False)`: Extract concepts from text
- `extractRelations(verbose=False)`: Extract relationships between concepts
- `groupConcepts(verbose=False)`: Group similar concepts
- `relations2ontology(ontologies)`: Apply ontology standards
- `visualize(output_file)`: Generate HTML visualization
- `dump_graph(output_file)`: Save graph as JSON
## 📊 Supported Ontologies
- **SKOS** (Simple Knowledge Organization System)
- More ontologies coming soon!
## 🎯 Use Cases
- **Academic Research**: Process Hebrew academic papers and texts
- **Digital Humanities**: Analyze Hebrew literature and historical documents
- **Knowledge Management**: Create knowledge bases from Hebrew content
- **Content Analysis**: Understand relationships in Hebrew texts
- **Educational Tools**: Build learning resources from Hebrew materials
## 🔑 Environment Setup
Set up your API key as an environment variable:
```bash
# Add to your shell profile (.bashrc, .zshrc, etc.)
export OPENAI_API_KEY="your-openai-api-key"
```
## 📁 Output Files
SimpleKG generates several types of output:
- **JSON Graph**: Complete graph data structure
- **HTML Visualization**: Interactive web-based visualization
- **Ontology Files**: Structured ontology representations
## 🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 📚 Citation
If you use SimpleKG in your research, please cite:
```bibtex
@software{simplekg,
author = {Hadar Miller},
title = {SimpleKG: Knowledge Graph Generation from Hebrew Text},
url = {https://gitlab.com/millerhadar/simplekg},
version = {0.1.0},
year = {2025}
}
```
## 🔗 Links
- [Documentation](https://gitlab.com/millerhadar/simplekg)
- [Issues](https://gitlab.com/millerhadar/simplekg/-/issues)
- [PyPI Package](https://pypi.org/project/simplekg/)
---
Made with ❤️ for the Hebrew NLP community
---
## 🚀 Installation
### From PyPI (recommended)
```bash
pip install simplekg
```
### From Source
```bash
git clone https://gitlab.com/millerhadar/simplekg.git
cd simplekg
pip install -e .
```
### Development Installation
```bash
git clone https://gitlab.com/millerhadar/simplekg.git
cd simplekg
pip install -e ".[dev]"
```
---
## 🎯 Features
- Extract entities and relations from raw text using an LLM.
- Normalize entities into SKOS-compatible **concepts** (canonical + alt labels).
- Normalize relations into **predicates** with multiple strategies:
- LLM clustering with adjustable granularity (LOW, MEDIUM, HIGH).
- Embedding-based clustering + LLM naming.
- Alignment to known ontologies (SKOS, Dublin Core, CIDOC CRM).
- Support for **multiple ontologies** in parallel.
- Graph export and HTML visualization.
---
## 📖 Workflow
### 1. Initialize
```python
import simplekg as kg
import os
kggen = kg.KnowledgeGraphGenerator(
model="openai/gpt-4o",
api_key=os.getenv("OPENAI_API_KEY")
)
```
### 2. Extract Entities
```python
kggen.extractConcepts(text=text, chunk_size=chunk_size, verbose=True)
```
- `chunk_size=0`: analyze the text as a whole (may dilute context).
- `chunk_size>0`: split text into smaller parts, generate subgraphs, then merge.
### 3. Extract Relations
```python
kggen.extractRelations(verbose = True)
```
### 4. Normalize Entities
Groups entities into **concepts** (canonical + alternate names).
```python
kggen.groupConcepts(chunk_size= 160, threshold= 160, max_iterations= 2, verbose=True)
```
### 5. Normalize Relations
- **LLM clustering**: more detailed, but text-specific.
- **Ontology alignment**: more general, enables cross-text comparison.
- **Multiple ontologies**: select the best relation across several vocabularies.
- **LLM validation** *(planned)*: confirm embedding-based alignment.
```python
kggen.relations2ontology(["SKOS"])
kggen.relations2ontology(["CIDOC_CRM", "DUBLIN_CORE"])
```
### 6. Visualization
Convert the KG into a chosen ontology and render HTML.
```python
ontology = "SKOS" # or "MIX" for multiple ontologies
kggen.graph2Ontology(ontology)
viz = kggen.visualize(f"../../vis/{file_name}_c{chunk_size}_ontology_{ontology}.html")
```
---
## � Command Line Interface
SimpleKG also provides a command-line interface:
```bash
# Basic usage
simplekg --input-file text.txt --output-dir ./output
# With specific model and ontologies
simplekg --input-file text.txt --model "openai/gpt-4o" --ontologies SKOS CIDOC_CRM --verbose
# Get help
simplekg --help
```
---
## �📚 Supported Ontologies
### SKOS
- `skos:broader` / `skos:narrower`
- `skos:related`
- Mapping terms: `exactMatch`, `closeMatch`, etc.
### Dublin Core
- `dcterms:isPartOf` / `dcterms:hasPart`
- `dcterms:references` / `dcterms:isReferencedBy`
- Versioning, formats, replacements, requirements.
### CIDOC CRM (selected)
- `P5_consists_of` (part-of)
- `P7_took_place_at` (event location)
- `P13_destroyed` (destruction)
- `P53_has_former_or_current_location`
- `P94_has_created` (production/creation)
---
## ⚖️ Design Trade-offs
- **Whole text vs. chunks**: global context vs. fine-grained extraction.
- **LLM relation clustering**: richer inside one graph, less comparable across graphs.
- **Ontology alignment**: more comparable across graphs, but risk of oversimplification.
---
## 📌 References
- [SKOS Specification](https://www.w3.org/TR/skos-reference/)
- [Dublin Core Metadata Terms](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/)
- [CIDOC CRM Official Site](https://cidoc-crm.org/)
---
## 🔮 Roadmap
- LLM-based validation for ontology alignment.
- Integration with additional ontologies.
- More advanced visualization options.
---
Raw data
{
"_id": null,
"home_page": "https://gitlab.com/millerhadar/simplekg",
"name": "simplekg",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "knowledge-graph, nlp, hebrew, ontology, skos, rdf, extraction",
"author": "Hadar Miller",
"author_email": "Hadar Miller <miller.hadar@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/c2/68/c5646a028326446bba003f45b9bad826bec614606730ee26e740825d693b/simplekg-0.1.2.tar.gz",
"platform": null,
"description": "# SimpleKG \ud83e\udde0\n\n[](https://badge.fury.io/py/simplekg)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n\n**SimpleKG** is a powerful Python package for generating Knowledge Graphs from Hebrew text using state-of-the-art language models like OpenAI's GPT-4o. It extracts entities, relationships, and creates visual knowledge representations with ontology support.\n\n## \ud83d\ude80 Features\n\n- **Hebrew Text Processing**: Specialized for Hebrew text analysis and knowledge extraction\n- **Entity & Relation Extraction**: Automatically identifies concepts and their relationships\n- **Ontology Integration**: Supports SKOS and other ontology standards\n- **Interactive Visualizations**: Generates beautiful HTML visualizations of knowledge graphs\n- **Command Line Interface**: Easy-to-use CLI for batch processing\n- **Python API**: Flexible programmatic interface for integration\n- **Multiple Output Formats**: JSON, HTML visualizations, and more\n\n## \ud83d\udce6 Installation\n\nInstall SimpleKG using pip:\n\n```bash\npip install simplekg\n```\n\n## \ud83d\udd27 Quick Start\n\n### Command Line Interface\n\n```bash\n# Process a Hebrew text file\nsimplekg -i input.txt -o output/ --model \"openai/gpt-4o\" --api-key YOUR_API_KEY\n\n# Use environment variable for API key\nexport OPENAI_API_KEY=\"your-api-key\"\nsimplekg -i input.txt -o output/\n\n# Get help\nsimplekg --help\n```\n\n### Python API\n\n```python\nfrom simplekg import KnowledgeGraphGenerator\n\n# Initialize the generator\nkggen = KnowledgeGraphGenerator(\n model=\"openai/gpt-4o\",\n api_key=\"your-api-key\"\n)\n\n# Process Hebrew text\ntext = \"\u05d8\u05e7\u05e1\u05d8 \u05e2\u05d1\u05e8\u05d9 \u05dc\u05e2\u05d9\u05d1\u05d5\u05d3...\"\nkggen.extractConcepts(text=text)\nkggen.extractRelations()\nkggen.groupConcepts()\n\n# Apply ontology\nkggen.relations2ontology([\"SKOS\"])\n\n# Generate visualization\nkggen.visualize(\"output.html\")\n\n# Save as JSON\nkggen.dump_graph(\"graph.json\")\n```\n\n## \ud83d\udee0\ufe0f API Reference\n\n### KnowledgeGraphGenerator\n\nThe main class for knowledge graph generation.\n\n**Parameters:**\n- `model` (str): Language model to use (default: \"openai/gpt-4o\")\n- `api_key` (str): API key for the language model\n- `temperature` (float): Model temperature (default: 0.0)\n- `api_base` (str): Custom API base URL (optional)\n- `chunk_size` (int): Text chunk size for processing (default: 0)\n\n**Key Methods:**\n- `extractConcepts(text, chunk_size=0, verbose=False)`: Extract concepts from text\n- `extractRelations(verbose=False)`: Extract relationships between concepts\n- `groupConcepts(verbose=False)`: Group similar concepts\n- `relations2ontology(ontologies)`: Apply ontology standards\n- `visualize(output_file)`: Generate HTML visualization\n- `dump_graph(output_file)`: Save graph as JSON\n\n## \ud83d\udcca Supported Ontologies\n\n- **SKOS** (Simple Knowledge Organization System)\n- More ontologies coming soon!\n\n## \ud83c\udfaf Use Cases\n\n- **Academic Research**: Process Hebrew academic papers and texts\n- **Digital Humanities**: Analyze Hebrew literature and historical documents\n- **Knowledge Management**: Create knowledge bases from Hebrew content\n- **Content Analysis**: Understand relationships in Hebrew texts\n- **Educational Tools**: Build learning resources from Hebrew materials\n\n## \ud83d\udd11 Environment Setup\n\nSet up your API key as an environment variable:\n\n```bash\n# Add to your shell profile (.bashrc, .zshrc, etc.)\nexport OPENAI_API_KEY=\"your-openai-api-key\"\n```\n\n## \ud83d\udcc1 Output Files\n\nSimpleKG generates several types of output:\n\n- **JSON Graph**: Complete graph data structure\n- **HTML Visualization**: Interactive web-based visualization\n- **Ontology Files**: Structured ontology representations\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\udcda Citation\n\nIf you use SimpleKG in your research, please cite:\n\n```bibtex\n@software{simplekg,\n author = {Hadar Miller},\n title = {SimpleKG: Knowledge Graph Generation from Hebrew Text},\n url = {https://gitlab.com/millerhadar/simplekg},\n version = {0.1.0},\n year = {2025}\n}\n```\n\n## \ud83d\udd17 Links\n\n- [Documentation](https://gitlab.com/millerhadar/simplekg)\n- [Issues](https://gitlab.com/millerhadar/simplekg/-/issues)\n- [PyPI Package](https://pypi.org/project/simplekg/)\n\n---\n\nMade with \u2764\ufe0f for the Hebrew NLP community\n\n---\n\n## \ud83d\ude80 Installation\n\n### From PyPI (recommended)\n```bash\npip install simplekg\n```\n\n### From Source\n```bash\ngit clone https://gitlab.com/millerhadar/simplekg.git\ncd simplekg\npip install -e .\n```\n\n### Development Installation\n```bash\ngit clone https://gitlab.com/millerhadar/simplekg.git\ncd simplekg\npip install -e \".[dev]\"\n```\n\n---\n\n## \ud83c\udfaf Features\n- Extract entities and relations from raw text using an LLM.\n- Normalize entities into SKOS-compatible **concepts** (canonical + alt labels).\n- Normalize relations into **predicates** with multiple strategies:\n - LLM clustering with adjustable granularity (LOW, MEDIUM, HIGH).\n - Embedding-based clustering + LLM naming.\n - Alignment to known ontologies (SKOS, Dublin Core, CIDOC CRM).\n- Support for **multiple ontologies** in parallel.\n- Graph export and HTML visualization.\n\n---\n\n## \ud83d\udcd6 Workflow\n\n### 1. Initialize\n```python\nimport simplekg as kg\nimport os\n\nkggen = kg.KnowledgeGraphGenerator(\n model=\"openai/gpt-4o\",\n api_key=os.getenv(\"OPENAI_API_KEY\")\n)\n```\n\n### 2. Extract Entities\n```python\nkggen.extractConcepts(text=text, chunk_size=chunk_size, verbose=True)\n```\n\n- `chunk_size=0`: analyze the text as a whole (may dilute context).\n- `chunk_size>0`: split text into smaller parts, generate subgraphs, then merge.\n\n### 3. Extract Relations\n```python\nkggen.extractRelations(verbose = True)\n```\n\n### 4. Normalize Entities\nGroups entities into **concepts** (canonical + alternate names).\n\n```python\nkggen.groupConcepts(chunk_size= 160, threshold= 160, max_iterations= 2, verbose=True)\n```\n\n### 5. Normalize Relations\n- **LLM clustering**: more detailed, but text-specific.\n- **Ontology alignment**: more general, enables cross-text comparison.\n- **Multiple ontologies**: select the best relation across several vocabularies.\n- **LLM validation** *(planned)*: confirm embedding-based alignment.\n\n```python\nkggen.relations2ontology([\"SKOS\"])\nkggen.relations2ontology([\"CIDOC_CRM\", \"DUBLIN_CORE\"])\n```\n\n### 6. Visualization\nConvert the KG into a chosen ontology and render HTML.\n\n```python\nontology = \"SKOS\" # or \"MIX\" for multiple ontologies\nkggen.graph2Ontology(ontology)\nviz = kggen.visualize(f\"../../vis/{file_name}_c{chunk_size}_ontology_{ontology}.html\")\n```\n\n---\n\n## \ufffd Command Line Interface\n\nSimpleKG also provides a command-line interface:\n\n```bash\n# Basic usage\nsimplekg --input-file text.txt --output-dir ./output\n\n# With specific model and ontologies\nsimplekg --input-file text.txt --model \"openai/gpt-4o\" --ontologies SKOS CIDOC_CRM --verbose\n\n# Get help\nsimplekg --help\n```\n\n---\n\n## \ufffd\ud83d\udcda Supported Ontologies\n\n### SKOS\n- `skos:broader` / `skos:narrower`\n- `skos:related`\n- Mapping terms: `exactMatch`, `closeMatch`, etc.\n\n### Dublin Core\n- `dcterms:isPartOf` / `dcterms:hasPart`\n- `dcterms:references` / `dcterms:isReferencedBy`\n- Versioning, formats, replacements, requirements.\n\n### CIDOC CRM (selected)\n- `P5_consists_of` (part-of)\n- `P7_took_place_at` (event location)\n- `P13_destroyed` (destruction)\n- `P53_has_former_or_current_location`\n- `P94_has_created` (production/creation)\n\n---\n\n## \u2696\ufe0f Design Trade-offs\n- **Whole text vs. chunks**: global context vs. fine-grained extraction.\n- **LLM relation clustering**: richer inside one graph, less comparable across graphs.\n- **Ontology alignment**: more comparable across graphs, but risk of oversimplification.\n\n---\n\n## \ud83d\udccc References\n- [SKOS Specification](https://www.w3.org/TR/skos-reference/)\n- [Dublin Core Metadata Terms](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/)\n- [CIDOC CRM Official Site](https://cidoc-crm.org/)\n\n---\n\n## \ud83d\udd2e Roadmap\n- LLM-based validation for ontology alignment.\n- Integration with additional ontologies.\n- More advanced visualization options.\n\n---\n",
"bugtrack_url": null,
"license": null,
"summary": "A complete workflow for generating, normalizing, and visualizing Knowledge Graphs from unstructured Hebrew text",
"version": "0.1.2",
"project_urls": {
"Bug Tracker": "https://gitlab.com/millerhadar/simplekg/-/issues",
"Documentation": "https://gitlab.com/millerhadar/simplekg/-/blob/main/README.md",
"Homepage": "https://gitlab.com/millerhadar/simplekg",
"Source Code": "https://gitlab.com/millerhadar/simplekg"
},
"split_keywords": [
"knowledge-graph",
" nlp",
" hebrew",
" ontology",
" skos",
" rdf",
" extraction"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9d348b1f6cc8fffbde46598a297d4adb415306afe715274b1e98c000ba5490a6",
"md5": "a1bf8b7f8d0d4838c851b349e6228282",
"sha256": "3b919de59ef90362901925b181e34ff2f1f2cd12f7976522999e132538699bf8"
},
"downloads": -1,
"filename": "simplekg-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a1bf8b7f8d0d4838c851b349e6228282",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 39746,
"upload_time": "2025-08-20T04:46:17",
"upload_time_iso_8601": "2025-08-20T04:46:17.088178Z",
"url": "https://files.pythonhosted.org/packages/9d/34/8b1f6cc8fffbde46598a297d4adb415306afe715274b1e98c000ba5490a6/simplekg-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c268c5646a028326446bba003f45b9bad826bec614606730ee26e740825d693b",
"md5": "bb18a7ca137a6ea57dd3183a0a11789a",
"sha256": "d7c8b223559d467b0c17a1abdbf0db00b4afbdcd1565d44f713277f245dbfeb6"
},
"downloads": -1,
"filename": "simplekg-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "bb18a7ca137a6ea57dd3183a0a11789a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 42268,
"upload_time": "2025-08-20T04:46:18",
"upload_time_iso_8601": "2025-08-20T04:46:18.237065Z",
"url": "https://files.pythonhosted.org/packages/c2/68/c5646a028326446bba003f45b9bad826bec614606730ee26e740825d693b/simplekg-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-20 04:46:18",
"github": false,
"gitlab": true,
"bitbucket": false,
"codeberg": false,
"gitlab_user": "millerhadar",
"gitlab_project": "simplekg",
"lcname": "simplekg"
}