simplekg


Namesimplekg JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://gitlab.com/millerhadar/simplekg
SummaryA complete workflow for generating, normalizing, and visualizing Knowledge Graphs from unstructured Hebrew text
upload_time2025-08-20 04:46:18
maintainerNone
docs_urlNone
authorHadar Miller
requires_python>=3.8
licenseNone
keywords knowledge-graph nlp hebrew ontology skos rdf extraction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SimpleKG 🧠

[![PyPI version](https://badge.fury.io/py/simplekg.svg)](https://badge.fury.io/py/simplekg)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**SimpleKG** is a powerful Python package for generating Knowledge Graphs from Hebrew text using state-of-the-art language models like OpenAI's GPT-4o. It extracts entities, relationships, and creates visual knowledge representations with ontology support.

## 🚀 Features

- **Hebrew Text Processing**: Specialized for Hebrew text analysis and knowledge extraction
- **Entity & Relation Extraction**: Automatically identifies concepts and their relationships
- **Ontology Integration**: Supports SKOS and other ontology standards
- **Interactive Visualizations**: Generates beautiful HTML visualizations of knowledge graphs
- **Command Line Interface**: Easy-to-use CLI for batch processing
- **Python API**: Flexible programmatic interface for integration
- **Multiple Output Formats**: JSON, HTML visualizations, and more

## 📦 Installation

Install SimpleKG using pip:

```bash
pip install simplekg
```

## 🔧 Quick Start

### Command Line Interface

```bash
# Process a Hebrew text file
simplekg -i input.txt -o output/ --model "openai/gpt-4o" --api-key YOUR_API_KEY

# Use environment variable for API key
export OPENAI_API_KEY="your-api-key"
simplekg -i input.txt -o output/

# Get help
simplekg --help
```

### Python API

```python
from simplekg import KnowledgeGraphGenerator

# Initialize the generator
kggen = KnowledgeGraphGenerator(
    model="openai/gpt-4o",
    api_key="your-api-key"
)

# Process Hebrew text
text = "טקסט עברי לעיבוד..."
kggen.extractConcepts(text=text)
kggen.extractRelations()
kggen.groupConcepts()

# Apply ontology
kggen.relations2ontology(["SKOS"])

# Generate visualization
kggen.visualize("output.html")

# Save as JSON
kggen.dump_graph("graph.json")
```

## 🛠️ API Reference

### KnowledgeGraphGenerator

The main class for knowledge graph generation.

**Parameters:**
- `model` (str): Language model to use (default: "openai/gpt-4o")
- `api_key` (str): API key for the language model
- `temperature` (float): Model temperature (default: 0.0)
- `api_base` (str): Custom API base URL (optional)
- `chunk_size` (int): Text chunk size for processing (default: 0)

**Key Methods:**
- `extractConcepts(text, chunk_size=0, verbose=False)`: Extract concepts from text
- `extractRelations(verbose=False)`: Extract relationships between concepts
- `groupConcepts(verbose=False)`: Group similar concepts
- `relations2ontology(ontologies)`: Apply ontology standards
- `visualize(output_file)`: Generate HTML visualization
- `dump_graph(output_file)`: Save graph as JSON

## 📊 Supported Ontologies

- **SKOS** (Simple Knowledge Organization System)
- More ontologies coming soon!

## 🎯 Use Cases

- **Academic Research**: Process Hebrew academic papers and texts
- **Digital Humanities**: Analyze Hebrew literature and historical documents
- **Knowledge Management**: Create knowledge bases from Hebrew content
- **Content Analysis**: Understand relationships in Hebrew texts
- **Educational Tools**: Build learning resources from Hebrew materials

## 🔑 Environment Setup

Set up your API key as an environment variable:

```bash
# Add to your shell profile (.bashrc, .zshrc, etc.)
export OPENAI_API_KEY="your-openai-api-key"
```

## 📁 Output Files

SimpleKG generates several types of output:

- **JSON Graph**: Complete graph data structure
- **HTML Visualization**: Interactive web-based visualization
- **Ontology Files**: Structured ontology representations

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 📚 Citation

If you use SimpleKG in your research, please cite:

```bibtex
@software{simplekg,
  author = {Hadar Miller},
  title = {SimpleKG: Knowledge Graph Generation from Hebrew Text},
  url = {https://gitlab.com/millerhadar/simplekg},
  version = {0.1.0},
  year = {2025}
}
```

## 🔗 Links

- [Documentation](https://gitlab.com/millerhadar/simplekg)
- [Issues](https://gitlab.com/millerhadar/simplekg/-/issues)
- [PyPI Package](https://pypi.org/project/simplekg/)

---

Made with ❤️ for the Hebrew NLP community

---

## 🚀 Installation

### From PyPI (recommended)
```bash
pip install simplekg
```

### From Source
```bash
git clone https://gitlab.com/millerhadar/simplekg.git
cd simplekg
pip install -e .
```

### Development Installation
```bash
git clone https://gitlab.com/millerhadar/simplekg.git
cd simplekg
pip install -e ".[dev]"
```

---

## 🎯 Features
- Extract entities and relations from raw text using an LLM.
- Normalize entities into SKOS-compatible **concepts** (canonical + alt labels).
- Normalize relations into **predicates** with multiple strategies:
  - LLM clustering with adjustable granularity (LOW, MEDIUM, HIGH).
  - Embedding-based clustering + LLM naming.
  - Alignment to known ontologies (SKOS, Dublin Core, CIDOC CRM).
- Support for **multiple ontologies** in parallel.
- Graph export and HTML visualization.

---

## 📖 Workflow

### 1. Initialize
```python
import simplekg as kg
import os

kggen = kg.KnowledgeGraphGenerator(
    model="openai/gpt-4o",
    api_key=os.getenv("OPENAI_API_KEY")
)
```

### 2. Extract Entities
```python
kggen.extractConcepts(text=text, chunk_size=chunk_size, verbose=True)
```

- `chunk_size=0`: analyze the text as a whole (may dilute context).
- `chunk_size>0`: split text into smaller parts, generate subgraphs, then merge.

### 3. Extract Relations
```python
kggen.extractRelations(verbose = True)
```

### 4. Normalize Entities
Groups entities into **concepts** (canonical + alternate names).

```python
kggen.groupConcepts(chunk_size= 160, threshold= 160, max_iterations= 2, verbose=True)
```

### 5. Normalize Relations
- **LLM clustering**: more detailed, but text-specific.
- **Ontology alignment**: more general, enables cross-text comparison.
- **Multiple ontologies**: select the best relation across several vocabularies.
- **LLM validation** *(planned)*: confirm embedding-based alignment.

```python
kggen.relations2ontology(["SKOS"])
kggen.relations2ontology(["CIDOC_CRM", "DUBLIN_CORE"])
```

### 6. Visualization
Convert the KG into a chosen ontology and render HTML.

```python
ontology = "SKOS"  # or "MIX" for multiple ontologies
kggen.graph2Ontology(ontology)
viz = kggen.visualize(f"../../vis/{file_name}_c{chunk_size}_ontology_{ontology}.html")
```

---

## � Command Line Interface

SimpleKG also provides a command-line interface:

```bash
# Basic usage
simplekg --input-file text.txt --output-dir ./output

# With specific model and ontologies
simplekg --input-file text.txt --model "openai/gpt-4o" --ontologies SKOS CIDOC_CRM --verbose

# Get help
simplekg --help
```

---

## �📚 Supported Ontologies

### SKOS
- `skos:broader` / `skos:narrower`
- `skos:related`
- Mapping terms: `exactMatch`, `closeMatch`, etc.

### Dublin Core
- `dcterms:isPartOf` / `dcterms:hasPart`
- `dcterms:references` / `dcterms:isReferencedBy`
- Versioning, formats, replacements, requirements.

### CIDOC CRM (selected)
- `P5_consists_of` (part-of)
- `P7_took_place_at` (event location)
- `P13_destroyed` (destruction)
- `P53_has_former_or_current_location`
- `P94_has_created` (production/creation)

---

## ⚖️ Design Trade-offs
- **Whole text vs. chunks**: global context vs. fine-grained extraction.
- **LLM relation clustering**: richer inside one graph, less comparable across graphs.
- **Ontology alignment**: more comparable across graphs, but risk of oversimplification.

---

## 📌 References
- [SKOS Specification](https://www.w3.org/TR/skos-reference/)
- [Dublin Core Metadata Terms](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/)
- [CIDOC CRM Official Site](https://cidoc-crm.org/)

---

## 🔮 Roadmap
- LLM-based validation for ontology alignment.
- Integration with additional ontologies.
- More advanced visualization options.

---

            

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.com/millerhadar/simplekg",
    "name": "simplekg",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "knowledge-graph, nlp, hebrew, ontology, skos, rdf, extraction",
    "author": "Hadar Miller",
    "author_email": "Hadar Miller <miller.hadar@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/c2/68/c5646a028326446bba003f45b9bad826bec614606730ee26e740825d693b/simplekg-0.1.2.tar.gz",
    "platform": null,
    "description": "# SimpleKG \ud83e\udde0\n\n[![PyPI version](https://badge.fury.io/py/simplekg.svg)](https://badge.fury.io/py/simplekg)\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n**SimpleKG** is a powerful Python package for generating Knowledge Graphs from Hebrew text using state-of-the-art language models like OpenAI's GPT-4o. It extracts entities, relationships, and creates visual knowledge representations with ontology support.\n\n## \ud83d\ude80 Features\n\n- **Hebrew Text Processing**: Specialized for Hebrew text analysis and knowledge extraction\n- **Entity & Relation Extraction**: Automatically identifies concepts and their relationships\n- **Ontology Integration**: Supports SKOS and other ontology standards\n- **Interactive Visualizations**: Generates beautiful HTML visualizations of knowledge graphs\n- **Command Line Interface**: Easy-to-use CLI for batch processing\n- **Python API**: Flexible programmatic interface for integration\n- **Multiple Output Formats**: JSON, HTML visualizations, and more\n\n## \ud83d\udce6 Installation\n\nInstall SimpleKG using pip:\n\n```bash\npip install simplekg\n```\n\n## \ud83d\udd27 Quick Start\n\n### Command Line Interface\n\n```bash\n# Process a Hebrew text file\nsimplekg -i input.txt -o output/ --model \"openai/gpt-4o\" --api-key YOUR_API_KEY\n\n# Use environment variable for API key\nexport OPENAI_API_KEY=\"your-api-key\"\nsimplekg -i input.txt -o output/\n\n# Get help\nsimplekg --help\n```\n\n### Python API\n\n```python\nfrom simplekg import KnowledgeGraphGenerator\n\n# Initialize the generator\nkggen = KnowledgeGraphGenerator(\n    model=\"openai/gpt-4o\",\n    api_key=\"your-api-key\"\n)\n\n# Process Hebrew text\ntext = \"\u05d8\u05e7\u05e1\u05d8 \u05e2\u05d1\u05e8\u05d9 \u05dc\u05e2\u05d9\u05d1\u05d5\u05d3...\"\nkggen.extractConcepts(text=text)\nkggen.extractRelations()\nkggen.groupConcepts()\n\n# Apply ontology\nkggen.relations2ontology([\"SKOS\"])\n\n# Generate visualization\nkggen.visualize(\"output.html\")\n\n# Save as JSON\nkggen.dump_graph(\"graph.json\")\n```\n\n## \ud83d\udee0\ufe0f API Reference\n\n### KnowledgeGraphGenerator\n\nThe main class for knowledge graph generation.\n\n**Parameters:**\n- `model` (str): Language model to use (default: \"openai/gpt-4o\")\n- `api_key` (str): API key for the language model\n- `temperature` (float): Model temperature (default: 0.0)\n- `api_base` (str): Custom API base URL (optional)\n- `chunk_size` (int): Text chunk size for processing (default: 0)\n\n**Key Methods:**\n- `extractConcepts(text, chunk_size=0, verbose=False)`: Extract concepts from text\n- `extractRelations(verbose=False)`: Extract relationships between concepts\n- `groupConcepts(verbose=False)`: Group similar concepts\n- `relations2ontology(ontologies)`: Apply ontology standards\n- `visualize(output_file)`: Generate HTML visualization\n- `dump_graph(output_file)`: Save graph as JSON\n\n## \ud83d\udcca Supported Ontologies\n\n- **SKOS** (Simple Knowledge Organization System)\n- More ontologies coming soon!\n\n## \ud83c\udfaf Use Cases\n\n- **Academic Research**: Process Hebrew academic papers and texts\n- **Digital Humanities**: Analyze Hebrew literature and historical documents\n- **Knowledge Management**: Create knowledge bases from Hebrew content\n- **Content Analysis**: Understand relationships in Hebrew texts\n- **Educational Tools**: Build learning resources from Hebrew materials\n\n## \ud83d\udd11 Environment Setup\n\nSet up your API key as an environment variable:\n\n```bash\n# Add to your shell profile (.bashrc, .zshrc, etc.)\nexport OPENAI_API_KEY=\"your-openai-api-key\"\n```\n\n## \ud83d\udcc1 Output Files\n\nSimpleKG generates several types of output:\n\n- **JSON Graph**: Complete graph data structure\n- **HTML Visualization**: Interactive web-based visualization\n- **Ontology Files**: Structured ontology representations\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\udcda Citation\n\nIf you use SimpleKG in your research, please cite:\n\n```bibtex\n@software{simplekg,\n  author = {Hadar Miller},\n  title = {SimpleKG: Knowledge Graph Generation from Hebrew Text},\n  url = {https://gitlab.com/millerhadar/simplekg},\n  version = {0.1.0},\n  year = {2025}\n}\n```\n\n## \ud83d\udd17 Links\n\n- [Documentation](https://gitlab.com/millerhadar/simplekg)\n- [Issues](https://gitlab.com/millerhadar/simplekg/-/issues)\n- [PyPI Package](https://pypi.org/project/simplekg/)\n\n---\n\nMade with \u2764\ufe0f for the Hebrew NLP community\n\n---\n\n## \ud83d\ude80 Installation\n\n### From PyPI (recommended)\n```bash\npip install simplekg\n```\n\n### From Source\n```bash\ngit clone https://gitlab.com/millerhadar/simplekg.git\ncd simplekg\npip install -e .\n```\n\n### Development Installation\n```bash\ngit clone https://gitlab.com/millerhadar/simplekg.git\ncd simplekg\npip install -e \".[dev]\"\n```\n\n---\n\n## \ud83c\udfaf Features\n- Extract entities and relations from raw text using an LLM.\n- Normalize entities into SKOS-compatible **concepts** (canonical + alt labels).\n- Normalize relations into **predicates** with multiple strategies:\n  - LLM clustering with adjustable granularity (LOW, MEDIUM, HIGH).\n  - Embedding-based clustering + LLM naming.\n  - Alignment to known ontologies (SKOS, Dublin Core, CIDOC CRM).\n- Support for **multiple ontologies** in parallel.\n- Graph export and HTML visualization.\n\n---\n\n## \ud83d\udcd6 Workflow\n\n### 1. Initialize\n```python\nimport simplekg as kg\nimport os\n\nkggen = kg.KnowledgeGraphGenerator(\n    model=\"openai/gpt-4o\",\n    api_key=os.getenv(\"OPENAI_API_KEY\")\n)\n```\n\n### 2. Extract Entities\n```python\nkggen.extractConcepts(text=text, chunk_size=chunk_size, verbose=True)\n```\n\n- `chunk_size=0`: analyze the text as a whole (may dilute context).\n- `chunk_size>0`: split text into smaller parts, generate subgraphs, then merge.\n\n### 3. Extract Relations\n```python\nkggen.extractRelations(verbose = True)\n```\n\n### 4. Normalize Entities\nGroups entities into **concepts** (canonical + alternate names).\n\n```python\nkggen.groupConcepts(chunk_size= 160, threshold= 160, max_iterations= 2, verbose=True)\n```\n\n### 5. Normalize Relations\n- **LLM clustering**: more detailed, but text-specific.\n- **Ontology alignment**: more general, enables cross-text comparison.\n- **Multiple ontologies**: select the best relation across several vocabularies.\n- **LLM validation** *(planned)*: confirm embedding-based alignment.\n\n```python\nkggen.relations2ontology([\"SKOS\"])\nkggen.relations2ontology([\"CIDOC_CRM\", \"DUBLIN_CORE\"])\n```\n\n### 6. Visualization\nConvert the KG into a chosen ontology and render HTML.\n\n```python\nontology = \"SKOS\"  # or \"MIX\" for multiple ontologies\nkggen.graph2Ontology(ontology)\nviz = kggen.visualize(f\"../../vis/{file_name}_c{chunk_size}_ontology_{ontology}.html\")\n```\n\n---\n\n## \ufffd Command Line Interface\n\nSimpleKG also provides a command-line interface:\n\n```bash\n# Basic usage\nsimplekg --input-file text.txt --output-dir ./output\n\n# With specific model and ontologies\nsimplekg --input-file text.txt --model \"openai/gpt-4o\" --ontologies SKOS CIDOC_CRM --verbose\n\n# Get help\nsimplekg --help\n```\n\n---\n\n## \ufffd\ud83d\udcda Supported Ontologies\n\n### SKOS\n- `skos:broader` / `skos:narrower`\n- `skos:related`\n- Mapping terms: `exactMatch`, `closeMatch`, etc.\n\n### Dublin Core\n- `dcterms:isPartOf` / `dcterms:hasPart`\n- `dcterms:references` / `dcterms:isReferencedBy`\n- Versioning, formats, replacements, requirements.\n\n### CIDOC CRM (selected)\n- `P5_consists_of` (part-of)\n- `P7_took_place_at` (event location)\n- `P13_destroyed` (destruction)\n- `P53_has_former_or_current_location`\n- `P94_has_created` (production/creation)\n\n---\n\n## \u2696\ufe0f Design Trade-offs\n- **Whole text vs. chunks**: global context vs. fine-grained extraction.\n- **LLM relation clustering**: richer inside one graph, less comparable across graphs.\n- **Ontology alignment**: more comparable across graphs, but risk of oversimplification.\n\n---\n\n## \ud83d\udccc References\n- [SKOS Specification](https://www.w3.org/TR/skos-reference/)\n- [Dublin Core Metadata Terms](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/)\n- [CIDOC CRM Official Site](https://cidoc-crm.org/)\n\n---\n\n## \ud83d\udd2e Roadmap\n- LLM-based validation for ontology alignment.\n- Integration with additional ontologies.\n- More advanced visualization options.\n\n---\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A complete workflow for generating, normalizing, and visualizing Knowledge Graphs from unstructured Hebrew text",
    "version": "0.1.2",
    "project_urls": {
        "Bug Tracker": "https://gitlab.com/millerhadar/simplekg/-/issues",
        "Documentation": "https://gitlab.com/millerhadar/simplekg/-/blob/main/README.md",
        "Homepage": "https://gitlab.com/millerhadar/simplekg",
        "Source Code": "https://gitlab.com/millerhadar/simplekg"
    },
    "split_keywords": [
        "knowledge-graph",
        " nlp",
        " hebrew",
        " ontology",
        " skos",
        " rdf",
        " extraction"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9d348b1f6cc8fffbde46598a297d4adb415306afe715274b1e98c000ba5490a6",
                "md5": "a1bf8b7f8d0d4838c851b349e6228282",
                "sha256": "3b919de59ef90362901925b181e34ff2f1f2cd12f7976522999e132538699bf8"
            },
            "downloads": -1,
            "filename": "simplekg-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a1bf8b7f8d0d4838c851b349e6228282",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 39746,
            "upload_time": "2025-08-20T04:46:17",
            "upload_time_iso_8601": "2025-08-20T04:46:17.088178Z",
            "url": "https://files.pythonhosted.org/packages/9d/34/8b1f6cc8fffbde46598a297d4adb415306afe715274b1e98c000ba5490a6/simplekg-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c268c5646a028326446bba003f45b9bad826bec614606730ee26e740825d693b",
                "md5": "bb18a7ca137a6ea57dd3183a0a11789a",
                "sha256": "d7c8b223559d467b0c17a1abdbf0db00b4afbdcd1565d44f713277f245dbfeb6"
            },
            "downloads": -1,
            "filename": "simplekg-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "bb18a7ca137a6ea57dd3183a0a11789a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 42268,
            "upload_time": "2025-08-20T04:46:18",
            "upload_time_iso_8601": "2025-08-20T04:46:18.237065Z",
            "url": "https://files.pythonhosted.org/packages/c2/68/c5646a028326446bba003f45b9bad826bec614606730ee26e740825d693b/simplekg-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-20 04:46:18",
    "github": false,
    "gitlab": true,
    "bitbucket": false,
    "codeberg": false,
    "gitlab_user": "millerhadar",
    "gitlab_project": "simplekg",
    "lcname": "simplekg"
}
        
Elapsed time: 0.43750s