<div align="center">
<img src="https://raw.githubusercontent.com/lineagentic/lineagentic-kg/main/images/lineagentickg.jpg" alt="Lineagentic Logo" width="880" height="500">
</div>
# Lineagentic-KG
Lineagentic-KG is a knowledge graph builder library that converts YAML definitions into a fully operational and customizable knowledge graph. While one key use case is building a data catalog, the framework is generic and extensible—making it easy to define entities, aspects, and relationships for any domain.
With automatic REST API and CLI tooling generation, Lineagentic-KG delivers a “batteries included” experience for quickly turning YAML into production-ready knowledge graph.
##Features
- **Generic Metadata Model Generator**: Define flexible, extensible models with entities, aspects, and relationships.
- **REST API Generator**: Automatically expose FastAPI endpoints from your YAML registry.
- **CLI Generator**: Instantly get CLI commands derived from the registry.
- **Type-Safe Code**: Ensure correctness and reliability in data handling.
## Quick Start
```
1. pip install lineagentic-kg
```
```python
from lineagentic_kg.registry.factory import RegistryFactory
# Initialize the registry factory with your config file
registry_factory = RegistryFactory("lineagentic_kg/config/main_registry.yaml")
# Create a Neo4j writer instance
neo4j_writer = registry_factory.create_writer(
uri="bolt://localhost:7687",
user="neo4j",
password="password"
)
# Create a dataset
dataset_urn = neo4j_writer.upsert_dataset(
platform="snowflake",
name="customer_data",
env="PROD"
)
# Create a corp user
user_urn = neo4j_writer.upsert_corpuser(
username="john.doe"
)
# Create a tag
tag_urn = neo4j_writer.upsert_tag(
key="sensitive",
value="true"
)
# Retrieve entities
dataset = neo4j_writer.get_dataset(dataset_urn)
print(f"Retrieved dataset: {dataset}")
user = neo4j_writer.get_corpuser(user_urn)
print(f"Retrieved user: {user}")
# Clean up
neo4j_writer.close()
print("LineAgentic KG example completed successfully!")
```
## Restful API Generator:
If you want to auto-generated restful apis and save huge time for api creation you are in the right place. just run command:
```
make generate-and-run-api
```
In examples folder you can find a some example of how call api with curl, you can just run: ```./examples/api_calls_examples.sh```
## CLI Generator:
You can also auto-generate CLI tooling. Just run ``` make generate-cli ``` . Examples of cli calls can be found and run: ```./examples/cli_calls_examples.sh```
## How Lineagentic-KG Works
Lineagentic-KG is a **dynamic code generation system** that creates knowledge graph from YAML configuration files.
1- Registry module:Core part of the Lineagentic-KG is Registry module which is developed based on registry design pattern. Think of it as a code factory that reads configuration which is in this case is Yaml file and builds classes automatically.
2- API-Generator: This module is responsible for generating RESTful APIs. It is developed based on FastAPI framework and leverages methods generated by Registry module and builds APIs around them.
3- CLI-Generator: This module is cli interface generator. It is developed based on Click framework and leverages methods generated by Registry module to build CLI commands.
### Why This Architecture is Powerful
This system essentially turns YAML configuration into working knowledge graph backend at runtime! It provides:
1. **Flexibility**: Change data models without code changes
2. **Consistency**: All entities follow the same patterns
3. **Maintainability**: Business logic is separated from implementation
4. **Extensibility**: Easy to add new entity types and relationships
5. **Type Safety**: Generated code ensures proper data handling
## Detailed Flow Diagrams
### 1. Bootstrap Phase: RegistryFactory Initialization
This diagram shows the complete initialization flow from YAML configuration to generated class.
<p align="center">
<img src="images/01_bootstrap_phase.png" alt="Bootstrap Phase Diagram" width="800">
</p>
*Shows how RegistryFactory loads config, validates it, generates functions, and creates the final writer class.*
### 2. Runtime Phase: Using Generated Methods
This diagram shows what happens when you use the generated methods.
<p align="center">
<img src="images/02_runtime_phase.png" alt="Runtime Phase Diagram" width="800">
</p>
*Flow when calling `upsert_dataset()` and `add_aspect()`.*
### 3. Configuration Loading Flow
<p align="center">
<img src="images/03_config_loading.png" alt="Configuration Loading Diagram" width="800">
</p>
### 4. Method Generation Flow
<p align="center">
<img src="images/04_method_generation.png" alt="Method Generation Diagram" width="800">
</p>
### 5. Overall System Architecture
<p align="center">
<img src="images/05_system_architecture.png" alt="System Architecture Diagram" width="800">
</p>
## 6. Data Flow Overview
<p align="center">
<img src="images/06_data_flow.png" alt="Data Flow Diagram" width="800">
</p>
## Step-by-Step Process
1. Configuration Files (`lineagentic_kg/config/` folder)
The system starts with YAML configuration files that define the data model.
- **`main_registry.yaml`**: The main entry point that includes all other config files
- **`entities.yaml`**: Defines what types of data objects exist (Dataset, DataFlow, CorpUser, etc.)
- **`urn_patterns.yaml`**: Defines how to create unique identifiers (URNs) for each entity
- **`aspects.yaml`**: Defines properties and metadata for entities (e.g. datasetProperties, dataflowProperties, etc.)
- **`relationships.yaml`**: Defines how entities connect to each other (e.g. dataset -> dataflow)
- **`utilities.yaml`**: Defines helper functions for data processing (e.g. data cleaning, data transformation, etc.)
2. Registry Loading (`lineagentic_kg/registry/loaders.py`)
- Reads the main registry file
- Merges all included YAML files into one big configuration
- Handles file dependencies and deep merging
3. Validation (`lineagentic_kg/registry/validators.py`)
- Checks that all required sections exist
- Validates configuration structure
- Ensures everything is properly configured
4. Code Generation (`lineagentic_kg/registry/generators.py`)
- **URNGenerator**: Creates functions that generate unique identifiers
- **AspectProcessor**: Creates functions that process entity metadata
- **UtilityFunctionBuilder**: Creates helper functions for data cleaning/processing
5. Class Generation (`lineagentic_kg/registry/writers.py`)
- Takes all the generated functions and configuration
- Dynamically creates a Python class called `Neo4jMetadataWriter`
- This class has methods like:
- `upsert_dataset()`, `get_dataset()`, `delete_dataset()`
- `upsert_dataflow()`, `get_dataflow()`, `delete_dataflow()`
- And so on for each entity type
6. Factory (`lineagentic_kg/registry/factory.py`)
- Orchestrates the entire process
- Creates the final writer class
- Provides a simple interface to use the generated code
## Example in fine grained way: How a Dataset Gets Created
1. **Config says**: "Dataset entities need platform, name, env, versionId properties"
2. **URN Pattern says**: "Dataset URNs should look like: `urn:li:dataset:(platform,name,env)`"
3. **Generator creates**: A function that builds URNs from the input data
4. **Writer gets**: A method `upsert_dataset(platform="mysql", name="users", env="PROD")`
5. **Result**: Creates a dataset node in Neo4j with the URN `urn:li:dataset:(mysql,users,PROD)`
## Key Benefits
- **No hardcoded entity/aspect/relationship types**: Add new entities/aspects/relationships by just editing YAML
- **Flexible URN patterns**: Change how IDs are generated without touching code
- **Dynamic methods**: New entity types automatically get create/read/delete methods
- **Configuration-driven**: Business logic is in config files, not code
- **Maintainable**: Changes to data model only require config updates
Raw data
{
"_id": null,
"home_page": null,
"name": "lineagentic-kg",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "LineAgentic Team <team@lineagentic.com>",
"keywords": "neo4j, metadata, registry, lineage, catalog",
"author": null,
"author_email": "LineAgentic Team <team@lineagentic.com>",
"download_url": "https://files.pythonhosted.org/packages/79/37/7011ec853f716c1120baf161ff7d1d52fef4d9652358e16a315f22fce01d/lineagentic_kg-1.6.0.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <img src=\"https://raw.githubusercontent.com/lineagentic/lineagentic-kg/main/images/lineagentickg.jpg\" alt=\"Lineagentic Logo\" width=\"880\" height=\"500\">\n</div>\n\n# Lineagentic-KG\n\nLineagentic-KG is a knowledge graph builder library that converts YAML definitions into a fully operational and customizable knowledge graph. While one key use case is building a data catalog, the framework is generic and extensible\u2014making it easy to define entities, aspects, and relationships for any domain.\n\nWith automatic REST API and CLI tooling generation, Lineagentic-KG delivers a \u201cbatteries included\u201d experience for quickly turning YAML into production-ready knowledge graph.\n\n##Features\n\n- **Generic Metadata Model Generator**: Define flexible, extensible models with entities, aspects, and relationships.\n- **REST API Generator**: Automatically expose FastAPI endpoints from your YAML registry.\n- **CLI Generator**: Instantly get CLI commands derived from the registry.\n- **Type-Safe Code**: Ensure correctness and reliability in data handling.\n\n## Quick Start\n\n```\n1. pip install lineagentic-kg\n```\n```python\nfrom lineagentic_kg.registry.factory import RegistryFactory\n\n# Initialize the registry factory with your config file\nregistry_factory = RegistryFactory(\"lineagentic_kg/config/main_registry.yaml\")\n\n# Create a Neo4j writer instance\nneo4j_writer = registry_factory.create_writer(\n uri=\"bolt://localhost:7687\",\n user=\"neo4j\",\n password=\"password\"\n)\n\n\n# Create a dataset\ndataset_urn = neo4j_writer.upsert_dataset(\n platform=\"snowflake\",\n name=\"customer_data\",\n env=\"PROD\"\n)\n\n# Create a corp user\nuser_urn = neo4j_writer.upsert_corpuser(\n username=\"john.doe\"\n)\n\n# Create a tag\ntag_urn = neo4j_writer.upsert_tag(\n key=\"sensitive\",\n value=\"true\"\n)\n\n# Retrieve entities\ndataset = neo4j_writer.get_dataset(dataset_urn)\nprint(f\"Retrieved dataset: {dataset}\")\n\nuser = neo4j_writer.get_corpuser(user_urn)\nprint(f\"Retrieved user: {user}\")\n\n# Clean up\nneo4j_writer.close()\nprint(\"LineAgentic KG example completed successfully!\")\n```\n\n## Restful API Generator:\n\nIf you want to auto-generated restful apis and save huge time for api creation you are in the right place. just run command:\n\n```\nmake generate-and-run-api\n```\nIn examples folder you can find a some example of how call api with curl, you can just run: ```./examples/api_calls_examples.sh```\n\n## CLI Generator:\n\nYou can also auto-generate CLI tooling. Just run ``` make generate-cli ``` . Examples of cli calls can be found and run: ```./examples/cli_calls_examples.sh```\n\n\n## How Lineagentic-KG Works \n\nLineagentic-KG is a **dynamic code generation system** that creates knowledge graph from YAML configuration files. \n\n1- Registry module:Core part of the Lineagentic-KG is Registry module which is developed based on registry design pattern. Think of it as a code factory that reads configuration which is in this case is Yaml file and builds classes automatically. \n\n2- API-Generator: This module is responsible for generating RESTful APIs. It is developed based on FastAPI framework and leverages methods generated by Registry module and builds APIs around them.\n\n3- CLI-Generator: This module is cli interface generator. It is developed based on Click framework and leverages methods generated by Registry module to build CLI commands.\n\n### Why This Architecture is Powerful\n\nThis system essentially turns YAML configuration into working knowledge graph backend at runtime! It provides:\n\n1. **Flexibility**: Change data models without code changes\n2. **Consistency**: All entities follow the same patterns\n3. **Maintainability**: Business logic is separated from implementation\n4. **Extensibility**: Easy to add new entity types and relationships\n5. **Type Safety**: Generated code ensures proper data handling\n\n\n## Detailed Flow Diagrams\n\n### 1. Bootstrap Phase: RegistryFactory Initialization\nThis diagram shows the complete initialization flow from YAML configuration to generated class.\n\n<p align=\"center\">\n <img src=\"images/01_bootstrap_phase.png\" alt=\"Bootstrap Phase Diagram\" width=\"800\">\n</p>\n*Shows how RegistryFactory loads config, validates it, generates functions, and creates the final writer class.*\n\n### 2. Runtime Phase: Using Generated Methods\nThis diagram shows what happens when you use the generated methods.\n\n<p align=\"center\">\n <img src=\"images/02_runtime_phase.png\" alt=\"Runtime Phase Diagram\" width=\"800\">\n</p>\n*Flow when calling `upsert_dataset()` and `add_aspect()`.*\n\n### 3. Configuration Loading Flow\n<p align=\"center\">\n <img src=\"images/03_config_loading.png\" alt=\"Configuration Loading Diagram\" width=\"800\">\n</p>\n\n### 4. Method Generation Flow\n<p align=\"center\">\n <img src=\"images/04_method_generation.png\" alt=\"Method Generation Diagram\" width=\"800\">\n</p>\n\n### 5. Overall System Architecture\n<p align=\"center\">\n <img src=\"images/05_system_architecture.png\" alt=\"System Architecture Diagram\" width=\"800\">\n</p>\n\n## 6. Data Flow Overview\n<p align=\"center\">\n <img src=\"images/06_data_flow.png\" alt=\"Data Flow Diagram\" width=\"800\">\n</p>\n\n## Step-by-Step Process\n\n1. Configuration Files (`lineagentic_kg/config/` folder)\n\nThe system starts with YAML configuration files that define the data model.\n\n- **`main_registry.yaml`**: The main entry point that includes all other config files\n- **`entities.yaml`**: Defines what types of data objects exist (Dataset, DataFlow, CorpUser, etc.) \n- **`urn_patterns.yaml`**: Defines how to create unique identifiers (URNs) for each entity\n- **`aspects.yaml`**: Defines properties and metadata for entities (e.g. datasetProperties, dataflowProperties, etc.)\n- **`relationships.yaml`**: Defines how entities connect to each other (e.g. dataset -> dataflow)\n- **`utilities.yaml`**: Defines helper functions for data processing (e.g. data cleaning, data transformation, etc.)\n\n2. Registry Loading (`lineagentic_kg/registry/loaders.py`)\n\n- Reads the main registry file\n- Merges all included YAML files into one big configuration\n- Handles file dependencies and deep merging\n\n3. Validation (`lineagentic_kg/registry/validators.py`)\n\n- Checks that all required sections exist\n- Validates configuration structure\n- Ensures everything is properly configured\n\n4. Code Generation (`lineagentic_kg/registry/generators.py`)\n\n- **URNGenerator**: Creates functions that generate unique identifiers\n- **AspectProcessor**: Creates functions that process entity metadata\n- **UtilityFunctionBuilder**: Creates helper functions for data cleaning/processing\n\n5. Class Generation (`lineagentic_kg/registry/writers.py`)\n\n- Takes all the generated functions and configuration\n- Dynamically creates a Python class called `Neo4jMetadataWriter`\n- This class has methods like:\n - `upsert_dataset()`, `get_dataset()`, `delete_dataset()`\n - `upsert_dataflow()`, `get_dataflow()`, `delete_dataflow()`\n - And so on for each entity type\n\n6. Factory (`lineagentic_kg/registry/factory.py`)\n\n- Orchestrates the entire process\n- Creates the final writer class\n- Provides a simple interface to use the generated code\n\n## Example in fine grained way: How a Dataset Gets Created\n\n1. **Config says**: \"Dataset entities need platform, name, env, versionId properties\"\n2. **URN Pattern says**: \"Dataset URNs should look like: `urn:li:dataset:(platform,name,env)`\"\n3. **Generator creates**: A function that builds URNs from the input data\n4. **Writer gets**: A method `upsert_dataset(platform=\"mysql\", name=\"users\", env=\"PROD\")`\n5. **Result**: Creates a dataset node in Neo4j with the URN `urn:li:dataset:(mysql,users,PROD)`\n\n## Key Benefits\n\n- **No hardcoded entity/aspect/relationship types**: Add new entities/aspects/relationships by just editing YAML\n- **Flexible URN patterns**: Change how IDs are generated without touching code\n- **Dynamic methods**: New entity types automatically get create/read/delete methods\n- **Configuration-driven**: Business logic is in config files, not code\n- **Maintainable**: Changes to data model only require config updates\n",
"bugtrack_url": null,
"license": null,
"summary": "Turn YAML into a Knowledge Graph in minutes",
"version": "1.6.0",
"project_urls": {
"Documentation": "https://github.com/lineagentic/lineagentic-kg#readme",
"Homepage": "https://lineagentic.com",
"Issues": "https://github.com/lineagentic/lineagentic-kg/issues",
"Repository": "https://github.com/lineagentic/lineagentic-kg"
},
"split_keywords": [
"neo4j",
" metadata",
" registry",
" lineage",
" catalog"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "dc24e31c109830c5f8d318ea9624135f2526f56f8e6aca0724126e3d308abfc9",
"md5": "1e571ae9d1a8ea505497b1cdc26be5c9",
"sha256": "0035912726babf628fbda51d1a8426712d176daa610c3ec554b0554dbf000fca"
},
"downloads": -1,
"filename": "lineagentic_kg-1.6.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1e571ae9d1a8ea505497b1cdc26be5c9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 59951,
"upload_time": "2025-08-30T08:59:45",
"upload_time_iso_8601": "2025-08-30T08:59:45.392057Z",
"url": "https://files.pythonhosted.org/packages/dc/24/e31c109830c5f8d318ea9624135f2526f56f8e6aca0724126e3d308abfc9/lineagentic_kg-1.6.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "79377011ec853f716c1120baf161ff7d1d52fef4d9652358e16a315f22fce01d",
"md5": "91226fdbbe9142bfc40e239c3cd997fe",
"sha256": "b98b5195ee710f217a111c16aa00635928b3b243555501afda15e388b3216c3a"
},
"downloads": -1,
"filename": "lineagentic_kg-1.6.0.tar.gz",
"has_sig": false,
"md5_digest": "91226fdbbe9142bfc40e239c3cd997fe",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 1370573,
"upload_time": "2025-08-30T08:59:47",
"upload_time_iso_8601": "2025-08-30T08:59:47.250056Z",
"url": "https://files.pythonhosted.org/packages/79/37/7011ec853f716c1120baf161ff7d1d52fef4d9652358e16a315f22fce01d/lineagentic_kg-1.6.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-30 08:59:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "lineagentic",
"github_project": "lineagentic-kg#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "lineagentic-kg"
}