lineagentic-kg


Namelineagentic-kg JSON
Version 1.6.0 PyPI version JSON
download
home_pageNone
SummaryTurn YAML into a Knowledge Graph in minutes
upload_time2025-08-30 08:59:47
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords neo4j metadata registry lineage catalog
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  <img src="https://raw.githubusercontent.com/lineagentic/lineagentic-kg/main/images/lineagentickg.jpg" alt="Lineagentic Logo" width="880" height="500">
</div>

# Lineagentic-KG

Lineagentic-KG is a knowledge graph builder library that converts YAML definitions into a fully operational and customizable knowledge graph. While one key use case is building a data catalog, the framework is generic and extensible—making it easy to define entities, aspects, and relationships for any domain.

With automatic REST API and CLI tooling generation, Lineagentic-KG delivers a “batteries included” experience for quickly turning YAML into production-ready knowledge graph.

##Features

- **Generic Metadata Model Generator**: Define flexible, extensible models with entities, aspects, and relationships.
- **REST API Generator**: Automatically expose FastAPI endpoints from your YAML registry.
- **CLI Generator**: Instantly get CLI commands derived from the registry.
- **Type-Safe Code**: Ensure correctness and reliability in data handling.

## Quick Start

```
1. pip install lineagentic-kg
```
```python
from lineagentic_kg.registry.factory import RegistryFactory

# Initialize the registry factory with your config file
registry_factory = RegistryFactory("lineagentic_kg/config/main_registry.yaml")

# Create a Neo4j writer instance
neo4j_writer = registry_factory.create_writer(
    uri="bolt://localhost:7687",
    user="neo4j",
    password="password"
)


# Create a dataset
dataset_urn = neo4j_writer.upsert_dataset(
    platform="snowflake",
    name="customer_data",
    env="PROD"
)

# Create a corp user
user_urn = neo4j_writer.upsert_corpuser(
    username="john.doe"
)

# Create a tag
tag_urn = neo4j_writer.upsert_tag(
    key="sensitive",
    value="true"
)

# Retrieve entities
dataset = neo4j_writer.get_dataset(dataset_urn)
print(f"Retrieved dataset: {dataset}")

user = neo4j_writer.get_corpuser(user_urn)
print(f"Retrieved user: {user}")

# Clean up
neo4j_writer.close()
print("LineAgentic KG example completed successfully!")
```

## Restful API Generator:

If you want to auto-generated restful apis and save huge time for api creation you are in the right place. just run command:

```
make generate-and-run-api
```
In examples folder you can find a some example of how call api with curl, you can just run: ```./examples/api_calls_examples.sh```

## CLI Generator:

You can also auto-generate CLI tooling. Just run ``` make generate-cli ``` . Examples of cli calls can be found and run: ```./examples/cli_calls_examples.sh```


## How Lineagentic-KG Works 

Lineagentic-KG is a **dynamic code generation system** that creates knowledge graph from YAML configuration files. 

1- Registry module:Core part of the Lineagentic-KG is Registry module which is developed based on registry design pattern. Think of it as a code factory that reads configuration which is in this case is Yaml file and builds classes automatically. 

2- API-Generator: This module is responsible for generating RESTful APIs. It is developed based on FastAPI framework and leverages methods generated by Registry module and builds APIs around them.

3- CLI-Generator: This module is cli interface generator. It is developed based on Click framework and leverages methods generated by Registry module to build CLI commands.

### Why This Architecture is Powerful

This system essentially turns YAML configuration into working knowledge graph backend at runtime! It provides:

1. **Flexibility**: Change data models without code changes
2. **Consistency**: All entities follow the same patterns
3. **Maintainability**: Business logic is separated from implementation
4. **Extensibility**: Easy to add new entity types and relationships
5. **Type Safety**: Generated code ensures proper data handling


## Detailed Flow Diagrams

### 1. Bootstrap Phase: RegistryFactory Initialization
This diagram shows the complete initialization flow from YAML configuration to generated class.

<p align="center">
  <img src="images/01_bootstrap_phase.png" alt="Bootstrap Phase Diagram" width="800">
</p>
*Shows how RegistryFactory loads config, validates it, generates functions, and creates the final writer class.*

### 2. Runtime Phase: Using Generated Methods
This diagram shows what happens when you use the generated methods.

<p align="center">
  <img src="images/02_runtime_phase.png" alt="Runtime Phase Diagram" width="800">
</p>
*Flow when calling `upsert_dataset()` and `add_aspect()`.*

### 3. Configuration Loading Flow
<p align="center">
  <img src="images/03_config_loading.png" alt="Configuration Loading Diagram" width="800">
</p>

### 4. Method Generation Flow
<p align="center">
  <img src="images/04_method_generation.png" alt="Method Generation Diagram" width="800">
</p>

### 5. Overall System Architecture
<p align="center">
  <img src="images/05_system_architecture.png" alt="System Architecture Diagram" width="800">
</p>

## 6. Data Flow Overview
<p align="center">
  <img src="images/06_data_flow.png" alt="Data Flow Diagram" width="800">
</p>

## Step-by-Step Process

1. Configuration Files (`lineagentic_kg/config/` folder)

The system starts with YAML configuration files that define the data model.

- **`main_registry.yaml`**: The main entry point that includes all other config files
- **`entities.yaml`**: Defines what types of data objects exist (Dataset, DataFlow, CorpUser, etc.) 
- **`urn_patterns.yaml`**: Defines how to create unique identifiers (URNs) for each entity
- **`aspects.yaml`**: Defines properties and metadata for entities (e.g. datasetProperties, dataflowProperties, etc.)
- **`relationships.yaml`**: Defines how entities connect to each other (e.g. dataset -> dataflow)
- **`utilities.yaml`**: Defines helper functions for data processing (e.g. data cleaning, data transformation, etc.)

2. Registry Loading (`lineagentic_kg/registry/loaders.py`)

- Reads the main registry file
- Merges all included YAML files into one big configuration
- Handles file dependencies and deep merging

3. Validation (`lineagentic_kg/registry/validators.py`)

- Checks that all required sections exist
- Validates configuration structure
- Ensures everything is properly configured

4. Code Generation (`lineagentic_kg/registry/generators.py`)

- **URNGenerator**: Creates functions that generate unique identifiers
- **AspectProcessor**: Creates functions that process entity metadata
- **UtilityFunctionBuilder**: Creates helper functions for data cleaning/processing

5. Class Generation (`lineagentic_kg/registry/writers.py`)

- Takes all the generated functions and configuration
- Dynamically creates a Python class called `Neo4jMetadataWriter`
- This class has methods like:
  - `upsert_dataset()`, `get_dataset()`, `delete_dataset()`
  - `upsert_dataflow()`, `get_dataflow()`, `delete_dataflow()`
  - And so on for each entity type

6. Factory (`lineagentic_kg/registry/factory.py`)

- Orchestrates the entire process
- Creates the final writer class
- Provides a simple interface to use the generated code

## Example in fine grained way: How a Dataset Gets Created

1. **Config says**: "Dataset entities need platform, name, env, versionId properties"
2. **URN Pattern says**: "Dataset URNs should look like: `urn:li:dataset:(platform,name,env)`"
3. **Generator creates**: A function that builds URNs from the input data
4. **Writer gets**: A method `upsert_dataset(platform="mysql", name="users", env="PROD")`
5. **Result**: Creates a dataset node in Neo4j with the URN `urn:li:dataset:(mysql,users,PROD)`

## Key Benefits

- **No hardcoded entity/aspect/relationship types**: Add new entities/aspects/relationships by just editing YAML
- **Flexible URN patterns**: Change how IDs are generated without touching code
- **Dynamic methods**: New entity types automatically get create/read/delete methods
- **Configuration-driven**: Business logic is in config files, not code
- **Maintainable**: Changes to data model only require config updates

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "lineagentic-kg",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "LineAgentic Team <team@lineagentic.com>",
    "keywords": "neo4j, metadata, registry, lineage, catalog",
    "author": null,
    "author_email": "LineAgentic Team <team@lineagentic.com>",
    "download_url": "https://files.pythonhosted.org/packages/79/37/7011ec853f716c1120baf161ff7d1d52fef4d9652358e16a315f22fce01d/lineagentic_kg-1.6.0.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/lineagentic/lineagentic-kg/main/images/lineagentickg.jpg\" alt=\"Lineagentic Logo\" width=\"880\" height=\"500\">\n</div>\n\n# Lineagentic-KG\n\nLineagentic-KG is a knowledge graph builder library that converts YAML definitions into a fully operational and customizable knowledge graph. While one key use case is building a data catalog, the framework is generic and extensible\u2014making it easy to define entities, aspects, and relationships for any domain.\n\nWith automatic REST API and CLI tooling generation, Lineagentic-KG delivers a \u201cbatteries included\u201d experience for quickly turning YAML into production-ready knowledge graph.\n\n##Features\n\n- **Generic Metadata Model Generator**: Define flexible, extensible models with entities, aspects, and relationships.\n- **REST API Generator**: Automatically expose FastAPI endpoints from your YAML registry.\n- **CLI Generator**: Instantly get CLI commands derived from the registry.\n- **Type-Safe Code**: Ensure correctness and reliability in data handling.\n\n## Quick Start\n\n```\n1. pip install lineagentic-kg\n```\n```python\nfrom lineagentic_kg.registry.factory import RegistryFactory\n\n# Initialize the registry factory with your config file\nregistry_factory = RegistryFactory(\"lineagentic_kg/config/main_registry.yaml\")\n\n# Create a Neo4j writer instance\nneo4j_writer = registry_factory.create_writer(\n    uri=\"bolt://localhost:7687\",\n    user=\"neo4j\",\n    password=\"password\"\n)\n\n\n# Create a dataset\ndataset_urn = neo4j_writer.upsert_dataset(\n    platform=\"snowflake\",\n    name=\"customer_data\",\n    env=\"PROD\"\n)\n\n# Create a corp user\nuser_urn = neo4j_writer.upsert_corpuser(\n    username=\"john.doe\"\n)\n\n# Create a tag\ntag_urn = neo4j_writer.upsert_tag(\n    key=\"sensitive\",\n    value=\"true\"\n)\n\n# Retrieve entities\ndataset = neo4j_writer.get_dataset(dataset_urn)\nprint(f\"Retrieved dataset: {dataset}\")\n\nuser = neo4j_writer.get_corpuser(user_urn)\nprint(f\"Retrieved user: {user}\")\n\n# Clean up\nneo4j_writer.close()\nprint(\"LineAgentic KG example completed successfully!\")\n```\n\n## Restful API Generator:\n\nIf you want to auto-generated restful apis and save huge time for api creation you are in the right place. just run command:\n\n```\nmake generate-and-run-api\n```\nIn examples folder you can find a some example of how call api with curl, you can just run: ```./examples/api_calls_examples.sh```\n\n## CLI Generator:\n\nYou can also auto-generate CLI tooling. Just run ``` make generate-cli ``` . Examples of cli calls can be found and run: ```./examples/cli_calls_examples.sh```\n\n\n## How Lineagentic-KG Works \n\nLineagentic-KG is a **dynamic code generation system** that creates knowledge graph from YAML configuration files. \n\n1- Registry module:Core part of the Lineagentic-KG is Registry module which is developed based on registry design pattern. Think of it as a code factory that reads configuration which is in this case is Yaml file and builds classes automatically. \n\n2- API-Generator: This module is responsible for generating RESTful APIs. It is developed based on FastAPI framework and leverages methods generated by Registry module and builds APIs around them.\n\n3- CLI-Generator: This module is cli interface generator. It is developed based on Click framework and leverages methods generated by Registry module to build CLI commands.\n\n### Why This Architecture is Powerful\n\nThis system essentially turns YAML configuration into working knowledge graph backend at runtime! It provides:\n\n1. **Flexibility**: Change data models without code changes\n2. **Consistency**: All entities follow the same patterns\n3. **Maintainability**: Business logic is separated from implementation\n4. **Extensibility**: Easy to add new entity types and relationships\n5. **Type Safety**: Generated code ensures proper data handling\n\n\n## Detailed Flow Diagrams\n\n### 1. Bootstrap Phase: RegistryFactory Initialization\nThis diagram shows the complete initialization flow from YAML configuration to generated class.\n\n<p align=\"center\">\n  <img src=\"images/01_bootstrap_phase.png\" alt=\"Bootstrap Phase Diagram\" width=\"800\">\n</p>\n*Shows how RegistryFactory loads config, validates it, generates functions, and creates the final writer class.*\n\n### 2. Runtime Phase: Using Generated Methods\nThis diagram shows what happens when you use the generated methods.\n\n<p align=\"center\">\n  <img src=\"images/02_runtime_phase.png\" alt=\"Runtime Phase Diagram\" width=\"800\">\n</p>\n*Flow when calling `upsert_dataset()` and `add_aspect()`.*\n\n### 3. Configuration Loading Flow\n<p align=\"center\">\n  <img src=\"images/03_config_loading.png\" alt=\"Configuration Loading Diagram\" width=\"800\">\n</p>\n\n### 4. Method Generation Flow\n<p align=\"center\">\n  <img src=\"images/04_method_generation.png\" alt=\"Method Generation Diagram\" width=\"800\">\n</p>\n\n### 5. Overall System Architecture\n<p align=\"center\">\n  <img src=\"images/05_system_architecture.png\" alt=\"System Architecture Diagram\" width=\"800\">\n</p>\n\n## 6. Data Flow Overview\n<p align=\"center\">\n  <img src=\"images/06_data_flow.png\" alt=\"Data Flow Diagram\" width=\"800\">\n</p>\n\n## Step-by-Step Process\n\n1. Configuration Files (`lineagentic_kg/config/` folder)\n\nThe system starts with YAML configuration files that define the data model.\n\n- **`main_registry.yaml`**: The main entry point that includes all other config files\n- **`entities.yaml`**: Defines what types of data objects exist (Dataset, DataFlow, CorpUser, etc.) \n- **`urn_patterns.yaml`**: Defines how to create unique identifiers (URNs) for each entity\n- **`aspects.yaml`**: Defines properties and metadata for entities (e.g. datasetProperties, dataflowProperties, etc.)\n- **`relationships.yaml`**: Defines how entities connect to each other (e.g. dataset -> dataflow)\n- **`utilities.yaml`**: Defines helper functions for data processing (e.g. data cleaning, data transformation, etc.)\n\n2. Registry Loading (`lineagentic_kg/registry/loaders.py`)\n\n- Reads the main registry file\n- Merges all included YAML files into one big configuration\n- Handles file dependencies and deep merging\n\n3. Validation (`lineagentic_kg/registry/validators.py`)\n\n- Checks that all required sections exist\n- Validates configuration structure\n- Ensures everything is properly configured\n\n4. Code Generation (`lineagentic_kg/registry/generators.py`)\n\n- **URNGenerator**: Creates functions that generate unique identifiers\n- **AspectProcessor**: Creates functions that process entity metadata\n- **UtilityFunctionBuilder**: Creates helper functions for data cleaning/processing\n\n5. Class Generation (`lineagentic_kg/registry/writers.py`)\n\n- Takes all the generated functions and configuration\n- Dynamically creates a Python class called `Neo4jMetadataWriter`\n- This class has methods like:\n  - `upsert_dataset()`, `get_dataset()`, `delete_dataset()`\n  - `upsert_dataflow()`, `get_dataflow()`, `delete_dataflow()`\n  - And so on for each entity type\n\n6. Factory (`lineagentic_kg/registry/factory.py`)\n\n- Orchestrates the entire process\n- Creates the final writer class\n- Provides a simple interface to use the generated code\n\n## Example in fine grained way: How a Dataset Gets Created\n\n1. **Config says**: \"Dataset entities need platform, name, env, versionId properties\"\n2. **URN Pattern says**: \"Dataset URNs should look like: `urn:li:dataset:(platform,name,env)`\"\n3. **Generator creates**: A function that builds URNs from the input data\n4. **Writer gets**: A method `upsert_dataset(platform=\"mysql\", name=\"users\", env=\"PROD\")`\n5. **Result**: Creates a dataset node in Neo4j with the URN `urn:li:dataset:(mysql,users,PROD)`\n\n## Key Benefits\n\n- **No hardcoded entity/aspect/relationship types**: Add new entities/aspects/relationships by just editing YAML\n- **Flexible URN patterns**: Change how IDs are generated without touching code\n- **Dynamic methods**: New entity types automatically get create/read/delete methods\n- **Configuration-driven**: Business logic is in config files, not code\n- **Maintainable**: Changes to data model only require config updates\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Turn YAML into a Knowledge Graph in minutes",
    "version": "1.6.0",
    "project_urls": {
        "Documentation": "https://github.com/lineagentic/lineagentic-kg#readme",
        "Homepage": "https://lineagentic.com",
        "Issues": "https://github.com/lineagentic/lineagentic-kg/issues",
        "Repository": "https://github.com/lineagentic/lineagentic-kg"
    },
    "split_keywords": [
        "neo4j",
        " metadata",
        " registry",
        " lineage",
        " catalog"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "dc24e31c109830c5f8d318ea9624135f2526f56f8e6aca0724126e3d308abfc9",
                "md5": "1e571ae9d1a8ea505497b1cdc26be5c9",
                "sha256": "0035912726babf628fbda51d1a8426712d176daa610c3ec554b0554dbf000fca"
            },
            "downloads": -1,
            "filename": "lineagentic_kg-1.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1e571ae9d1a8ea505497b1cdc26be5c9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 59951,
            "upload_time": "2025-08-30T08:59:45",
            "upload_time_iso_8601": "2025-08-30T08:59:45.392057Z",
            "url": "https://files.pythonhosted.org/packages/dc/24/e31c109830c5f8d318ea9624135f2526f56f8e6aca0724126e3d308abfc9/lineagentic_kg-1.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "79377011ec853f716c1120baf161ff7d1d52fef4d9652358e16a315f22fce01d",
                "md5": "91226fdbbe9142bfc40e239c3cd997fe",
                "sha256": "b98b5195ee710f217a111c16aa00635928b3b243555501afda15e388b3216c3a"
            },
            "downloads": -1,
            "filename": "lineagentic_kg-1.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "91226fdbbe9142bfc40e239c3cd997fe",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 1370573,
            "upload_time": "2025-08-30T08:59:47",
            "upload_time_iso_8601": "2025-08-30T08:59:47.250056Z",
            "url": "https://files.pythonhosted.org/packages/79/37/7011ec853f716c1120baf161ff7d1d52fef4d9652358e16a315f22fce01d/lineagentic_kg-1.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-30 08:59:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lineagentic",
    "github_project": "lineagentic-kg#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "lineagentic-kg"
}
        
Elapsed time: 1.07635s