lineagentic-catalog


Namelineagentic-catalog JSON
Version 0.6.0 PyPI version JSON
download
home_pageNone
SummaryA dynamic registry system for Neo4j metadata management
upload_time2025-08-22 10:36:44
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords neo4j metadata registry lineage catalog
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  <img src="https://raw.githubusercontent.com/lineagentic/lineagentic-catalog/main/images/lineagenticcatalog.jpg" alt="Lineagentic Logo" width="880" height="300">
</div>

# Lineagentic-Catalog

Lineagentic-Catalog is more than just a data catalog—it's a graph-native metadata platform that turns simple YAML definitions into a fully operational, customizable, and governed data model. With extendability built in, automatic generation of REST APIs, automatic generation of CLI tooling it delivers a "batteries included" experience for modern data teams.

## Features
- **Generic Metadata Model Generator**: Highly customizable metadata model that can be extended to support new entities, aspects, and relationships. 
- **REST APIs Generator**: Generate FastAPI endpoints from registry system.
- **CLI Tooling Generator**: Generate CLI commands from registry system.
- **Type Safety**: Generated code ensures proper data handling


## Quick Start

## You can use Lineagentic-Catalog as a library

Lineagentic-Catalog is data catalogue as code factory. Build a Yaml file in "lineagentic_catalog/config" folder for your desired architecture and taxonomy and Lineagentic-Catalog will generate all the code for you. You will get a python library with all the methods required for your data catalouge which is graph native

```
1. pip install lineagentic-catalog
   for dev: clone the repo and run: uv sync. 
2. The package comes with default YAML configuration files in lineagentic_catalog/config/
```
3. Run the example:
```python
from lineagentic_catalog.registry.factory import RegistryFactory
from lineagentic_catalog.utils import get_logger, setup_logging, log_function_call, log_function_result

# Setup logging with custom configuration
setup_logging(
    default_level="INFO",
    log_file="logs/lineagentic_catalog.log"
)

# Get logger for this application
logger = get_logger("lineagentic.example")

logger.info("Starting LineAgentic Catalog example", config_path="lineagentic_catalog/config/main_registry.yaml")

try:
    # 1. Initialize the registry factory with your config file
    log_function_call(logger, "RegistryFactory initialization", config_path="lineagentic_catalog/config/main_registry.yaml")
    registry_factory = RegistryFactory("lineagentic_catalog/config/main_registry.yaml")
    log_function_result(logger, "RegistryFactory initialization", 
                       factory_created=True, registry_path="lineagentic_catalog/config/main_registry.yaml")
    
    # 2. Create a Neo4j writer instance
    logger.info("Creating Neo4j writer instance", uri="bolt://localhost:7687")
    neo4j_writer = registry_factory.create_writer(
        uri="bolt://localhost:7687",
        user="neo4j",
        password="password"
    )
    logger.info("Neo4j writer instance created successfully")
    
    # 3. Create entities using the dynamically generated methods
    # The methods are generated based on your YAML configuration
    
    # Create a dataset
    logger.info("Creating dataset", platform="snowflake", name="customer_data", env="PROD")
    dataset_urn = neo4j_writer.upsert_dataset(
        platform="snowflake",
        name="customer_data",
        env="PROD"
    )
    logger.info("Dataset created successfully", dataset_urn=dataset_urn)
    
    # Create a data flow
    logger.info("Creating data flow", 
                platform="airflow", 
                flow_id="customer_etl", 
                namespace="data_engineering")
    flow_urn = neo4j_writer.upsert_dataflowinfo_aspect(
        payload={
            "name": "Customer ETL Pipeline",
            "namespace": "data_engineering",
            "description": "Customer data ETL pipeline"
        },
        platform="airflow",
        flow_id="customer_etl",
        env="PROD"
    )
    logger.info("Data flow created successfully", flow_urn=flow_urn)
    
    # Create a data job
    logger.info("Creating data job", flow_urn=flow_urn, job_name="transform_customer_data")
    job_urn = neo4j_writer.upsert_datajobinfo_aspect(
        payload={
            "name": "Transform Customer Data",
            "namespace": "data_engineering",
            "description": "Transform customer data job"
        },
        flow_urn=flow_urn,
        job_name="transform_customer_data"
    )
    logger.info("Data job created successfully", job_urn=job_urn)
    
    # 4. Retrieve entities
    logger.info("Retrieving dataset", dataset_urn=dataset_urn)
    dataset = neo4j_writer.get_dataset(dataset_urn)
    logger.info("Dataset retrieved successfully", dataset_urn=dataset_urn, dataset_size=len(str(dataset)))
    print(f"Retrieved dataset: {dataset}")
    
    # 5. Clean up
    logger.info("Closing Neo4j writer connection")
    neo4j_writer.close()
    logger.info("Neo4j writer connection closed successfully")
    
    logger.info("LineAgentic Catalog example completed successfully", 
                entities_created=3, 
                dataset_urn=dataset_urn,
                flow_urn=flow_urn,
                job_urn=job_urn)

except Exception as e:
    logger.error("Error in LineAgentic Catalog example", 
                error_type=type(e).__name__,
                error_message=str(e))
    raise
```

The library automatically generates methods like `upsert_dataset()`, `upsert_dataflow_aspect()`, `upsert_datajob()`, etc based on your YAML configuration files. Each entity type defined in your `entities.yaml` and `aspects.yaml` gets its own set of CRUD operations.

## Out of the box Restful apis:

You can also use auto-generated restful apis out of the box to save huge time. Just run following commands and you will have a FastAPI server running on your local machine with all endpoints for your graph data model.

```
1. pip install lineagentic-catalog
   for dev: clone the repo and run: uv sync.
2. The package comes with default YAML configuration files in lineagentic_catalog/config/
3. generate-api # to generate FastAPI endpoints.
4. cd generated_api && pip install -r requirements.txt && python main.py # to run the API server
```

After generating the API, you can use curl commands to interact with your metadata:

```bash
# 1. Create a dataset
curl -X POST "http://localhost:8000/api/v1/entities/Dataset" \
  -H "Content-Type: application/json" \
  -d '{
    "platform": "snowflake",
    "name": "customer_data",
    "env": "PROD"
  }'

# 2. Get a dataset by URN
curl -X GET "http://localhost:8000/api/v1/entities/Dataset/urn:li:dataset:(urn:li:dataPlatform:snowflake,customer_data,PROD)"

# 3. View API documentation
# Open http://localhost:8000/docs in your browser for interactive API documentation

## Out of the box CLI tooling:

In Lineagentic-Catalog, you can also auto-generate CLI tooling. Just run following commands and you will have a CLI tooling running on your local machine with all commands for your graph data model.

```
1. pip install lineagentic-catalog
   for dev: clone the repo and run: uv sync . # to install dependencies.
2. The package comes with default YAML configuration files in lineagentic_catalog/config/
3. generate-cli # to generate CLI commands.
4. cd generated_cli && pip install -r requirements.txt # to install CLI dependencies
```

After generating the CLI, you can use command-line tools to manage your metadata:

```bash
# 1. Create a dataset
registry-cli upsert-dataset --platform "snowflake" --name "customer_data" --env "PROD"

# 2. Get dataset information
registry-cli get-dataset "urn:li:dataset:(urn:li:dataPlatform:snowflake,customer_data,PROD)" --output table

# 3. Add ownership aspect to the dataset
registry-cli upsert-ownership-aspect --entity-label "Dataset" --entity-urn "urn:li:dataset:(urn:li:dataPlatform:snowflake,customer_data,PROD)" --owners '[{"owner": "urn:li:corpuser:john.doe", "type": "DATAOWNER"}]'

# 4. Health check
registry-cli health
```


## How Lineagentic-Catalog Works - Detailed Flow Diagrams

Lineagentic-Catalog is a **dynamic code generation system** that creates graph database writers from YAML configuration files. 

1- Registry module:Core part of the Lineagentic-Catalog is Registry module which is developed based on registry design pattern. Think of it as a code factory that reads configuration which is in this case is Yaml file for your graph data model and builds Python classes automatically. 

2- API-Generator: This module is responsible for generating RESTful APIs for your graph data model. It is developed based on FastAPI framework and leverages methods generated by Registry module.

3- CLI-Generator: This module is responsible for generating CLI commands for your graph data model. It is developed based on Click framework and leverages methods generated by Registry module to build CLI commands.

### Why This Architecture is Powerful

This system essentially turns YAML configuration into working Python code at runtime! It provides:

1. **Flexibility**: Change data models without code changes
2. **Consistency**: All entities follow the same patterns
3. **Maintainability**: Business logic is separated from implementation
4. **Extensibility**: Easy to add new entity types and relationships
5. **Type Safety**: Generated code ensures proper data handling

The registry system transforms declarative configuration into executable code, making it easy to adapt to changing business requirements while maintaining code quality and consistency.




## Detailed Flow Diagrams

### 1. Bootstrap Phase: RegistryFactory Initialization
This diagram shows the complete initialization flow from YAML configuration to generated class.

<p align="center">
  <img src="images/01_bootstrap_phase.png" alt="Bootstrap Phase Diagram" width="800">
</p>
*Shows how RegistryFactory loads config, validates it, generates functions, and creates the final writer class.*

### 2. Runtime Phase: Using Generated Methods
This diagram shows what happens when you use the generated methods.

<p align="center">
  <img src="images/02_runtime_phase.png" alt="Runtime Phase Diagram" width="800">
</p>
*Flow when calling `upsert_dataset()` and `add_aspect()`.*

### 3. Configuration Loading Flow
<p align="center">
  <img src="images/03_config_loading.png" alt="Configuration Loading Diagram" width="800">
</p>

### 4. Method Generation Flow
<p align="center">
  <img src="images/04_method_generation.png" alt="Method Generation Diagram" width="800">
</p>

### 5. Overall System Architecture
<p align="center">
  <img src="images/05_system_architecture.png" alt="System Architecture Diagram" width="800">
</p>

## 6. Data Flow Overview
<p align="center">
  <img src="images/06_data_flow.png" alt="Data Flow Diagram" width="800">
</p>

## Step-by-Step Process

1. Configuration Files (`lineagentic_catalog/config/` folder)

The system starts with YAML configuration files that define the data model.

- **`main_registry.yaml`**: The main entry point that includes all other config files
- **`entities.yaml`**: Defines what types of data objects exist (Dataset, DataFlow, CorpUser, etc.) 
- **`urn_patterns.yaml`**: Defines how to create unique identifiers (URNs) for each entity
- **`aspects.yaml`**: Defines properties and metadata for entities (e.g. datasetProperties, dataflowProperties, etc.)
- **`relationships.yaml`**: Defines how entities connect to each other (e.g. dataset -> dataflow)
- **`utilities.yaml`**: Defines helper functions for data processing (e.g. data cleaning, data transformation, etc.)

2. Registry Loading (`lineagentic_catalog/registry/loaders.py`)

- Reads the main registry file
- Merges all included YAML files into one big configuration
- Handles file dependencies and deep merging

3. Validation (`lineagentic_catalog/registry/validators.py`)

- Checks that all required sections exist
- Validates configuration structure
- Ensures everything is properly configured

4. Code Generation (`lineagentic_catalog/registry/generators.py`)

- **URNGenerator**: Creates functions that generate unique identifiers
- **AspectProcessor**: Creates functions that process entity metadata
- **UtilityFunctionBuilder**: Creates helper functions for data cleaning/processing

5. Class Generation (`lineagentic_catalog/registry/writers.py`)

- Takes all the generated functions and configuration
- Dynamically creates a Python class called `Neo4jMetadataWriter`
- This class has methods like:
  - `upsert_dataset()`, `get_dataset()`, `delete_dataset()`
  - `upsert_dataflow()`, `get_dataflow()`, `delete_dataflow()`
  - And so on for each entity type

6. Factory (`lineagentic_catalog/registry/factory.py`)

- Orchestrates the entire process
- Creates the final writer class
- Provides a simple interface to use the generated code

## Example in fine grained way: How a Dataset Gets Created

1. **Config says**: "Dataset entities need platform, name, env, versionId properties"
2. **URN Pattern says**: "Dataset URNs should look like: `urn:li:dataset:(platform,name,env)`"
3. **Generator creates**: A function that builds URNs from the input data
4. **Writer gets**: A method `upsert_dataset(platform="mysql", name="users", env="PROD")`
5. **Result**: Creates a dataset node in Neo4j with the URN `urn:li:dataset:(mysql,users,PROD)`

## Key Benefits

- **No hardcoded entity types**: Add new entities by just editing YAML
- **Flexible URN patterns**: Change how IDs are generated without touching code
- **Dynamic methods**: New entity types automatically get create/read/delete methods
- **Configuration-driven**: Business logic is in config files, not code
- **Maintainable**: Changes to data model only require config updates

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "lineagentic-catalog",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "LineAgentic Team <team@lineagentic.com>",
    "keywords": "neo4j, metadata, registry, lineage, catalog",
    "author": null,
    "author_email": "LineAgentic Team <team@lineagentic.com>",
    "download_url": "https://files.pythonhosted.org/packages/e1/16/fc1acaaddca2456f558975af5cd5fb8a908e0c7c42929f9daee989d80241/lineagentic_catalog-0.6.0.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/lineagentic/lineagentic-catalog/main/images/lineagenticcatalog.jpg\" alt=\"Lineagentic Logo\" width=\"880\" height=\"300\">\n</div>\n\n# Lineagentic-Catalog\n\nLineagentic-Catalog is more than just a data catalog\u2014it's a graph-native metadata platform that turns simple YAML definitions into a fully operational, customizable, and governed data model. With extendability built in, automatic generation of REST APIs, automatic generation of CLI tooling it delivers a \"batteries included\" experience for modern data teams.\n\n## Features\n- **Generic Metadata Model Generator**: Highly customizable metadata model that can be extended to support new entities, aspects, and relationships. \n- **REST APIs Generator**: Generate FastAPI endpoints from registry system.\n- **CLI Tooling Generator**: Generate CLI commands from registry system.\n- **Type Safety**: Generated code ensures proper data handling\n\n\n## Quick Start\n\n## You can use Lineagentic-Catalog as a library\n\nLineagentic-Catalog is data catalogue as code factory. Build a Yaml file in \"lineagentic_catalog/config\" folder for your desired architecture and taxonomy and Lineagentic-Catalog will generate all the code for you. You will get a python library with all the methods required for your data catalouge which is graph native\n\n```\n1. pip install lineagentic-catalog\n   for dev: clone the repo and run: uv sync. \n2. The package comes with default YAML configuration files in lineagentic_catalog/config/\n```\n3. Run the example:\n```python\nfrom lineagentic_catalog.registry.factory import RegistryFactory\nfrom lineagentic_catalog.utils import get_logger, setup_logging, log_function_call, log_function_result\n\n# Setup logging with custom configuration\nsetup_logging(\n    default_level=\"INFO\",\n    log_file=\"logs/lineagentic_catalog.log\"\n)\n\n# Get logger for this application\nlogger = get_logger(\"lineagentic.example\")\n\nlogger.info(\"Starting LineAgentic Catalog example\", config_path=\"lineagentic_catalog/config/main_registry.yaml\")\n\ntry:\n    # 1. Initialize the registry factory with your config file\n    log_function_call(logger, \"RegistryFactory initialization\", config_path=\"lineagentic_catalog/config/main_registry.yaml\")\n    registry_factory = RegistryFactory(\"lineagentic_catalog/config/main_registry.yaml\")\n    log_function_result(logger, \"RegistryFactory initialization\", \n                       factory_created=True, registry_path=\"lineagentic_catalog/config/main_registry.yaml\")\n    \n    # 2. Create a Neo4j writer instance\n    logger.info(\"Creating Neo4j writer instance\", uri=\"bolt://localhost:7687\")\n    neo4j_writer = registry_factory.create_writer(\n        uri=\"bolt://localhost:7687\",\n        user=\"neo4j\",\n        password=\"password\"\n    )\n    logger.info(\"Neo4j writer instance created successfully\")\n    \n    # 3. Create entities using the dynamically generated methods\n    # The methods are generated based on your YAML configuration\n    \n    # Create a dataset\n    logger.info(\"Creating dataset\", platform=\"snowflake\", name=\"customer_data\", env=\"PROD\")\n    dataset_urn = neo4j_writer.upsert_dataset(\n        platform=\"snowflake\",\n        name=\"customer_data\",\n        env=\"PROD\"\n    )\n    logger.info(\"Dataset created successfully\", dataset_urn=dataset_urn)\n    \n    # Create a data flow\n    logger.info(\"Creating data flow\", \n                platform=\"airflow\", \n                flow_id=\"customer_etl\", \n                namespace=\"data_engineering\")\n    flow_urn = neo4j_writer.upsert_dataflowinfo_aspect(\n        payload={\n            \"name\": \"Customer ETL Pipeline\",\n            \"namespace\": \"data_engineering\",\n            \"description\": \"Customer data ETL pipeline\"\n        },\n        platform=\"airflow\",\n        flow_id=\"customer_etl\",\n        env=\"PROD\"\n    )\n    logger.info(\"Data flow created successfully\", flow_urn=flow_urn)\n    \n    # Create a data job\n    logger.info(\"Creating data job\", flow_urn=flow_urn, job_name=\"transform_customer_data\")\n    job_urn = neo4j_writer.upsert_datajobinfo_aspect(\n        payload={\n            \"name\": \"Transform Customer Data\",\n            \"namespace\": \"data_engineering\",\n            \"description\": \"Transform customer data job\"\n        },\n        flow_urn=flow_urn,\n        job_name=\"transform_customer_data\"\n    )\n    logger.info(\"Data job created successfully\", job_urn=job_urn)\n    \n    # 4. Retrieve entities\n    logger.info(\"Retrieving dataset\", dataset_urn=dataset_urn)\n    dataset = neo4j_writer.get_dataset(dataset_urn)\n    logger.info(\"Dataset retrieved successfully\", dataset_urn=dataset_urn, dataset_size=len(str(dataset)))\n    print(f\"Retrieved dataset: {dataset}\")\n    \n    # 5. Clean up\n    logger.info(\"Closing Neo4j writer connection\")\n    neo4j_writer.close()\n    logger.info(\"Neo4j writer connection closed successfully\")\n    \n    logger.info(\"LineAgentic Catalog example completed successfully\", \n                entities_created=3, \n                dataset_urn=dataset_urn,\n                flow_urn=flow_urn,\n                job_urn=job_urn)\n\nexcept Exception as e:\n    logger.error(\"Error in LineAgentic Catalog example\", \n                error_type=type(e).__name__,\n                error_message=str(e))\n    raise\n```\n\nThe library automatically generates methods like `upsert_dataset()`, `upsert_dataflow_aspect()`, `upsert_datajob()`, etc based on your YAML configuration files. Each entity type defined in your `entities.yaml` and `aspects.yaml` gets its own set of CRUD operations.\n\n## Out of the box Restful apis:\n\nYou can also use auto-generated restful apis out of the box to save huge time. Just run following commands and you will have a FastAPI server running on your local machine with all endpoints for your graph data model.\n\n```\n1. pip install lineagentic-catalog\n   for dev: clone the repo and run: uv sync.\n2. The package comes with default YAML configuration files in lineagentic_catalog/config/\n3. generate-api # to generate FastAPI endpoints.\n4. cd generated_api && pip install -r requirements.txt && python main.py # to run the API server\n```\n\nAfter generating the API, you can use curl commands to interact with your metadata:\n\n```bash\n# 1. Create a dataset\ncurl -X POST \"http://localhost:8000/api/v1/entities/Dataset\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"platform\": \"snowflake\",\n    \"name\": \"customer_data\",\n    \"env\": \"PROD\"\n  }'\n\n# 2. Get a dataset by URN\ncurl -X GET \"http://localhost:8000/api/v1/entities/Dataset/urn:li:dataset:(urn:li:dataPlatform:snowflake,customer_data,PROD)\"\n\n# 3. View API documentation\n# Open http://localhost:8000/docs in your browser for interactive API documentation\n\n## Out of the box CLI tooling:\n\nIn Lineagentic-Catalog, you can also auto-generate CLI tooling. Just run following commands and you will have a CLI tooling running on your local machine with all commands for your graph data model.\n\n```\n1. pip install lineagentic-catalog\n   for dev: clone the repo and run: uv sync . # to install dependencies.\n2. The package comes with default YAML configuration files in lineagentic_catalog/config/\n3. generate-cli # to generate CLI commands.\n4. cd generated_cli && pip install -r requirements.txt # to install CLI dependencies\n```\n\nAfter generating the CLI, you can use command-line tools to manage your metadata:\n\n```bash\n# 1. Create a dataset\nregistry-cli upsert-dataset --platform \"snowflake\" --name \"customer_data\" --env \"PROD\"\n\n# 2. Get dataset information\nregistry-cli get-dataset \"urn:li:dataset:(urn:li:dataPlatform:snowflake,customer_data,PROD)\" --output table\n\n# 3. Add ownership aspect to the dataset\nregistry-cli upsert-ownership-aspect --entity-label \"Dataset\" --entity-urn \"urn:li:dataset:(urn:li:dataPlatform:snowflake,customer_data,PROD)\" --owners '[{\"owner\": \"urn:li:corpuser:john.doe\", \"type\": \"DATAOWNER\"}]'\n\n# 4. Health check\nregistry-cli health\n```\n\n\n## How Lineagentic-Catalog Works - Detailed Flow Diagrams\n\nLineagentic-Catalog is a **dynamic code generation system** that creates graph database writers from YAML configuration files. \n\n1- Registry module:Core part of the Lineagentic-Catalog is Registry module which is developed based on registry design pattern. Think of it as a code factory that reads configuration which is in this case is Yaml file for your graph data model and builds Python classes automatically. \n\n2- API-Generator: This module is responsible for generating RESTful APIs for your graph data model. It is developed based on FastAPI framework and leverages methods generated by Registry module.\n\n3- CLI-Generator: This module is responsible for generating CLI commands for your graph data model. It is developed based on Click framework and leverages methods generated by Registry module to build CLI commands.\n\n### Why This Architecture is Powerful\n\nThis system essentially turns YAML configuration into working Python code at runtime! It provides:\n\n1. **Flexibility**: Change data models without code changes\n2. **Consistency**: All entities follow the same patterns\n3. **Maintainability**: Business logic is separated from implementation\n4. **Extensibility**: Easy to add new entity types and relationships\n5. **Type Safety**: Generated code ensures proper data handling\n\nThe registry system transforms declarative configuration into executable code, making it easy to adapt to changing business requirements while maintaining code quality and consistency.\n\n\n\n\n## Detailed Flow Diagrams\n\n### 1. Bootstrap Phase: RegistryFactory Initialization\nThis diagram shows the complete initialization flow from YAML configuration to generated class.\n\n<p align=\"center\">\n  <img src=\"images/01_bootstrap_phase.png\" alt=\"Bootstrap Phase Diagram\" width=\"800\">\n</p>\n*Shows how RegistryFactory loads config, validates it, generates functions, and creates the final writer class.*\n\n### 2. Runtime Phase: Using Generated Methods\nThis diagram shows what happens when you use the generated methods.\n\n<p align=\"center\">\n  <img src=\"images/02_runtime_phase.png\" alt=\"Runtime Phase Diagram\" width=\"800\">\n</p>\n*Flow when calling `upsert_dataset()` and `add_aspect()`.*\n\n### 3. Configuration Loading Flow\n<p align=\"center\">\n  <img src=\"images/03_config_loading.png\" alt=\"Configuration Loading Diagram\" width=\"800\">\n</p>\n\n### 4. Method Generation Flow\n<p align=\"center\">\n  <img src=\"images/04_method_generation.png\" alt=\"Method Generation Diagram\" width=\"800\">\n</p>\n\n### 5. Overall System Architecture\n<p align=\"center\">\n  <img src=\"images/05_system_architecture.png\" alt=\"System Architecture Diagram\" width=\"800\">\n</p>\n\n## 6. Data Flow Overview\n<p align=\"center\">\n  <img src=\"images/06_data_flow.png\" alt=\"Data Flow Diagram\" width=\"800\">\n</p>\n\n## Step-by-Step Process\n\n1. Configuration Files (`lineagentic_catalog/config/` folder)\n\nThe system starts with YAML configuration files that define the data model.\n\n- **`main_registry.yaml`**: The main entry point that includes all other config files\n- **`entities.yaml`**: Defines what types of data objects exist (Dataset, DataFlow, CorpUser, etc.) \n- **`urn_patterns.yaml`**: Defines how to create unique identifiers (URNs) for each entity\n- **`aspects.yaml`**: Defines properties and metadata for entities (e.g. datasetProperties, dataflowProperties, etc.)\n- **`relationships.yaml`**: Defines how entities connect to each other (e.g. dataset -> dataflow)\n- **`utilities.yaml`**: Defines helper functions for data processing (e.g. data cleaning, data transformation, etc.)\n\n2. Registry Loading (`lineagentic_catalog/registry/loaders.py`)\n\n- Reads the main registry file\n- Merges all included YAML files into one big configuration\n- Handles file dependencies and deep merging\n\n3. Validation (`lineagentic_catalog/registry/validators.py`)\n\n- Checks that all required sections exist\n- Validates configuration structure\n- Ensures everything is properly configured\n\n4. Code Generation (`lineagentic_catalog/registry/generators.py`)\n\n- **URNGenerator**: Creates functions that generate unique identifiers\n- **AspectProcessor**: Creates functions that process entity metadata\n- **UtilityFunctionBuilder**: Creates helper functions for data cleaning/processing\n\n5. Class Generation (`lineagentic_catalog/registry/writers.py`)\n\n- Takes all the generated functions and configuration\n- Dynamically creates a Python class called `Neo4jMetadataWriter`\n- This class has methods like:\n  - `upsert_dataset()`, `get_dataset()`, `delete_dataset()`\n  - `upsert_dataflow()`, `get_dataflow()`, `delete_dataflow()`\n  - And so on for each entity type\n\n6. Factory (`lineagentic_catalog/registry/factory.py`)\n\n- Orchestrates the entire process\n- Creates the final writer class\n- Provides a simple interface to use the generated code\n\n## Example in fine grained way: How a Dataset Gets Created\n\n1. **Config says**: \"Dataset entities need platform, name, env, versionId properties\"\n2. **URN Pattern says**: \"Dataset URNs should look like: `urn:li:dataset:(platform,name,env)`\"\n3. **Generator creates**: A function that builds URNs from the input data\n4. **Writer gets**: A method `upsert_dataset(platform=\"mysql\", name=\"users\", env=\"PROD\")`\n5. **Result**: Creates a dataset node in Neo4j with the URN `urn:li:dataset:(mysql,users,PROD)`\n\n## Key Benefits\n\n- **No hardcoded entity types**: Add new entities by just editing YAML\n- **Flexible URN patterns**: Change how IDs are generated without touching code\n- **Dynamic methods**: New entity types automatically get create/read/delete methods\n- **Configuration-driven**: Business logic is in config files, not code\n- **Maintainable**: Changes to data model only require config updates\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A dynamic registry system for Neo4j metadata management",
    "version": "0.6.0",
    "project_urls": {
        "Documentation": "https://github.com/lineagentic/lineagentic-catalog#readme",
        "Homepage": "https://lineagentic.com",
        "Issues": "https://github.com/lineagentic/lineagentic-catalog/issues",
        "Repository": "https://github.com/lineagentic/lineagentic-catalog"
    },
    "split_keywords": [
        "neo4j",
        " metadata",
        " registry",
        " lineage",
        " catalog"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "11376c488f5c10ffe83b1c412b158291146cef46343db456887be5f0bb7d1fff",
                "md5": "60f77452d52ffd121b4391d021465672",
                "sha256": "f06720e0dd9b46510644cff193593e6dc2084d735f2e3745673102d32a6d1cbc"
            },
            "downloads": -1,
            "filename": "lineagentic_catalog-0.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "60f77452d52ffd121b4391d021465672",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 59415,
            "upload_time": "2025-08-22T10:36:42",
            "upload_time_iso_8601": "2025-08-22T10:36:42.792474Z",
            "url": "https://files.pythonhosted.org/packages/11/37/6c488f5c10ffe83b1c412b158291146cef46343db456887be5f0bb7d1fff/lineagentic_catalog-0.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e116fc1acaaddca2456f558975af5cd5fb8a908e0c7c42929f9daee989d80241",
                "md5": "922a4a7c691a735ba50e42c512976fd4",
                "sha256": "f078a0f5bae7dace20d1dadd54ddcb42bd2cd538e717fc2cb0e173b1792bf12c"
            },
            "downloads": -1,
            "filename": "lineagentic_catalog-0.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "922a4a7c691a735ba50e42c512976fd4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 1384570,
            "upload_time": "2025-08-22T10:36:44",
            "upload_time_iso_8601": "2025-08-22T10:36:44.386912Z",
            "url": "https://files.pythonhosted.org/packages/e1/16/fc1acaaddca2456f558975af5cd5fb8a908e0c7c42929f9daee989d80241/lineagentic_catalog-0.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-22 10:36:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lineagentic",
    "github_project": "lineagentic-catalog#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "lineagentic-catalog"
}
        
Elapsed time: 1.24954s