lineagentic-flow


Namelineagentic-flow JSON
Version 1.0.2 PyPI version JSON
download
home_pageNone
SummaryLineagentic-flow is agentic ai approach for building data lineage across diverse data processing scripts including python, sql, java, airflow, spark, etc.
upload_time2025-08-18 16:30:41
maintainerNone
docs_urlNone
authorNone
requires_python>=3.13
licenseNone
keywords data-lineage ai-agents data-processing lineage-tracking
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
<div align="center">
  <img src="https://raw.githubusercontent.com/lineagentic/lineagentic-flow/main/images/logo.jpg" alt="Lineagentic Logo" width="880" height="300">
</div>

## Lineagentic-flow

Lineagentic-flow is an agentic ai solution for building end-to-end data lineage across diverse types of data processing scripts across different platforms. It is designed to be modular and customizable, and can be extended to support new data processing script types. In a nutshell this is what it does:

```
┌─────────────┐    ┌───────────────────────────────┐    ┌────────────---───┐
│ source-code │───▶│   lineagentic-flow-algorithm  │───▶│  lineage output  │
│             │    │                               │    │                  │
└─────────────┘    └───────────────────────────────┘    └──────────────---─┘
```
### Features

- Plugin based design pattern, simple to extend and customize.
- Command line interface for quick analysis.
- Support for multiple data processing script types (SQL, Python, Airflow Spark, etc.)
- Simple demo server to run locally and in huggingface spaces.

## Quick Start

### Installation

Install the package from PyPI:

```bash
pip install lineagentic-flow
```

### Basic Usage

```python
import asyncio
from lf_algorithm.framework_agent import FrameworkAgent
import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

async def main():
    # Create an agent for SQL lineage extraction
    agent = FrameworkAgent(
        agent_name="sql-lineage-agent",
        model_name="gpt-4o-mini",
        source_code="SELECT id, name FROM users WHERE active = true"
    )
    
    # Run the agent to extract lineage
    result = await agent.run_agent()
    print(result)

# Run the example
asyncio.run(main())
```
### Supported Agents

Following table shows the current development agents in Lineagentic-flow algorithm:


| **Agent Name**       | **Done** | **Under Development** | **In Backlog** | **Comment**                          |
|----------------------|:--------:|:----------------------:|:--------------:|--------------------------------------|
| python-lineage_agent    | ✓        |                        |                |       |
| airflow_lineage_agent       |    ✓        |                      |                |             |
| java_lineage_agent      |       ✓     |                        |              |           |
| spark_lineage_agent        |  ✓          |                       |                |       |
| sql_lineage_agent      | ✓        |                        |                |            |
| flink_lineage_agent         |          |                        | ✓              |            |
| beam_lineage_agent         |          |                        | ✓              |            |
| shell_lineage_agent         |          |                        | ✓              |            |
| scala_lineage_agent         |          |                        | ✓              |            |
| dbt_lineage_agent         |          |                        | ✓              |            |


### Environment Variables

Set your API keys:

```bash
export OPENAI_API_KEY="your-openai-api-key"
export HF_TOKEN="your-huggingface-token"  # Optional
```

## What are the components of Lineagentic-flow?

- Algorithm module: This is the brain of the Lineagentic-flow. It contains agents, which are implemented as plugins and acting as chain of thought process to extract lineage from different types of data processing scripts. The module is built using a plugin-based design pattern, allowing you to easily develop and integrate your own custom agents.

- CLI module: is for command line around algorithm API and connect to unified service layer

- Demo module: is for teams who want to demo Lineagentic-flow in fast and simple way deployable into huggingface spaces.

#### Command Line Interface (CLI)

Lineagentic-flow provides a powerful CLI tool for quick analysis:

```bash
# Basic SQL query analysis
lineagentic analyze --agent-name sql-lineage-agent --query "SELECT user_id, name FROM users WHERE active = true" --verbose

# Analyze with lineage configuration
lineagentic analyze --agent-name python-lineage-agent --query-file "my_script.py" --verbose

```
for more details see [CLI documentation](cli/README.md).

### environment variables

- HF_TOKEN   (HUGGINGFACE_TOKEN)
- OPENAI_API_KEY

### Architecture

The following figure illustrates the architecture behind the Lineagentic-flow, which is essentially a multi-layer architecture of backend and agentic AI algorithm that leverages a chain-of-thought process to construct lineage across various script types.

![Architecture Diagram](https://raw.githubusercontent.com/lineagentic/lineagentic-flow/main/images/architecture.png)


## Mathematic behind algorithm 

Following shows mathematic behind each layer of algorithm.

### Agent framework 
The agent framework dose IO operations ,memory management, and prompt engineering according to the script type (T) and its content (C).

$$
P := f(T, C)
$$

## Runtime orchestration agent

The runtime orchestration agent orchestrates the execution of the required agents provided by the agent framework (P) by selecting the appropriate agent (A) and its corresponding task (T).

$$
G=h([\{(A_1, T_1), (A_2, T_2), (A_3, T_3), (A_4, T_4)\}],P)
$$

## Syntax Analysis Agent

Syntax Analysis agent, analyzes the syntactic structure of the raw script to identify subqueries and nested structures and decompose the script into multiple subscripts.

$$
\{sa1,⋯,san\}:=h([A_1,T_1],P)
$$

## Field Derivation Agent
The Field Derivation agent processes each subscript from syntax analysis agent to derive field-level mapping relationships and processing logic. 

$$
\{fd1,⋯,fdn\}:=h([A_2,T_2],\{sa1,⋯,san\})
$$

## Operation Tracing Agent
The Operation Tracing agent analyzes the complex conditions within each subscript identified in syntax analysis agent including filter conditions, join conditions, grouping conditions, and sorting conditions.

$$
\{ot1,⋯,otn\}:=h([A_3,T_3],\{sa1,⋯,san\})
$$

## Event Composer Agent
The Event Composer agent consolidates the results from the syntax analysis agent, the field derivation agent and the operation tracing agent to generate the final lineage result.

$$
\{A\}:=h([A_4,T_4],\{sa1,⋯,san\},\{fd1,⋯,fdn\},\{ot1,⋯,otn\})
$$



## Activation and Deployment

To simplify the usage of Lineagentic-flow, a Makefile has been created to manage various activation and deployment tasks. You can explore the available targets directly within the Makefile. Here you can find different strategies but for more details look into Makefile.

1- to start demo server:

```bash
make start-demo-server
```
2- to do all tests:

```bash
make test
```
3- to build package:

```bash
make build-package
```
4- to clean all stack:

```bash
make clean-all-stack
```

5- In order to deploy Lineagentic-flow to Hugging Face Spaces, run the following command ( you need to have huggingface account and put secret keys there if you are going to use paid models):

```bash
make gradio-deploy
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "lineagentic-flow",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.13",
    "maintainer_email": null,
    "keywords": "data-lineage, ai-agents, data-processing, lineage-tracking",
    "author": null,
    "author_email": "Lineagentic Flow Team <team@lineagentic.com>",
    "download_url": "https://files.pythonhosted.org/packages/2a/ff/7fd96ca75276da1f50f156d37a33bde4792f2f2b0edf4bf5b50ccc31e612/lineagentic_flow-1.0.2.tar.gz",
    "platform": null,
    "description": "\n<div align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/lineagentic/lineagentic-flow/main/images/logo.jpg\" alt=\"Lineagentic Logo\" width=\"880\" height=\"300\">\n</div>\n\n## Lineagentic-flow\n\nLineagentic-flow is an agentic ai solution for building end-to-end data lineage across diverse types of data processing scripts across different platforms. It is designed to be modular and customizable, and can be extended to support new data processing script types. In a nutshell this is what it does:\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510    \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510    \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500---\u2500\u2500\u2500\u2510\n\u2502 source-code \u2502\u2500\u2500\u2500\u25b6\u2502   lineagentic-flow-algorithm  \u2502\u2500\u2500\u2500\u25b6\u2502  lineage output  \u2502\n\u2502             \u2502    \u2502                               \u2502    \u2502                  \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518    \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518    \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500---\u2500\u2518\n```\n### Features\n\n- Plugin based design pattern, simple to extend and customize.\n- Command line interface for quick analysis.\n- Support for multiple data processing script types (SQL, Python, Airflow Spark, etc.)\n- Simple demo server to run locally and in huggingface spaces.\n\n## Quick Start\n\n### Installation\n\nInstall the package from PyPI:\n\n```bash\npip install lineagentic-flow\n```\n\n### Basic Usage\n\n```python\nimport asyncio\nfrom lf_algorithm.framework_agent import FrameworkAgent\nimport logging\n\nlogging.basicConfig(\n    level=logging.INFO,\n    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'\n)\n\nasync def main():\n    # Create an agent for SQL lineage extraction\n    agent = FrameworkAgent(\n        agent_name=\"sql-lineage-agent\",\n        model_name=\"gpt-4o-mini\",\n        source_code=\"SELECT id, name FROM users WHERE active = true\"\n    )\n    \n    # Run the agent to extract lineage\n    result = await agent.run_agent()\n    print(result)\n\n# Run the example\nasyncio.run(main())\n```\n### Supported Agents\n\nFollowing table shows the current development agents in Lineagentic-flow algorithm:\n\n\n| **Agent Name**       | **Done** | **Under Development** | **In Backlog** | **Comment**                          |\n|----------------------|:--------:|:----------------------:|:--------------:|--------------------------------------|\n| python-lineage_agent    | \u2713        |                        |                |       |\n| airflow_lineage_agent       |    \u2713        |                      |                |             |\n| java_lineage_agent      |       \u2713     |                        |              |           |\n| spark_lineage_agent        |  \u2713          |                       |                |       |\n| sql_lineage_agent      | \u2713        |                        |                |            |\n| flink_lineage_agent         |          |                        | \u2713              |            |\n| beam_lineage_agent         |          |                        | \u2713              |            |\n| shell_lineage_agent         |          |                        | \u2713              |            |\n| scala_lineage_agent         |          |                        | \u2713              |            |\n| dbt_lineage_agent         |          |                        | \u2713              |            |\n\n\n### Environment Variables\n\nSet your API keys:\n\n```bash\nexport OPENAI_API_KEY=\"your-openai-api-key\"\nexport HF_TOKEN=\"your-huggingface-token\"  # Optional\n```\n\n## What are the components of Lineagentic-flow?\n\n- Algorithm module: This is the brain of the Lineagentic-flow. It contains agents, which are implemented as plugins and acting as chain of thought process to extract lineage from different types of data processing scripts. The module is built using a plugin-based design pattern, allowing you to easily develop and integrate your own custom agents.\n\n- CLI module: is for command line around algorithm API and connect to unified service layer\n\n- Demo module: is for teams who want to demo Lineagentic-flow in fast and simple way deployable into huggingface spaces.\n\n#### Command Line Interface (CLI)\n\nLineagentic-flow provides a powerful CLI tool for quick analysis:\n\n```bash\n# Basic SQL query analysis\nlineagentic analyze --agent-name sql-lineage-agent --query \"SELECT user_id, name FROM users WHERE active = true\" --verbose\n\n# Analyze with lineage configuration\nlineagentic analyze --agent-name python-lineage-agent --query-file \"my_script.py\" --verbose\n\n```\nfor more details see [CLI documentation](cli/README.md).\n\n### environment variables\n\n- HF_TOKEN   (HUGGINGFACE_TOKEN)\n- OPENAI_API_KEY\n\n### Architecture\n\nThe following figure illustrates the architecture behind the Lineagentic-flow, which is essentially a multi-layer architecture of backend and agentic AI algorithm that leverages a chain-of-thought process to construct lineage across various script types.\n\n![Architecture Diagram](https://raw.githubusercontent.com/lineagentic/lineagentic-flow/main/images/architecture.png)\n\n\n## Mathematic behind algorithm \n\nFollowing shows mathematic behind each layer of algorithm.\n\n### Agent framework \nThe agent framework dose IO operations ,memory management, and prompt engineering according to the script type (T) and its content (C).\n\n$$\nP := f(T, C)\n$$\n\n## Runtime orchestration agent\n\nThe runtime orchestration agent orchestrates the execution of the required agents provided by the agent framework (P) by selecting the appropriate agent (A) and its corresponding task (T).\n\n$$\nG=h([\\{(A_1, T_1), (A_2, T_2), (A_3, T_3), (A_4, T_4)\\}],P)\n$$\n\n## Syntax Analysis Agent\n\nSyntax Analysis agent, analyzes the syntactic structure of the raw script to identify subqueries and nested structures and decompose the script into multiple subscripts.\n\n$$\n\\{sa1,\u22ef,san\\}:=h([A_1,T_1],P)\n$$\n\n## Field Derivation Agent\nThe Field Derivation agent processes each subscript from syntax analysis agent to derive field-level mapping relationships and processing logic. \n\n$$\n\\{fd1,\u22ef,fdn\\}:=h([A_2,T_2],\\{sa1,\u22ef,san\\})\n$$\n\n## Operation Tracing Agent\nThe Operation Tracing agent analyzes the complex conditions within each subscript identified in syntax analysis agent including filter conditions, join conditions, grouping conditions, and sorting conditions.\n\n$$\n\\{ot1,\u22ef,otn\\}:=h([A_3,T_3],\\{sa1,\u22ef,san\\})\n$$\n\n## Event Composer Agent\nThe Event Composer agent consolidates the results from the syntax analysis agent, the field derivation agent and the operation tracing agent to generate the final lineage result.\n\n$$\n\\{A\\}:=h([A_4,T_4],\\{sa1,\u22ef,san\\},\\{fd1,\u22ef,fdn\\},\\{ot1,\u22ef,otn\\})\n$$\n\n\n\n## Activation and Deployment\n\nTo simplify the usage of Lineagentic-flow, a Makefile has been created to manage various activation and deployment tasks. You can explore the available targets directly within the Makefile. Here you can find different strategies but for more details look into Makefile.\n\n1- to start demo server:\n\n```bash\nmake start-demo-server\n```\n2- to do all tests:\n\n```bash\nmake test\n```\n3- to build package:\n\n```bash\nmake build-package\n```\n4- to clean all stack:\n\n```bash\nmake clean-all-stack\n```\n\n5- In order to deploy Lineagentic-flow to Hugging Face Spaces, run the following command ( you need to have huggingface account and put secret keys there if you are going to use paid models):\n\n```bash\nmake gradio-deploy\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Lineagentic-flow is agentic ai approach for building data lineage across diverse data processing scripts including python, sql, java, airflow, spark, etc.",
    "version": "1.0.2",
    "project_urls": {
        "Documentation": "https://lineagentic-flow.readthedocs.io",
        "Homepage": "https://github.com/lineagentic/lineagentic-flow",
        "Issues": "https://github.com/lineagentic/lineagentic-flow/issues",
        "Repository": "https://github.com/lineagentic/lineagentic-flow"
    },
    "split_keywords": [
        "data-lineage",
        " ai-agents",
        " data-processing",
        " lineage-tracking"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "53c76c755b6976a400754fd2bb6533d5f0b0c28e24952288fb15f91951b127f3",
                "md5": "29a7b2553f9262d8dd7e126afd0b3a8d",
                "sha256": "c7805aa2c5c04bd0850175e03fafa266b470c3eba450a7f71ddd1f9f0553f4eb"
            },
            "downloads": -1,
            "filename": "lineagentic_flow-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "29a7b2553f9262d8dd7e126afd0b3a8d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.13",
            "size": 74447,
            "upload_time": "2025-08-18T16:30:40",
            "upload_time_iso_8601": "2025-08-18T16:30:40.206826Z",
            "url": "https://files.pythonhosted.org/packages/53/c7/6c755b6976a400754fd2bb6533d5f0b0c28e24952288fb15f91951b127f3/lineagentic_flow-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2aff7fd96ca75276da1f50f156d37a33bde4792f2f2b0edf4bf5b50ccc31e612",
                "md5": "af1c5dbb7c601ed36c1bb241cec9d933",
                "sha256": "a65c91a8396de13f4647dde5aa03ad3541596632af1715d3f14b1ff9d44f7bec"
            },
            "downloads": -1,
            "filename": "lineagentic_flow-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "af1c5dbb7c601ed36c1bb241cec9d933",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.13",
            "size": 341426,
            "upload_time": "2025-08-18T16:30:41",
            "upload_time_iso_8601": "2025-08-18T16:30:41.928607Z",
            "url": "https://files.pythonhosted.org/packages/2a/ff/7fd96ca75276da1f50f156d37a33bde4792f2f2b0edf4bf5b50ccc31e612/lineagentic_flow-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-18 16:30:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lineagentic",
    "github_project": "lineagentic-flow",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "lineagentic-flow"
}
        
Elapsed time: 0.81354s