intugle


Nameintugle JSON
Version 0.1.4 PyPI version JSON
download
home_pageNone
SummaryA GenAI-powered Python library for building semantic layers.
upload_time2025-09-10 10:28:28
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseApache-2.0
keywords data semantic layer genai llm data profiling link prediction sql generation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
      <img alt="Intugle Logo" width="350" src="https://github.com/user-attachments/assets/18f4627b-af6c-4133-994b-830c30a9533b" />
 <h3 align="center"><i>The GenAI-powered toolkit for automated data intelligence.</i></h3>
</p>

[![Release](https://img.shields.io/github/release/Intugle/data-tools)](https://github.com/Intugle/data-tools/releases/tag/v0.1.0)     
[![Made with Python](https://img.shields.io/badge/Made_with-Python-blue?logo=python&logoColor=white)](https://www.python.org/) 
![contributions - welcome](https://img.shields.io/badge/contributions-welcome-blue)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Open Issues](https://img.shields.io/github/issues-raw/Intugle/data-tools)](https://github.com/Intugle/data-tools/issues)
[![GitHub star chart](https://img.shields.io/github/stars/Intugle/data-tools?style=social)](https://github.com/Intugle/data-tools/stargazers)

*Transform Fragmented Data into Connected Semantic Layer*

## Overview

Intugle’s GenAI-powered open-source Python library builds an intelligent semantic layer over your existing data systems. At its core, it discovers meaningful links and relationships across data assets — enriching them with profiles, classifications, and business glossaries. With this connected knowledge layer, you can enable semantic search and auto-generate queries to create unified data products, making data integration and exploration faster, more accurate, and far less manual.

## Who is this for?

*   **Data Engineers & Architects** often spend weeks manually profiling, classifying, and stitching together fragmented data assets. With Intugle, they can automate this process end-to-end, uncovering meaningful links and relationships to instantly generate a connected semantic layer.
*   **Data Analysts & Scientists** spend endless hours on data readiness and preparation before they can even start the real analysis. Intugle accelerates this by providing contextual intelligence, automatically generating SQL and reusable data products enriched with relationships and business meaning.
*   **Business Analysts & Decision Makers** are slowed down by constant dependence on technical teams for answers. Intugle removes this bottleneck by enabling natural language queries and semantic search, giving them trusted insights on demand.

## Features

*   **Semantic Intelligence:** Transform raw, fragmented datasets into an intelligent semantic graph that captures entities, relationships, and context — the foundation for connected intelligence.
*   **Business Glossary & Semantic Search:** Auto-generate a business glossary and enable search that understands meaning, not just keywords — making data more accessible across technical and business users.
*   **Smart SQL & Data Products:** Instantly generate SQL and reusable data products enriched with context, eliminating manual pipelines and accelerating data-to-insight.

## Getting Started

### Installation

For Windows and Linux, you can follow these steps. For macOS, please see the additional steps in the macOS section below.

Before installing, it is recommended to create a virtual environment:

```bash
python -m venv .venv
source .venv/bin/activate
```

Then, install the package:

```bash
pip install intugle
```

#### macOS

For macOS users, you may need to install the `libomp` library:

```bash
brew install libomp
```

If you installed Python using the official installer from python.org, you may also need to install SSL certificates by running the following command in your terminal. Please replace `3.XX` with your specific Python version. This step is not necessary if you installed Python using Homebrew.

```bash
/Applications/Python\ 3.XX/Install\ Certificates.command
```

### Configuration

Before running the project, you need to configure a LLM. This is used for tasks like generating business glossaries and predicting links between tables.

You can configure the LLM by setting the following environment variables:

*   `LLM_PROVIDER`: The LLM provider and model to use (e.g., `openai:gpt-3.5-turbo`) following LangChain's [conventions](https://python.langchain.com/docs/integrations/chat/)
*   `API_KEY`: Your API key for the LLM provider. The exact name of the variable may vary from provider to provider.

Here's an example of how to set these variables in your environment:

```bash
export LLM_PROVIDER="openai:gpt-3.5-turbo"
export OPENAI_API_KEY="your-openai-api-key"
```

## Quickstart

For a detailed, hands-on introduction to the project, please see the [`quickstart.ipynb`](quickstart.ipynb) notebook. It will walk you through the entire process of building a semantic layer, including:

*   **Building a Knowledge Base:** Use the `KnowledgeBuilder` to automatically profile your data, generate a business glossary, and predict links between tables.
*   **Accessing Enriched Metadata:** Learn how to access the profiling results and business glossary for each dataset.
*   **Visualizing Relationships:** Visualize the predicted links between your tables.
*   **Generating Data Products:** Use the semantic layer to generate data products and retrieve data.
*   **Searching the Knowledge Base:** Use semantic search to find relevant columns in your datasets using natural language.

## Usage

The core workflow of the project involves using the `KnowledgeBuilder` to build a semantic layer, and then using the `DataProductBuilder` to generate data products from that layer.

```python
from intugle import KnowledgeBuilder, DataProductBuilder

# Define your datasets
datasets = {
    "allergies": {"path": "path/to/allergies.csv", "type": "csv"},
    "patients": {"path": "path/to/patients.csv", "type": "csv"},
    "claims": {"path": "path/to/claims.csv", "type": "csv"},
    # ... add other datasets
}

# Build the knowledge base
kb = KnowledgeBuilder(datasets, domain="Healthcare")
kb.build()

# Create a DataProductBuilder
dp_builder = DataProductBuilder()

# Define an ETL model
etl = {
  "name": "top_patients_by_claim_count",
  "fields": [
    {
      "id": "patients.first",
      "name": "first_name",
    },
    {
      "id": "patients.last",
      "name": "last_name",
    },
    {
      "id": "claims.id",
      "name": "number_of_claims",
      "category": "measure",
      "measure_func": "count"
    }
  ],
  "filter": {
    "sort_by": [
      {
        "id": "claims.id",
        "alias": "number_of_claims",
        "direction": "desc"
      }
    ],
    "limit": 10
  }
}

# Generate the data product
data_product = dp_builder.build(etl)

# View the data product as a DataFrame
print(data_product.to_df())
```

For detailed code examples and a complete walkthrough, please see the [`quickstart.ipynb`](quickstart.ipynb) notebook.

### Semantic Search

The semantic search feature allows you to search for columns in your datasets using natural language. It is built on top of the [Qdrant](https://qdrant.tech/) vector database.

#### Prerequisites

To use the semantic search feature, you need to have a running Qdrant instance. You can start one using the following Docker command:

```bash
docker run -p 6333:6333 -p 6334:6334 \
    -v qdrant_storage:/qdrant/storage:z \
    --name qdrant qdrant/qdrant
```

You also need to configure the Qdrant URL and API key (if using authorization) in your environment variables:

```bash
export QDRANT_URL="http://localhost:6333"
export QDRANT_API_KEY="your-qdrant-api-key" # if authorization is used
```

Currently, the semantic search feature only supports OpenAI embedding models. Therefore, you need to have an OpenAI API key set up in your environment. The default model is `text-embedding-ada-002`. You can change the embedding model by setting the `EMBEDDING_MODEL_NAME` environment variable.

**For OpenAI models:**

```bash
export OPENAI_API_KEY="your-openai-api-key"
export EMBEDDING_MODEL_NAME="openai:ada"
```

**For Azure OpenAI models:**

```bash
export AZURE_OPENAI_API_KEY="your-azure-openai-api-key"
export AZURE_OPENAI_ENDPOINT="your-azure-openai-endpoint"
export OPENAI_API_VERSION="your-openai-api-version"
export EMBEDDING_MODEL_NAME="azure_openai:ada"
```

#### Usage

Once you have built the knowledge base, you can use the `search` method to perform a semantic search. The search function returns a pandas DataFrame containing the search results, including the column\'s profiling metrics, category, table name, and table glossary.

```python
from intugle import KnowledgeBuilder

# Define your datasets
datasets = {
    "allergies": {"path": "path/to/allergies.csv", "type": "csv"},
    "patients": {"path": "path/to/patients.csv", "type": "csv"},
    "claims": {"path": "path/to/claims.csv", "type": "csv"},
    # ... add other datasets
}

# Build the knowledge base
kb = KnowledgeBuilder(datasets, domain="Healthcare")
kb.build()
# Perform a semantic search
search_results = kb.search("patient allergies")

# View the search results
print(search_results)
```
For detailed code examples and a complete walkthrough, please see the [`quickstart.ipynb`](quickstart.ipynb) notebook.

## Community

Join our community to ask questions, share your projects, and connect with other users.

*   [Join our Slack](https://join.slack.com/share/enQtOTQ4NDc1MzYzOTg2MC02OTc2MTU1Njg3NDEyZjQwN2IzMzEwMjc5NmU4MjhiZmJlMDdiMzMzYjI5YWJiNDhkYWM4ODU0MGY4NTUyNjhi)
*   [Join our Discord](https://discord.gg/4PNPsQVA)


## Contributing

Contributions are welcome! Please see the [`CONTRIBUTING.md`](CONTRIBUTING.md) file for guidelines.

## License

This project is licensed under the Apache License, Version 2.0. See the [`LICENSE`](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "intugle",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "data, semantic layer, genai, llm, data profiling, link prediction, sql generation",
    "author": null,
    "author_email": "Intugle <hello@intugle.ai>",
    "download_url": "https://files.pythonhosted.org/packages/6a/2c/f16133050d8591b3d58d8bcc7cd144528b463eea859cb118a1f321f38f54/intugle-0.1.4.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n      <img alt=\"Intugle Logo\" width=\"350\" src=\"https://github.com/user-attachments/assets/18f4627b-af6c-4133-994b-830c30a9533b\" />\n <h3 align=\"center\"><i>The GenAI-powered toolkit for automated data intelligence.</i></h3>\n</p>\n\n[![Release](https://img.shields.io/github/release/Intugle/data-tools)](https://github.com/Intugle/data-tools/releases/tag/v0.1.0)     \n[![Made with Python](https://img.shields.io/badge/Made_with-Python-blue?logo=python&logoColor=white)](https://www.python.org/) \n![contributions - welcome](https://img.shields.io/badge/contributions-welcome-blue)\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Open Issues](https://img.shields.io/github/issues-raw/Intugle/data-tools)](https://github.com/Intugle/data-tools/issues)\n[![GitHub star chart](https://img.shields.io/github/stars/Intugle/data-tools?style=social)](https://github.com/Intugle/data-tools/stargazers)\n\n*Transform Fragmented Data into Connected Semantic Layer*\n\n## Overview\n\nIntugle\u2019s GenAI-powered open-source Python library builds an intelligent semantic layer over your existing data systems. At its core, it discovers meaningful links and relationships across data assets \u2014 enriching them with profiles, classifications, and business glossaries. With this connected knowledge layer, you can enable semantic search and auto-generate queries to create unified data products, making data integration and exploration faster, more accurate, and far less manual.\n\n## Who is this for?\n\n*   **Data Engineers & Architects** often spend weeks manually profiling, classifying, and stitching together fragmented data assets. With Intugle, they can automate this process end-to-end, uncovering meaningful links and relationships to instantly generate a connected semantic layer.\n*   **Data Analysts & Scientists** spend endless hours on data readiness and preparation before they can even start the real analysis. Intugle accelerates this by providing contextual intelligence, automatically generating SQL and reusable data products enriched with relationships and business meaning.\n*   **Business Analysts & Decision Makers** are slowed down by constant dependence on technical teams for answers. Intugle removes this bottleneck by enabling natural language queries and semantic search, giving them trusted insights on demand.\n\n## Features\n\n*   **Semantic Intelligence:** Transform raw, fragmented datasets into an intelligent semantic graph that captures entities, relationships, and context \u2014 the foundation for connected intelligence.\n*   **Business Glossary & Semantic Search:** Auto-generate a business glossary and enable search that understands meaning, not just keywords \u2014 making data more accessible across technical and business users.\n*   **Smart SQL & Data Products:** Instantly generate SQL and reusable data products enriched with context, eliminating manual pipelines and accelerating data-to-insight.\n\n## Getting Started\n\n### Installation\n\nFor Windows and Linux, you can follow these steps. For macOS, please see the additional steps in the macOS section below.\n\nBefore installing, it is recommended to create a virtual environment:\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate\n```\n\nThen, install the package:\n\n```bash\npip install intugle\n```\n\n#### macOS\n\nFor macOS users, you may need to install the `libomp` library:\n\n```bash\nbrew install libomp\n```\n\nIf you installed Python using the official installer from python.org, you may also need to install SSL certificates by running the following command in your terminal. Please replace `3.XX` with your specific Python version. This step is not necessary if you installed Python using Homebrew.\n\n```bash\n/Applications/Python\\ 3.XX/Install\\ Certificates.command\n```\n\n### Configuration\n\nBefore running the project, you need to configure a LLM. This is used for tasks like generating business glossaries and predicting links between tables.\n\nYou can configure the LLM by setting the following environment variables:\n\n*   `LLM_PROVIDER`: The LLM provider and model to use (e.g., `openai:gpt-3.5-turbo`) following LangChain's [conventions](https://python.langchain.com/docs/integrations/chat/)\n*   `API_KEY`: Your API key for the LLM provider. The exact name of the variable may vary from provider to provider.\n\nHere's an example of how to set these variables in your environment:\n\n```bash\nexport LLM_PROVIDER=\"openai:gpt-3.5-turbo\"\nexport OPENAI_API_KEY=\"your-openai-api-key\"\n```\n\n## Quickstart\n\nFor a detailed, hands-on introduction to the project, please see the [`quickstart.ipynb`](quickstart.ipynb) notebook. It will walk you through the entire process of building a semantic layer, including:\n\n*   **Building a Knowledge Base:** Use the `KnowledgeBuilder` to automatically profile your data, generate a business glossary, and predict links between tables.\n*   **Accessing Enriched Metadata:** Learn how to access the profiling results and business glossary for each dataset.\n*   **Visualizing Relationships:** Visualize the predicted links between your tables.\n*   **Generating Data Products:** Use the semantic layer to generate data products and retrieve data.\n*   **Searching the Knowledge Base:** Use semantic search to find relevant columns in your datasets using natural language.\n\n## Usage\n\nThe core workflow of the project involves using the `KnowledgeBuilder` to build a semantic layer, and then using the `DataProductBuilder` to generate data products from that layer.\n\n```python\nfrom intugle import KnowledgeBuilder, DataProductBuilder\n\n# Define your datasets\ndatasets = {\n    \"allergies\": {\"path\": \"path/to/allergies.csv\", \"type\": \"csv\"},\n    \"patients\": {\"path\": \"path/to/patients.csv\", \"type\": \"csv\"},\n    \"claims\": {\"path\": \"path/to/claims.csv\", \"type\": \"csv\"},\n    # ... add other datasets\n}\n\n# Build the knowledge base\nkb = KnowledgeBuilder(datasets, domain=\"Healthcare\")\nkb.build()\n\n# Create a DataProductBuilder\ndp_builder = DataProductBuilder()\n\n# Define an ETL model\netl = {\n  \"name\": \"top_patients_by_claim_count\",\n  \"fields\": [\n    {\n      \"id\": \"patients.first\",\n      \"name\": \"first_name\",\n    },\n    {\n      \"id\": \"patients.last\",\n      \"name\": \"last_name\",\n    },\n    {\n      \"id\": \"claims.id\",\n      \"name\": \"number_of_claims\",\n      \"category\": \"measure\",\n      \"measure_func\": \"count\"\n    }\n  ],\n  \"filter\": {\n    \"sort_by\": [\n      {\n        \"id\": \"claims.id\",\n        \"alias\": \"number_of_claims\",\n        \"direction\": \"desc\"\n      }\n    ],\n    \"limit\": 10\n  }\n}\n\n# Generate the data product\ndata_product = dp_builder.build(etl)\n\n# View the data product as a DataFrame\nprint(data_product.to_df())\n```\n\nFor detailed code examples and a complete walkthrough, please see the [`quickstart.ipynb`](quickstart.ipynb) notebook.\n\n### Semantic Search\n\nThe semantic search feature allows you to search for columns in your datasets using natural language. It is built on top of the [Qdrant](https://qdrant.tech/) vector database.\n\n#### Prerequisites\n\nTo use the semantic search feature, you need to have a running Qdrant instance. You can start one using the following Docker command:\n\n```bash\ndocker run -p 6333:6333 -p 6334:6334 \\\n    -v qdrant_storage:/qdrant/storage:z \\\n    --name qdrant qdrant/qdrant\n```\n\nYou also need to configure the Qdrant URL and API key (if using authorization) in your environment variables:\n\n```bash\nexport QDRANT_URL=\"http://localhost:6333\"\nexport QDRANT_API_KEY=\"your-qdrant-api-key\" # if authorization is used\n```\n\nCurrently, the semantic search feature only supports OpenAI embedding models. Therefore, you need to have an OpenAI API key set up in your environment. The default model is `text-embedding-ada-002`. You can change the embedding model by setting the `EMBEDDING_MODEL_NAME` environment variable.\n\n**For OpenAI models:**\n\n```bash\nexport OPENAI_API_KEY=\"your-openai-api-key\"\nexport EMBEDDING_MODEL_NAME=\"openai:ada\"\n```\n\n**For Azure OpenAI models:**\n\n```bash\nexport AZURE_OPENAI_API_KEY=\"your-azure-openai-api-key\"\nexport AZURE_OPENAI_ENDPOINT=\"your-azure-openai-endpoint\"\nexport OPENAI_API_VERSION=\"your-openai-api-version\"\nexport EMBEDDING_MODEL_NAME=\"azure_openai:ada\"\n```\n\n#### Usage\n\nOnce you have built the knowledge base, you can use the `search` method to perform a semantic search. The search function returns a pandas DataFrame containing the search results, including the column\\'s profiling metrics, category, table name, and table glossary.\n\n```python\nfrom intugle import KnowledgeBuilder\n\n# Define your datasets\ndatasets = {\n    \"allergies\": {\"path\": \"path/to/allergies.csv\", \"type\": \"csv\"},\n    \"patients\": {\"path\": \"path/to/patients.csv\", \"type\": \"csv\"},\n    \"claims\": {\"path\": \"path/to/claims.csv\", \"type\": \"csv\"},\n    # ... add other datasets\n}\n\n# Build the knowledge base\nkb = KnowledgeBuilder(datasets, domain=\"Healthcare\")\nkb.build()\n# Perform a semantic search\nsearch_results = kb.search(\"patient allergies\")\n\n# View the search results\nprint(search_results)\n```\nFor detailed code examples and a complete walkthrough, please see the [`quickstart.ipynb`](quickstart.ipynb) notebook.\n\n## Community\n\nJoin our community to ask questions, share your projects, and connect with other users.\n\n*   [Join our Slack](https://join.slack.com/share/enQtOTQ4NDc1MzYzOTg2MC02OTc2MTU1Njg3NDEyZjQwN2IzMzEwMjc5NmU4MjhiZmJlMDdiMzMzYjI5YWJiNDhkYWM4ODU0MGY4NTUyNjhi)\n*   [Join our Discord](https://discord.gg/4PNPsQVA)\n\n\n## Contributing\n\nContributions are welcome! Please see the [`CONTRIBUTING.md`](CONTRIBUTING.md) file for guidelines.\n\n## License\n\nThis project is licensed under the Apache License, Version 2.0. See the [`LICENSE`](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "A GenAI-powered Python library for building semantic layers.",
    "version": "0.1.4",
    "project_urls": {
        "Bug Tracker": "https://github.com/Intugle/data-tools/issues",
        "Homepage": "https://github.com/Intugle/data-tools"
    },
    "split_keywords": [
        "data",
        " semantic layer",
        " genai",
        " llm",
        " data profiling",
        " link prediction",
        " sql generation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "db02719630276a9c1d20603e0dd58633b43c47d183f67a70d4eb91548528a07d",
                "md5": "6ea320d2a45b858631c6debab0ec4b5b",
                "sha256": "076090f2ae6c76e515ec4c810495092cc00b6bcba8a47ad162e64defd5661b09"
            },
            "downloads": -1,
            "filename": "intugle-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6ea320d2a45b858631c6debab0ec4b5b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 8615847,
            "upload_time": "2025-09-10T10:28:19",
            "upload_time_iso_8601": "2025-09-10T10:28:19.703191Z",
            "url": "https://files.pythonhosted.org/packages/db/02/719630276a9c1d20603e0dd58633b43c47d183f67a70d4eb91548528a07d/intugle-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6a2cf16133050d8591b3d58d8bcc7cd144528b463eea859cb118a1f321f38f54",
                "md5": "c037d4476bfd9240a185f2278c240657",
                "sha256": "393c7958da8119dd49376389cf8038a77519de03dc6e80a47429126fee59ae7f"
            },
            "downloads": -1,
            "filename": "intugle-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "c037d4476bfd9240a185f2278c240657",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 8401765,
            "upload_time": "2025-09-10T10:28:28",
            "upload_time_iso_8601": "2025-09-10T10:28:28.892779Z",
            "url": "https://files.pythonhosted.org/packages/6a/2c/f16133050d8591b3d58d8bcc7cd144528b463eea859cb118a1f321f38f54/intugle-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-10 10:28:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Intugle",
    "github_project": "data-tools",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "intugle"
}
        
Elapsed time: 4.97241s