adel-lite


Nameadel-lite JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/Parthnuwal7/adel-lite.git
SummaryAutomated Data Elements Linking - Lite
upload_time2025-09-10 22:09:23
maintainerNone
docs_urlNone
authorParth Nuwal
requires_python>=3.8
licenseMIT
keywords data schema profiling pandas automation
VCS
bugtrack_url
requirements pandas numpy pyyaml networkx matplotlib graphviz fuzzywuzzy python-levenshtein
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Adel-Lite: Automated Data Elements Linking - Lite

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-380/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**Adel-Lite** is a Python library for automated schema generation, data profiling, and relationship discovery for Pandas DataFrames. It helps you understand your data structure and relationships with minimal effort.

## Features

🔍 **Schema Generation**: Automatic structural schema detection  
📊 **Data Profiling**: Comprehensive statistics and semantic type inference  
🔗 **Relationship Mapping**: Primary/Foreign key detection using heuristics  
⚡ **Constraint Discovery**: Intra-row constraint detection (GT, EQ)  
📈 **Visualization**: Schema graphs with Graphviz  
📤 **Multi-format Export**: JSON, YAML, SQL DDL, Avro  
🛠️ **CLI Support**: Command-line interface for batch processing  

## Installation

```bash
pip install adel-lite
```
### Development installation
```bash
git clone https://github.com/Parthnuwal7/adel-lite.git
cd adel-lite
pip install -e .
```
## Quick Start
### Basic Usage

```python
import pandas as pd
from adel_lite import schema, profile, map_relationships, build_meta
```
### Load your data

```python
customers = pd.DataFrame({
'customer_id': ,
'name': ['Alice', 'Bob', 'Charlie'],
'email': ['alice@test.com', 'bob@test.com', 'charlie@test.com']
})

orders = pd.DataFrame({
'order_id': ,
'customer_id': ,
'amount': [100.0, 150.0, 75.0]
})

df_list = [customers, orders]
table_names = ['customers', 'orders']
```
### Generate comprehensive analysis

```python
schema_result = schema(df_list, table_names)
profile_result = profile(df_list, table_names)
relationships_result = map_relationships(df_list, table_names)
```
### Build final meta structure

```python
meta = build_meta(schema_result, profile_result, relationships_result)
print(json.dumps(meta, indent=2))
```
### Command Line usage

#### Analyze CSV files

```bash
adel-lite --input data/*.csv --output schema.json
```
#### Generate visualization

```bash
adel-lite --input *.csv --visualize --output schema.json
```
#### Export as SQL DDL

```bash
adel-lite --input data/*.csv --format ddl --output schema.sql
```
#### Skip constraint detection for faster processing

```bash
adel-lite --input *.csv --no-constraints --output schema.json
```
## Core Functions

### 1. Schema Generation

```python
from adel_lite import schema

Generate structural schema
result = schema(df_list, table_names)
```

**Returns:**
- Table names and column information
- Data types (pandas + high-level)
- Nullable flags and positions

### 2. Data Profiling
```python
from adel_lite import profile

Generate comprehensive profiles
result = profile(df_list, table_names)
```
**Returns:**
- Statistical summaries (min, max, mean, etc.)
- Uniqueness and null ratios
- Semantic type inference (id, datetime, categorical, etc.)
- Primary key candidates

### 3. Relationship Mapping
```python
from adel_lite import map_relationships

Detect relationships
result = map_relationships(df_list, table_names, fk_threshold=0.8)
```
**Returns:**
- Primary key detection
- Foreign key relationships with confidence scores
- Composite key candidates

### 4. Constraint Detection
```python
from adel_lite import detect_constraints

Find intra-row constraints
result = detect_constraints(df_list, table_names, threshold=0.95)
```
**Returns:**
- GT constraints: `A > B`
- EQ constraints: `A + B = C`
- Confidence scores

### 5. Visualization
```python
from adel_lite import visualize

Generate schema graph
path = visualize(schema_result, relationships_result, format='png')
```
### 6. Export

```python
from adel_lite import export_schema

Export to different formats
json_content = export_schema(meta, format='json')
yaml_content = export_schema(meta, format='yaml')
ddl_content = export_schema(meta, format='ddl')

```

## Example Output
```json
{
"metadata": {
"generated_at": "2025-09-10T12:42:00",
"generator": "adel-lite",
"version": "0.1.0"
},
"tables": [
{
"name": "customers",
"primary_key": "customer_id",
"fields": [
{
"name": "customer_id",
"dtype": "integer",
"semantic_type": "id",
"subtype": "primary",
"nullable": false
}
]
}
],
"relationships": [
{
"type": "foreign_key",
"foreign_table": "orders",
"foreign_column": "customer_id",
"referenced_table": "customers",
"referenced_column": "customer_id",
"confidence": 0.92
}
]
}
```

## Advanced Usage

### Custom Thresholds

Adjust detection thresholds

```python
relationships = map_relationships(
df_list, table_names,
fk_threshold=0.9, # Stricter FK detection
name_similarity_threshold=0.8
)

constraints = detect_constraints(
df_list, table_names,
threshold=0.98 # Very strict constraints
)

```

### Sampling and Inspection
```python
from adel_lite import sample

Get sample data for inspection
samples = sample(df_list, table_names, n=10, method='random')

Conditional sampling
samples = sample_by_condition(
df_list,
['age > 25', 'amount > 100'],
table_names
)

```

## Configuration

### CLI Configuration

Full configuration example
```bash 
adel-lite
--input data/*.csv
--output analysis.json
--format json
--visualize
--viz-format svg
--sample 5
--constraint-threshold 0.9
--fk-threshold 0.8
--verbose

```

### Logging

```python
import logging

#Enable debug logging
logging.getLogger('adel_lite').setLevel(logging.DEBUG)

```

## Performance Tips

1. **Skip constraints** for large datasets: `--no-constraints`
2. **Limit sampling** for inspection: `--sample 100`
3. **Use appropriate thresholds** based on data quality
4. **Process in batches** for very large datasets

## Requirements

- Python 3.8+
- pandas >= 1.3.0
- numpy >= 1.21.0
- pyyaml >= 6.0
- networkx >= 2.6
- matplotlib >= 3.5.0
- graphviz >= 0.20.0
- fuzzywuzzy >= 0.18.0

## Contributing

1. Fork the repository
2. Create a feature branch: `git checkout -b feature-name`
3. Make changes and add tests
4. Run tests: `pytest`
5. Submit a pull request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Roadmap

- [ ] Support for more data sources (databases, APIs)
- [ ] Advanced constraint types (LIKE patterns, regex)
- [ ] Machine learning-based relationship detection
- [ ] Interactive web interface
- [ ] Integration with data catalogs

## Support

- 📖 [Documentation](https://github.com/Parthnuwal7/adel-lite)
- 🐛 [Issue Tracker](https://github.com/Parthnuwal7/adel-lite)
- 💬 [Discussions](https://github.com/Parthnuwal7/adel-lite.git)

---

Made with ❤️ for the data community by Parth Nuwal

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Parthnuwal7/adel-lite.git",
    "name": "adel-lite",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "data, schema, profiling, pandas, automation",
    "author": "Parth Nuwal",
    "author_email": "Parth Nuwal <parthnuwal7@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/3d/1d/ce6ea63c4dc6495f635aec6e2461bdfd718f902ac0907ad961fc3761d517/adel_lite-0.1.0.tar.gz",
    "platform": null,
    "description": "# Adel-Lite: Automated Data Elements Linking - Lite\r\n\r\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-380/)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\r\n\r\n**Adel-Lite** is a Python library for automated schema generation, data profiling, and relationship discovery for Pandas DataFrames. It helps you understand your data structure and relationships with minimal effort.\r\n\r\n## Features\r\n\r\n\ud83d\udd0d **Schema Generation**: Automatic structural schema detection  \r\n\ud83d\udcca **Data Profiling**: Comprehensive statistics and semantic type inference  \r\n\ud83d\udd17 **Relationship Mapping**: Primary/Foreign key detection using heuristics  \r\n\u26a1 **Constraint Discovery**: Intra-row constraint detection (GT, EQ)  \r\n\ud83d\udcc8 **Visualization**: Schema graphs with Graphviz  \r\n\ud83d\udce4 **Multi-format Export**: JSON, YAML, SQL DDL, Avro  \r\n\ud83d\udee0\ufe0f **CLI Support**: Command-line interface for batch processing  \r\n\r\n## Installation\r\n\r\n```bash\r\npip install adel-lite\r\n```\r\n### Development installation\r\n```bash\r\ngit clone https://github.com/Parthnuwal7/adel-lite.git\r\ncd adel-lite\r\npip install -e .\r\n```\r\n## Quick Start\r\n### Basic Usage\r\n\r\n```python\r\nimport pandas as pd\r\nfrom adel_lite import schema, profile, map_relationships, build_meta\r\n```\r\n### Load your data\r\n\r\n```python\r\ncustomers = pd.DataFrame({\r\n'customer_id': ,\r\n'name': ['Alice', 'Bob', 'Charlie'],\r\n'email': ['alice@test.com', 'bob@test.com', 'charlie@test.com']\r\n})\r\n\r\norders = pd.DataFrame({\r\n'order_id': ,\r\n'customer_id': ,\r\n'amount': [100.0, 150.0, 75.0]\r\n})\r\n\r\ndf_list = [customers, orders]\r\ntable_names = ['customers', 'orders']\r\n```\r\n### Generate comprehensive analysis\r\n\r\n```python\r\nschema_result = schema(df_list, table_names)\r\nprofile_result = profile(df_list, table_names)\r\nrelationships_result = map_relationships(df_list, table_names)\r\n```\r\n### Build final meta structure\r\n\r\n```python\r\nmeta = build_meta(schema_result, profile_result, relationships_result)\r\nprint(json.dumps(meta, indent=2))\r\n```\r\n### Command Line usage\r\n\r\n#### Analyze CSV files\r\n\r\n```bash\r\nadel-lite --input data/*.csv --output schema.json\r\n```\r\n#### Generate visualization\r\n\r\n```bash\r\nadel-lite --input *.csv --visualize --output schema.json\r\n```\r\n#### Export as SQL DDL\r\n\r\n```bash\r\nadel-lite --input data/*.csv --format ddl --output schema.sql\r\n```\r\n#### Skip constraint detection for faster processing\r\n\r\n```bash\r\nadel-lite --input *.csv --no-constraints --output schema.json\r\n```\r\n## Core Functions\r\n\r\n### 1. Schema Generation\r\n\r\n```python\r\nfrom adel_lite import schema\r\n\r\nGenerate structural schema\r\nresult = schema(df_list, table_names)\r\n```\r\n\r\n**Returns:**\r\n- Table names and column information\r\n- Data types (pandas + high-level)\r\n- Nullable flags and positions\r\n\r\n### 2. Data Profiling\r\n```python\r\nfrom adel_lite import profile\r\n\r\nGenerate comprehensive profiles\r\nresult = profile(df_list, table_names)\r\n```\r\n**Returns:**\r\n- Statistical summaries (min, max, mean, etc.)\r\n- Uniqueness and null ratios\r\n- Semantic type inference (id, datetime, categorical, etc.)\r\n- Primary key candidates\r\n\r\n### 3. Relationship Mapping\r\n```python\r\nfrom adel_lite import map_relationships\r\n\r\nDetect relationships\r\nresult = map_relationships(df_list, table_names, fk_threshold=0.8)\r\n```\r\n**Returns:**\r\n- Primary key detection\r\n- Foreign key relationships with confidence scores\r\n- Composite key candidates\r\n\r\n### 4. Constraint Detection\r\n```python\r\nfrom adel_lite import detect_constraints\r\n\r\nFind intra-row constraints\r\nresult = detect_constraints(df_list, table_names, threshold=0.95)\r\n```\r\n**Returns:**\r\n- GT constraints: `A > B`\r\n- EQ constraints: `A + B = C`\r\n- Confidence scores\r\n\r\n### 5. Visualization\r\n```python\r\nfrom adel_lite import visualize\r\n\r\nGenerate schema graph\r\npath = visualize(schema_result, relationships_result, format='png')\r\n```\r\n### 6. Export\r\n\r\n```python\r\nfrom adel_lite import export_schema\r\n\r\nExport to different formats\r\njson_content = export_schema(meta, format='json')\r\nyaml_content = export_schema(meta, format='yaml')\r\nddl_content = export_schema(meta, format='ddl')\r\n\r\n```\r\n\r\n## Example Output\r\n```json\r\n{\r\n\"metadata\": {\r\n\"generated_at\": \"2025-09-10T12:42:00\",\r\n\"generator\": \"adel-lite\",\r\n\"version\": \"0.1.0\"\r\n},\r\n\"tables\": [\r\n{\r\n\"name\": \"customers\",\r\n\"primary_key\": \"customer_id\",\r\n\"fields\": [\r\n{\r\n\"name\": \"customer_id\",\r\n\"dtype\": \"integer\",\r\n\"semantic_type\": \"id\",\r\n\"subtype\": \"primary\",\r\n\"nullable\": false\r\n}\r\n]\r\n}\r\n],\r\n\"relationships\": [\r\n{\r\n\"type\": \"foreign_key\",\r\n\"foreign_table\": \"orders\",\r\n\"foreign_column\": \"customer_id\",\r\n\"referenced_table\": \"customers\",\r\n\"referenced_column\": \"customer_id\",\r\n\"confidence\": 0.92\r\n}\r\n]\r\n}\r\n```\r\n\r\n## Advanced Usage\r\n\r\n### Custom Thresholds\r\n\r\nAdjust detection thresholds\r\n\r\n```python\r\nrelationships = map_relationships(\r\ndf_list, table_names,\r\nfk_threshold=0.9, # Stricter FK detection\r\nname_similarity_threshold=0.8\r\n)\r\n\r\nconstraints = detect_constraints(\r\ndf_list, table_names,\r\nthreshold=0.98 # Very strict constraints\r\n)\r\n\r\n```\r\n\r\n### Sampling and Inspection\r\n```python\r\nfrom adel_lite import sample\r\n\r\nGet sample data for inspection\r\nsamples = sample(df_list, table_names, n=10, method='random')\r\n\r\nConditional sampling\r\nsamples = sample_by_condition(\r\ndf_list,\r\n['age > 25', 'amount > 100'],\r\ntable_names\r\n)\r\n\r\n```\r\n\r\n## Configuration\r\n\r\n### CLI Configuration\r\n\r\nFull configuration example\r\n```bash \r\nadel-lite\r\n--input data/*.csv\r\n--output analysis.json\r\n--format json\r\n--visualize\r\n--viz-format svg\r\n--sample 5\r\n--constraint-threshold 0.9\r\n--fk-threshold 0.8\r\n--verbose\r\n\r\n```\r\n\r\n### Logging\r\n\r\n```python\r\nimport logging\r\n\r\n#Enable debug logging\r\nlogging.getLogger('adel_lite').setLevel(logging.DEBUG)\r\n\r\n```\r\n\r\n## Performance Tips\r\n\r\n1. **Skip constraints** for large datasets: `--no-constraints`\r\n2. **Limit sampling** for inspection: `--sample 100`\r\n3. **Use appropriate thresholds** based on data quality\r\n4. **Process in batches** for very large datasets\r\n\r\n## Requirements\r\n\r\n- Python 3.8+\r\n- pandas >= 1.3.0\r\n- numpy >= 1.21.0\r\n- pyyaml >= 6.0\r\n- networkx >= 2.6\r\n- matplotlib >= 3.5.0\r\n- graphviz >= 0.20.0\r\n- fuzzywuzzy >= 0.18.0\r\n\r\n## Contributing\r\n\r\n1. Fork the repository\r\n2. Create a feature branch: `git checkout -b feature-name`\r\n3. Make changes and add tests\r\n4. Run tests: `pytest`\r\n5. Submit a pull request\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## Roadmap\r\n\r\n- [ ] Support for more data sources (databases, APIs)\r\n- [ ] Advanced constraint types (LIKE patterns, regex)\r\n- [ ] Machine learning-based relationship detection\r\n- [ ] Interactive web interface\r\n- [ ] Integration with data catalogs\r\n\r\n## Support\r\n\r\n- \ud83d\udcd6 [Documentation](https://github.com/Parthnuwal7/adel-lite)\r\n- \ud83d\udc1b [Issue Tracker](https://github.com/Parthnuwal7/adel-lite)\r\n- \ud83d\udcac [Discussions](https://github.com/Parthnuwal7/adel-lite.git)\r\n\r\n---\r\n\r\nMade with \u2764\ufe0f for the data community by Parth Nuwal\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Automated Data Elements Linking - Lite",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/Parthnuwal7/adel-lite.git"
    },
    "split_keywords": [
        "data",
        " schema",
        " profiling",
        " pandas",
        " automation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c2aa0a453bb80db8e9b4a7003f508fc86e36cf733cf383bf4a4cad1d76f7a411",
                "md5": "6c4337b600621a401634dd132f9621db",
                "sha256": "314c86ea6502d9a33ff80c852b7b56d50b1cf5a8445c07740c3cdfc320c109b9"
            },
            "downloads": -1,
            "filename": "adel_lite-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6c4337b600621a401634dd132f9621db",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 28309,
            "upload_time": "2025-09-10T22:09:21",
            "upload_time_iso_8601": "2025-09-10T22:09:21.294126Z",
            "url": "https://files.pythonhosted.org/packages/c2/aa/0a453bb80db8e9b4a7003f508fc86e36cf733cf383bf4a4cad1d76f7a411/adel_lite-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3d1dce6ea63c4dc6495f635aec6e2461bdfd718f902ac0907ad961fc3761d517",
                "md5": "3562e562ec94230685009ea3bbe0dc75",
                "sha256": "37f8b1372a241797697fa5fc4b7e3537aba76035064c011a5524e673a93baf28"
            },
            "downloads": -1,
            "filename": "adel_lite-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "3562e562ec94230685009ea3bbe0dc75",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 24446,
            "upload_time": "2025-09-10T22:09:23",
            "upload_time_iso_8601": "2025-09-10T22:09:23.195044Z",
            "url": "https://files.pythonhosted.org/packages/3d/1d/ce6ea63c4dc6495f635aec6e2461bdfd718f902ac0907ad961fc3761d517/adel_lite-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-10 22:09:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Parthnuwal7",
    "github_project": "adel-lite",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.21.0"
                ]
            ]
        },
        {
            "name": "pyyaml",
            "specs": [
                [
                    ">=",
                    "6.0"
                ]
            ]
        },
        {
            "name": "networkx",
            "specs": [
                [
                    ">=",
                    "2.6"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.5.0"
                ]
            ]
        },
        {
            "name": "graphviz",
            "specs": [
                [
                    ">=",
                    "0.20.0"
                ]
            ]
        },
        {
            "name": "fuzzywuzzy",
            "specs": [
                [
                    ">=",
                    "0.18.0"
                ]
            ]
        },
        {
            "name": "python-levenshtein",
            "specs": [
                [
                    ">=",
                    "0.12.0"
                ]
            ]
        }
    ],
    "lcname": "adel-lite"
}
        
Elapsed time: 1.23138s