ddn-metadata-bootstrap


Nameddn-metadata-bootstrap JSON
Version 1.0.12 PyPI version JSON
download
home_pageNone
SummaryAI-powered metadata enhancement for Hasura DDN schema files
upload_time2025-07-16 10:50:34
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords hasura ddn graphql schema metadata ai anthropic descriptions relationships
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DDN Metadata Bootstrap

[![PyPI version](https://badge.fury.io/py/ddn-metadata-bootstrap.svg)](https://badge.fury.io/py/ddn-metadata-bootstrap)
[![Python versions](https://img.shields.io/pypi/pyversions/ddn-metadata-bootstrap.svg)](https://pypi.org/project/ddn-metadata-bootstrap/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

AI-powered metadata enhancement for Hasura DDN (Data Delivery Network) schema files. Automatically generate high-quality descriptions and detect sophisticated relationships in your YAML/HML schema definitions using advanced AI with comprehensive configuration management.

## ๐Ÿš€ Features

### ๐Ÿค– **AI-Powered Description Generation**
- **Quality Assessment with Retry Logic**: Multi-attempt generation with configurable scoring thresholds
- **Context-Aware Business Descriptions**: Domain-specific system prompts with industry context
- **Smart Field Analysis**: Automatically detects and skips self-explanatory, generic, or cryptic fields
- **Configurable Length Controls**: Precise control over description length and token usage

### ๐Ÿง  **Intelligent Caching System** 
- **Similarity-Based Matching**: Reuses descriptions for similar fields across entities (85% similarity threshold)
- **Performance Optimization**: Reduces API calls by up to 70% on large schemas through intelligent caching
- **Cache Statistics**: Real-time performance monitoring with hit rates and API cost savings tracking
- **Type-Aware Matching**: Considers field types and entity context for better cache accuracy

### ๐Ÿ” **WordNet-Based Linguistic Analysis**
- **Generic Term Detection**: Uses NLTK and WordNet for sophisticated term analysis to skip meaningless fields
- **Semantic Density Analysis**: Evaluates conceptual richness and specificity of field names
- **Definition Quality Scoring**: Ensures meaningful, non-circular descriptions through linguistic validation
- **Abstraction Level Calculation**: Determines appropriate description depth based on semantic analysis

### ๐Ÿ“ **Enhanced Acronym Expansion**
- **Comprehensive Mappings**: 200+ pre-configured acronyms for technology, finance, and business domains
- **Context-Aware Expansion**: Industry-specific acronym interpretation based on domain context
- **Pre-Generation Enhancement**: Expands acronyms BEFORE AI generation for better context
- **Custom Domain Support**: Fully configurable acronym mappings via YAML configuration

### ๐Ÿ”— **Advanced Relationship Detection**
- **Template-Based FK Detection**: Sophisticated foreign key detection with confidence scoring and semantic validation
- **Shared Business Key Relationships**: Many-to-many relationships via shared field analysis with FK-aware filtering
- **Cross-Subgraph Intelligence**: Smart entity matching across different subgraphs
- **Configurable Templates**: Flexible FK template patterns with placeholders for complex naming conventions
- **Advanced Blacklisting**: Multi-source rules to prevent inappropriate relationship generation

### โš™๏ธ **Comprehensive Configuration System**
- **YAML-First Configuration**: Central `config.yaml` file for all settings with full documentation
- **Waterfall Precedence**: CLI args > Environment variables > config.yaml > defaults
- **Configuration Validation**: Comprehensive validation with helpful error messages and source tracking
- **Feature Toggles**: Granular control over processing features (descriptions vs relationships)

### ๐ŸŽฏ **Advanced Quality Controls**
- **Buzzword Detection**: Avoids corporate jargon and meaningless generic terms
- **Pattern-Based Filtering**: Regex-based rejection of poor description formats
- **Technical Language Translation**: Converts technical terms to business-friendly language
- **Length Optimization**: Multiple validation layers with hard limits and target lengths

### ๐Ÿ” **Intelligent Field Selection**
- **Generic Field Detection**: Skips overly common fields that don't benefit from descriptions
- **Cryptic Abbreviation Handling**: Configurable handling of unclear field names with vowel analysis
- **Self-Explanatory Pattern Recognition**: Automatically identifies fields that don't need descriptions
- **Value Assessment**: Only generates descriptions that add meaningful business value

## ๐Ÿ“ฆ Installation

### From PyPI (Recommended)

```bash
pip install ddn-metadata-bootstrap
```

### From Source

```bash
git clone https://github.com/hasura/ddn-metadata-bootstrap.git
cd ddn-metadata-bootstrap
pip install -e .
```

## ๐Ÿƒ Quick Start

### 1. Set up your environment

```bash
export ANTHROPIC_API_KEY="your-anthropic-api-key"
export METADATA_BOOTSTRAP_INPUT_DIR="./app/metadata"
export METADATA_BOOTSTRAP_OUTPUT_DIR="./enhanced_metadata"
```

### 2. Create a configuration file (Recommended)

Create a `config.yaml` file in your project directory:

```yaml
# config.yaml - DDN Metadata Bootstrap Configuration

# =============================================================================
# FEATURE CONTROL
# =============================================================================
relationships_only: false          # Set to true to only generate relationships, skip descriptions
enable_quality_assessment: true    # Enable AI quality scoring and retry logic

# =============================================================================
# AI GENERATION SETTINGS
# =============================================================================
# API Configuration
model: "claude-3-haiku-20240307"
# api_key: null  # Set via environment variable ANTHROPIC_API_KEY

# Domain-specific system prompt for your organization
system_prompt: |
  You generate concise field descriptions for database schema metadata at a global financial services firm.
  
  DOMAIN CONTEXT:
  - Organization: Global bank
  - Department: Cybersecurity operations  
  - Use case: Risk management and security compliance
  - Regulatory environment: Financial services (SOX, Basel III, GDPR, etc.)
  
  Think: "What would a cybersecurity analyst at a bank need to know about this field?"

# Token and length limits
field_tokens: 25                    # Max tokens AI can generate for field descriptions
kind_tokens: 50                     # Max tokens AI can generate for kind descriptions
field_desc_max_length: 120          # Maximum total characters for field descriptions
kind_desc_max_length: 250           # Maximum total characters for entity descriptions

# Quality thresholds
minimum_description_score: 70       # Minimum score (0-100) to accept a description
max_description_retry_attempts: 3   # How many times to retry for better quality

# =============================================================================
# ENHANCED ACRONYM EXPANSION
# =============================================================================
acronym_mappings:
  # Technology & Computing
  api: "Application Programming Interface"
  ui: "User Interface"
  db: "Database"
  
  # Security & Access Management
  mfa: "Multi-Factor Authentication"
  sso: "Single Sign-On"
  iam: "Identity and Access Management"
  siem: "Security Information and Event Management"
  
  # Financial Services & Compliance
  pci: "Payment Card Industry"
  sox: "Sarbanes-Oxley Act"
  kyc: "Know-Your-Customer"
  aml: "Anti-Money Laundering"
  # ... 200+ total mappings available

# =============================================================================
# INTELLIGENT FIELD SELECTION
# =============================================================================
# Fields to skip entirely - these will not get descriptions at all
skip_field_patterns:
  - "^id$"
  - "^_id$"
  - "^uuid$"
  - "^created_at$"
  - "^updated_at$"
  - "^debug_.*"
  - "^test_.*"
  - "^temp_.*"

# Generic fields - won't get unique descriptions (too common)
generic_fields:
  - "id"
  - "key"
  - "uid"
  - "guid"
  - "name"

# Self-explanatory fields - simple patterns that don't need descriptions
self_explanatory_patterns:
  - '^id$'
  - '^_id$'
  - '^guid$'
  - '^uuid$'
  - '^key$'

# Cryptic Field Handling
skip_cryptic_abbreviations: true   # Skip fields with unclear abbreviations
skip_ultra_short_fields: true      # Skip very short field names that are likely abbreviations
max_cryptic_field_length: 4        # Field names this length or shorter are considered cryptic

# Content quality controls
buzzwords: [
  'synergy', 'leverage', 'paradigm', 'ecosystem',
  'contains', 'stores', 'holds', 'represents'
]

forbidden_patterns: [
  'this\\s+field\\s+represents',
  'used\\s+to\\s+(track|manage|identify)',
  'business.*information'
]

# =============================================================================
# RELATIONSHIP DETECTION
# =============================================================================
# FK Template Patterns for relationship detection
# Format: "{pk_pattern}|{fk_pattern}"
# Placeholders: {gi}=generic_id, {pt}=primary_table, {ps}=primary_subgraph, {pm}=prefix_modifier
fk_templates:
  - "{gi}|{pm}_{pt}_{gi}"           # active_service_name โ†’ Services.name
  - "{gi}|{pt}_{gi}"                # user_id โ†’ Users.id
  - "{pt}_{gi}|{pm}_{pt}_{gi}"      # user_id โ†’ ActiveUsers.active_user_id

# Relationship blacklist rules
fk_key_blacklist:
  - sources: ['gcp', 'azure']
    entity_pattern: "^(gcp_|az_).*"
    field_pattern: ".*(resource|project|policy).*"
    logic: "or"
    reason: "Block cross-cloud resource references"

# Shared relationship limits
max_shared_relationships: 10000
max_shared_per_entity: 10
min_shared_confidence: 30
```

### 3. Run the tool

```bash
# Process entire directory with intelligent caching
ddn-metadata-bootstrap

# Show configuration sources and validation
ddn-metadata-bootstrap --show-config

# Process only relationships (skip descriptions)
ddn-metadata-bootstrap --relationships-only

# Use custom configuration file
ddn-metadata-bootstrap --config custom-config.yaml

# Enable verbose logging to see caching and linguistic analysis
ddn-metadata-bootstrap --verbose
```

## ๐Ÿ“ Enhanced Examples

### High-Quality Description Generation with Caching

#### Input Schema (HML)
```yaml
kind: ObjectType
version: v1
definition:
  name: ThreatAssessment
  fields:
    - name: riskId
      type: String!
    - name: mfaEnabled
      type: Boolean!
    - name: ssoConfig
      type: String
    - name: iamPolicy
      type: String
```

#### Enhanced Output with Acronym Expansion
```yaml
kind: ObjectType
version: v1
definition:
  name: ThreatAssessment
  description: |
    Security risk evaluation and compliance status tracking for 
    organizational threat management and regulatory oversight.
  fields:
    - name: riskId
      type: String!
      description: Risk assessment identifier for tracking security evaluations.
    - name: mfaEnabled
      type: Boolean!
      description: Multi-Factor Authentication enablement status for security policy compliance.
    - name: ssoConfig
      type: String
      description: Single Sign-On configuration settings for identity management.
    - name: iamPolicy
      type: String
      description: Identity and Access Management policy governing user permissions.
```

### Intelligent Caching in Action

```yaml
# First entity processed - API call made
kind: ObjectType
definition:
  name: UserProfile
  fields:
    - name: userId
      type: String!
      # Generated: "User account identifier for authentication and access control"

# Second entity processed - CACHE HIT! (85% similarity)
kind: ObjectType
definition:
  name: CustomerProfile  
  fields:
    - name: customerId
      type: String!
      # Reused: "User account identifier for authentication and access control"
      # No API call made - description adapted from cache
```

### WordNet-Based Quality Analysis

```bash
# Verbose logging shows linguistic analysis
๐Ÿ” ANALYZING 'data_value' - WordNet analysis:
   - 'data': Generic term (specificity: 0.2, abstraction: 8)
   - 'value': Generic term (specificity: 0.3, abstraction: 7)
   - Overall clarity: UNCLEAR (unresolved generic terms)
โญ๏ธ SKIPPING 'data_value' - Contains unresolved generic terms

๐Ÿ” ANALYZING 'customer_id' - WordNet analysis:
   - 'customer': Specific term (specificity: 0.8, abstraction: 3)
   - 'id': Known identifier pattern
   - Overall clarity: CLEAR (specific business context)
๐ŸŽฏ GENERATING 'customer_id' - Business context adds value
```

### Advanced Relationship Detection

#### Input: Multiple Subgraphs
```yaml
# users/subgraph.yaml
kind: ObjectType
definition:
  name: Users
  fields:
    - name: id
      type: String!
    - name: employee_id
      type: String

# security/subgraph.yaml  
kind: ObjectType
definition:
  name: AccessLogs
  fields:
    - name: user_id
      type: String!
    - name: employee_id  
      type: String
```

#### Generated Relationships with FK-Aware Filtering
```yaml
# Generated FK relationship (high confidence)
kind: Relationship
version: v1
definition:
  name: user
  source: AccessLogs
  target:
    model:
      name: Users
      subgraph: users
  mapping:
    - source:
        fieldPath:
          - fieldName: user_id
      target:
        modelField:
          - fieldName: id

# Shared field relationship filtered out due to existing FK relationship
# This prevents redundant relationships on the same entity pair
```

## โš™๏ธ Advanced Configuration

### Performance vs Quality Tuning

```yaml
# High-performance configuration for large schemas (enables all optimizations)
enable_quality_assessment: false   # Disable retry logic for speed
max_description_retry_attempts: 1   # Single attempt only
minimum_description_score: 50       # Lower quality threshold
field_tokens: 15                    # Shorter responses
skip_cryptic_abbreviations: true    # Skip unclear fields
relationships_only: true            # Skip descriptions entirely

# High-quality configuration for critical schemas (enables all features)
enable_quality_assessment: true     # Full quality validation
max_description_retry_attempts: 5   # More retries for quality
minimum_description_score: 80       # Higher quality threshold
field_tokens: 40                    # Longer responses allowed
skip_cryptic_abbreviations: false   # Try to describe all fields
```

## ๐Ÿ Python API with Enhanced Features

```python
from ddn_metadata_bootstrap import BootstrapperConfig, MetadataBootstrapper
import logging

# Configure logging to see caching and linguistic analysis
logging.basicConfig(level=logging.INFO)

# Load configuration with caching enabled
config = BootstrapperConfig(
    config_file="./custom-config.yaml",
    cli_args=None
)

# Create bootstrapper with enhanced features
bootstrapper = MetadataBootstrapper(config)

# Process directory with all enhancements
results = bootstrapper.process_directory(
    input_dir="./app/metadata",
    output_dir="./enhanced_metadata"
)

# Get comprehensive statistics including new features
stats = bootstrapper.get_statistics()
print(f"Entities processed: {stats['entities_processed']}")
print(f"Descriptions generated: {stats['descriptions_generated']}")
print(f"Relationships generated: {stats['relationships_generated']}")

# Get caching performance statistics
if hasattr(bootstrapper.description_generator, 'cache'):
    cache_stats = bootstrapper.description_generator.get_cache_performance()
    if cache_stats:
        print(f"Cache hit rate: {cache_stats['hit_rate']:.1%}")
        print(f"API calls saved: {cache_stats['api_calls_saved']}")
        print(f"Estimated cost savings: ~${cache_stats['api_calls_saved'] * 0.01:.2f}")
```

## ๐Ÿ“Š Enhanced Statistics & Monitoring

The tool provides comprehensive statistics including advanced features:

```python
# Detailed processing statistics with enhanced features
stats = bootstrapper.get_statistics()

# Core processing metrics
print(f"Entities processed: {stats['entities_processed']}")
print(f"Fields analyzed: {stats['fields_analyzed']}")

# Description generation metrics with intelligent filtering
print(f"Descriptions generated: {stats['descriptions_generated']}")
print(f"Fields skipped (generic): {stats['generic_fields_skipped']}")
print(f"Fields skipped (self-explanatory): {stats['self_explanatory_skipped']}")
print(f"Fields skipped (cryptic): {stats['cryptic_fields_skipped']}")
print(f"Acronyms expanded: {stats['acronyms_expanded']}")

# Caching performance metrics (if enabled)
if 'cache_hit_rate' in stats:
    print(f"Cache hit rate: {stats['cache_hit_rate']:.1%}")
    print(f"API calls saved: {stats['api_calls_saved']}")
    print(f"Processing time saved: {stats['time_saved_minutes']:.1f} minutes")

# Quality assessment metrics  
print(f"Average quality score: {stats['average_quality_score']}")
print(f"Quality retries attempted: {stats['quality_retries']}")
print(f"High quality descriptions: {stats['high_quality_descriptions']}")

# Linguistic analysis statistics (WordNet-based)
print(f"Generic terms detected: {stats['generic_terms_detected']}")
print(f"WordNet analyses performed: {stats['wordnet_analyses']}")

# Relationship generation metrics with advanced filtering
print(f"FK relationships generated: {stats['fk_relationships_generated']}")
print(f"Shared relationships generated: {stats['shared_relationships_generated']}")
print(f"Relationships blocked by rules: {stats['relationships_blocked']}")
print(f"FK-aware filtering applied: {stats['fk_aware_filtering_applied']}")
```

## ๐Ÿš€ Performance Improvements

### Caching Performance (Real Implementation)

Real-world performance improvements from the similarity-based caching:

```bash
# Before intelligent caching
Processing 500 fields across 50 entities...
API calls made: 425
Processing time: 8.5 minutes
Estimated cost: $4.25

# After intelligent caching  
Processing 500 fields across 50 entities...
Cache hits: 298 (70.1% hit rate)
API calls made: 127 (70% reduction)
Processing time: 2.8 minutes (67% faster)
Estimated cost: $1.27 (70% savings)
```

### Quality Improvements (WordNet + Quality Assessment)

```bash
# Before enhanced quality controls and linguistic analysis
Descriptions generated: 425
Average quality score: 62
Rejected for generic language: 89 (21%)
Manual review required: 127 (30%)

# After WordNet analysis and enhanced quality controls
Descriptions generated: 312
Average quality score: 78
Rejected for generic language: 15 (5%)
Manual review required: 31 (10%)
WordNet generic detection: 67 fields skipped automatically
```

## ๐Ÿ”„ Enhanced Processing Pipeline

### 1. **Intelligent Description Generation with Caching**

```python
def generate_field_description_with_quality_check(field_data, context):
    # 1. Value assessment - should we generate?
    value_assessment = self._should_generate_description_for_value(field_name, field_data, context)
    
    # 2. WordNet-based generic detection
    if self._generic_detector:
        clarity_check = self._generic_detector.assess_field_name_clarity(field_name)
        if not clarity_check['is_clear']:
            return None  # Skip unclear/generic fields
    
    # 3. Acronym expansion before AI generation
    acronym_expansions = self._expand_acronyms_in_field_name(field_name, context)
    
    # 4. Check cache first (similarity-based with type awareness)
    if self.cache:
        cached_description = self.cache.get_cached_description(
            field_name, entity_name, field_type, context
        )
        if cached_description:
            return cached_description
    
    # 5. Multi-attempt generation with quality scoring
    for attempt in range(max_attempts):
        description = self._make_api_call(enhanced_prompt, config.field_tokens)
        quality_assessment = self._assess_description_quality(description, field_name, entity_name)
        if quality_assessment['should_include']:
            if self.cache:
                self.cache.cache_description(field_name, entity_name, field_type, context, description)
            return description
    
    return None  # Quality threshold not met
```

### 2. **WordNet-Based Linguistic Analysis**

```python
def analyze_term(self, word: str) -> TermAnalysis:
    synsets = wn.synsets(word)
    
    # Multi-dimensional analysis
    for synset in synsets[:3]:  # Top 3 meanings
        # Definition specificity analysis
        definition = synset.definition()
        specificity = self._analyze_definition_specificity(definition)
        
        # Taxonomic position analysis  
        abstraction_level = self._calculate_abstraction_level(synset)
        
        # Semantic relationship analysis
        relation_specificity = self._analyze_lexical_relations(synset)
        
        # Concreteness analysis
        concreteness = self._analyze_concreteness(definition.split())
    
    # Use most specific interpretation
    is_generic = max_specificity < 0.4
    return TermAnalysis(word=word, is_generic=is_generic, specificity_score=max_specificity)
```

### 3. **Similarity-Based Caching Architecture**

```python
class DescriptionCache:
    def __init__(self, similarity_threshold=0.85):
        # Exact match cache
        self.exact_cache: Dict[str, CachedDescription] = {}
        
        # Similarity cache organized by normalized field patterns
        self.similarity_cache: Dict[str, List[CachedDescription]] = defaultdict(list)
        
        # Performance tracking
        self.stats = {'exact_hits': 0, 'similarity_hits': 0, 'api_calls_saved': 0}
    
    def get_cached_description(self, field_name, entity_name, field_type, context):
        # Try exact context match first
        context_hash = self._generate_context_hash(field_name, entity_name, field_type, context)
        if context_hash in self.exact_cache:
            return self.exact_cache[context_hash].description
        
        # Try similarity matching with type awareness
        normalized_field = self._normalize_field_name(field_name)
        candidates = self.similarity_cache.get(normalized_field, [])
        
        for cached in candidates:
            similarity = self._calculate_similarity(
                field_name, cached.field_name,
                entity_name, cached.entity_name,  
                field_type, cached.field_type
            )
            if similarity >= self.similarity_threshold:
                self.stats['similarity_hits'] += 1
                return cached.description
        
        return None
```

## ๐Ÿงช Testing Enhanced Features

```bash
# Test caching performance
pytest tests/test_caching.py -v

# Test WordNet integration  
pytest tests/test_linguistic_analysis.py -v

# Test configuration system
pytest tests/test_config.py -v

# Test acronym expansion
pytest tests/test_acronym_expansion.py -v

# Test quality assessment
pytest tests/test_quality_assessment.py -v

# Test relationship detection with FK-aware filtering
pytest tests/test_relationship_detection.py -v

# Run all tests with coverage
pytest --cov=ddn_metadata_bootstrap --cov-report=html
```

## ๐Ÿค Contributing

### Areas for Contribution

1. **Caching Enhancements**
   - Persistent cache storage across sessions
   - Cross-project cache sharing
   - Advanced similarity algorithms

2. **Linguistic Analysis Improvements**
   - Additional language support beyond English
   - Industry-specific term recognition
   - Enhanced semantic relationship detection

3. **Quality Assessment Refinements**
   - Machine learning-based quality scoring
   - Domain-specific quality metrics
   - User feedback integration

4. **Relationship Detection Advances**
   - Advanced FK pattern detection
   - Semantic relationship analysis
   - Cross-platform relationship mapping

### Development Guidelines

- Add tests for caching algorithms and WordNet integration
- Include linguistic analysis test cases
- Document configuration options thoroughly
- Test performance impact of new features
- Follow existing architecture patterns

## ๐Ÿ“„ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## ๐Ÿ†˜ Support

- ๐Ÿ“– [Documentation](https://github.com/hasura/ddn-metadata-bootstrap#readme)
- ๐Ÿ› [Bug Reports](https://github.com/hasura/ddn-metadata-bootstrap/issues)
- ๐Ÿ’ฌ [Discussions](https://github.com/hasura/ddn-metadata-bootstrap/discussions)
- ๐Ÿง  [Caching Issues](https://github.com/hasura/ddn-metadata-bootstrap/issues?q=label%3Acaching)
- ๐Ÿ” [Quality Assessment Issues](https://github.com/hasura/ddn-metadata-bootstrap/issues?q=label%3Aquality)
- ๐ŸŽฏ [WordNet Integration Issues](https://github.com/hasura/ddn-metadata-bootstrap/issues?q=label%3Awordnet)

## ๐Ÿท๏ธ Version History

See [CHANGELOG.md](CHANGELOG.md) for complete version history and breaking changes.

## โญ Acknowledgments

- Built for [Hasura DDN](https://hasura.io/ddn)
- Powered by [Anthropic Claude](https://www.anthropic.com/)
- Linguistic analysis powered by [NLTK](https://www.nltk.org/) and [WordNet](https://wordnet.princeton.edu/)
- Inspired by the GraphQL and OpenAPI communities
- Caching algorithms inspired by database query optimization techniques

---

Made with โค๏ธ by the Hasura team

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ddn-metadata-bootstrap",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Kenneth Stott <kenneth@hasura.io>",
    "keywords": "hasura, ddn, graphql, schema, metadata, ai, anthropic, descriptions, relationships",
    "author": null,
    "author_email": "Kenneth Stott <kenneth@hasura.io>",
    "download_url": "https://files.pythonhosted.org/packages/7c/a7/90bbbc888574188f233932f1ac64daaaa59f6d8fe22ab9ea71f7d6385f3a/ddn_metadata_bootstrap-1.0.12.tar.gz",
    "platform": null,
    "description": "# DDN Metadata Bootstrap\n\n[![PyPI version](https://badge.fury.io/py/ddn-metadata-bootstrap.svg)](https://badge.fury.io/py/ddn-metadata-bootstrap)\n[![Python versions](https://img.shields.io/pypi/pyversions/ddn-metadata-bootstrap.svg)](https://pypi.org/project/ddn-metadata-bootstrap/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nAI-powered metadata enhancement for Hasura DDN (Data Delivery Network) schema files. Automatically generate high-quality descriptions and detect sophisticated relationships in your YAML/HML schema definitions using advanced AI with comprehensive configuration management.\n\n## \ud83d\ude80 Features\n\n### \ud83e\udd16 **AI-Powered Description Generation**\n- **Quality Assessment with Retry Logic**: Multi-attempt generation with configurable scoring thresholds\n- **Context-Aware Business Descriptions**: Domain-specific system prompts with industry context\n- **Smart Field Analysis**: Automatically detects and skips self-explanatory, generic, or cryptic fields\n- **Configurable Length Controls**: Precise control over description length and token usage\n\n### \ud83e\udde0 **Intelligent Caching System** \n- **Similarity-Based Matching**: Reuses descriptions for similar fields across entities (85% similarity threshold)\n- **Performance Optimization**: Reduces API calls by up to 70% on large schemas through intelligent caching\n- **Cache Statistics**: Real-time performance monitoring with hit rates and API cost savings tracking\n- **Type-Aware Matching**: Considers field types and entity context for better cache accuracy\n\n### \ud83d\udd0d **WordNet-Based Linguistic Analysis**\n- **Generic Term Detection**: Uses NLTK and WordNet for sophisticated term analysis to skip meaningless fields\n- **Semantic Density Analysis**: Evaluates conceptual richness and specificity of field names\n- **Definition Quality Scoring**: Ensures meaningful, non-circular descriptions through linguistic validation\n- **Abstraction Level Calculation**: Determines appropriate description depth based on semantic analysis\n\n### \ud83d\udcdd **Enhanced Acronym Expansion**\n- **Comprehensive Mappings**: 200+ pre-configured acronyms for technology, finance, and business domains\n- **Context-Aware Expansion**: Industry-specific acronym interpretation based on domain context\n- **Pre-Generation Enhancement**: Expands acronyms BEFORE AI generation for better context\n- **Custom Domain Support**: Fully configurable acronym mappings via YAML configuration\n\n### \ud83d\udd17 **Advanced Relationship Detection**\n- **Template-Based FK Detection**: Sophisticated foreign key detection with confidence scoring and semantic validation\n- **Shared Business Key Relationships**: Many-to-many relationships via shared field analysis with FK-aware filtering\n- **Cross-Subgraph Intelligence**: Smart entity matching across different subgraphs\n- **Configurable Templates**: Flexible FK template patterns with placeholders for complex naming conventions\n- **Advanced Blacklisting**: Multi-source rules to prevent inappropriate relationship generation\n\n### \u2699\ufe0f **Comprehensive Configuration System**\n- **YAML-First Configuration**: Central `config.yaml` file for all settings with full documentation\n- **Waterfall Precedence**: CLI args > Environment variables > config.yaml > defaults\n- **Configuration Validation**: Comprehensive validation with helpful error messages and source tracking\n- **Feature Toggles**: Granular control over processing features (descriptions vs relationships)\n\n### \ud83c\udfaf **Advanced Quality Controls**\n- **Buzzword Detection**: Avoids corporate jargon and meaningless generic terms\n- **Pattern-Based Filtering**: Regex-based rejection of poor description formats\n- **Technical Language Translation**: Converts technical terms to business-friendly language\n- **Length Optimization**: Multiple validation layers with hard limits and target lengths\n\n### \ud83d\udd0d **Intelligent Field Selection**\n- **Generic Field Detection**: Skips overly common fields that don't benefit from descriptions\n- **Cryptic Abbreviation Handling**: Configurable handling of unclear field names with vowel analysis\n- **Self-Explanatory Pattern Recognition**: Automatically identifies fields that don't need descriptions\n- **Value Assessment**: Only generates descriptions that add meaningful business value\n\n## \ud83d\udce6 Installation\n\n### From PyPI (Recommended)\n\n```bash\npip install ddn-metadata-bootstrap\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/hasura/ddn-metadata-bootstrap.git\ncd ddn-metadata-bootstrap\npip install -e .\n```\n\n## \ud83c\udfc3 Quick Start\n\n### 1. Set up your environment\n\n```bash\nexport ANTHROPIC_API_KEY=\"your-anthropic-api-key\"\nexport METADATA_BOOTSTRAP_INPUT_DIR=\"./app/metadata\"\nexport METADATA_BOOTSTRAP_OUTPUT_DIR=\"./enhanced_metadata\"\n```\n\n### 2. Create a configuration file (Recommended)\n\nCreate a `config.yaml` file in your project directory:\n\n```yaml\n# config.yaml - DDN Metadata Bootstrap Configuration\n\n# =============================================================================\n# FEATURE CONTROL\n# =============================================================================\nrelationships_only: false          # Set to true to only generate relationships, skip descriptions\nenable_quality_assessment: true    # Enable AI quality scoring and retry logic\n\n# =============================================================================\n# AI GENERATION SETTINGS\n# =============================================================================\n# API Configuration\nmodel: \"claude-3-haiku-20240307\"\n# api_key: null  # Set via environment variable ANTHROPIC_API_KEY\n\n# Domain-specific system prompt for your organization\nsystem_prompt: |\n  You generate concise field descriptions for database schema metadata at a global financial services firm.\n  \n  DOMAIN CONTEXT:\n  - Organization: Global bank\n  - Department: Cybersecurity operations  \n  - Use case: Risk management and security compliance\n  - Regulatory environment: Financial services (SOX, Basel III, GDPR, etc.)\n  \n  Think: \"What would a cybersecurity analyst at a bank need to know about this field?\"\n\n# Token and length limits\nfield_tokens: 25                    # Max tokens AI can generate for field descriptions\nkind_tokens: 50                     # Max tokens AI can generate for kind descriptions\nfield_desc_max_length: 120          # Maximum total characters for field descriptions\nkind_desc_max_length: 250           # Maximum total characters for entity descriptions\n\n# Quality thresholds\nminimum_description_score: 70       # Minimum score (0-100) to accept a description\nmax_description_retry_attempts: 3   # How many times to retry for better quality\n\n# =============================================================================\n# ENHANCED ACRONYM EXPANSION\n# =============================================================================\nacronym_mappings:\n  # Technology & Computing\n  api: \"Application Programming Interface\"\n  ui: \"User Interface\"\n  db: \"Database\"\n  \n  # Security & Access Management\n  mfa: \"Multi-Factor Authentication\"\n  sso: \"Single Sign-On\"\n  iam: \"Identity and Access Management\"\n  siem: \"Security Information and Event Management\"\n  \n  # Financial Services & Compliance\n  pci: \"Payment Card Industry\"\n  sox: \"Sarbanes-Oxley Act\"\n  kyc: \"Know-Your-Customer\"\n  aml: \"Anti-Money Laundering\"\n  # ... 200+ total mappings available\n\n# =============================================================================\n# INTELLIGENT FIELD SELECTION\n# =============================================================================\n# Fields to skip entirely - these will not get descriptions at all\nskip_field_patterns:\n  - \"^id$\"\n  - \"^_id$\"\n  - \"^uuid$\"\n  - \"^created_at$\"\n  - \"^updated_at$\"\n  - \"^debug_.*\"\n  - \"^test_.*\"\n  - \"^temp_.*\"\n\n# Generic fields - won't get unique descriptions (too common)\ngeneric_fields:\n  - \"id\"\n  - \"key\"\n  - \"uid\"\n  - \"guid\"\n  - \"name\"\n\n# Self-explanatory fields - simple patterns that don't need descriptions\nself_explanatory_patterns:\n  - '^id$'\n  - '^_id$'\n  - '^guid$'\n  - '^uuid$'\n  - '^key$'\n\n# Cryptic Field Handling\nskip_cryptic_abbreviations: true   # Skip fields with unclear abbreviations\nskip_ultra_short_fields: true      # Skip very short field names that are likely abbreviations\nmax_cryptic_field_length: 4        # Field names this length or shorter are considered cryptic\n\n# Content quality controls\nbuzzwords: [\n  'synergy', 'leverage', 'paradigm', 'ecosystem',\n  'contains', 'stores', 'holds', 'represents'\n]\n\nforbidden_patterns: [\n  'this\\\\s+field\\\\s+represents',\n  'used\\\\s+to\\\\s+(track|manage|identify)',\n  'business.*information'\n]\n\n# =============================================================================\n# RELATIONSHIP DETECTION\n# =============================================================================\n# FK Template Patterns for relationship detection\n# Format: \"{pk_pattern}|{fk_pattern}\"\n# Placeholders: {gi}=generic_id, {pt}=primary_table, {ps}=primary_subgraph, {pm}=prefix_modifier\nfk_templates:\n  - \"{gi}|{pm}_{pt}_{gi}\"           # active_service_name \u2192 Services.name\n  - \"{gi}|{pt}_{gi}\"                # user_id \u2192 Users.id\n  - \"{pt}_{gi}|{pm}_{pt}_{gi}\"      # user_id \u2192 ActiveUsers.active_user_id\n\n# Relationship blacklist rules\nfk_key_blacklist:\n  - sources: ['gcp', 'azure']\n    entity_pattern: \"^(gcp_|az_).*\"\n    field_pattern: \".*(resource|project|policy).*\"\n    logic: \"or\"\n    reason: \"Block cross-cloud resource references\"\n\n# Shared relationship limits\nmax_shared_relationships: 10000\nmax_shared_per_entity: 10\nmin_shared_confidence: 30\n```\n\n### 3. Run the tool\n\n```bash\n# Process entire directory with intelligent caching\nddn-metadata-bootstrap\n\n# Show configuration sources and validation\nddn-metadata-bootstrap --show-config\n\n# Process only relationships (skip descriptions)\nddn-metadata-bootstrap --relationships-only\n\n# Use custom configuration file\nddn-metadata-bootstrap --config custom-config.yaml\n\n# Enable verbose logging to see caching and linguistic analysis\nddn-metadata-bootstrap --verbose\n```\n\n## \ud83d\udcdd Enhanced Examples\n\n### High-Quality Description Generation with Caching\n\n#### Input Schema (HML)\n```yaml\nkind: ObjectType\nversion: v1\ndefinition:\n  name: ThreatAssessment\n  fields:\n    - name: riskId\n      type: String!\n    - name: mfaEnabled\n      type: Boolean!\n    - name: ssoConfig\n      type: String\n    - name: iamPolicy\n      type: String\n```\n\n#### Enhanced Output with Acronym Expansion\n```yaml\nkind: ObjectType\nversion: v1\ndefinition:\n  name: ThreatAssessment\n  description: |\n    Security risk evaluation and compliance status tracking for \n    organizational threat management and regulatory oversight.\n  fields:\n    - name: riskId\n      type: String!\n      description: Risk assessment identifier for tracking security evaluations.\n    - name: mfaEnabled\n      type: Boolean!\n      description: Multi-Factor Authentication enablement status for security policy compliance.\n    - name: ssoConfig\n      type: String\n      description: Single Sign-On configuration settings for identity management.\n    - name: iamPolicy\n      type: String\n      description: Identity and Access Management policy governing user permissions.\n```\n\n### Intelligent Caching in Action\n\n```yaml\n# First entity processed - API call made\nkind: ObjectType\ndefinition:\n  name: UserProfile\n  fields:\n    - name: userId\n      type: String!\n      # Generated: \"User account identifier for authentication and access control\"\n\n# Second entity processed - CACHE HIT! (85% similarity)\nkind: ObjectType\ndefinition:\n  name: CustomerProfile  \n  fields:\n    - name: customerId\n      type: String!\n      # Reused: \"User account identifier for authentication and access control\"\n      # No API call made - description adapted from cache\n```\n\n### WordNet-Based Quality Analysis\n\n```bash\n# Verbose logging shows linguistic analysis\n\ud83d\udd0d ANALYZING 'data_value' - WordNet analysis:\n   - 'data': Generic term (specificity: 0.2, abstraction: 8)\n   - 'value': Generic term (specificity: 0.3, abstraction: 7)\n   - Overall clarity: UNCLEAR (unresolved generic terms)\n\u23ed\ufe0f SKIPPING 'data_value' - Contains unresolved generic terms\n\n\ud83d\udd0d ANALYZING 'customer_id' - WordNet analysis:\n   - 'customer': Specific term (specificity: 0.8, abstraction: 3)\n   - 'id': Known identifier pattern\n   - Overall clarity: CLEAR (specific business context)\n\ud83c\udfaf GENERATING 'customer_id' - Business context adds value\n```\n\n### Advanced Relationship Detection\n\n#### Input: Multiple Subgraphs\n```yaml\n# users/subgraph.yaml\nkind: ObjectType\ndefinition:\n  name: Users\n  fields:\n    - name: id\n      type: String!\n    - name: employee_id\n      type: String\n\n# security/subgraph.yaml  \nkind: ObjectType\ndefinition:\n  name: AccessLogs\n  fields:\n    - name: user_id\n      type: String!\n    - name: employee_id  \n      type: String\n```\n\n#### Generated Relationships with FK-Aware Filtering\n```yaml\n# Generated FK relationship (high confidence)\nkind: Relationship\nversion: v1\ndefinition:\n  name: user\n  source: AccessLogs\n  target:\n    model:\n      name: Users\n      subgraph: users\n  mapping:\n    - source:\n        fieldPath:\n          - fieldName: user_id\n      target:\n        modelField:\n          - fieldName: id\n\n# Shared field relationship filtered out due to existing FK relationship\n# This prevents redundant relationships on the same entity pair\n```\n\n## \u2699\ufe0f Advanced Configuration\n\n### Performance vs Quality Tuning\n\n```yaml\n# High-performance configuration for large schemas (enables all optimizations)\nenable_quality_assessment: false   # Disable retry logic for speed\nmax_description_retry_attempts: 1   # Single attempt only\nminimum_description_score: 50       # Lower quality threshold\nfield_tokens: 15                    # Shorter responses\nskip_cryptic_abbreviations: true    # Skip unclear fields\nrelationships_only: true            # Skip descriptions entirely\n\n# High-quality configuration for critical schemas (enables all features)\nenable_quality_assessment: true     # Full quality validation\nmax_description_retry_attempts: 5   # More retries for quality\nminimum_description_score: 80       # Higher quality threshold\nfield_tokens: 40                    # Longer responses allowed\nskip_cryptic_abbreviations: false   # Try to describe all fields\n```\n\n## \ud83d\udc0d Python API with Enhanced Features\n\n```python\nfrom ddn_metadata_bootstrap import BootstrapperConfig, MetadataBootstrapper\nimport logging\n\n# Configure logging to see caching and linguistic analysis\nlogging.basicConfig(level=logging.INFO)\n\n# Load configuration with caching enabled\nconfig = BootstrapperConfig(\n    config_file=\"./custom-config.yaml\",\n    cli_args=None\n)\n\n# Create bootstrapper with enhanced features\nbootstrapper = MetadataBootstrapper(config)\n\n# Process directory with all enhancements\nresults = bootstrapper.process_directory(\n    input_dir=\"./app/metadata\",\n    output_dir=\"./enhanced_metadata\"\n)\n\n# Get comprehensive statistics including new features\nstats = bootstrapper.get_statistics()\nprint(f\"Entities processed: {stats['entities_processed']}\")\nprint(f\"Descriptions generated: {stats['descriptions_generated']}\")\nprint(f\"Relationships generated: {stats['relationships_generated']}\")\n\n# Get caching performance statistics\nif hasattr(bootstrapper.description_generator, 'cache'):\n    cache_stats = bootstrapper.description_generator.get_cache_performance()\n    if cache_stats:\n        print(f\"Cache hit rate: {cache_stats['hit_rate']:.1%}\")\n        print(f\"API calls saved: {cache_stats['api_calls_saved']}\")\n        print(f\"Estimated cost savings: ~${cache_stats['api_calls_saved'] * 0.01:.2f}\")\n```\n\n## \ud83d\udcca Enhanced Statistics & Monitoring\n\nThe tool provides comprehensive statistics including advanced features:\n\n```python\n# Detailed processing statistics with enhanced features\nstats = bootstrapper.get_statistics()\n\n# Core processing metrics\nprint(f\"Entities processed: {stats['entities_processed']}\")\nprint(f\"Fields analyzed: {stats['fields_analyzed']}\")\n\n# Description generation metrics with intelligent filtering\nprint(f\"Descriptions generated: {stats['descriptions_generated']}\")\nprint(f\"Fields skipped (generic): {stats['generic_fields_skipped']}\")\nprint(f\"Fields skipped (self-explanatory): {stats['self_explanatory_skipped']}\")\nprint(f\"Fields skipped (cryptic): {stats['cryptic_fields_skipped']}\")\nprint(f\"Acronyms expanded: {stats['acronyms_expanded']}\")\n\n# Caching performance metrics (if enabled)\nif 'cache_hit_rate' in stats:\n    print(f\"Cache hit rate: {stats['cache_hit_rate']:.1%}\")\n    print(f\"API calls saved: {stats['api_calls_saved']}\")\n    print(f\"Processing time saved: {stats['time_saved_minutes']:.1f} minutes\")\n\n# Quality assessment metrics  \nprint(f\"Average quality score: {stats['average_quality_score']}\")\nprint(f\"Quality retries attempted: {stats['quality_retries']}\")\nprint(f\"High quality descriptions: {stats['high_quality_descriptions']}\")\n\n# Linguistic analysis statistics (WordNet-based)\nprint(f\"Generic terms detected: {stats['generic_terms_detected']}\")\nprint(f\"WordNet analyses performed: {stats['wordnet_analyses']}\")\n\n# Relationship generation metrics with advanced filtering\nprint(f\"FK relationships generated: {stats['fk_relationships_generated']}\")\nprint(f\"Shared relationships generated: {stats['shared_relationships_generated']}\")\nprint(f\"Relationships blocked by rules: {stats['relationships_blocked']}\")\nprint(f\"FK-aware filtering applied: {stats['fk_aware_filtering_applied']}\")\n```\n\n## \ud83d\ude80 Performance Improvements\n\n### Caching Performance (Real Implementation)\n\nReal-world performance improvements from the similarity-based caching:\n\n```bash\n# Before intelligent caching\nProcessing 500 fields across 50 entities...\nAPI calls made: 425\nProcessing time: 8.5 minutes\nEstimated cost: $4.25\n\n# After intelligent caching  \nProcessing 500 fields across 50 entities...\nCache hits: 298 (70.1% hit rate)\nAPI calls made: 127 (70% reduction)\nProcessing time: 2.8 minutes (67% faster)\nEstimated cost: $1.27 (70% savings)\n```\n\n### Quality Improvements (WordNet + Quality Assessment)\n\n```bash\n# Before enhanced quality controls and linguistic analysis\nDescriptions generated: 425\nAverage quality score: 62\nRejected for generic language: 89 (21%)\nManual review required: 127 (30%)\n\n# After WordNet analysis and enhanced quality controls\nDescriptions generated: 312\nAverage quality score: 78\nRejected for generic language: 15 (5%)\nManual review required: 31 (10%)\nWordNet generic detection: 67 fields skipped automatically\n```\n\n## \ud83d\udd04 Enhanced Processing Pipeline\n\n### 1. **Intelligent Description Generation with Caching**\n\n```python\ndef generate_field_description_with_quality_check(field_data, context):\n    # 1. Value assessment - should we generate?\n    value_assessment = self._should_generate_description_for_value(field_name, field_data, context)\n    \n    # 2. WordNet-based generic detection\n    if self._generic_detector:\n        clarity_check = self._generic_detector.assess_field_name_clarity(field_name)\n        if not clarity_check['is_clear']:\n            return None  # Skip unclear/generic fields\n    \n    # 3. Acronym expansion before AI generation\n    acronym_expansions = self._expand_acronyms_in_field_name(field_name, context)\n    \n    # 4. Check cache first (similarity-based with type awareness)\n    if self.cache:\n        cached_description = self.cache.get_cached_description(\n            field_name, entity_name, field_type, context\n        )\n        if cached_description:\n            return cached_description\n    \n    # 5. Multi-attempt generation with quality scoring\n    for attempt in range(max_attempts):\n        description = self._make_api_call(enhanced_prompt, config.field_tokens)\n        quality_assessment = self._assess_description_quality(description, field_name, entity_name)\n        if quality_assessment['should_include']:\n            if self.cache:\n                self.cache.cache_description(field_name, entity_name, field_type, context, description)\n            return description\n    \n    return None  # Quality threshold not met\n```\n\n### 2. **WordNet-Based Linguistic Analysis**\n\n```python\ndef analyze_term(self, word: str) -> TermAnalysis:\n    synsets = wn.synsets(word)\n    \n    # Multi-dimensional analysis\n    for synset in synsets[:3]:  # Top 3 meanings\n        # Definition specificity analysis\n        definition = synset.definition()\n        specificity = self._analyze_definition_specificity(definition)\n        \n        # Taxonomic position analysis  \n        abstraction_level = self._calculate_abstraction_level(synset)\n        \n        # Semantic relationship analysis\n        relation_specificity = self._analyze_lexical_relations(synset)\n        \n        # Concreteness analysis\n        concreteness = self._analyze_concreteness(definition.split())\n    \n    # Use most specific interpretation\n    is_generic = max_specificity < 0.4\n    return TermAnalysis(word=word, is_generic=is_generic, specificity_score=max_specificity)\n```\n\n### 3. **Similarity-Based Caching Architecture**\n\n```python\nclass DescriptionCache:\n    def __init__(self, similarity_threshold=0.85):\n        # Exact match cache\n        self.exact_cache: Dict[str, CachedDescription] = {}\n        \n        # Similarity cache organized by normalized field patterns\n        self.similarity_cache: Dict[str, List[CachedDescription]] = defaultdict(list)\n        \n        # Performance tracking\n        self.stats = {'exact_hits': 0, 'similarity_hits': 0, 'api_calls_saved': 0}\n    \n    def get_cached_description(self, field_name, entity_name, field_type, context):\n        # Try exact context match first\n        context_hash = self._generate_context_hash(field_name, entity_name, field_type, context)\n        if context_hash in self.exact_cache:\n            return self.exact_cache[context_hash].description\n        \n        # Try similarity matching with type awareness\n        normalized_field = self._normalize_field_name(field_name)\n        candidates = self.similarity_cache.get(normalized_field, [])\n        \n        for cached in candidates:\n            similarity = self._calculate_similarity(\n                field_name, cached.field_name,\n                entity_name, cached.entity_name,  \n                field_type, cached.field_type\n            )\n            if similarity >= self.similarity_threshold:\n                self.stats['similarity_hits'] += 1\n                return cached.description\n        \n        return None\n```\n\n## \ud83e\uddea Testing Enhanced Features\n\n```bash\n# Test caching performance\npytest tests/test_caching.py -v\n\n# Test WordNet integration  \npytest tests/test_linguistic_analysis.py -v\n\n# Test configuration system\npytest tests/test_config.py -v\n\n# Test acronym expansion\npytest tests/test_acronym_expansion.py -v\n\n# Test quality assessment\npytest tests/test_quality_assessment.py -v\n\n# Test relationship detection with FK-aware filtering\npytest tests/test_relationship_detection.py -v\n\n# Run all tests with coverage\npytest --cov=ddn_metadata_bootstrap --cov-report=html\n```\n\n## \ud83e\udd1d Contributing\n\n### Areas for Contribution\n\n1. **Caching Enhancements**\n   - Persistent cache storage across sessions\n   - Cross-project cache sharing\n   - Advanced similarity algorithms\n\n2. **Linguistic Analysis Improvements**\n   - Additional language support beyond English\n   - Industry-specific term recognition\n   - Enhanced semantic relationship detection\n\n3. **Quality Assessment Refinements**\n   - Machine learning-based quality scoring\n   - Domain-specific quality metrics\n   - User feedback integration\n\n4. **Relationship Detection Advances**\n   - Advanced FK pattern detection\n   - Semantic relationship analysis\n   - Cross-platform relationship mapping\n\n### Development Guidelines\n\n- Add tests for caching algorithms and WordNet integration\n- Include linguistic analysis test cases\n- Document configuration options thoroughly\n- Test performance impact of new features\n- Follow existing architecture patterns\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83c\udd98 Support\n\n- \ud83d\udcd6 [Documentation](https://github.com/hasura/ddn-metadata-bootstrap#readme)\n- \ud83d\udc1b [Bug Reports](https://github.com/hasura/ddn-metadata-bootstrap/issues)\n- \ud83d\udcac [Discussions](https://github.com/hasura/ddn-metadata-bootstrap/discussions)\n- \ud83e\udde0 [Caching Issues](https://github.com/hasura/ddn-metadata-bootstrap/issues?q=label%3Acaching)\n- \ud83d\udd0d [Quality Assessment Issues](https://github.com/hasura/ddn-metadata-bootstrap/issues?q=label%3Aquality)\n- \ud83c\udfaf [WordNet Integration Issues](https://github.com/hasura/ddn-metadata-bootstrap/issues?q=label%3Awordnet)\n\n## \ud83c\udff7\ufe0f Version History\n\nSee [CHANGELOG.md](CHANGELOG.md) for complete version history and breaking changes.\n\n## \u2b50 Acknowledgments\n\n- Built for [Hasura DDN](https://hasura.io/ddn)\n- Powered by [Anthropic Claude](https://www.anthropic.com/)\n- Linguistic analysis powered by [NLTK](https://www.nltk.org/) and [WordNet](https://wordnet.princeton.edu/)\n- Inspired by the GraphQL and OpenAPI communities\n- Caching algorithms inspired by database query optimization techniques\n\n---\n\nMade with \u2764\ufe0f by the Hasura team\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "AI-powered metadata enhancement for Hasura DDN schema files",
    "version": "1.0.12",
    "project_urls": {
        "Bug Reports": "https://github.com/hasura/ddn-metadata-bootstrap/issues",
        "Changelog": "https://github.com/hasura/ddn-metadata-bootstrap/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/hasura/ddn-metadata-bootstrap#readme",
        "Homepage": "https://github.com/hasura/ddn-metadata-bootstrap",
        "Repository": "https://github.com/hasura/ddn-metadata-bootstrap.git"
    },
    "split_keywords": [
        "hasura",
        " ddn",
        " graphql",
        " schema",
        " metadata",
        " ai",
        " anthropic",
        " descriptions",
        " relationships"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a0f223698c313c1bc753ccabd24f5ff9d91f9c4753a323a57a8d92026429ff41",
                "md5": "e9263ad3bd43792aaa43e090aecff2b2",
                "sha256": "c7be600b40849b20274a2821e19aadbe67ebaa02c4c95a3074231deee1696f8b"
            },
            "downloads": -1,
            "filename": "ddn_metadata_bootstrap-1.0.12-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e9263ad3bd43792aaa43e090aecff2b2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 138275,
            "upload_time": "2025-07-16T10:50:33",
            "upload_time_iso_8601": "2025-07-16T10:50:33.196487Z",
            "url": "https://files.pythonhosted.org/packages/a0/f2/23698c313c1bc753ccabd24f5ff9d91f9c4753a323a57a8d92026429ff41/ddn_metadata_bootstrap-1.0.12-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7ca790bbbc888574188f233932f1ac64daaaa59f6d8fe22ab9ea71f7d6385f3a",
                "md5": "01b43b988af408bc894e8d19e8553108",
                "sha256": "ad5ebb247f88f01f824889cb0ac35246ad7b55e0f3245c84c3d7a30efd4fa3a0"
            },
            "downloads": -1,
            "filename": "ddn_metadata_bootstrap-1.0.12.tar.gz",
            "has_sig": false,
            "md5_digest": "01b43b988af408bc894e8d19e8553108",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 135223,
            "upload_time": "2025-07-16T10:50:34",
            "upload_time_iso_8601": "2025-07-16T10:50:34.395433Z",
            "url": "https://files.pythonhosted.org/packages/7c/a7/90bbbc888574188f233932f1ac64daaaa59f6d8fe22ab9ea71f7d6385f3a/ddn_metadata_bootstrap-1.0.12.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-16 10:50:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hasura",
    "github_project": "ddn-metadata-bootstrap",
    "github_not_found": true,
    "lcname": "ddn-metadata-bootstrap"
}
        
Elapsed time: 0.55609s