semantic-rdf-mapper

Name	semantic-rdf-mapper JSON
Version	0.1.0 JSON
	download
home_page	None
Summary	Convert tabular data (CSV, Excel, JSON, XML) to RDF triples aligned with OWL ontologies using SKOS-based semantic mapping
upload_time	2025-11-03 06:19:07
maintainer	None
docs_url	None
author	Enterprise Data Engineering
requires_python	>=3.11
license	None
keywords	rdf ontology semantic-web knowledge-graph owl shacl linked-data data-conversion skos owl2 ttl json-ld csv-to-rdf excel-to-rdf json-to-rdf xml-to-rdf semantic-mapping ontology-alignment data-integration
VCS
bugtrack_url
requirements	rdflib pandas openpyxl pydantic pydantic-settings pyshacl typer PyYAML python-dateutil rich click pytest pytest-cov mypy black ruff types-PyYAML types-python-dateutil pandas-stubs
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # RDFMap - Semantic Model Data Mapper

Convert tabular and structured data (CSV, Excel, JSON, XML) into RDF triples aligned with OWL ontologies using intelligent SKOS-based semantic mapping.

## ✨ Features

### 📊 **Multi-Format Data Sources**
- **CSV/TSV**: Standard delimited files with configurable separators
- **Excel (XLSX)**: Multi-sheet workbooks with automatic type detection
- **JSON**: Complex nested structures with array expansion
- **XML**: Structured documents with namespace support

### 🧠 **Intelligent Semantic Mapping**
- **SKOS-Based Matching**: Automatic column-to-property alignment using SKOS labels
- **Ontology Imports**: Modular ontology architecture with `--import` flag
- **Semantic Alignment Reports**: Confidence scoring and mapping quality metrics
- **OWL2 Best Practices**: NamedIndividual declarations and standards compliance

### 🛠 **Advanced Processing**
- **IRI Templating**: Deterministic, idempotent IRI construction
- **Data Transformation**: Type casting, normalization, value transforms
- **Array Expansion**: Complex nested JSON array processing
- **Object Linking**: Cross-sheet joins and multi-valued cell unpacking

### 📋 **Enterprise Features**
- **Multiple Output Formats**: Turtle, RDF/XML, JSON-LD, N-Triples
- **SHACL Validation**: Validate generated RDF against ontology shapes
- **Batch Processing**: Handle 100k+ row datasets efficiently
- **Error Reporting**: Comprehensive validation and processing reports

## 🚀 Installation

### Requirements
- Python 3.11+ (recommended: Python 3.13)

### Install from PyPI

```bash
pip install rdfmap
```

### Development Installation

```bash
# Clone the repository
git clone https://github.com/rdfmap/rdfmap.git
cd rdfmap

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"
```

## Quick Start

### 1. Run the Mortgage Example

```bash
# Convert mortgage loans data to RDF with validation
rdfmap convert \
  --ontology examples/mortgage/ontology/mortgage.ttl \
  --mapping examples/mortgage/config/mortgage_mapping.yaml \
  --format ttl \
  --output output/mortgage.ttl \
  --validate \
  --report output/validation_report.json

# Dry run with first 10 rows
rdfmap convert \
  --mapping examples/mortgage/config/mortgage_mapping.yaml \
  --limit 10 \
  --validate \
  --dry-run

# 🆕 Or auto-generate mapping from ontology + spreadsheet
rdfmap generate \
  --ontology examples/mortgage/ontology/mortgage.ttl \
  --spreadsheet examples/mortgage/data/loans.csv \
  --output auto_mapping.yaml \
  --export-schema
```

### 2. Understanding the Mortgage Example

The example converts loan data with this structure:

**Input CSV** (`examples/mortgage/data/loans.csv`):
```csv
LoanID,BorrowerID,BorrowerName,PropertyID,PropertyAddress,Principal,InterestRate,OriginationDate
L-1001,B-9001,Alex Morgan,P-7001,12 Oak St,250000,0.0525,2023-06-15
```

**Mapping Config** (`examples/mortgage/config/mortgage_mapping.yaml`):
- Maps `LoanID` → `ex:loanNumber`
- Creates linked resources for Borrower and Property
- Applies proper XSD datatypes
- Constructs IRIs using templates

**Output RDF** (Turtle):
```turtle
<https://data.example.com/loan/L-1001> a ex:MortgageLoan ;
  ex:loanNumber "L-1001"^^xsd:string ;
  ex:principalAmount "250000"^^xsd:decimal ;
  ex:hasBorrower <https://data.example.com/borrower/B-9001> ;
  ex:collateralProperty <https://data.example.com/property/P-7001> .
```

## Configuration Reference

### Mapping File Structure

```yaml
# Namespace declarations
namespaces:
  ex: https://example.com/mortgage#
  xsd: http://www.w3.org/2001/XMLSchema#

# Default settings
defaults:
  base_iri: https://data.example.com/
  language: en  # Optional default language tag

# Sheet/file mappings
sheets:
  - name: loans
    source: loans.csv  # Relative to mapping file or absolute
    
    # Main resource for each row
    row_resource:
      class: ex:MortgageLoan
      iri_template: "{base_iri}loan/{LoanID}"
    
    # Column mappings
    columns:
      LoanID:
        as: ex:loanNumber
        datatype: xsd:string
        required: true
      
      Principal:
        as: ex:principalAmount
        datatype: xsd:decimal
        transform: to_decimal  # Built-in transform
        default: 0  # Optional default value
      
      Notes:
        as: rdfs:comment
        datatype: xsd:string
        language: en  # Language tag for literal
    
    # Linked objects (object properties)
    objects:
      borrower:
        predicate: ex:hasBorrower
        class: ex:Borrower
        iri_template: "{base_iri}borrower/{BorrowerID}"
        properties:
          - column: BorrowerName
            as: ex:borrowerName
            datatype: xsd:string

# Validation configuration
validation:
  shacl:
    enabled: true
    shapes_file: shapes/mortgage_shapes.ttl

# Processing options
options:
  delimiter: ","
  header: true
  on_error: "report"  # "report" or "fail-fast"
  skip_empty_values: true
```

### Built-in Transforms

- `to_decimal`: Convert to decimal number
- `to_integer`: Convert to integer
- `to_date`: Parse date (ISO format)
- `to_datetime`: Parse datetime with timezone support
- `to_boolean`: Convert to boolean
- `uppercase`: Convert string to uppercase
- `lowercase`: Convert string to lowercase
- `strip`: Trim whitespace

### IRI Templates

Use Python-style string formatting with column names:
- `{base_iri}loan/{LoanID}` → `https://data.example.com/loan/L-1001`
- `{base_iri}{EntityType}/{ID}` → Combine multiple columns

## CLI Reference

### Commands

#### `convert`

Convert spreadsheet data to RDF.

```bash
rdfmap convert [OPTIONS]
```

**Options:**

- `--ontology PATH`: Path to ontology file (supports TTL, RDF/XML, JSON-LD, N-Triples, etc.)
- `--mapping PATH`: Path to mapping configuration (YAML/JSON) [required]
- `--format, -f TEXT`: Output format: ttl, xml, jsonld, nt (default: ttl)
- `--output, -o FILE`: Output file path
- `--validate`: Run SHACL validation after conversion
- `--report PATH`: Write validation report to file (JSON)
- `--limit N`: Process only first N rows (for testing)
- `--dry-run`: Parse and validate without writing output
- `--verbose, -v`: Enable detailed logging
- `--log PATH`: Write log to file

**Examples:**

```bash
# Basic conversion to Turtle
rdfmap convert --mapping config.yaml --format ttl --output output.ttl

# With ontology validation and SHACL validation
rdfmap convert \
  --mapping config.yaml \
  --ontology ontology.ttl \
  --format jsonld \
  --output output.jsonld \
  --validate \
  --report validation.json

# Test with limited rows
rdfmap convert --mapping config.yaml --limit 100 --dry-run --verbose
```

#### `generate`

**NEW**: Automatically generate mapping configuration from ontology and spreadsheet.

```bash
rdfmap generate [OPTIONS]
```

**Options:**

- `--ontology, -ont PATH`: Path to ontology file (TTL, RDF/XML, etc.) [required]
- `--spreadsheet, -s PATH`: Path to spreadsheet file (CSV/XLSX) [required]
- `--output, -o PATH`: Output path for generated mapping config [required]
- `--base-iri, -b TEXT`: Base IRI for resources (default: http://example.org/)
- `--class, -c TEXT`: Target ontology class (auto-detects if omitted)
- `--format, -f TEXT`: Output format: yaml or json (default: yaml)
- `--analyze-only`: Show analysis without generating mapping
- `--export-schema`: Export JSON Schema for validation
- `--verbose, -v`: Enable detailed logging

**Examples:**

```bash
# Auto-generate mapping configuration
rdfmap generate \
  --ontology ontology.ttl \
  --spreadsheet data.csv \
  --output mapping.yaml

# Specify target class and export JSON Schema
rdfmap generate \
  -ont ontology.ttl \
  -s data.csv \
  -o mapping.yaml \
  --class MortgageLoan \
  --export-schema

# Analyze only (no generation)
rdfmap generate \
  --ontology ontology.ttl \
  --spreadsheet data.csv \
  --output mapping.yaml \
  --analyze-only
```

**What it does:**
- Analyzes ontology classes and properties
- Examines spreadsheet columns and data types
- Intelligently matches columns to properties
- Suggests appropriate XSD datatypes
- Generates IRI templates from identifier columns
- Detects relationships for linked objects
- Exports JSON Schema for validation

See [docs/MAPPING_GENERATOR.md](docs/MAPPING_GENERATOR.md) for details.

#### `validate`

Validate existing RDF file against shapes.

```bash
rdfmap validate --rdf PATH --shapes PATH [--report PATH]
```

#### `info`

Display information about mapping configuration.

```bash
rdfmap info --mapping PATH
```

## Architecture

```
rdfmap/
├── parsers/          # CSV/XLSX data source parsers
├── models/           # Pydantic schemas for mapping config
├── transforms/       # Data transformation functions
├── iri/              # IRI templating and generation
├── emitter/          # RDF graph construction with rdflib
├── validator/        # SHACL validation integration
└── cli/              # Command-line interface
```

### Key Design Principles

1. **Configuration-Driven**: All mappings declarative in YAML/JSON
2. **Modular**: Clear separation between parsing, transformation, and emission
3. **Deterministic**: Same input always produces same IRIs (idempotency)
4. **Extensible**: Easy to add new transforms, datatypes, or ontology patterns
5. **Robust**: Comprehensive error handling with row-level tracking

## Extending the Application

### Adding Custom Transforms

Edit `rdfmap/transforms/functions.py`:

```python
@register_transform("custom_transform")
def custom_transform(value: Any, **kwargs) -> Any:
    """Your custom transformation logic."""
    return transformed_value
```

### Supporting New Ontology Patterns

1. Update mapping schema in `rdfmap/models/mapping.py` if needed
2. Implement pattern handler in `rdfmap/emitter/graph_builder.py`
3. Add test cases in `tests/test_patterns.py`

### Adding New Output Formats

Extend `rdfmap/emitter/serializer.py`:

```python
def serialize(graph: Graph, format: str, output_path: Path):
    if format == "your_format":
        # Custom serialization logic
        pass
```

## Testing

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=rdfmap --cov-report=html

# Run specific test file
pytest tests/test_transforms.py

# Run mortgage example test
pytest tests/test_mortgage_example.py -v
```

## Error Handling

The application provides detailed error reporting:

### Row-Level Errors

```json
{
  "row": 42,
  "error": "Invalid datatype for column 'Principal': cannot convert 'N/A' to xsd:decimal",
  "severity": "error"
}
```

### Validation Reports

```json
{
  "conforms": false,
  "results": [
    {
      "focusNode": "https://data.example.com/loan/L-1001",
      "resultPath": "ex:principalAmount",
      "resultMessage": "Value must be greater than 0"
    }
  ]
}
```

## Performance Tips

1. **Large Files**: The application automatically streams data for files >10MB
2. **Chunking**: Process in batches using `--limit` and multiple runs
3. **Validation**: Skip validation during development (`--validate` only for final runs)
4. **Dry Runs**: Test mappings with `--limit 100 --dry-run` before full processing

## Troubleshooting

### "Column not found" errors
- Check CSV column names match mapping config exactly (case-sensitive)
- Verify CSV delimiter matches config (`delimiter: ","`)

### Invalid IRIs
- Ensure IRI template variables match column names exactly
- Check that base_iri ends with `/` or `#`

### Datatype conversion errors
- Review data for unexpected values (nulls, text in numeric fields)
- Use `transform` to normalize before typing
- Set `skip_empty_values: true` to ignore nulls

### SHACL validation failures
- Review validation report for specific violations
- Ensure ontology and shapes are compatible
- Check that required properties are mapped

## Contributing

Contributions welcome! Please:

1. Follow PEP 8 style guidelines
2. Add unit tests for new features
3. Update documentation
4. Run `pytest` and `mypy` before submitting

## License

MIT License - See LICENSE file for details

## Support

For issues, questions, or feature requests, please open an issue on the project repository.

## Acknowledgments

Built with:
- [rdflib](https://rdflib.readthedocs.io/) - RDF processing
- [pandas](https://pandas.pydata.org/) - Data manipulation
- [pydantic](https://docs.pydantic.dev/) - Data validation
- [pyshacl](https://github.com/RDFLib/pySHACL) - SHACL validation
- [typer](https://typer.tiangolo.com/) - CLI framework

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "semantic-rdf-mapper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": "RDFMap Team <rxcthefirst@gmail.com>",
    "keywords": "rdf, ontology, semantic-web, knowledge-graph, owl, shacl, linked-data, data-conversion, skos, owl2, ttl, json-ld, csv-to-rdf, excel-to-rdf, json-to-rdf, xml-to-rdf, semantic-mapping, ontology-alignment, data-integration",
    "author": "Enterprise Data Engineering",
    "author_email": "RDFMap Contributors <rxcthefirst@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/95/8e/9247a5dd568474f4b5b40b38d922a54cf2f8280098cd136ca447f9449454/semantic_rdf_mapper-0.1.0.tar.gz",
    "platform": null,
    "description": "# RDFMap - Semantic Model Data Mapper\n\nConvert tabular and structured data (CSV, Excel, JSON, XML) into RDF triples aligned with OWL ontologies using intelligent SKOS-based semantic mapping.\n\n## \u2728 Features\n\n### \ud83d\udcca **Multi-Format Data Sources**\n- **CSV/TSV**: Standard delimited files with configurable separators\n- **Excel (XLSX)**: Multi-sheet workbooks with automatic type detection\n- **JSON**: Complex nested structures with array expansion\n- **XML**: Structured documents with namespace support\n\n### \ud83e\udde0 **Intelligent Semantic Mapping**\n- **SKOS-Based Matching**: Automatic column-to-property alignment using SKOS labels\n- **Ontology Imports**: Modular ontology architecture with `--import` flag\n- **Semantic Alignment Reports**: Confidence scoring and mapping quality metrics\n- **OWL2 Best Practices**: NamedIndividual declarations and standards compliance\n\n### \ud83d\udee0 **Advanced Processing**\n- **IRI Templating**: Deterministic, idempotent IRI construction\n- **Data Transformation**: Type casting, normalization, value transforms\n- **Array Expansion**: Complex nested JSON array processing\n- **Object Linking**: Cross-sheet joins and multi-valued cell unpacking\n\n### \ud83d\udccb **Enterprise Features**\n- **Multiple Output Formats**: Turtle, RDF/XML, JSON-LD, N-Triples\n- **SHACL Validation**: Validate generated RDF against ontology shapes\n- **Batch Processing**: Handle 100k+ row datasets efficiently\n- **Error Reporting**: Comprehensive validation and processing reports\n\n## \ud83d\ude80 Installation\n\n### Requirements\n- Python 3.11+ (recommended: Python 3.13)\n\n### Install from PyPI\n\n```bash\npip install rdfmap\n```\n\n### Development Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/rdfmap/rdfmap.git\ncd rdfmap\n\n# Create virtual environment\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n\n# Install in development mode\npip install -e \".[dev]\"\n```\n\n## Quick Start\n\n### 1. Run the Mortgage Example\n\n```bash\n# Convert mortgage loans data to RDF with validation\nrdfmap convert \\\n  --ontology examples/mortgage/ontology/mortgage.ttl \\\n  --mapping examples/mortgage/config/mortgage_mapping.yaml \\\n  --format ttl \\\n  --output output/mortgage.ttl \\\n  --validate \\\n  --report output/validation_report.json\n\n# Dry run with first 10 rows\nrdfmap convert \\\n  --mapping examples/mortgage/config/mortgage_mapping.yaml \\\n  --limit 10 \\\n  --validate \\\n  --dry-run\n\n# \ud83c\udd95 Or auto-generate mapping from ontology + spreadsheet\nrdfmap generate \\\n  --ontology examples/mortgage/ontology/mortgage.ttl \\\n  --spreadsheet examples/mortgage/data/loans.csv \\\n  --output auto_mapping.yaml \\\n  --export-schema\n```\n\n### 2. Understanding the Mortgage Example\n\nThe example converts loan data with this structure:\n\n**Input CSV** (`examples/mortgage/data/loans.csv`):\n```csv\nLoanID,BorrowerID,BorrowerName,PropertyID,PropertyAddress,Principal,InterestRate,OriginationDate\nL-1001,B-9001,Alex Morgan,P-7001,12 Oak St,250000,0.0525,2023-06-15\n```\n\n**Mapping Config** (`examples/mortgage/config/mortgage_mapping.yaml`):\n- Maps `LoanID` \u2192 `ex:loanNumber`\n- Creates linked resources for Borrower and Property\n- Applies proper XSD datatypes\n- Constructs IRIs using templates\n\n**Output RDF** (Turtle):\n```turtle\n<https://data.example.com/loan/L-1001> a ex:MortgageLoan ;\n  ex:loanNumber \"L-1001\"^^xsd:string ;\n  ex:principalAmount \"250000\"^^xsd:decimal ;\n  ex:hasBorrower <https://data.example.com/borrower/B-9001> ;\n  ex:collateralProperty <https://data.example.com/property/P-7001> .\n```\n\n## Configuration Reference\n\n### Mapping File Structure\n\n```yaml\n# Namespace declarations\nnamespaces:\n  ex: https://example.com/mortgage#\n  xsd: http://www.w3.org/2001/XMLSchema#\n\n# Default settings\ndefaults:\n  base_iri: https://data.example.com/\n  language: en  # Optional default language tag\n\n# Sheet/file mappings\nsheets:\n  - name: loans\n    source: loans.csv  # Relative to mapping file or absolute\n    \n    # Main resource for each row\n    row_resource:\n      class: ex:MortgageLoan\n      iri_template: \"{base_iri}loan/{LoanID}\"\n    \n    # Column mappings\n    columns:\n      LoanID:\n        as: ex:loanNumber\n        datatype: xsd:string\n        required: true\n      \n      Principal:\n        as: ex:principalAmount\n        datatype: xsd:decimal\n        transform: to_decimal  # Built-in transform\n        default: 0  # Optional default value\n      \n      Notes:\n        as: rdfs:comment\n        datatype: xsd:string\n        language: en  # Language tag for literal\n    \n    # Linked objects (object properties)\n    objects:\n      borrower:\n        predicate: ex:hasBorrower\n        class: ex:Borrower\n        iri_template: \"{base_iri}borrower/{BorrowerID}\"\n        properties:\n          - column: BorrowerName\n            as: ex:borrowerName\n            datatype: xsd:string\n\n# Validation configuration\nvalidation:\n  shacl:\n    enabled: true\n    shapes_file: shapes/mortgage_shapes.ttl\n\n# Processing options\noptions:\n  delimiter: \",\"\n  header: true\n  on_error: \"report\"  # \"report\" or \"fail-fast\"\n  skip_empty_values: true\n```\n\n### Built-in Transforms\n\n- `to_decimal`: Convert to decimal number\n- `to_integer`: Convert to integer\n- `to_date`: Parse date (ISO format)\n- `to_datetime`: Parse datetime with timezone support\n- `to_boolean`: Convert to boolean\n- `uppercase`: Convert string to uppercase\n- `lowercase`: Convert string to lowercase\n- `strip`: Trim whitespace\n\n### IRI Templates\n\nUse Python-style string formatting with column names:\n- `{base_iri}loan/{LoanID}` \u2192 `https://data.example.com/loan/L-1001`\n- `{base_iri}{EntityType}/{ID}` \u2192 Combine multiple columns\n\n## CLI Reference\n\n### Commands\n\n#### `convert`\n\nConvert spreadsheet data to RDF.\n\n```bash\nrdfmap convert [OPTIONS]\n```\n\n**Options:**\n\n- `--ontology PATH`: Path to ontology file (supports TTL, RDF/XML, JSON-LD, N-Triples, etc.)\n- `--mapping PATH`: Path to mapping configuration (YAML/JSON) [required]\n- `--format, -f TEXT`: Output format: ttl, xml, jsonld, nt (default: ttl)\n- `--output, -o FILE`: Output file path\n- `--validate`: Run SHACL validation after conversion\n- `--report PATH`: Write validation report to file (JSON)\n- `--limit N`: Process only first N rows (for testing)\n- `--dry-run`: Parse and validate without writing output\n- `--verbose, -v`: Enable detailed logging\n- `--log PATH`: Write log to file\n\n**Examples:**\n\n```bash\n# Basic conversion to Turtle\nrdfmap convert --mapping config.yaml --format ttl --output output.ttl\n\n# With ontology validation and SHACL validation\nrdfmap convert \\\n  --mapping config.yaml \\\n  --ontology ontology.ttl \\\n  --format jsonld \\\n  --output output.jsonld \\\n  --validate \\\n  --report validation.json\n\n# Test with limited rows\nrdfmap convert --mapping config.yaml --limit 100 --dry-run --verbose\n```\n\n#### `generate`\n\n**NEW**: Automatically generate mapping configuration from ontology and spreadsheet.\n\n```bash\nrdfmap generate [OPTIONS]\n```\n\n**Options:**\n\n- `--ontology, -ont PATH`: Path to ontology file (TTL, RDF/XML, etc.) [required]\n- `--spreadsheet, -s PATH`: Path to spreadsheet file (CSV/XLSX) [required]\n- `--output, -o PATH`: Output path for generated mapping config [required]\n- `--base-iri, -b TEXT`: Base IRI for resources (default: http://example.org/)\n- `--class, -c TEXT`: Target ontology class (auto-detects if omitted)\n- `--format, -f TEXT`: Output format: yaml or json (default: yaml)\n- `--analyze-only`: Show analysis without generating mapping\n- `--export-schema`: Export JSON Schema for validation\n- `--verbose, -v`: Enable detailed logging\n\n**Examples:**\n\n```bash\n# Auto-generate mapping configuration\nrdfmap generate \\\n  --ontology ontology.ttl \\\n  --spreadsheet data.csv \\\n  --output mapping.yaml\n\n# Specify target class and export JSON Schema\nrdfmap generate \\\n  -ont ontology.ttl \\\n  -s data.csv \\\n  -o mapping.yaml \\\n  --class MortgageLoan \\\n  --export-schema\n\n# Analyze only (no generation)\nrdfmap generate \\\n  --ontology ontology.ttl \\\n  --spreadsheet data.csv \\\n  --output mapping.yaml \\\n  --analyze-only\n```\n\n**What it does:**\n- Analyzes ontology classes and properties\n- Examines spreadsheet columns and data types\n- Intelligently matches columns to properties\n- Suggests appropriate XSD datatypes\n- Generates IRI templates from identifier columns\n- Detects relationships for linked objects\n- Exports JSON Schema for validation\n\nSee [docs/MAPPING_GENERATOR.md](docs/MAPPING_GENERATOR.md) for details.\n\n#### `validate`\n\nValidate existing RDF file against shapes.\n\n```bash\nrdfmap validate --rdf PATH --shapes PATH [--report PATH]\n```\n\n#### `info`\n\nDisplay information about mapping configuration.\n\n```bash\nrdfmap info --mapping PATH\n```\n\n## Architecture\n\n```\nrdfmap/\n\u251c\u2500\u2500 parsers/          # CSV/XLSX data source parsers\n\u251c\u2500\u2500 models/           # Pydantic schemas for mapping config\n\u251c\u2500\u2500 transforms/       # Data transformation functions\n\u251c\u2500\u2500 iri/              # IRI templating and generation\n\u251c\u2500\u2500 emitter/          # RDF graph construction with rdflib\n\u251c\u2500\u2500 validator/        # SHACL validation integration\n\u2514\u2500\u2500 cli/              # Command-line interface\n```\n\n### Key Design Principles\n\n1. **Configuration-Driven**: All mappings declarative in YAML/JSON\n2. **Modular**: Clear separation between parsing, transformation, and emission\n3. **Deterministic**: Same input always produces same IRIs (idempotency)\n4. **Extensible**: Easy to add new transforms, datatypes, or ontology patterns\n5. **Robust**: Comprehensive error handling with row-level tracking\n\n## Extending the Application\n\n### Adding Custom Transforms\n\nEdit `rdfmap/transforms/functions.py`:\n\n```python\n@register_transform(\"custom_transform\")\ndef custom_transform(value: Any, **kwargs) -> Any:\n    \"\"\"Your custom transformation logic.\"\"\"\n    return transformed_value\n```\n\n### Supporting New Ontology Patterns\n\n1. Update mapping schema in `rdfmap/models/mapping.py` if needed\n2. Implement pattern handler in `rdfmap/emitter/graph_builder.py`\n3. Add test cases in `tests/test_patterns.py`\n\n### Adding New Output Formats\n\nExtend `rdfmap/emitter/serializer.py`:\n\n```python\ndef serialize(graph: Graph, format: str, output_path: Path):\n    if format == \"your_format\":\n        # Custom serialization logic\n        pass\n```\n\n## Testing\n\n```bash\n# Run all tests\npytest\n\n# Run with coverage\npytest --cov=rdfmap --cov-report=html\n\n# Run specific test file\npytest tests/test_transforms.py\n\n# Run mortgage example test\npytest tests/test_mortgage_example.py -v\n```\n\n## Error Handling\n\nThe application provides detailed error reporting:\n\n### Row-Level Errors\n\n```json\n{\n  \"row\": 42,\n  \"error\": \"Invalid datatype for column 'Principal': cannot convert 'N/A' to xsd:decimal\",\n  \"severity\": \"error\"\n}\n```\n\n### Validation Reports\n\n```json\n{\n  \"conforms\": false,\n  \"results\": [\n    {\n      \"focusNode\": \"https://data.example.com/loan/L-1001\",\n      \"resultPath\": \"ex:principalAmount\",\n      \"resultMessage\": \"Value must be greater than 0\"\n    }\n  ]\n}\n```\n\n## Performance Tips\n\n1. **Large Files**: The application automatically streams data for files >10MB\n2. **Chunking**: Process in batches using `--limit` and multiple runs\n3. **Validation**: Skip validation during development (`--validate` only for final runs)\n4. **Dry Runs**: Test mappings with `--limit 100 --dry-run` before full processing\n\n## Troubleshooting\n\n### \"Column not found\" errors\n- Check CSV column names match mapping config exactly (case-sensitive)\n- Verify CSV delimiter matches config (`delimiter: \",\"`)\n\n### Invalid IRIs\n- Ensure IRI template variables match column names exactly\n- Check that base_iri ends with `/` or `#`\n\n### Datatype conversion errors\n- Review data for unexpected values (nulls, text in numeric fields)\n- Use `transform` to normalize before typing\n- Set `skip_empty_values: true` to ignore nulls\n\n### SHACL validation failures\n- Review validation report for specific violations\n- Ensure ontology and shapes are compatible\n- Check that required properties are mapped\n\n## Contributing\n\nContributions welcome! Please:\n\n1. Follow PEP 8 style guidelines\n2. Add unit tests for new features\n3. Update documentation\n4. Run `pytest` and `mypy` before submitting\n\n## License\n\nMIT License - See LICENSE file for details\n\n## Support\n\nFor issues, questions, or feature requests, please open an issue on the project repository.\n\n## Acknowledgments\n\nBuilt with:\n- [rdflib](https://rdflib.readthedocs.io/) - RDF processing\n- [pandas](https://pandas.pydata.org/) - Data manipulation\n- [pydantic](https://docs.pydantic.dev/) - Data validation\n- [pyshacl](https://github.com/RDFLib/pySHACL) - SHACL validation\n- [typer](https://typer.tiangolo.com/) - CLI framework\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Convert tabular data (CSV, Excel, JSON, XML) to RDF triples aligned with OWL ontologies using SKOS-based semantic mapping",
    "version": "0.1.0",
    "project_urls": {
        "Changelog": "https://github.com/rxcthefirst/SemanticModelDataMapper/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/rxcthefirst/SemanticModelDataMapper#readme",
        "Homepage": "https://github.com/rxcthefirst/SemanticModelDataMapper",
        "Issues": "https://github.com/rxcthefirst/SemanticModelDataMapper/issues",
        "Repository": "https://github.com/rxcthefirst/SemanticModelDataMapper"
    },
    "split_keywords": [
        "rdf",
        " ontology",
        " semantic-web",
        " knowledge-graph",
        " owl",
        " shacl",
        " linked-data",
        " data-conversion",
        " skos",
        " owl2",
        " ttl",
        " json-ld",
        " csv-to-rdf",
        " excel-to-rdf",
        " json-to-rdf",
        " xml-to-rdf",
        " semantic-mapping",
        " ontology-alignment",
        " data-integration"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a442984dbd04928c384c6b23be72ab28d49b4d30bea807ba295d4f5dbd765a78",
                "md5": "522cbc7c1ce90b96d45565d66b0def23",
                "sha256": "d6f4fc10155aa33c1202e039c277db75e86c30012d55118e0fb4cdcda18582f9"
            },
            "downloads": -1,
            "filename": "semantic_rdf_mapper-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "522cbc7c1ce90b96d45565d66b0def23",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 76282,
            "upload_time": "2025-11-03T06:19:06",
            "upload_time_iso_8601": "2025-11-03T06:19:06.070598Z",
            "url": "https://files.pythonhosted.org/packages/a4/42/984dbd04928c384c6b23be72ab28d49b4d30bea807ba295d4f5dbd765a78/semantic_rdf_mapper-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "958e9247a5dd568474f4b5b40b38d922a54cf2f8280098cd136ca447f9449454",
                "md5": "324e39d04d1adb46e839e8f8117bccda",
                "sha256": "991c0bd8e53fe04ac8013723426c89cc147f814ed0c26df36f7ee571ea3adf97"
            },
            "downloads": -1,
            "filename": "semantic_rdf_mapper-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "324e39d04d1adb46e839e8f8117bccda",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 231109,
            "upload_time": "2025-11-03T06:19:07",
            "upload_time_iso_8601": "2025-11-03T06:19:07.708376Z",
            "url": "https://files.pythonhosted.org/packages/95/8e/9247a5dd568474f4b5b40b38d922a54cf2f8280098cd136ca447f9449454/semantic_rdf_mapper-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-03 06:19:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rxcthefirst",
    "github_project": "SemanticModelDataMapper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "rdflib",
            "specs": [
                [
                    ">=",
                    "7.0.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "2.1.0"
                ]
            ]
        },
        {
            "name": "openpyxl",
            "specs": [
                [
                    ">=",
                    "3.1.0"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    ">=",
                    "2.5.0"
                ]
            ]
        },
        {
            "name": "pydantic-settings",
            "specs": [
                [
                    ">=",
                    "2.1.0"
                ]
            ]
        },
        {
            "name": "pyshacl",
            "specs": [
                [
                    ">=",
                    "0.25.0"
                ]
            ]
        },
        {
            "name": "typer",
            "specs": [
                [
                    ">=",
                    "0.9.0"
                ]
            ]
        },
        {
            "name": "PyYAML",
            "specs": [
                [
                    ">=",
                    "6.0.1"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    ">=",
                    "2.8.2"
                ]
            ]
        },
        {
            "name": "rich",
            "specs": [
                [
                    ">=",
                    "13.7.0"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    ">=",
                    "8.1.7"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "7.4.3"
                ]
            ]
        },
        {
            "name": "pytest-cov",
            "specs": [
                [
                    ">=",
                    "4.1.0"
                ]
            ]
        },
        {
            "name": "mypy",
            "specs": [
                [
                    ">=",
                    "1.7.1"
                ]
            ]
        },
        {
            "name": "black",
            "specs": [
                [
                    ">=",
                    "23.12.0"
                ]
            ]
        },
        {
            "name": "ruff",
            "specs": [
                [
                    ">=",
                    "0.1.8"
                ]
            ]
        },
        {
            "name": "types-PyYAML",
            "specs": [
                [
                    ">=",
                    "6.0.12"
                ]
            ]
        },
        {
            "name": "types-python-dateutil",
            "specs": [
                [
                    ">=",
                    "2.8.19"
                ]
            ]
        },
        {
            "name": "pandas-stubs",
            "specs": [
                [
                    ">=",
                    "2.1.1"
                ]
            ]
        }
    ],
    "lcname": "semantic-rdf-mapper"
}

Enterprise Data Engineering