urarovite


Nameurarovite JSON
Version 1.3.4 PyPI version JSON
download
home_pageNone
SummaryA Google Sheets validation library
upload_time2025-08-22 19:46:13
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseGNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (c) 2024 Mo Dorff Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.
keywords data-quality google-sheets gspread spreadsheets validation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Urarovite ๐Ÿ”

[![PyPI version](https://badge.fury.io/py/urarovite.svg)](https://badge.fury.io/py/urarovite)
[![Python versions](https://img.shields.io/pypi/pyversions/urarovite.svg)](https://pypi.org/project/urarovite/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A comprehensive spreadsheet validation library with **universal format support** for both Google Sheets and Excel files. Urarovite provides robust data validation, modern authentication, formula preservation, and seamless integration with contemporary codebases through a clean abstraction layer.

## ๐Ÿš€ Features

### Universal Spreadsheet Support
- **Google Sheets**: Full gspread integration with modern authentication
- **Excel Files**: Native `.xlsx` and `.xls` support via openpyxl
- **Format Agnostic**: Validators work identically across all spreadsheet formats
- **Formula Preservation**: Maintains formulas during conversion between formats
- **Intelligent Defaults**: Smart target location detection and file management

### Modern Architecture
- **Abstraction Layer**: Clean SpreadsheetInterface for format-independent operations
- **Template Methods**: Consistent validation patterns with automatic resource management
- **Performance Optimized**: Smart read-only mode detection and client caching
- **Type Safety**: Comprehensive type hints throughout (Python 3.9+)

### Authentication & Security
- **Service Account Only**: Base64-encoded credentials (no file storage required)
- **Domain-wide Delegation**: Enterprise-grade user impersonation support
- **Secure Credential Handling**: No sensitive data in logs or error messages
- **Environment Integration**: Seamless .env file support

### Validation System
- **19+ Built-in Validators**: Data quality, range validation, platform neutralization
- **Fix & Flag Modes**: Automatic issue resolution or reporting-only modes
- **Comprehensive Coverage**: Empty cells, duplicates, formatting, tab names, ranges
- **Excel Compatibility**: Tab name validation, formula detection, range verification

## ๐Ÿ“ฆ Installation

```bash
pip install urarovite
```

### Optional Dependencies

```bash
# For Excel file support
pip install urarovite[excel]

# For development
pip install urarovite[dev]

# For Jupyter notebook support
pip install urarovite[notebook]

# Install all extras
pip install urarovite[excel,dev,notebook]
```

## ๐Ÿ”‘ Authentication Setup

### Service Account (Recommended)

1. **Create a Google Cloud Project**:
   - Go to [Google Cloud Console](https://console.cloud.google.com/)
   - Create a new project or select an existing one

2. **Enable APIs**:
   - Navigate to "APIs & Services" > "Library"
   - Enable "Google Sheets API" and "Google Drive API"

3. **Create Service Account**:
   - Go to "APIs & Services" > "Credentials"
   - Click "Create Credentials" > "Service Account"
   - Download the JSON key file

4. **Prepare Credentials**:

   ```python
   import base64
   import json
   
   # Load your service account JSON
   with open('path/to/service-account.json', 'r') as f:
       service_account = json.load(f)
   
   # Encode for use with urarovite
   encoded_creds = base64.b64encode(json.dumps(service_account).encode()).decode()
   ```

### Domain-wide Delegation (Enterprise)

For enterprise users who need to impersonate other users:

1. **Enable Domain-wide Delegation** in your service account settings
2. **Add OAuth Scopes** in Google Admin Console:
   - `https://www.googleapis.com/auth/spreadsheets`
   - `https://www.googleapis.com/auth/drive.readonly`

## ๐Ÿ’ป Usage

### Basic Validation

```python
from urarovite.core.api import execute_validation, get_available_validation_criteria

# List available validators
validators = get_available_validation_criteria()
print(validators)
# [{"id": "empty_cells", "name": "Fix Empty Cells (with range targeting)"}, ...]

# Google Sheets validation (requires authentication)
encoded_creds = "eyJ0eXBlIjogInNlcnZpY2VfYWNjb3VudCIsIC4uLn0="
result = execute_validation(
    check={"id": "empty_cells", "mode": "fix"},
    sheet_url="https://docs.google.com/spreadsheets/d/1ABC123/edit",
    auth_secret=encoded_creds,
    subject="user@domain.com"  # Optional: for domain-wide delegation
)

# Excel file validation (no authentication required)
result = execute_validation(
    check={"id": "empty_cells", "mode": "fix"},
    sheet_url="./data/spreadsheet.xlsx"
)

# Excel to Google Sheets conversion with validation
result = execute_validation(
    check={"id": "tab_names", "mode": "fix"},
    sheet_url="./data/spreadsheet.xlsx",
    auth_secret=encoded_creds,
    target="1hWMAXridd8Gd_ND6p8r4bGLQYZnL0b52",  # Drive folder ID
    target_format="sheets"
)

print(f"Fixed {result['fixes_applied']} flags")
print(f"Found {result['flags_found']} additional flags")
print(f"Logs: {result['automated_log']}")
```

### Advanced Validation: Target Specific Ranges

The `empty_cells` validator now supports targeting specific cell ranges instead of checking all cells. This is useful when you only want to validate certain areas of your spreadsheet.

#### Range Targeting Examples

```python
from urarovite.validators import get_validator

validator = get_validator("empty_cells")

# Target a specific range (e.g., 'Case'!D72:D76)
result = validator.validate(
    "spreadsheet.xlsx",
    mode="flag",
    target_ranges="Case!D72:D76"
)

# Target individual cells
result = validator.validate(
    "spreadsheet.xlsx", 
    mode="fix",
    target_ranges=["Case!D72", "Case!D73", "Case!D74"],
    fill_value="N/A"
)

# Target multiple ranges
result = validator.validate(
    "spreadsheet.xlsx",
    mode="flag", 
    target_ranges=["Sheet1!A1:B10", "Sheet1!D5:F15"]
)

# Without sheet name (uses default sheet)
result = validator.validate(
    "spreadsheet.xlsx",
    mode="flag",
    target_ranges="A1:C5"
)
```

#### Range Format Support

- **Range notation**: `'Sheet'!A1:B10` - checks all cells from A1 to B10
- **Individual cells**: `'Sheet'!A1, 'Sheet'!B2` - checks only specific cells
- **Mixed formats**: Combine ranges and individual cells in a list
- **Sheet names**: Use quotes for sheets with spaces: `'My Sheet'!A1`
- **Backward compatibility**: No `target_ranges` parameter = check all cells (default behavior)

#### Command Line Usage with Ranges

```bash
# Target specific range
./run_validation.sh --check '{"id": "empty_cells", "mode": "flag", "target_ranges": "Case!D72:D76"}' 'spreadsheet.xlsx'

# Target individual cells  
./run_validation.sh --check '{"id": "empty_cells", "mode": "fix", "target_ranges": ["Case!D72", "Case!D73"], "fill_value": "N/A"}' 'spreadsheet.xlsx'
```

### Sheet Crawling & Batch Validation

Urarovite includes powerful sheet crawling capabilities that can automatically discover and validate all sheets referenced in a metadata spreadsheet. This is perfect for processing large datasets with multiple input/output sheet pairs.

#### ๐Ÿš€ Super Simple Usage

```bash
# Just provide the URL - saves fixed sheets as Google Sheets in same folder as source
./run_crawl_validation.sh "https://docs.google.com/spreadsheets/d/1Jx5CHYvKt3y2aO-1vFKT7botQUWnvp-CcZftvFGz2pQ/edit#gid=114720924"
```

**What this does:**
- โœ… Crawls through your metadata sheet
- โœ… Finds all input/output sheet URLs automatically
- โœ… Runs ALL available validations on each sheet
- โœ… **Saves fixed sheets as Google Sheets in the same folder as the source** (NEW DEFAULT!)
- โœ… **Automatically adds fixed sheet URLs back to your metadata sheet** (NEW!)
- โœ… Provides comprehensive results and statistics

#### Command Line Options

```bash
# With domain-wide delegation
./run_crawl_validation.sh "https://docs.google.com/spreadsheets/d/your-sheet-id" "user@yourdomain.com"

# Flag mode only (no fixes applied)
./run_crawl_validation.sh "https://docs.google.com/spreadsheets/d/your-sheet-id" --mode flag

# Save to local Excel files instead
./run_crawl_validation.sh "https://docs.google.com/spreadsheets/d/your-sheet-id" --target local --format excel

# Save to specific Google Drive folder
./run_crawl_validation.sh "https://docs.google.com/spreadsheets/d/your-sheet-id" --target "1A2B3C4D5E6F7G8H9I0J"
```

#### ๐Ÿ“‹ New Defaults (Google-First)

- **Target**: Same folder as source (intelligent default) โ† was "local"
- **Format**: Google Sheets โ† was "excel"

**Why This is Better:**
1. **Seamless Google Workflow**: Fixed sheets stay in Google Drive where you can easily access them
2. **Same Folder Organization**: Fixed sheets are created right next to your source sheets
3. **No File Management**: No need to upload/download Excel files
4. **Team Collaboration**: Everyone can access the fixed sheets immediately
5. **Version History**: Google Sheets maintains version history of fixes
6. **Automatic Audit Trail**: Fixed sheet URLs are automatically added to your metadata sheet

#### ๐Ÿ“ New Metadata Columns

After running the crawling script, your metadata sheet will automatically get new columns added:

**For Google Sheets Output (Default):**
- **`input_sheet_url_fixed`**: URLs to the fixed versions of input sheets
- **`example_output_sheet_url_fixed`**: URLs to the fixed versions of output sheets
- **`input_fixes_applied`**: Number of fixes applied to input sheets
- **`input_flags_found`**: Number of flags found in input sheets
- **`input_validation_summary`**: Summary of validation results for input sheets
- **`input_validation_errors`**: Any validation errors for input sheets
- **`output_fixes_applied`**: Number of fixes applied to output sheets
- **`output_flags_found`**: Number of flags found in output sheets
- **`output_validation_summary`**: Summary of validation results for output sheets
- **`output_validation_errors`**: Any validation errors for output sheets

**For Excel Output:**
- **`input_sheet_path_fixed`**: Relative paths to the fixed Excel files for input sheets
- **`example_output_sheet_path_fixed`**: Relative paths to the fixed Excel files for output sheets
- **Plus all the same validator output columns as above**

#### Example Results

**Before:**
| worker_id | input_sheet_url | example_output_sheet_url |
|-----------|----------------|--------------------------|
| ABC123    | https://docs.google.com/.../input123 | https://docs.google.com/.../output123 |

**After (showing key columns):**
| worker_id | input_sheet_url_fixed | input_fixes_applied | input_flags_found | input_validation_summary | output_sheet_url_fixed | output_fixes_applied | output_flags_found |
|-----------|----------------------|--------------------|--------------------|-------------------------|----------------------|--------------------|--------------------|
| ABC123    | https://docs.google.com/.../input123_fixed | 15 | 3 | โœ… 12 successful; โŒ 1 failed | https://docs.google.com/.../output123_fixed | 8 | 0 |

This creates a **complete audit trail** showing:
- ๐Ÿ”— **Where the fixed sheets are located** (direct links)
- ๐Ÿ“Š **Exactly what was fixed** (number of fixes applied)
- โš ๏ธ **What flags remain** (flags found but not fixed)
- โœ… **Validation success rate** (how many validators succeeded)
- ๐Ÿšจ **Any errors encountered** (validation errors for troubleshooting)

#### ๐Ÿ“Š Expected Output

```
๐Ÿš€ Starting Urarovite Sheet Crawling and Validation
==================================================
[INFO] Metadata Sheet: https://docs.google.com/spreadsheets/d/...
[INFO] Authentication: โœ“ Configured
[INFO] Validation Mode: fix
[INFO] Target: Same folder as source (intelligent default)
[INFO] Format: sheets
[INFO] Preserve Formatting: true

๐Ÿ” Starting crawling and validation...
   Metadata Sheet: https://docs.google.com/spreadsheets/d/...
   Authentication: โœ“ Configured
   Validation Mode: fix
   Target: Same folder as source (intelligent default)
   Format: sheets

๐Ÿ“‹ CRAWLING AND VALIDATION RESULTS
==================================================
โœ… Overall Status: SUCCESS

๐Ÿ“Š Summary Statistics:
   Total Sheet Pairs: 15
   Successful Pairs: 15
   Failed Pairs: 0
   Total Input Fixes: 47
   Total Output Fixes: 23
   Total Input flags: 12
   Total Output flags: 8
   Total Errors: 0

โฑ๏ธ  Performance Metrics:
   Total Time: 125.30 seconds
   Crawling Time: 5.20 seconds
   Validation Time: 120.10 seconds
   Processing Rate: 0.12 pairs/second

๐Ÿ’พ Output Files:
   Results JSON: ./output/crawl_validation_results_20241220_143022.json
   Processing Log: ./output/crawl_validation_20241220_143022.log
   Validated Files: Check Google Drive - fixed sheets created in source folders

๐ŸŽ‰ Crawling and validation completed successfully!
   Applied 70 fixes across all sheets
   Check Google Drive for the fixed sheets
```

#### ๐ŸŽฏ Perfect for Batch Processing

This crawling functionality is ideal for data cleaning tasks because:

1. **Batch Processing**: Processes all your input/output sheet pairs at once
2. **Intelligent Detection**: Automatically finds sheet URLs in your metadata
3. **Comprehensive Validation**: Runs all available validators on each sheet
4. **Google-Native**: Keeps everything in Google Drive for easy access
5. **Detailed Reporting**: Shows exactly what was fixed and where

Just run it once and get all your sheets validated and fixed! ๐Ÿš€

#### Prerequisites for Crawling

```bash
# Set your authentication (required)
export AUTH_SECRET="eyJ0eXBlIjogInNlcnZpY2VfYWNjb3VudCIsIC4uLn0="

# Optional: For domain-wide delegation
export DELEGATION_SUBJECT="user@yourdomain.com"
```

### Advanced Usage with gspread

```python
from urarovite.auth import get_gspread_client, create_sheets_service_from_encoded_creds
from urarovite.utils.sheets import extract_sheet_id, get_sheet_values

# Create gspread client (recommended)
client = get_gspread_client(encoded_creds, subject="user@domain.com")
spreadsheet = client.open_by_key(sheet_id)

# Or create traditional Google Sheets API service
service = create_sheets_service_from_encoded_creds(encoded_creds)

# Use utility functions
sheet_id = extract_sheet_id("https://docs.google.com/spreadsheets/d/1ABC123/edit")
data = get_sheet_values(service, sheet_id, "Sheet1!A1:Z1000")
```

## โš™๏ธ Configuration

### Range Separator Configuration

Urarovite supports configurable range separators for parsing verification field ranges. By default, ranges are separated by `@@`, but you can configure this to use any separator (e.g., commas).

#### Global Configuration

```python
from urarovite.utils.sheets import get_segment_separator, set_segment_separator

# Check current separator
print(f"Current separator: '{get_segment_separator()}'")  # Default: '@@'

# Change to comma separator
set_segment_separator(",")

# Now all functions will use comma by default
from urarovite.utils.sheets import split_segments
ranges = "Sheet1!A1:B2,Sheet2!C3:D4"
segments = split_segments(ranges)  # Uses comma separator
print(segments)  # ['Sheet1!A1:B2', 'Sheet2!C3:D4']

# Restore default
set_segment_separator("@@")
```

#### Per-Validator Configuration

You can also specify separators per validation call:

```python
from urarovite.validators import get_validator

validator = get_validator("open_ended_ranges")

# Use comma separator for this validation
result = validator.validate(
    spreadsheet_source=sheet_url,
    mode="flag",
    auth_credentials=auth_creds,
    row=row_data,
    separator=","  # Override separator for this call
)
```

#### Supported Separators

- **Default**: `@@` (double at-sign)
- **Comma**: `,` (comma)
- **Pipe**: `|` (pipe)
- **Semicolon**: `;` (semicolon)
- **Any string**: Custom separators as needed

#### Backward Compatibility

All existing code continues to work unchanged:
- Functions default to `@@` separator if no separator is specified
- Explicit separator parameters override global settings
- No breaking changes to existing APIs

## ๐Ÿ”ง Migration from OAuth

If you're migrating from OAuth-based authentication:

```python
# OLD: OAuth-based authentication
from urarovite.checker.auth import get_credentials, get_sheets_service
creds = get_credentials()  # Interactive OAuth flow
service = get_sheets_service()

# NEW: Service account with base64 credentials
from urarovite.auth import create_sheets_service_from_encoded_creds
service = create_sheets_service_from_encoded_creds(encoded_creds)

# Or use modern gspread client (recommended)
from urarovite.auth import get_gspread_client
client = get_gspread_client(encoded_creds)
```

## ๐Ÿ“š Documentation

- **Migration Guide**: See [MIGRATION_SUMMARY.md](MIGRATION_SUMMARY.md) for detailed changes
- **Validator Migration**: See [VALIDATOR_MIGRATION_GUIDE.md](VALIDATOR_MIGRATION_GUIDE.md) for validator development
- **Spreadsheet Abstraction**: See [SPREADSHEET_ABSTRACTION_GUIDE.md](SPREADSHEET_ABSTRACTION_GUIDE.md) for multi-format support
- **API Reference**: Full type hints and docstrings throughout the codebase
- **Command Line Usage**: Use `./run_validation.sh` for batch validation operations
- **Examples**: Check the `/tests` directory for comprehensive usage examples

## ๐Ÿงช Testing

```bash
# Install with dev dependencies
pip install urarovite[dev]

# Run tests
pytest

# Run with coverage
pytest --cov=urarovite
```

## ๐Ÿ–ฅ๏ธ Command Line Usage

The repository includes a dynamic CLI and legacy shell scripts for running validations, plus powerful batch utilities with automatic result tracking.

### ๐Ÿš€ CLI Utilities (New)

Urarovite provides a comprehensive set of CLI utilities for both single operations and batch processing. All batch utilities automatically add result columns to metadata spreadsheets, showing "Passed" or "Failed" for each processed row.

#### Available Utilities

```bash
# View all available utilities
uv run python -m urarovite.cli run_util --help

# Batch processing utilities (automatically add result columns)
uv run python -m urarovite.cli run_util batch-sanitize-tab-names "metadata_sheet_url" --url-columns "tabs_to_fix"
uv run python -m urarovite.cli run_util batch-validate-a1-ranges "metadata_sheet_url" --url-columns "tabs_to_fix" --column "formula_column"
uv run python -m urarovite.cli run_util batch-sheets-to-excel-drive "metadata_sheet_url" "drive_folder_id" --url-columns "sheet_url"
uv run python -m urarovite.cli run_util batch-excel-to-sheets-drive "metadata_sheet_url" "drive_folder_id" --url-columns "excel_url"
uv run python -m urarovite.cli run_util batch-rename-tabs-from-sheet "metadata_sheet_url" --url-column "spreadsheet_url" --old-name-column "old_tab" --new-name-column "new_tab"

# Single operation utilities
uv run python -m urarovite.cli run_util sanitize-tab-names "spreadsheet_url"
uv run python -m urarovite.cli run_util validate-a1-ranges "spreadsheet_url" --column "formula_column"
uv run python -m urarovite.cli run_util sheets-to-excel-drive "sheet_url" "drive_folder_id"
uv run python -m urarovite.cli run_util excel-to-sheets-drive "excel_file" "drive_folder_id"

# Specialized utilities
uv run python -m urarovite.cli run_util process-forte "csv_file" --target "drive_folder_id"
uv run python -m urarovite.cli run_util folder-batch "input_folder" "drive_folder_id"
uv run python -m urarovite.cli run_util validate "spreadsheet_url" --validator "empty_cells" --validation-mode fix
```

#### ๐Ÿ”„ Automatic Result Column Tracking

**Batch utilities automatically add result columns to your metadata spreadsheet:**

- **Result Column**: `"[utility-name] result"` (e.g., `"batch-sanitize-tab-names result"`)
  - **Values**: `"Passed"` for successful operations, `"Failed"` for failed operations
- **Converted File URL Columns**: Automatically added based on utility type
  - **Sheets โ†’ Excel**: `"excel_url"` column with Google Drive URLs
  - **Excel โ†’ Sheets**: `"sheets_url"` column with Google Sheets URLs
  - **Custom Names**: Use `--excel-url-column` or `--sheets-url-column` to customize
- **Position**: Added to the right of existing data
- **Automatic**: No manual configuration required

**Example Result Columns:**
```
| sheet_url | batch-sheets-to-excel-drive result | excel_url                    |
|-----------|-----------------------------------|------------------------------|
| sheet1    | Passed                            | https://drive.google.com/... |
| sheet2    | Passed                            | https://drive.google.com/... |
| sheet3    | Failed                            |                              |
```

#### ๐Ÿ“‹ Requirements for Result Columns

- **Google Sheets URLs**: Must be `https://docs.google.com/spreadsheets/...` format
- **Authentication**: Requires valid `AUTH_SECRET` in `.env` file
- **Batch Mode**: Automatically detected when using `--url-columns`
- **Metadata Structure**: Input sheet must have URL columns specified

#### ๐ŸŽฏ Smart Mode Detection

- **Automatic**: Utilities with `--url-columns` automatically switch to batch mode
- **Manual Override**: Use `--mode batch` or `--mode single` to override
- **Result Tracking**: Only batch mode adds result columns to metadata sheets

### Per-Validator CLI Commands (Legacy)

### Per-Validator CLI Commands (New)

Each validator now has its own subcommand that accepts explicit parameters mapped to its `validate()` signature. Run `urarovite <validator-id> --help` to see arguments.

Examples:

```bash
# Empty cells: fill all empty values with N/A on a local Excel file
urarovite empty_cells ./data/spreadsheet.xlsx --mode fix --fill-value "N/A"

# Empty cells with targeted ranges
urarovite empty_cells ./data/spreadsheet.xlsx --mode flag --target-ranges "'Case'!D72:D76"

# Tab names with custom replacement char
urarovite tab_names 'https://docs.google.com/spreadsheets/d/abc123' --mode fix --auth-secret "$AUTH_SECRET" --replacement-char _

# Numeric rounding (flag-only example)
urarovite numeric_rounding ./data/spreadsheet.xlsx --mode flag

# Cell value validation using JSON for expected_values
urarovite cell_value_validation ./data/spreadsheet.xlsx --mode fix \
  --expected-values '{"A1": "Title", "B2": 100}' --tolerance 0.01

# Spreadsheet differences with verification ranges
urarovite run_util spreadsheet-differences "task_sheet_url" --validation-mode flag --input-url-column "I" --output-url-column "J" --verification-range-column "H"

# Spreadsheet differences with custom columns and whole-sheet comparison disabled
urarovite run_util spreadsheet-differences "task_sheet_url" --validation-mode fix --input-url-column "A" --output-url-column "B" --verification-range-column "C" --compare-whole-sheet false

```

Notes:
- Common options on all validator commands:
  - `--mode {flag,fix}`: required
  - `--auth-secret`: base64 service account for Google Sheets
  - `--subject`: delegation subject (optional)
  - `--output {table,json}`: result display format (default: table)

### Single Sheet Validation (`run_validation.sh`) (Legacy)

For validating individual spreadsheets with the legacy script:

### Prerequisites

1. **Make the script executable** (if needed):
   ```bash
   chmod +x run_validation.sh
   ```

2. **For Google Sheets validation**, create a `.env` file with your base64-encoded service account:
   ```bash
   # .env file
   AUTH_SECRET=eyJ0eXBlIjogInNlcnZpY2VfYWNjb3VudCIsIC4uLn0=
   ```

### Usage Examples

```bash
# Run all validations on Google Sheets
./run_validation.sh --all 'https://docs.google.com/spreadsheets/d/abc123'

# Run all validations on local Excel file
./run_validation.sh --all './data/spreadsheet.xlsx'

# Run single validation with JSON
./run_validation.sh --check '{"id": "empty_cells", "mode": "fix"}' 'https://docs.google.com/spreadsheets/d/abc123'

# Run single validation on Excel file
./run_validation.sh --check '{"id": "tab_names", "mode": "fix"}' './spreadsheet.xlsx'

# With delegation subject (Google Sheets only)
./run_validation.sh --all 'https://docs.google.com/spreadsheets/d/abc123' 'user@domain.com'

# Load check from JSON file
echo '{"id": "duplicate_rows", "mode": "flag"}' > check.json
./run_validation.sh --check check.json 'https://docs.google.com/spreadsheets/d/abc123'
```

### Script Options

- `--all`: Run all available validation criteria
- `--check <json_or_file>`: Run a single validation check (JSON string or file path)

### Supported Input Types

- **Google Sheets**: URLs containing `docs.google.com` (requires authentication)
- **Excel Files**: Local `.xlsx` or `.xls` files (no authentication required)

The script automatically detects the input type and applies appropriate authentication requirements.

### Batch Validation with Crawling (`run_crawl_validation.sh`)

For processing multiple sheets referenced in a metadata spreadsheet, see the [Sheet Crawling & Batch Validation](#sheet-crawling--batch-validation) section above. This script can automatically discover and validate all sheets in your data processing workflow.

## ๐Ÿ“š Documentation

For comprehensive guides and technical details:

- **[CLI Utilities Complete Guide](./CLI_UTILITIES_COMPREHENSIVE_GUIDE.md)** - Complete CLI usage, technical implementation, and quick reference
- **[CLI Utilities README](./CLI_UTILITIES_README.md)** - CLI utilities overview
- **[Validation Guide](./COMPLETE_VALIDATORS_USER_GUIDE.md)** - Validator usage and examples
- **[API Documentation](./SPREADSHEET_ABSTRACTION_GUIDE.md)** - Programmatic usage patterns
- **[Authentication Guide](./USER_GUIDE.md)** - Detailed authentication setup

## ๐Ÿค Contributing

1. Fork the repository
2. Create a feature branch
3. Install development dependencies: `pip install urarovite[dev]`
4. Make your changes with proper type hints and tests
5. Run tests and linting: `pytest && ruff check`
6. Submit a pull request

## ๐Ÿ“„ License

This project is licensed under the GNU General Public License v3 (GPLv3) - see the [LICENSE](LICENSE) file for details.

## ๐Ÿ”— Links

- **PyPI**: <https://pypi.org/project/urarovite/>
- **GitHub**: <https://github.com/ParetoWorkers/Urarovite>
- **flags**: <https://github.com/ParetoWorkers/Urarovite/flags>

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "urarovite",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "data-quality, google-sheets, gspread, spreadsheets, validation",
    "author": null,
    "author_email": "Mo Dorff <mohonrid@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/54/8c/8e3c6d1aafa5a097d8578905b19cd46ba6e392f0935cbd9725eca7efb705/urarovite-1.3.4.tar.gz",
    "platform": null,
    "description": "# Urarovite \ud83d\udd0d\n\n[![PyPI version](https://badge.fury.io/py/urarovite.svg)](https://badge.fury.io/py/urarovite)\n[![Python versions](https://img.shields.io/pypi/pyversions/urarovite.svg)](https://pypi.org/project/urarovite/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nA comprehensive spreadsheet validation library with **universal format support** for both Google Sheets and Excel files. Urarovite provides robust data validation, modern authentication, formula preservation, and seamless integration with contemporary codebases through a clean abstraction layer.\n\n## \ud83d\ude80 Features\n\n### Universal Spreadsheet Support\n- **Google Sheets**: Full gspread integration with modern authentication\n- **Excel Files**: Native `.xlsx` and `.xls` support via openpyxl\n- **Format Agnostic**: Validators work identically across all spreadsheet formats\n- **Formula Preservation**: Maintains formulas during conversion between formats\n- **Intelligent Defaults**: Smart target location detection and file management\n\n### Modern Architecture\n- **Abstraction Layer**: Clean SpreadsheetInterface for format-independent operations\n- **Template Methods**: Consistent validation patterns with automatic resource management\n- **Performance Optimized**: Smart read-only mode detection and client caching\n- **Type Safety**: Comprehensive type hints throughout (Python 3.9+)\n\n### Authentication & Security\n- **Service Account Only**: Base64-encoded credentials (no file storage required)\n- **Domain-wide Delegation**: Enterprise-grade user impersonation support\n- **Secure Credential Handling**: No sensitive data in logs or error messages\n- **Environment Integration**: Seamless .env file support\n\n### Validation System\n- **19+ Built-in Validators**: Data quality, range validation, platform neutralization\n- **Fix & Flag Modes**: Automatic issue resolution or reporting-only modes\n- **Comprehensive Coverage**: Empty cells, duplicates, formatting, tab names, ranges\n- **Excel Compatibility**: Tab name validation, formula detection, range verification\n\n## \ud83d\udce6 Installation\n\n```bash\npip install urarovite\n```\n\n### Optional Dependencies\n\n```bash\n# For Excel file support\npip install urarovite[excel]\n\n# For development\npip install urarovite[dev]\n\n# For Jupyter notebook support\npip install urarovite[notebook]\n\n# Install all extras\npip install urarovite[excel,dev,notebook]\n```\n\n## \ud83d\udd11 Authentication Setup\n\n### Service Account (Recommended)\n\n1. **Create a Google Cloud Project**:\n   - Go to [Google Cloud Console](https://console.cloud.google.com/)\n   - Create a new project or select an existing one\n\n2. **Enable APIs**:\n   - Navigate to \"APIs & Services\" > \"Library\"\n   - Enable \"Google Sheets API\" and \"Google Drive API\"\n\n3. **Create Service Account**:\n   - Go to \"APIs & Services\" > \"Credentials\"\n   - Click \"Create Credentials\" > \"Service Account\"\n   - Download the JSON key file\n\n4. **Prepare Credentials**:\n\n   ```python\n   import base64\n   import json\n   \n   # Load your service account JSON\n   with open('path/to/service-account.json', 'r') as f:\n       service_account = json.load(f)\n   \n   # Encode for use with urarovite\n   encoded_creds = base64.b64encode(json.dumps(service_account).encode()).decode()\n   ```\n\n### Domain-wide Delegation (Enterprise)\n\nFor enterprise users who need to impersonate other users:\n\n1. **Enable Domain-wide Delegation** in your service account settings\n2. **Add OAuth Scopes** in Google Admin Console:\n   - `https://www.googleapis.com/auth/spreadsheets`\n   - `https://www.googleapis.com/auth/drive.readonly`\n\n## \ud83d\udcbb Usage\n\n### Basic Validation\n\n```python\nfrom urarovite.core.api import execute_validation, get_available_validation_criteria\n\n# List available validators\nvalidators = get_available_validation_criteria()\nprint(validators)\n# [{\"id\": \"empty_cells\", \"name\": \"Fix Empty Cells (with range targeting)\"}, ...]\n\n# Google Sheets validation (requires authentication)\nencoded_creds = \"eyJ0eXBlIjogInNlcnZpY2VfYWNjb3VudCIsIC4uLn0=\"\nresult = execute_validation(\n    check={\"id\": \"empty_cells\", \"mode\": \"fix\"},\n    sheet_url=\"https://docs.google.com/spreadsheets/d/1ABC123/edit\",\n    auth_secret=encoded_creds,\n    subject=\"user@domain.com\"  # Optional: for domain-wide delegation\n)\n\n# Excel file validation (no authentication required)\nresult = execute_validation(\n    check={\"id\": \"empty_cells\", \"mode\": \"fix\"},\n    sheet_url=\"./data/spreadsheet.xlsx\"\n)\n\n# Excel to Google Sheets conversion with validation\nresult = execute_validation(\n    check={\"id\": \"tab_names\", \"mode\": \"fix\"},\n    sheet_url=\"./data/spreadsheet.xlsx\",\n    auth_secret=encoded_creds,\n    target=\"1hWMAXridd8Gd_ND6p8r4bGLQYZnL0b52\",  # Drive folder ID\n    target_format=\"sheets\"\n)\n\nprint(f\"Fixed {result['fixes_applied']} flags\")\nprint(f\"Found {result['flags_found']} additional flags\")\nprint(f\"Logs: {result['automated_log']}\")\n```\n\n### Advanced Validation: Target Specific Ranges\n\nThe `empty_cells` validator now supports targeting specific cell ranges instead of checking all cells. This is useful when you only want to validate certain areas of your spreadsheet.\n\n#### Range Targeting Examples\n\n```python\nfrom urarovite.validators import get_validator\n\nvalidator = get_validator(\"empty_cells\")\n\n# Target a specific range (e.g., 'Case'!D72:D76)\nresult = validator.validate(\n    \"spreadsheet.xlsx\",\n    mode=\"flag\",\n    target_ranges=\"Case!D72:D76\"\n)\n\n# Target individual cells\nresult = validator.validate(\n    \"spreadsheet.xlsx\", \n    mode=\"fix\",\n    target_ranges=[\"Case!D72\", \"Case!D73\", \"Case!D74\"],\n    fill_value=\"N/A\"\n)\n\n# Target multiple ranges\nresult = validator.validate(\n    \"spreadsheet.xlsx\",\n    mode=\"flag\", \n    target_ranges=[\"Sheet1!A1:B10\", \"Sheet1!D5:F15\"]\n)\n\n# Without sheet name (uses default sheet)\nresult = validator.validate(\n    \"spreadsheet.xlsx\",\n    mode=\"flag\",\n    target_ranges=\"A1:C5\"\n)\n```\n\n#### Range Format Support\n\n- **Range notation**: `'Sheet'!A1:B10` - checks all cells from A1 to B10\n- **Individual cells**: `'Sheet'!A1, 'Sheet'!B2` - checks only specific cells\n- **Mixed formats**: Combine ranges and individual cells in a list\n- **Sheet names**: Use quotes for sheets with spaces: `'My Sheet'!A1`\n- **Backward compatibility**: No `target_ranges` parameter = check all cells (default behavior)\n\n#### Command Line Usage with Ranges\n\n```bash\n# Target specific range\n./run_validation.sh --check '{\"id\": \"empty_cells\", \"mode\": \"flag\", \"target_ranges\": \"Case!D72:D76\"}' 'spreadsheet.xlsx'\n\n# Target individual cells  \n./run_validation.sh --check '{\"id\": \"empty_cells\", \"mode\": \"fix\", \"target_ranges\": [\"Case!D72\", \"Case!D73\"], \"fill_value\": \"N/A\"}' 'spreadsheet.xlsx'\n```\n\n### Sheet Crawling & Batch Validation\n\nUrarovite includes powerful sheet crawling capabilities that can automatically discover and validate all sheets referenced in a metadata spreadsheet. This is perfect for processing large datasets with multiple input/output sheet pairs.\n\n#### \ud83d\ude80 Super Simple Usage\n\n```bash\n# Just provide the URL - saves fixed sheets as Google Sheets in same folder as source\n./run_crawl_validation.sh \"https://docs.google.com/spreadsheets/d/1Jx5CHYvKt3y2aO-1vFKT7botQUWnvp-CcZftvFGz2pQ/edit#gid=114720924\"\n```\n\n**What this does:**\n- \u2705 Crawls through your metadata sheet\n- \u2705 Finds all input/output sheet URLs automatically\n- \u2705 Runs ALL available validations on each sheet\n- \u2705 **Saves fixed sheets as Google Sheets in the same folder as the source** (NEW DEFAULT!)\n- \u2705 **Automatically adds fixed sheet URLs back to your metadata sheet** (NEW!)\n- \u2705 Provides comprehensive results and statistics\n\n#### Command Line Options\n\n```bash\n# With domain-wide delegation\n./run_crawl_validation.sh \"https://docs.google.com/spreadsheets/d/your-sheet-id\" \"user@yourdomain.com\"\n\n# Flag mode only (no fixes applied)\n./run_crawl_validation.sh \"https://docs.google.com/spreadsheets/d/your-sheet-id\" --mode flag\n\n# Save to local Excel files instead\n./run_crawl_validation.sh \"https://docs.google.com/spreadsheets/d/your-sheet-id\" --target local --format excel\n\n# Save to specific Google Drive folder\n./run_crawl_validation.sh \"https://docs.google.com/spreadsheets/d/your-sheet-id\" --target \"1A2B3C4D5E6F7G8H9I0J\"\n```\n\n#### \ud83d\udccb New Defaults (Google-First)\n\n- **Target**: Same folder as source (intelligent default) \u2190 was \"local\"\n- **Format**: Google Sheets \u2190 was \"excel\"\n\n**Why This is Better:**\n1. **Seamless Google Workflow**: Fixed sheets stay in Google Drive where you can easily access them\n2. **Same Folder Organization**: Fixed sheets are created right next to your source sheets\n3. **No File Management**: No need to upload/download Excel files\n4. **Team Collaboration**: Everyone can access the fixed sheets immediately\n5. **Version History**: Google Sheets maintains version history of fixes\n6. **Automatic Audit Trail**: Fixed sheet URLs are automatically added to your metadata sheet\n\n#### \ud83d\udcdd New Metadata Columns\n\nAfter running the crawling script, your metadata sheet will automatically get new columns added:\n\n**For Google Sheets Output (Default):**\n- **`input_sheet_url_fixed`**: URLs to the fixed versions of input sheets\n- **`example_output_sheet_url_fixed`**: URLs to the fixed versions of output sheets\n- **`input_fixes_applied`**: Number of fixes applied to input sheets\n- **`input_flags_found`**: Number of flags found in input sheets\n- **`input_validation_summary`**: Summary of validation results for input sheets\n- **`input_validation_errors`**: Any validation errors for input sheets\n- **`output_fixes_applied`**: Number of fixes applied to output sheets\n- **`output_flags_found`**: Number of flags found in output sheets\n- **`output_validation_summary`**: Summary of validation results for output sheets\n- **`output_validation_errors`**: Any validation errors for output sheets\n\n**For Excel Output:**\n- **`input_sheet_path_fixed`**: Relative paths to the fixed Excel files for input sheets\n- **`example_output_sheet_path_fixed`**: Relative paths to the fixed Excel files for output sheets\n- **Plus all the same validator output columns as above**\n\n#### Example Results\n\n**Before:**\n| worker_id | input_sheet_url | example_output_sheet_url |\n|-----------|----------------|--------------------------|\n| ABC123    | https://docs.google.com/.../input123 | https://docs.google.com/.../output123 |\n\n**After (showing key columns):**\n| worker_id | input_sheet_url_fixed | input_fixes_applied | input_flags_found | input_validation_summary | output_sheet_url_fixed | output_fixes_applied | output_flags_found |\n|-----------|----------------------|--------------------|--------------------|-------------------------|----------------------|--------------------|--------------------|\n| ABC123    | https://docs.google.com/.../input123_fixed | 15 | 3 | \u2705 12 successful; \u274c 1 failed | https://docs.google.com/.../output123_fixed | 8 | 0 |\n\nThis creates a **complete audit trail** showing:\n- \ud83d\udd17 **Where the fixed sheets are located** (direct links)\n- \ud83d\udcca **Exactly what was fixed** (number of fixes applied)\n- \u26a0\ufe0f **What flags remain** (flags found but not fixed)\n- \u2705 **Validation success rate** (how many validators succeeded)\n- \ud83d\udea8 **Any errors encountered** (validation errors for troubleshooting)\n\n#### \ud83d\udcca Expected Output\n\n```\n\ud83d\ude80 Starting Urarovite Sheet Crawling and Validation\n==================================================\n[INFO] Metadata Sheet: https://docs.google.com/spreadsheets/d/...\n[INFO] Authentication: \u2713 Configured\n[INFO] Validation Mode: fix\n[INFO] Target: Same folder as source (intelligent default)\n[INFO] Format: sheets\n[INFO] Preserve Formatting: true\n\n\ud83d\udd0d Starting crawling and validation...\n   Metadata Sheet: https://docs.google.com/spreadsheets/d/...\n   Authentication: \u2713 Configured\n   Validation Mode: fix\n   Target: Same folder as source (intelligent default)\n   Format: sheets\n\n\ud83d\udccb CRAWLING AND VALIDATION RESULTS\n==================================================\n\u2705 Overall Status: SUCCESS\n\n\ud83d\udcca Summary Statistics:\n   Total Sheet Pairs: 15\n   Successful Pairs: 15\n   Failed Pairs: 0\n   Total Input Fixes: 47\n   Total Output Fixes: 23\n   Total Input flags: 12\n   Total Output flags: 8\n   Total Errors: 0\n\n\u23f1\ufe0f  Performance Metrics:\n   Total Time: 125.30 seconds\n   Crawling Time: 5.20 seconds\n   Validation Time: 120.10 seconds\n   Processing Rate: 0.12 pairs/second\n\n\ud83d\udcbe Output Files:\n   Results JSON: ./output/crawl_validation_results_20241220_143022.json\n   Processing Log: ./output/crawl_validation_20241220_143022.log\n   Validated Files: Check Google Drive - fixed sheets created in source folders\n\n\ud83c\udf89 Crawling and validation completed successfully!\n   Applied 70 fixes across all sheets\n   Check Google Drive for the fixed sheets\n```\n\n#### \ud83c\udfaf Perfect for Batch Processing\n\nThis crawling functionality is ideal for data cleaning tasks because:\n\n1. **Batch Processing**: Processes all your input/output sheet pairs at once\n2. **Intelligent Detection**: Automatically finds sheet URLs in your metadata\n3. **Comprehensive Validation**: Runs all available validators on each sheet\n4. **Google-Native**: Keeps everything in Google Drive for easy access\n5. **Detailed Reporting**: Shows exactly what was fixed and where\n\nJust run it once and get all your sheets validated and fixed! \ud83d\ude80\n\n#### Prerequisites for Crawling\n\n```bash\n# Set your authentication (required)\nexport AUTH_SECRET=\"eyJ0eXBlIjogInNlcnZpY2VfYWNjb3VudCIsIC4uLn0=\"\n\n# Optional: For domain-wide delegation\nexport DELEGATION_SUBJECT=\"user@yourdomain.com\"\n```\n\n### Advanced Usage with gspread\n\n```python\nfrom urarovite.auth import get_gspread_client, create_sheets_service_from_encoded_creds\nfrom urarovite.utils.sheets import extract_sheet_id, get_sheet_values\n\n# Create gspread client (recommended)\nclient = get_gspread_client(encoded_creds, subject=\"user@domain.com\")\nspreadsheet = client.open_by_key(sheet_id)\n\n# Or create traditional Google Sheets API service\nservice = create_sheets_service_from_encoded_creds(encoded_creds)\n\n# Use utility functions\nsheet_id = extract_sheet_id(\"https://docs.google.com/spreadsheets/d/1ABC123/edit\")\ndata = get_sheet_values(service, sheet_id, \"Sheet1!A1:Z1000\")\n```\n\n## \u2699\ufe0f Configuration\n\n### Range Separator Configuration\n\nUrarovite supports configurable range separators for parsing verification field ranges. By default, ranges are separated by `@@`, but you can configure this to use any separator (e.g., commas).\n\n#### Global Configuration\n\n```python\nfrom urarovite.utils.sheets import get_segment_separator, set_segment_separator\n\n# Check current separator\nprint(f\"Current separator: '{get_segment_separator()}'\")  # Default: '@@'\n\n# Change to comma separator\nset_segment_separator(\",\")\n\n# Now all functions will use comma by default\nfrom urarovite.utils.sheets import split_segments\nranges = \"Sheet1!A1:B2,Sheet2!C3:D4\"\nsegments = split_segments(ranges)  # Uses comma separator\nprint(segments)  # ['Sheet1!A1:B2', 'Sheet2!C3:D4']\n\n# Restore default\nset_segment_separator(\"@@\")\n```\n\n#### Per-Validator Configuration\n\nYou can also specify separators per validation call:\n\n```python\nfrom urarovite.validators import get_validator\n\nvalidator = get_validator(\"open_ended_ranges\")\n\n# Use comma separator for this validation\nresult = validator.validate(\n    spreadsheet_source=sheet_url,\n    mode=\"flag\",\n    auth_credentials=auth_creds,\n    row=row_data,\n    separator=\",\"  # Override separator for this call\n)\n```\n\n#### Supported Separators\n\n- **Default**: `@@` (double at-sign)\n- **Comma**: `,` (comma)\n- **Pipe**: `|` (pipe)\n- **Semicolon**: `;` (semicolon)\n- **Any string**: Custom separators as needed\n\n#### Backward Compatibility\n\nAll existing code continues to work unchanged:\n- Functions default to `@@` separator if no separator is specified\n- Explicit separator parameters override global settings\n- No breaking changes to existing APIs\n\n## \ud83d\udd27 Migration from OAuth\n\nIf you're migrating from OAuth-based authentication:\n\n```python\n# OLD: OAuth-based authentication\nfrom urarovite.checker.auth import get_credentials, get_sheets_service\ncreds = get_credentials()  # Interactive OAuth flow\nservice = get_sheets_service()\n\n# NEW: Service account with base64 credentials\nfrom urarovite.auth import create_sheets_service_from_encoded_creds\nservice = create_sheets_service_from_encoded_creds(encoded_creds)\n\n# Or use modern gspread client (recommended)\nfrom urarovite.auth import get_gspread_client\nclient = get_gspread_client(encoded_creds)\n```\n\n## \ud83d\udcda Documentation\n\n- **Migration Guide**: See [MIGRATION_SUMMARY.md](MIGRATION_SUMMARY.md) for detailed changes\n- **Validator Migration**: See [VALIDATOR_MIGRATION_GUIDE.md](VALIDATOR_MIGRATION_GUIDE.md) for validator development\n- **Spreadsheet Abstraction**: See [SPREADSHEET_ABSTRACTION_GUIDE.md](SPREADSHEET_ABSTRACTION_GUIDE.md) for multi-format support\n- **API Reference**: Full type hints and docstrings throughout the codebase\n- **Command Line Usage**: Use `./run_validation.sh` for batch validation operations\n- **Examples**: Check the `/tests` directory for comprehensive usage examples\n\n## \ud83e\uddea Testing\n\n```bash\n# Install with dev dependencies\npip install urarovite[dev]\n\n# Run tests\npytest\n\n# Run with coverage\npytest --cov=urarovite\n```\n\n## \ud83d\udda5\ufe0f Command Line Usage\n\nThe repository includes a dynamic CLI and legacy shell scripts for running validations, plus powerful batch utilities with automatic result tracking.\n\n### \ud83d\ude80 CLI Utilities (New)\n\nUrarovite provides a comprehensive set of CLI utilities for both single operations and batch processing. All batch utilities automatically add result columns to metadata spreadsheets, showing \"Passed\" or \"Failed\" for each processed row.\n\n#### Available Utilities\n\n```bash\n# View all available utilities\nuv run python -m urarovite.cli run_util --help\n\n# Batch processing utilities (automatically add result columns)\nuv run python -m urarovite.cli run_util batch-sanitize-tab-names \"metadata_sheet_url\" --url-columns \"tabs_to_fix\"\nuv run python -m urarovite.cli run_util batch-validate-a1-ranges \"metadata_sheet_url\" --url-columns \"tabs_to_fix\" --column \"formula_column\"\nuv run python -m urarovite.cli run_util batch-sheets-to-excel-drive \"metadata_sheet_url\" \"drive_folder_id\" --url-columns \"sheet_url\"\nuv run python -m urarovite.cli run_util batch-excel-to-sheets-drive \"metadata_sheet_url\" \"drive_folder_id\" --url-columns \"excel_url\"\nuv run python -m urarovite.cli run_util batch-rename-tabs-from-sheet \"metadata_sheet_url\" --url-column \"spreadsheet_url\" --old-name-column \"old_tab\" --new-name-column \"new_tab\"\n\n# Single operation utilities\nuv run python -m urarovite.cli run_util sanitize-tab-names \"spreadsheet_url\"\nuv run python -m urarovite.cli run_util validate-a1-ranges \"spreadsheet_url\" --column \"formula_column\"\nuv run python -m urarovite.cli run_util sheets-to-excel-drive \"sheet_url\" \"drive_folder_id\"\nuv run python -m urarovite.cli run_util excel-to-sheets-drive \"excel_file\" \"drive_folder_id\"\n\n# Specialized utilities\nuv run python -m urarovite.cli run_util process-forte \"csv_file\" --target \"drive_folder_id\"\nuv run python -m urarovite.cli run_util folder-batch \"input_folder\" \"drive_folder_id\"\nuv run python -m urarovite.cli run_util validate \"spreadsheet_url\" --validator \"empty_cells\" --validation-mode fix\n```\n\n#### \ud83d\udd04 Automatic Result Column Tracking\n\n**Batch utilities automatically add result columns to your metadata spreadsheet:**\n\n- **Result Column**: `\"[utility-name] result\"` (e.g., `\"batch-sanitize-tab-names result\"`)\n  - **Values**: `\"Passed\"` for successful operations, `\"Failed\"` for failed operations\n- **Converted File URL Columns**: Automatically added based on utility type\n  - **Sheets \u2192 Excel**: `\"excel_url\"` column with Google Drive URLs\n  - **Excel \u2192 Sheets**: `\"sheets_url\"` column with Google Sheets URLs\n  - **Custom Names**: Use `--excel-url-column` or `--sheets-url-column` to customize\n- **Position**: Added to the right of existing data\n- **Automatic**: No manual configuration required\n\n**Example Result Columns:**\n```\n| sheet_url | batch-sheets-to-excel-drive result | excel_url                    |\n|-----------|-----------------------------------|------------------------------|\n| sheet1    | Passed                            | https://drive.google.com/... |\n| sheet2    | Passed                            | https://drive.google.com/... |\n| sheet3    | Failed                            |                              |\n```\n\n#### \ud83d\udccb Requirements for Result Columns\n\n- **Google Sheets URLs**: Must be `https://docs.google.com/spreadsheets/...` format\n- **Authentication**: Requires valid `AUTH_SECRET` in `.env` file\n- **Batch Mode**: Automatically detected when using `--url-columns`\n- **Metadata Structure**: Input sheet must have URL columns specified\n\n#### \ud83c\udfaf Smart Mode Detection\n\n- **Automatic**: Utilities with `--url-columns` automatically switch to batch mode\n- **Manual Override**: Use `--mode batch` or `--mode single` to override\n- **Result Tracking**: Only batch mode adds result columns to metadata sheets\n\n### Per-Validator CLI Commands (Legacy)\n\n### Per-Validator CLI Commands (New)\n\nEach validator now has its own subcommand that accepts explicit parameters mapped to its `validate()` signature. Run `urarovite <validator-id> --help` to see arguments.\n\nExamples:\n\n```bash\n# Empty cells: fill all empty values with N/A on a local Excel file\nurarovite empty_cells ./data/spreadsheet.xlsx --mode fix --fill-value \"N/A\"\n\n# Empty cells with targeted ranges\nurarovite empty_cells ./data/spreadsheet.xlsx --mode flag --target-ranges \"'Case'!D72:D76\"\n\n# Tab names with custom replacement char\nurarovite tab_names 'https://docs.google.com/spreadsheets/d/abc123' --mode fix --auth-secret \"$AUTH_SECRET\" --replacement-char _\n\n# Numeric rounding (flag-only example)\nurarovite numeric_rounding ./data/spreadsheet.xlsx --mode flag\n\n# Cell value validation using JSON for expected_values\nurarovite cell_value_validation ./data/spreadsheet.xlsx --mode fix \\\n  --expected-values '{\"A1\": \"Title\", \"B2\": 100}' --tolerance 0.01\n\n# Spreadsheet differences with verification ranges\nurarovite run_util spreadsheet-differences \"task_sheet_url\" --validation-mode flag --input-url-column \"I\" --output-url-column \"J\" --verification-range-column \"H\"\n\n# Spreadsheet differences with custom columns and whole-sheet comparison disabled\nurarovite run_util spreadsheet-differences \"task_sheet_url\" --validation-mode fix --input-url-column \"A\" --output-url-column \"B\" --verification-range-column \"C\" --compare-whole-sheet false\n\n```\n\nNotes:\n- Common options on all validator commands:\n  - `--mode {flag,fix}`: required\n  - `--auth-secret`: base64 service account for Google Sheets\n  - `--subject`: delegation subject (optional)\n  - `--output {table,json}`: result display format (default: table)\n\n### Single Sheet Validation (`run_validation.sh`) (Legacy)\n\nFor validating individual spreadsheets with the legacy script:\n\n### Prerequisites\n\n1. **Make the script executable** (if needed):\n   ```bash\n   chmod +x run_validation.sh\n   ```\n\n2. **For Google Sheets validation**, create a `.env` file with your base64-encoded service account:\n   ```bash\n   # .env file\n   AUTH_SECRET=eyJ0eXBlIjogInNlcnZpY2VfYWNjb3VudCIsIC4uLn0=\n   ```\n\n### Usage Examples\n\n```bash\n# Run all validations on Google Sheets\n./run_validation.sh --all 'https://docs.google.com/spreadsheets/d/abc123'\n\n# Run all validations on local Excel file\n./run_validation.sh --all './data/spreadsheet.xlsx'\n\n# Run single validation with JSON\n./run_validation.sh --check '{\"id\": \"empty_cells\", \"mode\": \"fix\"}' 'https://docs.google.com/spreadsheets/d/abc123'\n\n# Run single validation on Excel file\n./run_validation.sh --check '{\"id\": \"tab_names\", \"mode\": \"fix\"}' './spreadsheet.xlsx'\n\n# With delegation subject (Google Sheets only)\n./run_validation.sh --all 'https://docs.google.com/spreadsheets/d/abc123' 'user@domain.com'\n\n# Load check from JSON file\necho '{\"id\": \"duplicate_rows\", \"mode\": \"flag\"}' > check.json\n./run_validation.sh --check check.json 'https://docs.google.com/spreadsheets/d/abc123'\n```\n\n### Script Options\n\n- `--all`: Run all available validation criteria\n- `--check <json_or_file>`: Run a single validation check (JSON string or file path)\n\n### Supported Input Types\n\n- **Google Sheets**: URLs containing `docs.google.com` (requires authentication)\n- **Excel Files**: Local `.xlsx` or `.xls` files (no authentication required)\n\nThe script automatically detects the input type and applies appropriate authentication requirements.\n\n### Batch Validation with Crawling (`run_crawl_validation.sh`)\n\nFor processing multiple sheets referenced in a metadata spreadsheet, see the [Sheet Crawling & Batch Validation](#sheet-crawling--batch-validation) section above. This script can automatically discover and validate all sheets in your data processing workflow.\n\n## \ud83d\udcda Documentation\n\nFor comprehensive guides and technical details:\n\n- **[CLI Utilities Complete Guide](./CLI_UTILITIES_COMPREHENSIVE_GUIDE.md)** - Complete CLI usage, technical implementation, and quick reference\n- **[CLI Utilities README](./CLI_UTILITIES_README.md)** - CLI utilities overview\n- **[Validation Guide](./COMPLETE_VALIDATORS_USER_GUIDE.md)** - Validator usage and examples\n- **[API Documentation](./SPREADSHEET_ABSTRACTION_GUIDE.md)** - Programmatic usage patterns\n- **[Authentication Guide](./USER_GUIDE.md)** - Detailed authentication setup\n\n## \ud83e\udd1d Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Install development dependencies: `pip install urarovite[dev]`\n4. Make your changes with proper type hints and tests\n5. Run tests and linting: `pytest && ruff check`\n6. Submit a pull request\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the GNU General Public License v3 (GPLv3) - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\udd17 Links\n\n- **PyPI**: <https://pypi.org/project/urarovite/>\n- **GitHub**: <https://github.com/ParetoWorkers/Urarovite>\n- **flags**: <https://github.com/ParetoWorkers/Urarovite/flags>\n",
    "bugtrack_url": null,
    "license": "GNU GENERAL PUBLIC LICENSE\n        Version 3, 29 June 2007\n        \n        Copyright (c) 2024 Mo Dorff\n        \n        Everyone is permitted to copy and distribute verbatim copies\n        of this license document, but changing it is not allowed.\n        \n        This program is free software: you can redistribute it and/or modify\n        it under the terms of the GNU General Public License as published by\n        the Free Software Foundation, either version 3 of the License, or\n        (at your option) any later version.\n        \n        This program is distributed in the hope that it will be useful,\n        but WITHOUT ANY WARRANTY; without even the implied warranty of\n        MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n        GNU General Public License for more details.\n        \n        You should have received a copy of the GNU General Public License\n        along with this program.  If not, see <https://www.gnu.org/licenses/>.",
    "summary": "A Google Sheets validation library",
    "version": "1.3.4",
    "project_urls": {
        "Changelog": "https://github.com/ParetoWorkers/Urarovite/blob/main/MIGRATION_SUMMARY.md",
        "Documentation": "https://github.com/ParetoWorkers/Urarovite#readme",
        "Homepage": "https://github.com/ParetoWorkers/Urarovite",
        "Repository": "https://github.com/ParetoWorkers/Urarovite.git",
        "flags": "https://github.com/ParetoWorkers/Urarovite/flags"
    },
    "split_keywords": [
        "data-quality",
        " google-sheets",
        " gspread",
        " spreadsheets",
        " validation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "01e49a4cdfd467a89c049cc11eb0a764653d7086484b29cdfd104457ab151a1f",
                "md5": "857b982c1c7a4111385eb01bb69d8c81",
                "sha256": "b294b5054d3f128a6bc6d7c2d868724527cc4276c0f30a574187d7649fdc15da"
            },
            "downloads": -1,
            "filename": "urarovite-1.3.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "857b982c1c7a4111385eb01bb69d8c81",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 340006,
            "upload_time": "2025-08-22T19:46:12",
            "upload_time_iso_8601": "2025-08-22T19:46:12.255872Z",
            "url": "https://files.pythonhosted.org/packages/01/e4/9a4cdfd467a89c049cc11eb0a764653d7086484b29cdfd104457ab151a1f/urarovite-1.3.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "548c8e3c6d1aafa5a097d8578905b19cd46ba6e392f0935cbd9725eca7efb705",
                "md5": "fa8fdaf105408a0847416d51fd1e94b8",
                "sha256": "79593dcc2611fbacf1ec5e12dbcafe9ec2d39b6f3785937bcef56d81633d04ad"
            },
            "downloads": -1,
            "filename": "urarovite-1.3.4.tar.gz",
            "has_sig": false,
            "md5_digest": "fa8fdaf105408a0847416d51fd1e94b8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 390175,
            "upload_time": "2025-08-22T19:46:13",
            "upload_time_iso_8601": "2025-08-22T19:46:13.411898Z",
            "url": "https://files.pythonhosted.org/packages/54/8c/8e3c6d1aafa5a097d8578905b19cd46ba6e392f0935cbd9725eca7efb705/urarovite-1.3.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-22 19:46:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ParetoWorkers",
    "github_project": "Urarovite",
    "github_not_found": true,
    "lcname": "urarovite"
}
        
Elapsed time: 4.80068s