splurge-dsv


Namesplurge-dsv JSON
Version 2025.1.0 PyPI version JSON
download
home_pageNone
SummaryA utility library for working with DSV (Delimited String Values) files
upload_time2025-08-26 16:49:43
maintainerNone
docs_urlNone
authorJim Schilling
requires_python>=3.10
licenseNone
keywords dsv csv tsv delimited parsing file-processing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # splurge-dsv

A robust Python library for parsing and processing delimited-separated value (DSV) files with advanced features for data validation, streaming, and error handling.

## Features

### ๐Ÿ”ง Core Functionality
- **Multi-format DSV Support**: Parse CSV, TSV, pipe-delimited, semicolon-delimited, and custom delimiter files
- **Flexible Parsing Options**: Configurable whitespace handling, bookend removal, and encoding support
- **Memory-Efficient Streaming**: Process large files without loading entire content into memory
- **Header/Footer Skipping**: Skip specified numbers of rows from start or end of files
- **Unicode Support**: Full Unicode character and delimiter support

### ๐Ÿ›ก๏ธ Security & Validation
- **Path Validation**: Comprehensive file path security validation with traversal attack prevention
- **File Permission Checks**: Automatic file accessibility and permission validation
- **Encoding Validation**: Robust encoding error detection and handling
- **Resource Management**: Automatic file handle cleanup and resource management

### ๐Ÿ“Š Advanced Processing
- **Chunked Processing**: Configurable chunk sizes for streaming large datasets
- **Mixed Content Handling**: Support for quoted and unquoted values in the same file
- **Line Ending Flexibility**: Automatic handling of different line ending formats
- **Error Recovery**: Graceful error handling with detailed error messages

### ๐Ÿงช Testing & Quality
- **Comprehensive Test Suite**: 90%+ code coverage with 250+ tests
- **Cross-Platform Support**: Tested on Windows, Linux, and macOS
- **Type Safety**: Full type annotations and validation
- **Documentation**: Complete API documentation with examples

## Installation

```bash
pip install splurge-dsv
```

## Quick Start

### Basic CSV Parsing

```python
from splurge_dsv import DsvHelper

# Parse a simple CSV string
data = DsvHelper.parse("a,b,c", delimiter=",")
print(data)  # ['a', 'b', 'c']

# Parse a CSV file
rows = DsvHelper.parse_file("data.csv", delimiter=",")
for row in rows:
    print(row)  # ['col1', 'col2', 'col3']
```

### Streaming Large Files

```python
from splurge_dsv import DsvHelper

# Stream a large CSV file in chunks
for chunk in DsvHelper.parse_stream("large_file.csv", delimiter=",", chunk_size=1000):
    for row in chunk:
        process_row(row)
```

### Advanced Parsing Options

```python
from splurge_dsv import DsvHelper

# Parse with custom options
data = DsvHelper.parse(
    '"a","b","c"',
    delimiter=",",
    bookend='"',
    strip=True,
    bookend_strip=True
)
print(data)  # ['a', 'b', 'c']

# Skip header and footer rows
rows = DsvHelper.parse_file(
    "data.csv",
    delimiter=",",
    skip_header_rows=1,
    skip_footer_rows=2
)
```

### Text File Operations

```python
from splurge_dsv import TextFileHelper

# Count lines in a file
line_count = TextFileHelper.line_count("data.txt")

# Preview first N lines
preview = TextFileHelper.preview("data.txt", max_lines=10)

# Read entire file with options
lines = TextFileHelper.read(
    "data.txt",
    strip=True,
    skip_header_rows=1,
    skip_footer_rows=1
)

# Stream file content
for chunk in TextFileHelper.read_as_stream("large_file.txt", chunk_size=500):
    process_chunk(chunk)
```

### Path Validation

```python
from splurge_dsv import PathValidator

# Validate a file path
valid_path = PathValidator.validate_path(
    "data.csv",
    must_exist=True,
    must_be_file=True,
    must_be_readable=True
)

# Check if path is safe
is_safe = PathValidator.is_safe_path("user_input_path.txt")
```

## API Reference

### DsvHelper

Main class for DSV parsing operations.

#### Methods

- `parse(content, delimiter, strip=True, bookend=None, bookend_strip=True)` - Parse a single string
- `parses(content_list, delimiter, strip=True, bookend=None, bookend_strip=True)` - Parse multiple strings
- `parse_file(file_path, delimiter, strip=True, bookend=None, bookend_strip=True, skip_header_rows=0, skip_footer_rows=0, encoding='utf-8')` - Parse a file
- `parse_stream(file_path, delimiter, strip=True, bookend=None, bookend_strip=True, skip_header_rows=0, skip_footer_rows=0, encoding='utf-8', chunk_size=500)` - Stream parse a file

### TextFileHelper

Utility class for text file operations.

#### Methods

- `line_count(file_path, encoding='utf-8')` - Count lines in a file
- `preview(file_path, max_lines=100, strip=True, encoding='utf-8', skip_header_rows=0)` - Preview file content
- `read(file_path, strip=True, encoding='utf-8', skip_header_rows=0, skip_footer_rows=0)` - Read entire file
- `read_as_stream(file_path, strip=True, encoding='utf-8', skip_header_rows=0, skip_footer_rows=0, chunk_size=500)` - Stream read file

### PathValidator

Security-focused path validation utilities.

#### Methods

- `validate_path(file_path, must_exist=False, must_be_file=False, must_be_readable=False, allow_relative=False, base_directory=None)` - Validate file path
- `is_safe_path(file_path)` - Check if path is safe
- `sanitize_filename(filename, default_name='file')` - Sanitize filename

### ResourceManager

Context managers for safe resource handling.

#### Classes

- `FileResourceManager` - Context manager for file operations
- `StreamResourceManager` - Context manager for stream operations

#### Functions

- `safe_file_operation(file_path, mode='r', encoding='utf-8', ...)` - Safe file operation context manager
- `safe_stream_operation(stream, auto_close=True)` - Safe stream operation context manager

## Error Handling

The library provides comprehensive error handling with custom exception classes:

- `SplurgeParameterError` - Invalid parameter values
- `SplurgeFileNotFoundError` - File not found
- `SplurgeFilePermissionError` - File permission issues
- `SplurgeFileEncodingError` - File encoding problems
- `SplurgePathValidationError` - Path validation failures
- `SplurgeResourceAcquisitionError` - Resource acquisition failures
- `SplurgeResourceReleaseError` - Resource cleanup failures

## Development

### Running Tests

```bash
# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=splurge_dsv --cov-report=html

# Run specific test file
pytest tests/test_dsv_helper.py -v
```

### Code Quality

The project follows strict coding standards:
- PEP 8 compliance
- Type annotations for all functions
- Google-style docstrings
- 90%+ test coverage requirement
- Comprehensive error handling

## Changelog

### 2025.1.0 (2025-08-25)

#### ๐ŸŽ‰ Major Features
- **Complete DSV Parser**: Full-featured delimited-separated value parser with support for CSV, TSV, and custom delimiters
- **Streaming Support**: Memory-efficient streaming for large files with configurable chunk sizes
- **Advanced Parsing Options**: Bookend removal, whitespace handling, and encoding support
- **Header/Footer Skipping**: Skip specified numbers of rows from start or end of files

#### ๐Ÿ›ก๏ธ Security Enhancements
- **Path Validation System**: Comprehensive file path security validation with traversal attack prevention
- **File Permission Checks**: Automatic file accessibility and permission validation
- **Encoding Validation**: Robust encoding error detection and handling

#### ๐Ÿ”ง Core Components
- **DsvHelper**: Main DSV parsing class with parse, parses, parse_file, and parse_stream methods
- **TextFileHelper**: Utility class for text file operations (line counting, preview, reading, streaming)
- **PathValidator**: Security-focused path validation utilities
- **ResourceManager**: Context managers for safe resource handling
- **StringTokenizer**: Core string parsing functionality

#### ๐Ÿงช Testing & Quality
- **Comprehensive Test Suite**: 250+ tests with 90%+ code coverage
- **Cross-Platform Testing**: Tested on Windows, Linux, and macOS
- **Type Safety**: Full type annotations throughout the codebase
- **Error Handling**: Custom exception hierarchy with detailed error messages

#### ๐Ÿ“š Documentation
- **Complete API Documentation**: Google-style docstrings for all public methods
- **Usage Examples**: Comprehensive examples for all major features
- **Error Documentation**: Detailed error handling documentation

#### ๐Ÿš€ Performance
- **Memory Efficiency**: Streaming support for large files
- **Optimized Parsing**: Efficient string tokenization and processing
- **Resource Management**: Automatic cleanup and resource management

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

## Support

For support, please open an issue on the GitHub repository or contact the maintainers.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "splurge-dsv",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "dsv, csv, tsv, delimited, parsing, file-processing",
    "author": "Jim Schilling",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/17/08/dfd075c6f1251b78fee7a617c0cb87e35eb2da97f8b3e608ef67fb1399d9/splurge_dsv-2025.1.0.tar.gz",
    "platform": null,
    "description": "# splurge-dsv\r\n\r\nA robust Python library for parsing and processing delimited-separated value (DSV) files with advanced features for data validation, streaming, and error handling.\r\n\r\n## Features\r\n\r\n### \ud83d\udd27 Core Functionality\r\n- **Multi-format DSV Support**: Parse CSV, TSV, pipe-delimited, semicolon-delimited, and custom delimiter files\r\n- **Flexible Parsing Options**: Configurable whitespace handling, bookend removal, and encoding support\r\n- **Memory-Efficient Streaming**: Process large files without loading entire content into memory\r\n- **Header/Footer Skipping**: Skip specified numbers of rows from start or end of files\r\n- **Unicode Support**: Full Unicode character and delimiter support\r\n\r\n### \ud83d\udee1\ufe0f Security & Validation\r\n- **Path Validation**: Comprehensive file path security validation with traversal attack prevention\r\n- **File Permission Checks**: Automatic file accessibility and permission validation\r\n- **Encoding Validation**: Robust encoding error detection and handling\r\n- **Resource Management**: Automatic file handle cleanup and resource management\r\n\r\n### \ud83d\udcca Advanced Processing\r\n- **Chunked Processing**: Configurable chunk sizes for streaming large datasets\r\n- **Mixed Content Handling**: Support for quoted and unquoted values in the same file\r\n- **Line Ending Flexibility**: Automatic handling of different line ending formats\r\n- **Error Recovery**: Graceful error handling with detailed error messages\r\n\r\n### \ud83e\uddea Testing & Quality\r\n- **Comprehensive Test Suite**: 90%+ code coverage with 250+ tests\r\n- **Cross-Platform Support**: Tested on Windows, Linux, and macOS\r\n- **Type Safety**: Full type annotations and validation\r\n- **Documentation**: Complete API documentation with examples\r\n\r\n## Installation\r\n\r\n```bash\r\npip install splurge-dsv\r\n```\r\n\r\n## Quick Start\r\n\r\n### Basic CSV Parsing\r\n\r\n```python\r\nfrom splurge_dsv import DsvHelper\r\n\r\n# Parse a simple CSV string\r\ndata = DsvHelper.parse(\"a,b,c\", delimiter=\",\")\r\nprint(data)  # ['a', 'b', 'c']\r\n\r\n# Parse a CSV file\r\nrows = DsvHelper.parse_file(\"data.csv\", delimiter=\",\")\r\nfor row in rows:\r\n    print(row)  # ['col1', 'col2', 'col3']\r\n```\r\n\r\n### Streaming Large Files\r\n\r\n```python\r\nfrom splurge_dsv import DsvHelper\r\n\r\n# Stream a large CSV file in chunks\r\nfor chunk in DsvHelper.parse_stream(\"large_file.csv\", delimiter=\",\", chunk_size=1000):\r\n    for row in chunk:\r\n        process_row(row)\r\n```\r\n\r\n### Advanced Parsing Options\r\n\r\n```python\r\nfrom splurge_dsv import DsvHelper\r\n\r\n# Parse with custom options\r\ndata = DsvHelper.parse(\r\n    '\"a\",\"b\",\"c\"',\r\n    delimiter=\",\",\r\n    bookend='\"',\r\n    strip=True,\r\n    bookend_strip=True\r\n)\r\nprint(data)  # ['a', 'b', 'c']\r\n\r\n# Skip header and footer rows\r\nrows = DsvHelper.parse_file(\r\n    \"data.csv\",\r\n    delimiter=\",\",\r\n    skip_header_rows=1,\r\n    skip_footer_rows=2\r\n)\r\n```\r\n\r\n### Text File Operations\r\n\r\n```python\r\nfrom splurge_dsv import TextFileHelper\r\n\r\n# Count lines in a file\r\nline_count = TextFileHelper.line_count(\"data.txt\")\r\n\r\n# Preview first N lines\r\npreview = TextFileHelper.preview(\"data.txt\", max_lines=10)\r\n\r\n# Read entire file with options\r\nlines = TextFileHelper.read(\r\n    \"data.txt\",\r\n    strip=True,\r\n    skip_header_rows=1,\r\n    skip_footer_rows=1\r\n)\r\n\r\n# Stream file content\r\nfor chunk in TextFileHelper.read_as_stream(\"large_file.txt\", chunk_size=500):\r\n    process_chunk(chunk)\r\n```\r\n\r\n### Path Validation\r\n\r\n```python\r\nfrom splurge_dsv import PathValidator\r\n\r\n# Validate a file path\r\nvalid_path = PathValidator.validate_path(\r\n    \"data.csv\",\r\n    must_exist=True,\r\n    must_be_file=True,\r\n    must_be_readable=True\r\n)\r\n\r\n# Check if path is safe\r\nis_safe = PathValidator.is_safe_path(\"user_input_path.txt\")\r\n```\r\n\r\n## API Reference\r\n\r\n### DsvHelper\r\n\r\nMain class for DSV parsing operations.\r\n\r\n#### Methods\r\n\r\n- `parse(content, delimiter, strip=True, bookend=None, bookend_strip=True)` - Parse a single string\r\n- `parses(content_list, delimiter, strip=True, bookend=None, bookend_strip=True)` - Parse multiple strings\r\n- `parse_file(file_path, delimiter, strip=True, bookend=None, bookend_strip=True, skip_header_rows=0, skip_footer_rows=0, encoding='utf-8')` - Parse a file\r\n- `parse_stream(file_path, delimiter, strip=True, bookend=None, bookend_strip=True, skip_header_rows=0, skip_footer_rows=0, encoding='utf-8', chunk_size=500)` - Stream parse a file\r\n\r\n### TextFileHelper\r\n\r\nUtility class for text file operations.\r\n\r\n#### Methods\r\n\r\n- `line_count(file_path, encoding='utf-8')` - Count lines in a file\r\n- `preview(file_path, max_lines=100, strip=True, encoding='utf-8', skip_header_rows=0)` - Preview file content\r\n- `read(file_path, strip=True, encoding='utf-8', skip_header_rows=0, skip_footer_rows=0)` - Read entire file\r\n- `read_as_stream(file_path, strip=True, encoding='utf-8', skip_header_rows=0, skip_footer_rows=0, chunk_size=500)` - Stream read file\r\n\r\n### PathValidator\r\n\r\nSecurity-focused path validation utilities.\r\n\r\n#### Methods\r\n\r\n- `validate_path(file_path, must_exist=False, must_be_file=False, must_be_readable=False, allow_relative=False, base_directory=None)` - Validate file path\r\n- `is_safe_path(file_path)` - Check if path is safe\r\n- `sanitize_filename(filename, default_name='file')` - Sanitize filename\r\n\r\n### ResourceManager\r\n\r\nContext managers for safe resource handling.\r\n\r\n#### Classes\r\n\r\n- `FileResourceManager` - Context manager for file operations\r\n- `StreamResourceManager` - Context manager for stream operations\r\n\r\n#### Functions\r\n\r\n- `safe_file_operation(file_path, mode='r', encoding='utf-8', ...)` - Safe file operation context manager\r\n- `safe_stream_operation(stream, auto_close=True)` - Safe stream operation context manager\r\n\r\n## Error Handling\r\n\r\nThe library provides comprehensive error handling with custom exception classes:\r\n\r\n- `SplurgeParameterError` - Invalid parameter values\r\n- `SplurgeFileNotFoundError` - File not found\r\n- `SplurgeFilePermissionError` - File permission issues\r\n- `SplurgeFileEncodingError` - File encoding problems\r\n- `SplurgePathValidationError` - Path validation failures\r\n- `SplurgeResourceAcquisitionError` - Resource acquisition failures\r\n- `SplurgeResourceReleaseError` - Resource cleanup failures\r\n\r\n## Development\r\n\r\n### Running Tests\r\n\r\n```bash\r\n# Run all tests\r\npytest tests/ -v\r\n\r\n# Run with coverage\r\npytest tests/ --cov=splurge_dsv --cov-report=html\r\n\r\n# Run specific test file\r\npytest tests/test_dsv_helper.py -v\r\n```\r\n\r\n### Code Quality\r\n\r\nThe project follows strict coding standards:\r\n- PEP 8 compliance\r\n- Type annotations for all functions\r\n- Google-style docstrings\r\n- 90%+ test coverage requirement\r\n- Comprehensive error handling\r\n\r\n## Changelog\r\n\r\n### 2025.1.0 (2025-08-25)\r\n\r\n#### \ud83c\udf89 Major Features\r\n- **Complete DSV Parser**: Full-featured delimited-separated value parser with support for CSV, TSV, and custom delimiters\r\n- **Streaming Support**: Memory-efficient streaming for large files with configurable chunk sizes\r\n- **Advanced Parsing Options**: Bookend removal, whitespace handling, and encoding support\r\n- **Header/Footer Skipping**: Skip specified numbers of rows from start or end of files\r\n\r\n#### \ud83d\udee1\ufe0f Security Enhancements\r\n- **Path Validation System**: Comprehensive file path security validation with traversal attack prevention\r\n- **File Permission Checks**: Automatic file accessibility and permission validation\r\n- **Encoding Validation**: Robust encoding error detection and handling\r\n\r\n#### \ud83d\udd27 Core Components\r\n- **DsvHelper**: Main DSV parsing class with parse, parses, parse_file, and parse_stream methods\r\n- **TextFileHelper**: Utility class for text file operations (line counting, preview, reading, streaming)\r\n- **PathValidator**: Security-focused path validation utilities\r\n- **ResourceManager**: Context managers for safe resource handling\r\n- **StringTokenizer**: Core string parsing functionality\r\n\r\n#### \ud83e\uddea Testing & Quality\r\n- **Comprehensive Test Suite**: 250+ tests with 90%+ code coverage\r\n- **Cross-Platform Testing**: Tested on Windows, Linux, and macOS\r\n- **Type Safety**: Full type annotations throughout the codebase\r\n- **Error Handling**: Custom exception hierarchy with detailed error messages\r\n\r\n#### \ud83d\udcda Documentation\r\n- **Complete API Documentation**: Google-style docstrings for all public methods\r\n- **Usage Examples**: Comprehensive examples for all major features\r\n- **Error Documentation**: Detailed error handling documentation\r\n\r\n#### \ud83d\ude80 Performance\r\n- **Memory Efficiency**: Streaming support for large files\r\n- **Optimized Parsing**: Efficient string tokenization and processing\r\n- **Resource Management**: Automatic cleanup and resource management\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## Contributing\r\n\r\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\r\n\r\n## Support\r\n\r\nFor support, please open an issue on the GitHub repository or contact the maintainers.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A utility library for working with DSV (Delimited String Values) files",
    "version": "2025.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/jim-schilling/splurge-dsv/issues",
        "Documentation": "https://github.com/jim-schilling/splurge-dsv#readme",
        "Homepage": "https://github.com/jim-schilling/splurge-dsv",
        "Repository": "https://github.com/jim-schilling/splurge-dsv"
    },
    "split_keywords": [
        "dsv",
        " csv",
        " tsv",
        " delimited",
        " parsing",
        " file-processing"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1dfe8d77e407312b752b7097737c61528687a18e9a74bb849ad888d04afde525",
                "md5": "4f10e7ff338d9762c12e5cc1cad50d27",
                "sha256": "05c92881f8e706509a8d4ba5b6dba21c2de7a2b37db89929eb49fdfbbc731513"
            },
            "downloads": -1,
            "filename": "splurge_dsv-2025.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4f10e7ff338d9762c12e5cc1cad50d27",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 18454,
            "upload_time": "2025-08-26T16:49:42",
            "upload_time_iso_8601": "2025-08-26T16:49:42.287488Z",
            "url": "https://files.pythonhosted.org/packages/1d/fe/8d77e407312b752b7097737c61528687a18e9a74bb849ad888d04afde525/splurge_dsv-2025.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1708dfd075c6f1251b78fee7a617c0cb87e35eb2da97f8b3e608ef67fb1399d9",
                "md5": "1fc963158dfbc2e134364a3a86d74c01",
                "sha256": "063035cfc7efa36bcafbe10358266d74e66620eff3e1d924b2259b56636812bf"
            },
            "downloads": -1,
            "filename": "splurge_dsv-2025.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "1fc963158dfbc2e134364a3a86d74c01",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 31044,
            "upload_time": "2025-08-26T16:49:43",
            "upload_time_iso_8601": "2025-08-26T16:49:43.568904Z",
            "url": "https://files.pythonhosted.org/packages/17/08/dfd075c6f1251b78fee7a617c0cb87e35eb2da97f8b3e608ef67fb1399d9/splurge_dsv-2025.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-26 16:49:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jim-schilling",
    "github_project": "splurge-dsv",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "splurge-dsv"
}
        
Elapsed time: 1.72881s