# splurge-dsv
A robust Python library for parsing and processing delimited-separated value (DSV) files with advanced features for data validation, streaming, and error handling.
## Features
### ๐ง Core Functionality
- **Multi-format DSV Support**: Parse CSV, TSV, pipe-delimited, semicolon-delimited, and custom delimiter files
- **Flexible Parsing Options**: Configurable whitespace handling, bookend removal, and encoding support
- **Memory-Efficient Streaming**: Process large files without loading entire content into memory
- **Header/Footer Skipping**: Skip specified numbers of rows from start or end of files
- **Unicode Support**: Full Unicode character and delimiter support
### ๐ก๏ธ Security & Validation
- **Path Validation**: Comprehensive file path security validation with traversal attack prevention
- **File Permission Checks**: Automatic file accessibility and permission validation
- **Encoding Validation**: Robust encoding error detection and handling
- **Resource Management**: Automatic file handle cleanup and resource management
### ๐ Advanced Processing
- **Chunked Processing**: Configurable chunk sizes for streaming large datasets
- **Mixed Content Handling**: Support for quoted and unquoted values in the same file
- **Line Ending Flexibility**: Automatic handling of different line ending formats
- **Error Recovery**: Graceful error handling with detailed error messages
### ๐งช Testing & Quality
- **Comprehensive Test Suite**: 90%+ code coverage with 250+ tests
- **Cross-Platform Support**: Tested on Windows, Linux, and macOS
- **Type Safety**: Full type annotations and validation
- **Documentation**: Complete API documentation with examples
## Installation
```bash
pip install splurge-dsv
```
## Quick Start
### Basic CSV Parsing
```python
from splurge_dsv import DsvHelper
# Parse a simple CSV string
data = DsvHelper.parse("a,b,c", delimiter=",")
print(data) # ['a', 'b', 'c']
# Parse a CSV file
rows = DsvHelper.parse_file("data.csv", delimiter=",")
for row in rows:
print(row) # ['col1', 'col2', 'col3']
```
### Streaming Large Files
```python
from splurge_dsv import DsvHelper
# Stream a large CSV file in chunks
for chunk in DsvHelper.parse_stream("large_file.csv", delimiter=",", chunk_size=1000):
for row in chunk:
process_row(row)
```
### Advanced Parsing Options
```python
from splurge_dsv import DsvHelper
# Parse with custom options
data = DsvHelper.parse(
'"a","b","c"',
delimiter=",",
bookend='"',
strip=True,
bookend_strip=True
)
print(data) # ['a', 'b', 'c']
# Skip header and footer rows
rows = DsvHelper.parse_file(
"data.csv",
delimiter=",",
skip_header_rows=1,
skip_footer_rows=2
)
```
### Text File Operations
```python
from splurge_dsv import TextFileHelper
# Count lines in a file
line_count = TextFileHelper.line_count("data.txt")
# Preview first N lines
preview = TextFileHelper.preview("data.txt", max_lines=10)
# Read entire file with options
lines = TextFileHelper.read(
"data.txt",
strip=True,
skip_header_rows=1,
skip_footer_rows=1
)
# Stream file content
for chunk in TextFileHelper.read_as_stream("large_file.txt", chunk_size=500):
process_chunk(chunk)
```
### Path Validation
```python
from splurge_dsv import PathValidator
# Validate a file path
valid_path = PathValidator.validate_path(
"data.csv",
must_exist=True,
must_be_file=True,
must_be_readable=True
)
# Check if path is safe
is_safe = PathValidator.is_safe_path("user_input_path.txt")
```
## API Reference
### DsvHelper
Main class for DSV parsing operations.
#### Methods
- `parse(content, delimiter, strip=True, bookend=None, bookend_strip=True)` - Parse a single string
- `parses(content_list, delimiter, strip=True, bookend=None, bookend_strip=True)` - Parse multiple strings
- `parse_file(file_path, delimiter, strip=True, bookend=None, bookend_strip=True, skip_header_rows=0, skip_footer_rows=0, encoding='utf-8')` - Parse a file
- `parse_stream(file_path, delimiter, strip=True, bookend=None, bookend_strip=True, skip_header_rows=0, skip_footer_rows=0, encoding='utf-8', chunk_size=500)` - Stream parse a file
### TextFileHelper
Utility class for text file operations.
#### Methods
- `line_count(file_path, encoding='utf-8')` - Count lines in a file
- `preview(file_path, max_lines=100, strip=True, encoding='utf-8', skip_header_rows=0)` - Preview file content
- `read(file_path, strip=True, encoding='utf-8', skip_header_rows=0, skip_footer_rows=0)` - Read entire file
- `read_as_stream(file_path, strip=True, encoding='utf-8', skip_header_rows=0, skip_footer_rows=0, chunk_size=500)` - Stream read file
### PathValidator
Security-focused path validation utilities.
#### Methods
- `validate_path(file_path, must_exist=False, must_be_file=False, must_be_readable=False, allow_relative=False, base_directory=None)` - Validate file path
- `is_safe_path(file_path)` - Check if path is safe
- `sanitize_filename(filename, default_name='file')` - Sanitize filename
### ResourceManager
Context managers for safe resource handling.
#### Classes
- `FileResourceManager` - Context manager for file operations
- `StreamResourceManager` - Context manager for stream operations
#### Functions
- `safe_file_operation(file_path, mode='r', encoding='utf-8', ...)` - Safe file operation context manager
- `safe_stream_operation(stream, auto_close=True)` - Safe stream operation context manager
## Error Handling
The library provides comprehensive error handling with custom exception classes:
- `SplurgeParameterError` - Invalid parameter values
- `SplurgeFileNotFoundError` - File not found
- `SplurgeFilePermissionError` - File permission issues
- `SplurgeFileEncodingError` - File encoding problems
- `SplurgePathValidationError` - Path validation failures
- `SplurgeResourceAcquisitionError` - Resource acquisition failures
- `SplurgeResourceReleaseError` - Resource cleanup failures
## Development
### Running Tests
```bash
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=splurge_dsv --cov-report=html
# Run specific test file
pytest tests/test_dsv_helper.py -v
```
### Code Quality
The project follows strict coding standards:
- PEP 8 compliance
- Type annotations for all functions
- Google-style docstrings
- 90%+ test coverage requirement
- Comprehensive error handling
## Changelog
### 2025.1.0 (2025-08-25)
#### ๐ Major Features
- **Complete DSV Parser**: Full-featured delimited-separated value parser with support for CSV, TSV, and custom delimiters
- **Streaming Support**: Memory-efficient streaming for large files with configurable chunk sizes
- **Advanced Parsing Options**: Bookend removal, whitespace handling, and encoding support
- **Header/Footer Skipping**: Skip specified numbers of rows from start or end of files
#### ๐ก๏ธ Security Enhancements
- **Path Validation System**: Comprehensive file path security validation with traversal attack prevention
- **File Permission Checks**: Automatic file accessibility and permission validation
- **Encoding Validation**: Robust encoding error detection and handling
#### ๐ง Core Components
- **DsvHelper**: Main DSV parsing class with parse, parses, parse_file, and parse_stream methods
- **TextFileHelper**: Utility class for text file operations (line counting, preview, reading, streaming)
- **PathValidator**: Security-focused path validation utilities
- **ResourceManager**: Context managers for safe resource handling
- **StringTokenizer**: Core string parsing functionality
#### ๐งช Testing & Quality
- **Comprehensive Test Suite**: 250+ tests with 90%+ code coverage
- **Cross-Platform Testing**: Tested on Windows, Linux, and macOS
- **Type Safety**: Full type annotations throughout the codebase
- **Error Handling**: Custom exception hierarchy with detailed error messages
#### ๐ Documentation
- **Complete API Documentation**: Google-style docstrings for all public methods
- **Usage Examples**: Comprehensive examples for all major features
- **Error Documentation**: Detailed error handling documentation
#### ๐ Performance
- **Memory Efficiency**: Streaming support for large files
- **Optimized Parsing**: Efficient string tokenization and processing
- **Resource Management**: Automatic cleanup and resource management
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
## Support
For support, please open an issue on the GitHub repository or contact the maintainers.
Raw data
{
"_id": null,
"home_page": null,
"name": "splurge-dsv",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "dsv, csv, tsv, delimited, parsing, file-processing",
"author": "Jim Schilling",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/17/08/dfd075c6f1251b78fee7a617c0cb87e35eb2da97f8b3e608ef67fb1399d9/splurge_dsv-2025.1.0.tar.gz",
"platform": null,
"description": "# splurge-dsv\r\n\r\nA robust Python library for parsing and processing delimited-separated value (DSV) files with advanced features for data validation, streaming, and error handling.\r\n\r\n## Features\r\n\r\n### \ud83d\udd27 Core Functionality\r\n- **Multi-format DSV Support**: Parse CSV, TSV, pipe-delimited, semicolon-delimited, and custom delimiter files\r\n- **Flexible Parsing Options**: Configurable whitespace handling, bookend removal, and encoding support\r\n- **Memory-Efficient Streaming**: Process large files without loading entire content into memory\r\n- **Header/Footer Skipping**: Skip specified numbers of rows from start or end of files\r\n- **Unicode Support**: Full Unicode character and delimiter support\r\n\r\n### \ud83d\udee1\ufe0f Security & Validation\r\n- **Path Validation**: Comprehensive file path security validation with traversal attack prevention\r\n- **File Permission Checks**: Automatic file accessibility and permission validation\r\n- **Encoding Validation**: Robust encoding error detection and handling\r\n- **Resource Management**: Automatic file handle cleanup and resource management\r\n\r\n### \ud83d\udcca Advanced Processing\r\n- **Chunked Processing**: Configurable chunk sizes for streaming large datasets\r\n- **Mixed Content Handling**: Support for quoted and unquoted values in the same file\r\n- **Line Ending Flexibility**: Automatic handling of different line ending formats\r\n- **Error Recovery**: Graceful error handling with detailed error messages\r\n\r\n### \ud83e\uddea Testing & Quality\r\n- **Comprehensive Test Suite**: 90%+ code coverage with 250+ tests\r\n- **Cross-Platform Support**: Tested on Windows, Linux, and macOS\r\n- **Type Safety**: Full type annotations and validation\r\n- **Documentation**: Complete API documentation with examples\r\n\r\n## Installation\r\n\r\n```bash\r\npip install splurge-dsv\r\n```\r\n\r\n## Quick Start\r\n\r\n### Basic CSV Parsing\r\n\r\n```python\r\nfrom splurge_dsv import DsvHelper\r\n\r\n# Parse a simple CSV string\r\ndata = DsvHelper.parse(\"a,b,c\", delimiter=\",\")\r\nprint(data) # ['a', 'b', 'c']\r\n\r\n# Parse a CSV file\r\nrows = DsvHelper.parse_file(\"data.csv\", delimiter=\",\")\r\nfor row in rows:\r\n print(row) # ['col1', 'col2', 'col3']\r\n```\r\n\r\n### Streaming Large Files\r\n\r\n```python\r\nfrom splurge_dsv import DsvHelper\r\n\r\n# Stream a large CSV file in chunks\r\nfor chunk in DsvHelper.parse_stream(\"large_file.csv\", delimiter=\",\", chunk_size=1000):\r\n for row in chunk:\r\n process_row(row)\r\n```\r\n\r\n### Advanced Parsing Options\r\n\r\n```python\r\nfrom splurge_dsv import DsvHelper\r\n\r\n# Parse with custom options\r\ndata = DsvHelper.parse(\r\n '\"a\",\"b\",\"c\"',\r\n delimiter=\",\",\r\n bookend='\"',\r\n strip=True,\r\n bookend_strip=True\r\n)\r\nprint(data) # ['a', 'b', 'c']\r\n\r\n# Skip header and footer rows\r\nrows = DsvHelper.parse_file(\r\n \"data.csv\",\r\n delimiter=\",\",\r\n skip_header_rows=1,\r\n skip_footer_rows=2\r\n)\r\n```\r\n\r\n### Text File Operations\r\n\r\n```python\r\nfrom splurge_dsv import TextFileHelper\r\n\r\n# Count lines in a file\r\nline_count = TextFileHelper.line_count(\"data.txt\")\r\n\r\n# Preview first N lines\r\npreview = TextFileHelper.preview(\"data.txt\", max_lines=10)\r\n\r\n# Read entire file with options\r\nlines = TextFileHelper.read(\r\n \"data.txt\",\r\n strip=True,\r\n skip_header_rows=1,\r\n skip_footer_rows=1\r\n)\r\n\r\n# Stream file content\r\nfor chunk in TextFileHelper.read_as_stream(\"large_file.txt\", chunk_size=500):\r\n process_chunk(chunk)\r\n```\r\n\r\n### Path Validation\r\n\r\n```python\r\nfrom splurge_dsv import PathValidator\r\n\r\n# Validate a file path\r\nvalid_path = PathValidator.validate_path(\r\n \"data.csv\",\r\n must_exist=True,\r\n must_be_file=True,\r\n must_be_readable=True\r\n)\r\n\r\n# Check if path is safe\r\nis_safe = PathValidator.is_safe_path(\"user_input_path.txt\")\r\n```\r\n\r\n## API Reference\r\n\r\n### DsvHelper\r\n\r\nMain class for DSV parsing operations.\r\n\r\n#### Methods\r\n\r\n- `parse(content, delimiter, strip=True, bookend=None, bookend_strip=True)` - Parse a single string\r\n- `parses(content_list, delimiter, strip=True, bookend=None, bookend_strip=True)` - Parse multiple strings\r\n- `parse_file(file_path, delimiter, strip=True, bookend=None, bookend_strip=True, skip_header_rows=0, skip_footer_rows=0, encoding='utf-8')` - Parse a file\r\n- `parse_stream(file_path, delimiter, strip=True, bookend=None, bookend_strip=True, skip_header_rows=0, skip_footer_rows=0, encoding='utf-8', chunk_size=500)` - Stream parse a file\r\n\r\n### TextFileHelper\r\n\r\nUtility class for text file operations.\r\n\r\n#### Methods\r\n\r\n- `line_count(file_path, encoding='utf-8')` - Count lines in a file\r\n- `preview(file_path, max_lines=100, strip=True, encoding='utf-8', skip_header_rows=0)` - Preview file content\r\n- `read(file_path, strip=True, encoding='utf-8', skip_header_rows=0, skip_footer_rows=0)` - Read entire file\r\n- `read_as_stream(file_path, strip=True, encoding='utf-8', skip_header_rows=0, skip_footer_rows=0, chunk_size=500)` - Stream read file\r\n\r\n### PathValidator\r\n\r\nSecurity-focused path validation utilities.\r\n\r\n#### Methods\r\n\r\n- `validate_path(file_path, must_exist=False, must_be_file=False, must_be_readable=False, allow_relative=False, base_directory=None)` - Validate file path\r\n- `is_safe_path(file_path)` - Check if path is safe\r\n- `sanitize_filename(filename, default_name='file')` - Sanitize filename\r\n\r\n### ResourceManager\r\n\r\nContext managers for safe resource handling.\r\n\r\n#### Classes\r\n\r\n- `FileResourceManager` - Context manager for file operations\r\n- `StreamResourceManager` - Context manager for stream operations\r\n\r\n#### Functions\r\n\r\n- `safe_file_operation(file_path, mode='r', encoding='utf-8', ...)` - Safe file operation context manager\r\n- `safe_stream_operation(stream, auto_close=True)` - Safe stream operation context manager\r\n\r\n## Error Handling\r\n\r\nThe library provides comprehensive error handling with custom exception classes:\r\n\r\n- `SplurgeParameterError` - Invalid parameter values\r\n- `SplurgeFileNotFoundError` - File not found\r\n- `SplurgeFilePermissionError` - File permission issues\r\n- `SplurgeFileEncodingError` - File encoding problems\r\n- `SplurgePathValidationError` - Path validation failures\r\n- `SplurgeResourceAcquisitionError` - Resource acquisition failures\r\n- `SplurgeResourceReleaseError` - Resource cleanup failures\r\n\r\n## Development\r\n\r\n### Running Tests\r\n\r\n```bash\r\n# Run all tests\r\npytest tests/ -v\r\n\r\n# Run with coverage\r\npytest tests/ --cov=splurge_dsv --cov-report=html\r\n\r\n# Run specific test file\r\npytest tests/test_dsv_helper.py -v\r\n```\r\n\r\n### Code Quality\r\n\r\nThe project follows strict coding standards:\r\n- PEP 8 compliance\r\n- Type annotations for all functions\r\n- Google-style docstrings\r\n- 90%+ test coverage requirement\r\n- Comprehensive error handling\r\n\r\n## Changelog\r\n\r\n### 2025.1.0 (2025-08-25)\r\n\r\n#### \ud83c\udf89 Major Features\r\n- **Complete DSV Parser**: Full-featured delimited-separated value parser with support for CSV, TSV, and custom delimiters\r\n- **Streaming Support**: Memory-efficient streaming for large files with configurable chunk sizes\r\n- **Advanced Parsing Options**: Bookend removal, whitespace handling, and encoding support\r\n- **Header/Footer Skipping**: Skip specified numbers of rows from start or end of files\r\n\r\n#### \ud83d\udee1\ufe0f Security Enhancements\r\n- **Path Validation System**: Comprehensive file path security validation with traversal attack prevention\r\n- **File Permission Checks**: Automatic file accessibility and permission validation\r\n- **Encoding Validation**: Robust encoding error detection and handling\r\n\r\n#### \ud83d\udd27 Core Components\r\n- **DsvHelper**: Main DSV parsing class with parse, parses, parse_file, and parse_stream methods\r\n- **TextFileHelper**: Utility class for text file operations (line counting, preview, reading, streaming)\r\n- **PathValidator**: Security-focused path validation utilities\r\n- **ResourceManager**: Context managers for safe resource handling\r\n- **StringTokenizer**: Core string parsing functionality\r\n\r\n#### \ud83e\uddea Testing & Quality\r\n- **Comprehensive Test Suite**: 250+ tests with 90%+ code coverage\r\n- **Cross-Platform Testing**: Tested on Windows, Linux, and macOS\r\n- **Type Safety**: Full type annotations throughout the codebase\r\n- **Error Handling**: Custom exception hierarchy with detailed error messages\r\n\r\n#### \ud83d\udcda Documentation\r\n- **Complete API Documentation**: Google-style docstrings for all public methods\r\n- **Usage Examples**: Comprehensive examples for all major features\r\n- **Error Documentation**: Detailed error handling documentation\r\n\r\n#### \ud83d\ude80 Performance\r\n- **Memory Efficiency**: Streaming support for large files\r\n- **Optimized Parsing**: Efficient string tokenization and processing\r\n- **Resource Management**: Automatic cleanup and resource management\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## Contributing\r\n\r\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\r\n\r\n## Support\r\n\r\nFor support, please open an issue on the GitHub repository or contact the maintainers.\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A utility library for working with DSV (Delimited String Values) files",
"version": "2025.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/jim-schilling/splurge-dsv/issues",
"Documentation": "https://github.com/jim-schilling/splurge-dsv#readme",
"Homepage": "https://github.com/jim-schilling/splurge-dsv",
"Repository": "https://github.com/jim-schilling/splurge-dsv"
},
"split_keywords": [
"dsv",
" csv",
" tsv",
" delimited",
" parsing",
" file-processing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "1dfe8d77e407312b752b7097737c61528687a18e9a74bb849ad888d04afde525",
"md5": "4f10e7ff338d9762c12e5cc1cad50d27",
"sha256": "05c92881f8e706509a8d4ba5b6dba21c2de7a2b37db89929eb49fdfbbc731513"
},
"downloads": -1,
"filename": "splurge_dsv-2025.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4f10e7ff338d9762c12e5cc1cad50d27",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 18454,
"upload_time": "2025-08-26T16:49:42",
"upload_time_iso_8601": "2025-08-26T16:49:42.287488Z",
"url": "https://files.pythonhosted.org/packages/1d/fe/8d77e407312b752b7097737c61528687a18e9a74bb849ad888d04afde525/splurge_dsv-2025.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1708dfd075c6f1251b78fee7a617c0cb87e35eb2da97f8b3e608ef67fb1399d9",
"md5": "1fc963158dfbc2e134364a3a86d74c01",
"sha256": "063035cfc7efa36bcafbe10358266d74e66620eff3e1d924b2259b56636812bf"
},
"downloads": -1,
"filename": "splurge_dsv-2025.1.0.tar.gz",
"has_sig": false,
"md5_digest": "1fc963158dfbc2e134364a3a86d74c01",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 31044,
"upload_time": "2025-08-26T16:49:43",
"upload_time_iso_8601": "2025-08-26T16:49:43.568904Z",
"url": "https://files.pythonhosted.org/packages/17/08/dfd075c6f1251b78fee7a617c0cb87e35eb2da97f8b3e608ef67fb1399d9/splurge_dsv-2025.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-26 16:49:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jim-schilling",
"github_project": "splurge-dsv",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "splurge-dsv"
}