# kiarina-utils-file
A comprehensive Python library for file I/O operations with automatic encoding detection, MIME type detection, and support for various file formats.
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
## Features
### 🚀 **Comprehensive File I/O**
- **Multiple file formats**: Text, binary, JSON, YAML
- **Sync & Async support**: Full async/await support for high-performance applications
- **Atomic operations**: Safe file writing with temporary files and locking
- **Thread safety**: File locking mechanisms prevent concurrent access issues
### 🔍 **Smart Detection**
- **Automatic encoding detection**: Smart handling of various text encodings with nkf support
- **MIME type detection**: Automatic content type identification using multiple detection methods
- **Extension handling**: Support for complex multi-part extensions (.tar.gz, .tar.gz.gpg)
### 📦 **Data Containers**
- **FileBlob**: Unified file data container with metadata and path information
- **MIMEBlob**: MIME-typed binary data container with format conversion support
- **Hash-based naming**: Content-addressable file naming using cryptographic hashes
### 🛡️ **Production Ready**
- **Error handling**: Graceful handling of missing files with configurable defaults
- **Performance optimized**: Non-blocking I/O operations and efficient caching
- **Type safety**: Full type hints and comprehensive testing
## Installation
```bash
pip install kiarina-utils-file
```
### Optional Dependencies
For enhanced functionality, install optional dependencies:
```bash
# For MIME type detection from file content
pip install kiarina-utils-file[mime]
# Or install with all optional dependencies
pip install kiarina-utils-file[all]
```
## Quick Start
### Basic File Operations
```python
import kiarina.utils.file as kf
# Read and write text files with automatic encoding detection
text = kf.read_text("document.txt", default="")
kf.write_text("output.txt", "Hello, World! 🌍")
# Binary file operations
data = kf.read_binary("image.jpg")
if data:
kf.write_binary("copy.jpg", data)
# JSON operations with type safety
config = kf.read_json_dict("config.json", default={})
kf.write_json_dict("output.json", {"key": "value"})
# YAML operations
settings = kf.read_yaml_dict("settings.yaml", default={})
kf.write_yaml_list("list.yaml", [1, 2, 3])
```
### High-Level FileBlob Operations
```python
import kiarina.utils.file as kf
# Read file with automatic MIME type detection
blob = kf.read_file("document.pdf")
if blob:
print(f"File: {blob.file_path}")
print(f"MIME type: {blob.mime_type}")
print(f"Size: {len(blob.raw_data)} bytes")
print(f"Extension: {blob.ext}")
# Create and write FileBlob
blob = kf.FileBlob(
"output.txt",
mime_type="text/plain",
raw_text="Hello, World!"
)
kf.write_file(blob)
# Data URL generation for web use
print(blob.raw_base64_url) # data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==
```
### Async Operations
```python
import kiarina.utils.file.asyncio as kfa
async def process_files():
# All operations have async equivalents
text = await kfa.read_text("large_file.txt")
await kfa.write_json_dict("result.json", {"processed": True})
# FileBlob operations
blob = await kfa.read_file("document.pdf")
if blob:
await kfa.write_file(blob, "backup.pdf")
```
### MIME Type and Extension Detection
```python
import kiarina.utils.mime as km
import kiarina.utils.ext as ke
# MIME type detection from content and filename
mime_type = km.detect_mime_type(
raw_data=file_data,
file_name_hint="document.pdf"
)
# Extension detection from MIME type
extension = ke.detect_extension("application/json") # ".json"
# Multi-part extension extraction
extension = ke.extract_extension("archive.tar.gz") # ".tar.gz"
# Create MIME blob from data
blob = km.create_mime_blob(jpeg_data)
print(f"Detected: {blob.mime_type}") # "image/jpeg"
```
### Encoding Detection
```python
import kiarina.utils.encoding as kenc
# Automatic encoding detection
with open("mystery_file.txt", "rb") as f:
raw_data = f.read()
encoding = kenc.detect_encoding(raw_data)
text = kenc.decode_binary_to_text(raw_data)
# Check if data is binary or text
is_binary = kenc.is_binary(raw_data)
```
## Advanced Usage
### Custom Configuration
Configure behavior through environment variables:
```bash
# Encoding detection
export KIARINA_UTILS_ENCODING_USE_NKF=true
export KIARINA_UTILS_ENCODING_DEFAULT_ENCODING=utf-8
# File operations
export KIARINA_UTILS_FILE_LOCK_DIR=/custom/lock/dir
export KIARINA_UTILS_FILE_LOCK_CLEANUP_ENABLED=true
# MIME type detection
export KIARINA_UTILS_MIME_HASH_ALGORITHM=sha256
```
### Error Handling
```python
import kiarina.utils.file as kf
try:
data = kf.read_json_dict("config.json")
if data is None:
print("File not found, using defaults")
data = {"default": True}
except json.JSONDecodeError as e:
print(f"Invalid JSON: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
```
### Performance Considerations
```python
import kiarina.utils.file.asyncio as kfa
# For I/O intensive operations, use async versions
async def process_many_files(file_paths):
tasks = [kfa.read_file(path) for path in file_paths]
results = await asyncio.gather(*tasks)
return [r for r in results if r is not None]
# Use appropriate defaults to avoid None checks
config = kf.read_json_dict("config.json", default={})
# Instead of:
# config = kf.read_json_dict("config.json")
# if config is None:
# config = {}
```
## API Reference
### File Operations
#### Synchronous API (`kiarina.utils.file`)
**High-level operations:**
- `read_file(path, *, fallback_mime_type="application/octet-stream", default=None) -> FileBlob | None`
- `write_file(file_blob, file_path=None) -> None`
**Text operations:**
- `read_text(path, *, default=None) -> str | None`
- `write_text(path, text) -> None`
**Binary operations:**
- `read_binary(path, *, default=None) -> bytes | None`
- `write_binary(path, data) -> None`
**JSON operations:**
- `read_json_dict(path, *, default=None) -> dict[str, Any] | None`
- `write_json_dict(path, data, *, indent=2, ensure_ascii=False, sort_keys=False) -> None`
- `read_json_list(path, *, default=None) -> list[Any] | None`
- `write_json_list(path, data, *, indent=2, ensure_ascii=False, sort_keys=False) -> None`
**YAML operations:**
- `read_yaml_dict(path, *, default=None) -> dict[str, Any] | None`
- `write_yaml_dict(path, data, *, allow_unicode=True, sort_keys=False) -> None`
- `read_yaml_list(path, *, default=None) -> list[Any] | None`
- `write_yaml_list(path, data, *, allow_unicode=True, sort_keys=False) -> None`
**File management:**
- `remove_file(path) -> None`
#### Asynchronous API (`kiarina.utils.file.asyncio`)
All synchronous functions have async equivalents with the same signatures, but they return `Awaitable` objects and must be called with `await`.
### Data Containers
#### FileBlob
```python
class FileBlob:
def __init__(self, file_path, mime_blob=None, *, mime_type=None, raw_data=None, raw_text=None)
# Properties
file_path: str
mime_blob: MIMEBlob
mime_type: str
raw_data: bytes
raw_text: str
raw_base64_str: str
raw_base64_url: str
hash_string: str
ext: str
hashed_file_name: str
# Methods
def is_binary() -> bool
def is_text() -> bool
def replace(*, file_path=None, mime_blob=None, mime_type=None, raw_data=None, raw_text=None) -> FileBlob
```
#### MIMEBlob
```python
class MIMEBlob:
def __init__(self, mime_type, raw_data=None, *, raw_text=None)
# Properties
mime_type: str
raw_data: bytes
raw_text: str
raw_base64_str: str
raw_base64_url: str
hash_string: str
ext: str
hashed_file_name: str
# Methods
def is_binary() -> bool
def is_text() -> bool
def replace(*, mime_type=None, raw_data=None, raw_text=None) -> MIMEBlob
```
### Utility Functions
#### MIME Type Detection (`kiarina.utils.mime`)
- `detect_mime_type(*, raw_data=None, stream=None, file_name_hint=None, **kwargs) -> str | None`
- `create_mime_blob(raw_data, *, fallback_mime_type="application/octet-stream") -> MIMEBlob`
- `apply_mime_alias(mime_type, *, mime_aliases=None) -> str`
#### Extension Detection (`kiarina.utils.ext`)
- `detect_extension(mime_type, *, custom_extensions=None, default=None) -> str | None`
- `extract_extension(file_name_hint, *, multi_extensions=None, **kwargs, default=None) -> str | None`
#### Encoding Detection (`kiarina.utils.encoding`)
- `detect_encoding(raw_data, *, use_nkf=None, **kwargs) -> str | None`
- `decode_binary_to_text(raw_data, *, use_nkf=None, **kwargs) -> str`
- `is_binary(raw_data, *, use_nkf=None, **kwargs) -> bool`
- `get_default_encoding() -> str`
- `normalize_newlines(text) -> str`
## Configuration
### Environment Variables
#### Encoding Detection
- `KIARINA_UTILS_ENCODING_USE_NKF`: Enable/disable nkf usage (bool)
- `KIARINA_UTILS_ENCODING_DEFAULT_ENCODING`: Default encoding (default: "utf-8")
- `KIARINA_UTILS_ENCODING_FALLBACK_ENCODINGS`: Comma-separated list of fallback encodings
- `KIARINA_UTILS_ENCODING_MAX_SAMPLE_SIZE`: Maximum bytes to sample for detection (default: 8192)
- `KIARINA_UTILS_ENCODING_CHARSET_NORMALIZER_CONFIDENCE_THRESHOLD`: Confidence threshold (default: 0.6)
#### File Operations
- `KIARINA_UTILS_FILE_LOCK_DIR`: Custom lock directory path
- `KIARINA_UTILS_FILE_LOCK_CLEANUP_ENABLED`: Enable automatic cleanup (default: true)
- `KIARINA_UTILS_FILE_LOCK_MAX_AGE_HOURS`: Maximum age for lock files in hours (default: 24)
#### MIME Type Detection
- `KIARINA_UTILS_MIME_HASH_ALGORITHM`: Hash algorithm for content addressing (default: "sha256")
#### Extension Detection
- `KIARINA_UTILS_EXT_MAX_MULTI_EXTENSION_PARTS`: Maximum parts for multi-extension detection (default: 4)
## Requirements
- **Python**: 3.12 or higher
- **Core dependencies**:
- `aiofiles>=24.1.0` - Async file operations
- `charset-normalizer>=3.4.3` - Encoding detection
- `filelock>=3.19.1` - File locking
- `pydantic>=2.11.7` - Data validation
- `pydantic-settings>=2.10.1` - Settings management
- `pydantic-settings-manager>=2.1.0` - Advanced settings management
- `pyyaml>=6.0.2` - YAML support
- **Optional dependencies**:
- `puremagic>=1.30` - Enhanced MIME type detection from file content
## Development
### Prerequisites
- Python 3.12+
- [uv](https://github.com/astral-sh/uv) for dependency management
- [mise](https://mise.jdx.dev/) for task running
### Setup
```bash
# Clone the repository
git clone https://github.com/kiarina/kiarina-python.git
cd kiarina-python
# Setup development environment
mise run setup
# Install dependencies for this package
cd packages/kiarina-utils-file
uv sync --group dev
```
### Running Tests
```bash
# Run all tests
mise run package:test kiarina-utils-file
# Run with coverage
mise run package:test kiarina-utils-file --coverage
# Run specific test files
uv run --group test pytest tests/file/test_kiarina_utils_file_sync.py
uv run --group test pytest tests/file/test_kiarina_utils_file_async.py
```
### Code Quality
```bash
# Format code
mise run package:format kiarina-utils-file
# Run linting
mise run package:lint kiarina-utils-file
# Type checking
mise run package:typecheck kiarina-utils-file
# Run all checks
mise run package kiarina-utils-file
```
## Performance
### Benchmarks
The library is optimized for performance with several key features:
- **Lazy loading**: Properties are computed only when accessed
- **Caching**: Expensive operations like encoding detection are cached
- **Async support**: Non-blocking I/O for high-throughput applications
- **Efficient sampling**: Large files are sampled for encoding/MIME detection
- **Atomic operations**: Safe concurrent file access with minimal overhead
### Memory Usage
- **Streaming support**: Large files can be processed without loading entirely into memory
- **Configurable sampling**: Detection algorithms use configurable sample sizes
- **Efficient caching**: Only frequently accessed properties are cached
## License
This project is licensed under the MIT License - see the [LICENSE](../../LICENSE) file for details.
## Contributing
This is a personal project, but contributions are welcome! Please feel free to submit issues or pull requests.
### Guidelines
1. **Code Style**: Follow the existing code style (enforced by ruff)
2. **Testing**: Add tests for new functionality
3. **Documentation**: Update documentation for API changes
4. **Type Hints**: Maintain full type hint coverage
## Related Projects
- [kiarina-python](https://github.com/kiarina/kiarina-python) - The main monorepo containing this package
- [pydantic-settings-manager](https://github.com/kiarina/pydantic-settings-manager) - Configuration management library used by this package
## Changelog
See [CHANGELOG.md](CHANGELOG.md) for a detailed history of changes.
## Support
- **Issues**: [GitHub Issues](https://github.com/kiarina/kiarina-python/issues)
- **Discussions**: [GitHub Discussions](https://github.com/kiarina/kiarina-python/discussions)
---
Made with ❤️ by [kiarina](https://github.com/kiarina)
Raw data
{
"_id": null,
"home_page": null,
"name": "kiarina-utils-file",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": "kiarina <kiarinadawa@gmail.com>",
"keywords": "file, io, encoding, mime, async, sync, json, yaml, binary, text, detection, blob, atomic, thread-safe",
"author": null,
"author_email": "kiarina <kiarinadawa@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/3e/01/eb17d2f4d72a4f6981059aad27330609fe2b258ebef979376015f2a3dd41/kiarina_utils_file-1.0.1.tar.gz",
"platform": null,
"description": "# kiarina-utils-file\n\nA comprehensive Python library for file I/O operations with automatic encoding detection, MIME type detection, and support for various file formats.\n\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n\n## Features\n\n### \ud83d\ude80 **Comprehensive File I/O**\n- **Multiple file formats**: Text, binary, JSON, YAML\n- **Sync & Async support**: Full async/await support for high-performance applications\n- **Atomic operations**: Safe file writing with temporary files and locking\n- **Thread safety**: File locking mechanisms prevent concurrent access issues\n\n### \ud83d\udd0d **Smart Detection**\n- **Automatic encoding detection**: Smart handling of various text encodings with nkf support\n- **MIME type detection**: Automatic content type identification using multiple detection methods\n- **Extension handling**: Support for complex multi-part extensions (.tar.gz, .tar.gz.gpg)\n\n### \ud83d\udce6 **Data Containers**\n- **FileBlob**: Unified file data container with metadata and path information\n- **MIMEBlob**: MIME-typed binary data container with format conversion support\n- **Hash-based naming**: Content-addressable file naming using cryptographic hashes\n\n### \ud83d\udee1\ufe0f **Production Ready**\n- **Error handling**: Graceful handling of missing files with configurable defaults\n- **Performance optimized**: Non-blocking I/O operations and efficient caching\n- **Type safety**: Full type hints and comprehensive testing\n\n## Installation\n\n```bash\npip install kiarina-utils-file\n```\n\n### Optional Dependencies\n\nFor enhanced functionality, install optional dependencies:\n\n```bash\n# For MIME type detection from file content\npip install kiarina-utils-file[mime]\n\n# Or install with all optional dependencies\npip install kiarina-utils-file[all]\n```\n\n## Quick Start\n\n### Basic File Operations\n\n```python\nimport kiarina.utils.file as kf\n\n# Read and write text files with automatic encoding detection\ntext = kf.read_text(\"document.txt\", default=\"\")\nkf.write_text(\"output.txt\", \"Hello, World! \ud83c\udf0d\")\n\n# Binary file operations\ndata = kf.read_binary(\"image.jpg\")\nif data:\n kf.write_binary(\"copy.jpg\", data)\n\n# JSON operations with type safety\nconfig = kf.read_json_dict(\"config.json\", default={})\nkf.write_json_dict(\"output.json\", {\"key\": \"value\"})\n\n# YAML operations\nsettings = kf.read_yaml_dict(\"settings.yaml\", default={})\nkf.write_yaml_list(\"list.yaml\", [1, 2, 3])\n```\n\n### High-Level FileBlob Operations\n\n```python\nimport kiarina.utils.file as kf\n\n# Read file with automatic MIME type detection\nblob = kf.read_file(\"document.pdf\")\nif blob:\n print(f\"File: {blob.file_path}\")\n print(f\"MIME type: {blob.mime_type}\")\n print(f\"Size: {len(blob.raw_data)} bytes\")\n print(f\"Extension: {blob.ext}\")\n\n# Create and write FileBlob\nblob = kf.FileBlob(\n \"output.txt\",\n mime_type=\"text/plain\",\n raw_text=\"Hello, World!\"\n)\nkf.write_file(blob)\n\n# Data URL generation for web use\nprint(blob.raw_base64_url) # data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==\n```\n\n### Async Operations\n\n```python\nimport kiarina.utils.file.asyncio as kfa\n\nasync def process_files():\n # All operations have async equivalents\n text = await kfa.read_text(\"large_file.txt\")\n await kfa.write_json_dict(\"result.json\", {\"processed\": True})\n \n # FileBlob operations\n blob = await kfa.read_file(\"document.pdf\")\n if blob:\n await kfa.write_file(blob, \"backup.pdf\")\n```\n\n### MIME Type and Extension Detection\n\n```python\nimport kiarina.utils.mime as km\nimport kiarina.utils.ext as ke\n\n# MIME type detection from content and filename\nmime_type = km.detect_mime_type(\n raw_data=file_data,\n file_name_hint=\"document.pdf\"\n)\n\n# Extension detection from MIME type\nextension = ke.detect_extension(\"application/json\") # \".json\"\n\n# Multi-part extension extraction\nextension = ke.extract_extension(\"archive.tar.gz\") # \".tar.gz\"\n\n# Create MIME blob from data\nblob = km.create_mime_blob(jpeg_data)\nprint(f\"Detected: {blob.mime_type}\") # \"image/jpeg\"\n```\n\n### Encoding Detection\n\n```python\nimport kiarina.utils.encoding as kenc\n\n# Automatic encoding detection\nwith open(\"mystery_file.txt\", \"rb\") as f:\n raw_data = f.read()\n\nencoding = kenc.detect_encoding(raw_data)\ntext = kenc.decode_binary_to_text(raw_data)\n\n# Check if data is binary or text\nis_binary = kenc.is_binary(raw_data)\n```\n\n## Advanced Usage\n\n### Custom Configuration\n\nConfigure behavior through environment variables:\n\n```bash\n# Encoding detection\nexport KIARINA_UTILS_ENCODING_USE_NKF=true\nexport KIARINA_UTILS_ENCODING_DEFAULT_ENCODING=utf-8\n\n# File operations\nexport KIARINA_UTILS_FILE_LOCK_DIR=/custom/lock/dir\nexport KIARINA_UTILS_FILE_LOCK_CLEANUP_ENABLED=true\n\n# MIME type detection\nexport KIARINA_UTILS_MIME_HASH_ALGORITHM=sha256\n```\n\n### Error Handling\n\n```python\nimport kiarina.utils.file as kf\n\ntry:\n data = kf.read_json_dict(\"config.json\")\n if data is None:\n print(\"File not found, using defaults\")\n data = {\"default\": True}\nexcept json.JSONDecodeError as e:\n print(f\"Invalid JSON: {e}\")\nexcept Exception as e:\n print(f\"Unexpected error: {e}\")\n```\n\n### Performance Considerations\n\n```python\nimport kiarina.utils.file.asyncio as kfa\n\n# For I/O intensive operations, use async versions\nasync def process_many_files(file_paths):\n tasks = [kfa.read_file(path) for path in file_paths]\n results = await asyncio.gather(*tasks)\n return [r for r in results if r is not None]\n\n# Use appropriate defaults to avoid None checks\nconfig = kf.read_json_dict(\"config.json\", default={})\n# Instead of:\n# config = kf.read_json_dict(\"config.json\")\n# if config is None:\n# config = {}\n```\n\n## API Reference\n\n### File Operations\n\n#### Synchronous API (`kiarina.utils.file`)\n\n**High-level operations:**\n- `read_file(path, *, fallback_mime_type=\"application/octet-stream\", default=None) -> FileBlob | None`\n- `write_file(file_blob, file_path=None) -> None`\n\n**Text operations:**\n- `read_text(path, *, default=None) -> str | None`\n- `write_text(path, text) -> None`\n\n**Binary operations:**\n- `read_binary(path, *, default=None) -> bytes | None`\n- `write_binary(path, data) -> None`\n\n**JSON operations:**\n- `read_json_dict(path, *, default=None) -> dict[str, Any] | None`\n- `write_json_dict(path, data, *, indent=2, ensure_ascii=False, sort_keys=False) -> None`\n- `read_json_list(path, *, default=None) -> list[Any] | None`\n- `write_json_list(path, data, *, indent=2, ensure_ascii=False, sort_keys=False) -> None`\n\n**YAML operations:**\n- `read_yaml_dict(path, *, default=None) -> dict[str, Any] | None`\n- `write_yaml_dict(path, data, *, allow_unicode=True, sort_keys=False) -> None`\n- `read_yaml_list(path, *, default=None) -> list[Any] | None`\n- `write_yaml_list(path, data, *, allow_unicode=True, sort_keys=False) -> None`\n\n**File management:**\n- `remove_file(path) -> None`\n\n#### Asynchronous API (`kiarina.utils.file.asyncio`)\n\nAll synchronous functions have async equivalents with the same signatures, but they return `Awaitable` objects and must be called with `await`.\n\n### Data Containers\n\n#### FileBlob\n\n```python\nclass FileBlob:\n def __init__(self, file_path, mime_blob=None, *, mime_type=None, raw_data=None, raw_text=None)\n \n # Properties\n file_path: str\n mime_blob: MIMEBlob\n mime_type: str\n raw_data: bytes\n raw_text: str\n raw_base64_str: str\n raw_base64_url: str\n hash_string: str\n ext: str\n hashed_file_name: str\n \n # Methods\n def is_binary() -> bool\n def is_text() -> bool\n def replace(*, file_path=None, mime_blob=None, mime_type=None, raw_data=None, raw_text=None) -> FileBlob\n```\n\n#### MIMEBlob\n\n```python\nclass MIMEBlob:\n def __init__(self, mime_type, raw_data=None, *, raw_text=None)\n \n # Properties\n mime_type: str\n raw_data: bytes\n raw_text: str\n raw_base64_str: str\n raw_base64_url: str\n hash_string: str\n ext: str\n hashed_file_name: str\n \n # Methods\n def is_binary() -> bool\n def is_text() -> bool\n def replace(*, mime_type=None, raw_data=None, raw_text=None) -> MIMEBlob\n```\n\n### Utility Functions\n\n#### MIME Type Detection (`kiarina.utils.mime`)\n\n- `detect_mime_type(*, raw_data=None, stream=None, file_name_hint=None, **kwargs) -> str | None`\n- `create_mime_blob(raw_data, *, fallback_mime_type=\"application/octet-stream\") -> MIMEBlob`\n- `apply_mime_alias(mime_type, *, mime_aliases=None) -> str`\n\n#### Extension Detection (`kiarina.utils.ext`)\n\n- `detect_extension(mime_type, *, custom_extensions=None, default=None) -> str | None`\n- `extract_extension(file_name_hint, *, multi_extensions=None, **kwargs, default=None) -> str | None`\n\n#### Encoding Detection (`kiarina.utils.encoding`)\n\n- `detect_encoding(raw_data, *, use_nkf=None, **kwargs) -> str | None`\n- `decode_binary_to_text(raw_data, *, use_nkf=None, **kwargs) -> str`\n- `is_binary(raw_data, *, use_nkf=None, **kwargs) -> bool`\n- `get_default_encoding() -> str`\n- `normalize_newlines(text) -> str`\n\n## Configuration\n\n### Environment Variables\n\n#### Encoding Detection\n- `KIARINA_UTILS_ENCODING_USE_NKF`: Enable/disable nkf usage (bool)\n- `KIARINA_UTILS_ENCODING_DEFAULT_ENCODING`: Default encoding (default: \"utf-8\")\n- `KIARINA_UTILS_ENCODING_FALLBACK_ENCODINGS`: Comma-separated list of fallback encodings\n- `KIARINA_UTILS_ENCODING_MAX_SAMPLE_SIZE`: Maximum bytes to sample for detection (default: 8192)\n- `KIARINA_UTILS_ENCODING_CHARSET_NORMALIZER_CONFIDENCE_THRESHOLD`: Confidence threshold (default: 0.6)\n\n#### File Operations\n- `KIARINA_UTILS_FILE_LOCK_DIR`: Custom lock directory path\n- `KIARINA_UTILS_FILE_LOCK_CLEANUP_ENABLED`: Enable automatic cleanup (default: true)\n- `KIARINA_UTILS_FILE_LOCK_MAX_AGE_HOURS`: Maximum age for lock files in hours (default: 24)\n\n#### MIME Type Detection\n- `KIARINA_UTILS_MIME_HASH_ALGORITHM`: Hash algorithm for content addressing (default: \"sha256\")\n\n#### Extension Detection\n- `KIARINA_UTILS_EXT_MAX_MULTI_EXTENSION_PARTS`: Maximum parts for multi-extension detection (default: 4)\n\n## Requirements\n\n- **Python**: 3.12 or higher\n- **Core dependencies**: \n - `aiofiles>=24.1.0` - Async file operations\n - `charset-normalizer>=3.4.3` - Encoding detection\n - `filelock>=3.19.1` - File locking\n - `pydantic>=2.11.7` - Data validation\n - `pydantic-settings>=2.10.1` - Settings management\n - `pydantic-settings-manager>=2.1.0` - Advanced settings management\n - `pyyaml>=6.0.2` - YAML support\n\n- **Optional dependencies**:\n - `puremagic>=1.30` - Enhanced MIME type detection from file content\n\n## Development\n\n### Prerequisites\n\n- Python 3.12+\n- [uv](https://github.com/astral-sh/uv) for dependency management\n- [mise](https://mise.jdx.dev/) for task running\n\n### Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/kiarina/kiarina-python.git\ncd kiarina-python\n\n# Setup development environment\nmise run setup\n\n# Install dependencies for this package\ncd packages/kiarina-utils-file\nuv sync --group dev\n```\n\n### Running Tests\n\n```bash\n# Run all tests\nmise run package:test kiarina-utils-file\n\n# Run with coverage\nmise run package:test kiarina-utils-file --coverage\n\n# Run specific test files\nuv run --group test pytest tests/file/test_kiarina_utils_file_sync.py\nuv run --group test pytest tests/file/test_kiarina_utils_file_async.py\n```\n\n### Code Quality\n\n```bash\n# Format code\nmise run package:format kiarina-utils-file\n\n# Run linting\nmise run package:lint kiarina-utils-file\n\n# Type checking\nmise run package:typecheck kiarina-utils-file\n\n# Run all checks\nmise run package kiarina-utils-file\n```\n\n## Performance\n\n### Benchmarks\n\nThe library is optimized for performance with several key features:\n\n- **Lazy loading**: Properties are computed only when accessed\n- **Caching**: Expensive operations like encoding detection are cached\n- **Async support**: Non-blocking I/O for high-throughput applications\n- **Efficient sampling**: Large files are sampled for encoding/MIME detection\n- **Atomic operations**: Safe concurrent file access with minimal overhead\n\n### Memory Usage\n\n- **Streaming support**: Large files can be processed without loading entirely into memory\n- **Configurable sampling**: Detection algorithms use configurable sample sizes\n- **Efficient caching**: Only frequently accessed properties are cached\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](../../LICENSE) file for details.\n\n## Contributing\n\nThis is a personal project, but contributions are welcome! Please feel free to submit issues or pull requests.\n\n### Guidelines\n\n1. **Code Style**: Follow the existing code style (enforced by ruff)\n2. **Testing**: Add tests for new functionality\n3. **Documentation**: Update documentation for API changes\n4. **Type Hints**: Maintain full type hint coverage\n\n## Related Projects\n\n- [kiarina-python](https://github.com/kiarina/kiarina-python) - The main monorepo containing this package\n- [pydantic-settings-manager](https://github.com/kiarina/pydantic-settings-manager) - Configuration management library used by this package\n\n## Changelog\n\nSee [CHANGELOG.md](CHANGELOG.md) for a detailed history of changes.\n\n## Support\n\n- **Issues**: [GitHub Issues](https://github.com/kiarina/kiarina-python/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/kiarina/kiarina-python/discussions)\n\n---\n\nMade with \u2764\ufe0f by [kiarina](https://github.com/kiarina)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Comprehensive Python library for file I/O operations with automatic encoding detection, MIME type detection, and support for various file formats",
"version": "1.0.1",
"project_urls": {
"Changelog": "https://github.com/kiarina/kiarina-python/blob/main/packages/kiarina-utils-file/CHANGELOG.md",
"Documentation": "https://github.com/kiarina/kiarina-python/tree/main/packages/kiarina-utils-file#readme",
"Homepage": "https://github.com/kiarina/kiarina-python",
"Issues": "https://github.com/kiarina/kiarina-python/issues",
"Repository": "https://github.com/kiarina/kiarina-python"
},
"split_keywords": [
"file",
" io",
" encoding",
" mime",
" async",
" sync",
" json",
" yaml",
" binary",
" text",
" detection",
" blob",
" atomic",
" thread-safe"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "10a1a894bf1cb0566f83298e8b3c93371bc922bec1e0eefe309ba68b0443c03a",
"md5": "06a3e2ca887d30945dc7b41e801a3272",
"sha256": "2af9a51835cf27483cdd3e9abb3deba543c8ca2992a2572b088c27f64930079d"
},
"downloads": -1,
"filename": "kiarina_utils_file-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "06a3e2ca887d30945dc7b41e801a3272",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 73286,
"upload_time": "2025-09-11T08:52:27",
"upload_time_iso_8601": "2025-09-11T08:52:27.592860Z",
"url": "https://files.pythonhosted.org/packages/10/a1/a894bf1cb0566f83298e8b3c93371bc922bec1e0eefe309ba68b0443c03a/kiarina_utils_file-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "3e01eb17d2f4d72a4f6981059aad27330609fe2b258ebef979376015f2a3dd41",
"md5": "f3f79b223df6f000de7002455bab13a1",
"sha256": "7e62378812f11f4f524008abad69d85d934fbfa3fd02cbac67c9ab476d2deaaa"
},
"downloads": -1,
"filename": "kiarina_utils_file-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "f3f79b223df6f000de7002455bab13a1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 43473,
"upload_time": "2025-09-11T08:52:32",
"upload_time_iso_8601": "2025-09-11T08:52:32.081036Z",
"url": "https://files.pythonhosted.org/packages/3e/01/eb17d2f4d72a4f6981059aad27330609fe2b258ebef979376015f2a3dd41/kiarina_utils_file-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-11 08:52:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kiarina",
"github_project": "kiarina-python",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "kiarina-utils-file"
}