kiarina-utils-file


Namekiarina-utils-file JSON
Version 1.0.1 PyPI version JSON
download
home_pageNone
SummaryComprehensive Python library for file I/O operations with automatic encoding detection, MIME type detection, and support for various file formats
upload_time2025-09-11 08:52:32
maintainerNone
docs_urlNone
authorNone
requires_python>=3.12
licenseMIT
keywords file io encoding mime async sync json yaml binary text detection blob atomic thread-safe
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # kiarina-utils-file

A comprehensive Python library for file I/O operations with automatic encoding detection, MIME type detection, and support for various file formats.

[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Features

### 🚀 **Comprehensive File I/O**
- **Multiple file formats**: Text, binary, JSON, YAML
- **Sync & Async support**: Full async/await support for high-performance applications
- **Atomic operations**: Safe file writing with temporary files and locking
- **Thread safety**: File locking mechanisms prevent concurrent access issues

### 🔍 **Smart Detection**
- **Automatic encoding detection**: Smart handling of various text encodings with nkf support
- **MIME type detection**: Automatic content type identification using multiple detection methods
- **Extension handling**: Support for complex multi-part extensions (.tar.gz, .tar.gz.gpg)

### 📦 **Data Containers**
- **FileBlob**: Unified file data container with metadata and path information
- **MIMEBlob**: MIME-typed binary data container with format conversion support
- **Hash-based naming**: Content-addressable file naming using cryptographic hashes

### 🛡️ **Production Ready**
- **Error handling**: Graceful handling of missing files with configurable defaults
- **Performance optimized**: Non-blocking I/O operations and efficient caching
- **Type safety**: Full type hints and comprehensive testing

## Installation

```bash
pip install kiarina-utils-file
```

### Optional Dependencies

For enhanced functionality, install optional dependencies:

```bash
# For MIME type detection from file content
pip install kiarina-utils-file[mime]

# Or install with all optional dependencies
pip install kiarina-utils-file[all]
```

## Quick Start

### Basic File Operations

```python
import kiarina.utils.file as kf

# Read and write text files with automatic encoding detection
text = kf.read_text("document.txt", default="")
kf.write_text("output.txt", "Hello, World! 🌍")

# Binary file operations
data = kf.read_binary("image.jpg")
if data:
    kf.write_binary("copy.jpg", data)

# JSON operations with type safety
config = kf.read_json_dict("config.json", default={})
kf.write_json_dict("output.json", {"key": "value"})

# YAML operations
settings = kf.read_yaml_dict("settings.yaml", default={})
kf.write_yaml_list("list.yaml", [1, 2, 3])
```

### High-Level FileBlob Operations

```python
import kiarina.utils.file as kf

# Read file with automatic MIME type detection
blob = kf.read_file("document.pdf")
if blob:
    print(f"File: {blob.file_path}")
    print(f"MIME type: {blob.mime_type}")
    print(f"Size: {len(blob.raw_data)} bytes")
    print(f"Extension: {blob.ext}")

# Create and write FileBlob
blob = kf.FileBlob(
    "output.txt",
    mime_type="text/plain",
    raw_text="Hello, World!"
)
kf.write_file(blob)

# Data URL generation for web use
print(blob.raw_base64_url)  # data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==
```

### Async Operations

```python
import kiarina.utils.file.asyncio as kfa

async def process_files():
    # All operations have async equivalents
    text = await kfa.read_text("large_file.txt")
    await kfa.write_json_dict("result.json", {"processed": True})
    
    # FileBlob operations
    blob = await kfa.read_file("document.pdf")
    if blob:
        await kfa.write_file(blob, "backup.pdf")
```

### MIME Type and Extension Detection

```python
import kiarina.utils.mime as km
import kiarina.utils.ext as ke

# MIME type detection from content and filename
mime_type = km.detect_mime_type(
    raw_data=file_data,
    file_name_hint="document.pdf"
)

# Extension detection from MIME type
extension = ke.detect_extension("application/json")  # ".json"

# Multi-part extension extraction
extension = ke.extract_extension("archive.tar.gz")  # ".tar.gz"

# Create MIME blob from data
blob = km.create_mime_blob(jpeg_data)
print(f"Detected: {blob.mime_type}")  # "image/jpeg"
```

### Encoding Detection

```python
import kiarina.utils.encoding as kenc

# Automatic encoding detection
with open("mystery_file.txt", "rb") as f:
    raw_data = f.read()

encoding = kenc.detect_encoding(raw_data)
text = kenc.decode_binary_to_text(raw_data)

# Check if data is binary or text
is_binary = kenc.is_binary(raw_data)
```

## Advanced Usage

### Custom Configuration

Configure behavior through environment variables:

```bash
# Encoding detection
export KIARINA_UTILS_ENCODING_USE_NKF=true
export KIARINA_UTILS_ENCODING_DEFAULT_ENCODING=utf-8

# File operations
export KIARINA_UTILS_FILE_LOCK_DIR=/custom/lock/dir
export KIARINA_UTILS_FILE_LOCK_CLEANUP_ENABLED=true

# MIME type detection
export KIARINA_UTILS_MIME_HASH_ALGORITHM=sha256
```

### Error Handling

```python
import kiarina.utils.file as kf

try:
    data = kf.read_json_dict("config.json")
    if data is None:
        print("File not found, using defaults")
        data = {"default": True}
except json.JSONDecodeError as e:
    print(f"Invalid JSON: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")
```

### Performance Considerations

```python
import kiarina.utils.file.asyncio as kfa

# For I/O intensive operations, use async versions
async def process_many_files(file_paths):
    tasks = [kfa.read_file(path) for path in file_paths]
    results = await asyncio.gather(*tasks)
    return [r for r in results if r is not None]

# Use appropriate defaults to avoid None checks
config = kf.read_json_dict("config.json", default={})
# Instead of:
# config = kf.read_json_dict("config.json")
# if config is None:
#     config = {}
```

## API Reference

### File Operations

#### Synchronous API (`kiarina.utils.file`)

**High-level operations:**
- `read_file(path, *, fallback_mime_type="application/octet-stream", default=None) -> FileBlob | None`
- `write_file(file_blob, file_path=None) -> None`

**Text operations:**
- `read_text(path, *, default=None) -> str | None`
- `write_text(path, text) -> None`

**Binary operations:**
- `read_binary(path, *, default=None) -> bytes | None`
- `write_binary(path, data) -> None`

**JSON operations:**
- `read_json_dict(path, *, default=None) -> dict[str, Any] | None`
- `write_json_dict(path, data, *, indent=2, ensure_ascii=False, sort_keys=False) -> None`
- `read_json_list(path, *, default=None) -> list[Any] | None`
- `write_json_list(path, data, *, indent=2, ensure_ascii=False, sort_keys=False) -> None`

**YAML operations:**
- `read_yaml_dict(path, *, default=None) -> dict[str, Any] | None`
- `write_yaml_dict(path, data, *, allow_unicode=True, sort_keys=False) -> None`
- `read_yaml_list(path, *, default=None) -> list[Any] | None`
- `write_yaml_list(path, data, *, allow_unicode=True, sort_keys=False) -> None`

**File management:**
- `remove_file(path) -> None`

#### Asynchronous API (`kiarina.utils.file.asyncio`)

All synchronous functions have async equivalents with the same signatures, but they return `Awaitable` objects and must be called with `await`.

### Data Containers

#### FileBlob

```python
class FileBlob:
    def __init__(self, file_path, mime_blob=None, *, mime_type=None, raw_data=None, raw_text=None)
    
    # Properties
    file_path: str
    mime_blob: MIMEBlob
    mime_type: str
    raw_data: bytes
    raw_text: str
    raw_base64_str: str
    raw_base64_url: str
    hash_string: str
    ext: str
    hashed_file_name: str
    
    # Methods
    def is_binary() -> bool
    def is_text() -> bool
    def replace(*, file_path=None, mime_blob=None, mime_type=None, raw_data=None, raw_text=None) -> FileBlob
```

#### MIMEBlob

```python
class MIMEBlob:
    def __init__(self, mime_type, raw_data=None, *, raw_text=None)
    
    # Properties
    mime_type: str
    raw_data: bytes
    raw_text: str
    raw_base64_str: str
    raw_base64_url: str
    hash_string: str
    ext: str
    hashed_file_name: str
    
    # Methods
    def is_binary() -> bool
    def is_text() -> bool
    def replace(*, mime_type=None, raw_data=None, raw_text=None) -> MIMEBlob
```

### Utility Functions

#### MIME Type Detection (`kiarina.utils.mime`)

- `detect_mime_type(*, raw_data=None, stream=None, file_name_hint=None, **kwargs) -> str | None`
- `create_mime_blob(raw_data, *, fallback_mime_type="application/octet-stream") -> MIMEBlob`
- `apply_mime_alias(mime_type, *, mime_aliases=None) -> str`

#### Extension Detection (`kiarina.utils.ext`)

- `detect_extension(mime_type, *, custom_extensions=None, default=None) -> str | None`
- `extract_extension(file_name_hint, *, multi_extensions=None, **kwargs, default=None) -> str | None`

#### Encoding Detection (`kiarina.utils.encoding`)

- `detect_encoding(raw_data, *, use_nkf=None, **kwargs) -> str | None`
- `decode_binary_to_text(raw_data, *, use_nkf=None, **kwargs) -> str`
- `is_binary(raw_data, *, use_nkf=None, **kwargs) -> bool`
- `get_default_encoding() -> str`
- `normalize_newlines(text) -> str`

## Configuration

### Environment Variables

#### Encoding Detection
- `KIARINA_UTILS_ENCODING_USE_NKF`: Enable/disable nkf usage (bool)
- `KIARINA_UTILS_ENCODING_DEFAULT_ENCODING`: Default encoding (default: "utf-8")
- `KIARINA_UTILS_ENCODING_FALLBACK_ENCODINGS`: Comma-separated list of fallback encodings
- `KIARINA_UTILS_ENCODING_MAX_SAMPLE_SIZE`: Maximum bytes to sample for detection (default: 8192)
- `KIARINA_UTILS_ENCODING_CHARSET_NORMALIZER_CONFIDENCE_THRESHOLD`: Confidence threshold (default: 0.6)

#### File Operations
- `KIARINA_UTILS_FILE_LOCK_DIR`: Custom lock directory path
- `KIARINA_UTILS_FILE_LOCK_CLEANUP_ENABLED`: Enable automatic cleanup (default: true)
- `KIARINA_UTILS_FILE_LOCK_MAX_AGE_HOURS`: Maximum age for lock files in hours (default: 24)

#### MIME Type Detection
- `KIARINA_UTILS_MIME_HASH_ALGORITHM`: Hash algorithm for content addressing (default: "sha256")

#### Extension Detection
- `KIARINA_UTILS_EXT_MAX_MULTI_EXTENSION_PARTS`: Maximum parts for multi-extension detection (default: 4)

## Requirements

- **Python**: 3.12 or higher
- **Core dependencies**: 
  - `aiofiles>=24.1.0` - Async file operations
  - `charset-normalizer>=3.4.3` - Encoding detection
  - `filelock>=3.19.1` - File locking
  - `pydantic>=2.11.7` - Data validation
  - `pydantic-settings>=2.10.1` - Settings management
  - `pydantic-settings-manager>=2.1.0` - Advanced settings management
  - `pyyaml>=6.0.2` - YAML support

- **Optional dependencies**:
  - `puremagic>=1.30` - Enhanced MIME type detection from file content

## Development

### Prerequisites

- Python 3.12+
- [uv](https://github.com/astral-sh/uv) for dependency management
- [mise](https://mise.jdx.dev/) for task running

### Setup

```bash
# Clone the repository
git clone https://github.com/kiarina/kiarina-python.git
cd kiarina-python

# Setup development environment
mise run setup

# Install dependencies for this package
cd packages/kiarina-utils-file
uv sync --group dev
```

### Running Tests

```bash
# Run all tests
mise run package:test kiarina-utils-file

# Run with coverage
mise run package:test kiarina-utils-file --coverage

# Run specific test files
uv run --group test pytest tests/file/test_kiarina_utils_file_sync.py
uv run --group test pytest tests/file/test_kiarina_utils_file_async.py
```

### Code Quality

```bash
# Format code
mise run package:format kiarina-utils-file

# Run linting
mise run package:lint kiarina-utils-file

# Type checking
mise run package:typecheck kiarina-utils-file

# Run all checks
mise run package kiarina-utils-file
```

## Performance

### Benchmarks

The library is optimized for performance with several key features:

- **Lazy loading**: Properties are computed only when accessed
- **Caching**: Expensive operations like encoding detection are cached
- **Async support**: Non-blocking I/O for high-throughput applications
- **Efficient sampling**: Large files are sampled for encoding/MIME detection
- **Atomic operations**: Safe concurrent file access with minimal overhead

### Memory Usage

- **Streaming support**: Large files can be processed without loading entirely into memory
- **Configurable sampling**: Detection algorithms use configurable sample sizes
- **Efficient caching**: Only frequently accessed properties are cached

## License

This project is licensed under the MIT License - see the [LICENSE](../../LICENSE) file for details.

## Contributing

This is a personal project, but contributions are welcome! Please feel free to submit issues or pull requests.

### Guidelines

1. **Code Style**: Follow the existing code style (enforced by ruff)
2. **Testing**: Add tests for new functionality
3. **Documentation**: Update documentation for API changes
4. **Type Hints**: Maintain full type hint coverage

## Related Projects

- [kiarina-python](https://github.com/kiarina/kiarina-python) - The main monorepo containing this package
- [pydantic-settings-manager](https://github.com/kiarina/pydantic-settings-manager) - Configuration management library used by this package

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for a detailed history of changes.

## Support

- **Issues**: [GitHub Issues](https://github.com/kiarina/kiarina-python/issues)
- **Discussions**: [GitHub Discussions](https://github.com/kiarina/kiarina-python/discussions)

---

Made with ❤️ by [kiarina](https://github.com/kiarina)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "kiarina-utils-file",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": "kiarina <kiarinadawa@gmail.com>",
    "keywords": "file, io, encoding, mime, async, sync, json, yaml, binary, text, detection, blob, atomic, thread-safe",
    "author": null,
    "author_email": "kiarina <kiarinadawa@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/3e/01/eb17d2f4d72a4f6981059aad27330609fe2b258ebef979376015f2a3dd41/kiarina_utils_file-1.0.1.tar.gz",
    "platform": null,
    "description": "# kiarina-utils-file\n\nA comprehensive Python library for file I/O operations with automatic encoding detection, MIME type detection, and support for various file formats.\n\n[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n## Features\n\n### \ud83d\ude80 **Comprehensive File I/O**\n- **Multiple file formats**: Text, binary, JSON, YAML\n- **Sync & Async support**: Full async/await support for high-performance applications\n- **Atomic operations**: Safe file writing with temporary files and locking\n- **Thread safety**: File locking mechanisms prevent concurrent access issues\n\n### \ud83d\udd0d **Smart Detection**\n- **Automatic encoding detection**: Smart handling of various text encodings with nkf support\n- **MIME type detection**: Automatic content type identification using multiple detection methods\n- **Extension handling**: Support for complex multi-part extensions (.tar.gz, .tar.gz.gpg)\n\n### \ud83d\udce6 **Data Containers**\n- **FileBlob**: Unified file data container with metadata and path information\n- **MIMEBlob**: MIME-typed binary data container with format conversion support\n- **Hash-based naming**: Content-addressable file naming using cryptographic hashes\n\n### \ud83d\udee1\ufe0f **Production Ready**\n- **Error handling**: Graceful handling of missing files with configurable defaults\n- **Performance optimized**: Non-blocking I/O operations and efficient caching\n- **Type safety**: Full type hints and comprehensive testing\n\n## Installation\n\n```bash\npip install kiarina-utils-file\n```\n\n### Optional Dependencies\n\nFor enhanced functionality, install optional dependencies:\n\n```bash\n# For MIME type detection from file content\npip install kiarina-utils-file[mime]\n\n# Or install with all optional dependencies\npip install kiarina-utils-file[all]\n```\n\n## Quick Start\n\n### Basic File Operations\n\n```python\nimport kiarina.utils.file as kf\n\n# Read and write text files with automatic encoding detection\ntext = kf.read_text(\"document.txt\", default=\"\")\nkf.write_text(\"output.txt\", \"Hello, World! \ud83c\udf0d\")\n\n# Binary file operations\ndata = kf.read_binary(\"image.jpg\")\nif data:\n    kf.write_binary(\"copy.jpg\", data)\n\n# JSON operations with type safety\nconfig = kf.read_json_dict(\"config.json\", default={})\nkf.write_json_dict(\"output.json\", {\"key\": \"value\"})\n\n# YAML operations\nsettings = kf.read_yaml_dict(\"settings.yaml\", default={})\nkf.write_yaml_list(\"list.yaml\", [1, 2, 3])\n```\n\n### High-Level FileBlob Operations\n\n```python\nimport kiarina.utils.file as kf\n\n# Read file with automatic MIME type detection\nblob = kf.read_file(\"document.pdf\")\nif blob:\n    print(f\"File: {blob.file_path}\")\n    print(f\"MIME type: {blob.mime_type}\")\n    print(f\"Size: {len(blob.raw_data)} bytes\")\n    print(f\"Extension: {blob.ext}\")\n\n# Create and write FileBlob\nblob = kf.FileBlob(\n    \"output.txt\",\n    mime_type=\"text/plain\",\n    raw_text=\"Hello, World!\"\n)\nkf.write_file(blob)\n\n# Data URL generation for web use\nprint(blob.raw_base64_url)  # data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==\n```\n\n### Async Operations\n\n```python\nimport kiarina.utils.file.asyncio as kfa\n\nasync def process_files():\n    # All operations have async equivalents\n    text = await kfa.read_text(\"large_file.txt\")\n    await kfa.write_json_dict(\"result.json\", {\"processed\": True})\n    \n    # FileBlob operations\n    blob = await kfa.read_file(\"document.pdf\")\n    if blob:\n        await kfa.write_file(blob, \"backup.pdf\")\n```\n\n### MIME Type and Extension Detection\n\n```python\nimport kiarina.utils.mime as km\nimport kiarina.utils.ext as ke\n\n# MIME type detection from content and filename\nmime_type = km.detect_mime_type(\n    raw_data=file_data,\n    file_name_hint=\"document.pdf\"\n)\n\n# Extension detection from MIME type\nextension = ke.detect_extension(\"application/json\")  # \".json\"\n\n# Multi-part extension extraction\nextension = ke.extract_extension(\"archive.tar.gz\")  # \".tar.gz\"\n\n# Create MIME blob from data\nblob = km.create_mime_blob(jpeg_data)\nprint(f\"Detected: {blob.mime_type}\")  # \"image/jpeg\"\n```\n\n### Encoding Detection\n\n```python\nimport kiarina.utils.encoding as kenc\n\n# Automatic encoding detection\nwith open(\"mystery_file.txt\", \"rb\") as f:\n    raw_data = f.read()\n\nencoding = kenc.detect_encoding(raw_data)\ntext = kenc.decode_binary_to_text(raw_data)\n\n# Check if data is binary or text\nis_binary = kenc.is_binary(raw_data)\n```\n\n## Advanced Usage\n\n### Custom Configuration\n\nConfigure behavior through environment variables:\n\n```bash\n# Encoding detection\nexport KIARINA_UTILS_ENCODING_USE_NKF=true\nexport KIARINA_UTILS_ENCODING_DEFAULT_ENCODING=utf-8\n\n# File operations\nexport KIARINA_UTILS_FILE_LOCK_DIR=/custom/lock/dir\nexport KIARINA_UTILS_FILE_LOCK_CLEANUP_ENABLED=true\n\n# MIME type detection\nexport KIARINA_UTILS_MIME_HASH_ALGORITHM=sha256\n```\n\n### Error Handling\n\n```python\nimport kiarina.utils.file as kf\n\ntry:\n    data = kf.read_json_dict(\"config.json\")\n    if data is None:\n        print(\"File not found, using defaults\")\n        data = {\"default\": True}\nexcept json.JSONDecodeError as e:\n    print(f\"Invalid JSON: {e}\")\nexcept Exception as e:\n    print(f\"Unexpected error: {e}\")\n```\n\n### Performance Considerations\n\n```python\nimport kiarina.utils.file.asyncio as kfa\n\n# For I/O intensive operations, use async versions\nasync def process_many_files(file_paths):\n    tasks = [kfa.read_file(path) for path in file_paths]\n    results = await asyncio.gather(*tasks)\n    return [r for r in results if r is not None]\n\n# Use appropriate defaults to avoid None checks\nconfig = kf.read_json_dict(\"config.json\", default={})\n# Instead of:\n# config = kf.read_json_dict(\"config.json\")\n# if config is None:\n#     config = {}\n```\n\n## API Reference\n\n### File Operations\n\n#### Synchronous API (`kiarina.utils.file`)\n\n**High-level operations:**\n- `read_file(path, *, fallback_mime_type=\"application/octet-stream\", default=None) -> FileBlob | None`\n- `write_file(file_blob, file_path=None) -> None`\n\n**Text operations:**\n- `read_text(path, *, default=None) -> str | None`\n- `write_text(path, text) -> None`\n\n**Binary operations:**\n- `read_binary(path, *, default=None) -> bytes | None`\n- `write_binary(path, data) -> None`\n\n**JSON operations:**\n- `read_json_dict(path, *, default=None) -> dict[str, Any] | None`\n- `write_json_dict(path, data, *, indent=2, ensure_ascii=False, sort_keys=False) -> None`\n- `read_json_list(path, *, default=None) -> list[Any] | None`\n- `write_json_list(path, data, *, indent=2, ensure_ascii=False, sort_keys=False) -> None`\n\n**YAML operations:**\n- `read_yaml_dict(path, *, default=None) -> dict[str, Any] | None`\n- `write_yaml_dict(path, data, *, allow_unicode=True, sort_keys=False) -> None`\n- `read_yaml_list(path, *, default=None) -> list[Any] | None`\n- `write_yaml_list(path, data, *, allow_unicode=True, sort_keys=False) -> None`\n\n**File management:**\n- `remove_file(path) -> None`\n\n#### Asynchronous API (`kiarina.utils.file.asyncio`)\n\nAll synchronous functions have async equivalents with the same signatures, but they return `Awaitable` objects and must be called with `await`.\n\n### Data Containers\n\n#### FileBlob\n\n```python\nclass FileBlob:\n    def __init__(self, file_path, mime_blob=None, *, mime_type=None, raw_data=None, raw_text=None)\n    \n    # Properties\n    file_path: str\n    mime_blob: MIMEBlob\n    mime_type: str\n    raw_data: bytes\n    raw_text: str\n    raw_base64_str: str\n    raw_base64_url: str\n    hash_string: str\n    ext: str\n    hashed_file_name: str\n    \n    # Methods\n    def is_binary() -> bool\n    def is_text() -> bool\n    def replace(*, file_path=None, mime_blob=None, mime_type=None, raw_data=None, raw_text=None) -> FileBlob\n```\n\n#### MIMEBlob\n\n```python\nclass MIMEBlob:\n    def __init__(self, mime_type, raw_data=None, *, raw_text=None)\n    \n    # Properties\n    mime_type: str\n    raw_data: bytes\n    raw_text: str\n    raw_base64_str: str\n    raw_base64_url: str\n    hash_string: str\n    ext: str\n    hashed_file_name: str\n    \n    # Methods\n    def is_binary() -> bool\n    def is_text() -> bool\n    def replace(*, mime_type=None, raw_data=None, raw_text=None) -> MIMEBlob\n```\n\n### Utility Functions\n\n#### MIME Type Detection (`kiarina.utils.mime`)\n\n- `detect_mime_type(*, raw_data=None, stream=None, file_name_hint=None, **kwargs) -> str | None`\n- `create_mime_blob(raw_data, *, fallback_mime_type=\"application/octet-stream\") -> MIMEBlob`\n- `apply_mime_alias(mime_type, *, mime_aliases=None) -> str`\n\n#### Extension Detection (`kiarina.utils.ext`)\n\n- `detect_extension(mime_type, *, custom_extensions=None, default=None) -> str | None`\n- `extract_extension(file_name_hint, *, multi_extensions=None, **kwargs, default=None) -> str | None`\n\n#### Encoding Detection (`kiarina.utils.encoding`)\n\n- `detect_encoding(raw_data, *, use_nkf=None, **kwargs) -> str | None`\n- `decode_binary_to_text(raw_data, *, use_nkf=None, **kwargs) -> str`\n- `is_binary(raw_data, *, use_nkf=None, **kwargs) -> bool`\n- `get_default_encoding() -> str`\n- `normalize_newlines(text) -> str`\n\n## Configuration\n\n### Environment Variables\n\n#### Encoding Detection\n- `KIARINA_UTILS_ENCODING_USE_NKF`: Enable/disable nkf usage (bool)\n- `KIARINA_UTILS_ENCODING_DEFAULT_ENCODING`: Default encoding (default: \"utf-8\")\n- `KIARINA_UTILS_ENCODING_FALLBACK_ENCODINGS`: Comma-separated list of fallback encodings\n- `KIARINA_UTILS_ENCODING_MAX_SAMPLE_SIZE`: Maximum bytes to sample for detection (default: 8192)\n- `KIARINA_UTILS_ENCODING_CHARSET_NORMALIZER_CONFIDENCE_THRESHOLD`: Confidence threshold (default: 0.6)\n\n#### File Operations\n- `KIARINA_UTILS_FILE_LOCK_DIR`: Custom lock directory path\n- `KIARINA_UTILS_FILE_LOCK_CLEANUP_ENABLED`: Enable automatic cleanup (default: true)\n- `KIARINA_UTILS_FILE_LOCK_MAX_AGE_HOURS`: Maximum age for lock files in hours (default: 24)\n\n#### MIME Type Detection\n- `KIARINA_UTILS_MIME_HASH_ALGORITHM`: Hash algorithm for content addressing (default: \"sha256\")\n\n#### Extension Detection\n- `KIARINA_UTILS_EXT_MAX_MULTI_EXTENSION_PARTS`: Maximum parts for multi-extension detection (default: 4)\n\n## Requirements\n\n- **Python**: 3.12 or higher\n- **Core dependencies**: \n  - `aiofiles>=24.1.0` - Async file operations\n  - `charset-normalizer>=3.4.3` - Encoding detection\n  - `filelock>=3.19.1` - File locking\n  - `pydantic>=2.11.7` - Data validation\n  - `pydantic-settings>=2.10.1` - Settings management\n  - `pydantic-settings-manager>=2.1.0` - Advanced settings management\n  - `pyyaml>=6.0.2` - YAML support\n\n- **Optional dependencies**:\n  - `puremagic>=1.30` - Enhanced MIME type detection from file content\n\n## Development\n\n### Prerequisites\n\n- Python 3.12+\n- [uv](https://github.com/astral-sh/uv) for dependency management\n- [mise](https://mise.jdx.dev/) for task running\n\n### Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/kiarina/kiarina-python.git\ncd kiarina-python\n\n# Setup development environment\nmise run setup\n\n# Install dependencies for this package\ncd packages/kiarina-utils-file\nuv sync --group dev\n```\n\n### Running Tests\n\n```bash\n# Run all tests\nmise run package:test kiarina-utils-file\n\n# Run with coverage\nmise run package:test kiarina-utils-file --coverage\n\n# Run specific test files\nuv run --group test pytest tests/file/test_kiarina_utils_file_sync.py\nuv run --group test pytest tests/file/test_kiarina_utils_file_async.py\n```\n\n### Code Quality\n\n```bash\n# Format code\nmise run package:format kiarina-utils-file\n\n# Run linting\nmise run package:lint kiarina-utils-file\n\n# Type checking\nmise run package:typecheck kiarina-utils-file\n\n# Run all checks\nmise run package kiarina-utils-file\n```\n\n## Performance\n\n### Benchmarks\n\nThe library is optimized for performance with several key features:\n\n- **Lazy loading**: Properties are computed only when accessed\n- **Caching**: Expensive operations like encoding detection are cached\n- **Async support**: Non-blocking I/O for high-throughput applications\n- **Efficient sampling**: Large files are sampled for encoding/MIME detection\n- **Atomic operations**: Safe concurrent file access with minimal overhead\n\n### Memory Usage\n\n- **Streaming support**: Large files can be processed without loading entirely into memory\n- **Configurable sampling**: Detection algorithms use configurable sample sizes\n- **Efficient caching**: Only frequently accessed properties are cached\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](../../LICENSE) file for details.\n\n## Contributing\n\nThis is a personal project, but contributions are welcome! Please feel free to submit issues or pull requests.\n\n### Guidelines\n\n1. **Code Style**: Follow the existing code style (enforced by ruff)\n2. **Testing**: Add tests for new functionality\n3. **Documentation**: Update documentation for API changes\n4. **Type Hints**: Maintain full type hint coverage\n\n## Related Projects\n\n- [kiarina-python](https://github.com/kiarina/kiarina-python) - The main monorepo containing this package\n- [pydantic-settings-manager](https://github.com/kiarina/pydantic-settings-manager) - Configuration management library used by this package\n\n## Changelog\n\nSee [CHANGELOG.md](CHANGELOG.md) for a detailed history of changes.\n\n## Support\n\n- **Issues**: [GitHub Issues](https://github.com/kiarina/kiarina-python/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/kiarina/kiarina-python/discussions)\n\n---\n\nMade with \u2764\ufe0f by [kiarina](https://github.com/kiarina)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Comprehensive Python library for file I/O operations with automatic encoding detection, MIME type detection, and support for various file formats",
    "version": "1.0.1",
    "project_urls": {
        "Changelog": "https://github.com/kiarina/kiarina-python/blob/main/packages/kiarina-utils-file/CHANGELOG.md",
        "Documentation": "https://github.com/kiarina/kiarina-python/tree/main/packages/kiarina-utils-file#readme",
        "Homepage": "https://github.com/kiarina/kiarina-python",
        "Issues": "https://github.com/kiarina/kiarina-python/issues",
        "Repository": "https://github.com/kiarina/kiarina-python"
    },
    "split_keywords": [
        "file",
        " io",
        " encoding",
        " mime",
        " async",
        " sync",
        " json",
        " yaml",
        " binary",
        " text",
        " detection",
        " blob",
        " atomic",
        " thread-safe"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "10a1a894bf1cb0566f83298e8b3c93371bc922bec1e0eefe309ba68b0443c03a",
                "md5": "06a3e2ca887d30945dc7b41e801a3272",
                "sha256": "2af9a51835cf27483cdd3e9abb3deba543c8ca2992a2572b088c27f64930079d"
            },
            "downloads": -1,
            "filename": "kiarina_utils_file-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "06a3e2ca887d30945dc7b41e801a3272",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 73286,
            "upload_time": "2025-09-11T08:52:27",
            "upload_time_iso_8601": "2025-09-11T08:52:27.592860Z",
            "url": "https://files.pythonhosted.org/packages/10/a1/a894bf1cb0566f83298e8b3c93371bc922bec1e0eefe309ba68b0443c03a/kiarina_utils_file-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3e01eb17d2f4d72a4f6981059aad27330609fe2b258ebef979376015f2a3dd41",
                "md5": "f3f79b223df6f000de7002455bab13a1",
                "sha256": "7e62378812f11f4f524008abad69d85d934fbfa3fd02cbac67c9ab476d2deaaa"
            },
            "downloads": -1,
            "filename": "kiarina_utils_file-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "f3f79b223df6f000de7002455bab13a1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 43473,
            "upload_time": "2025-09-11T08:52:32",
            "upload_time_iso_8601": "2025-09-11T08:52:32.081036Z",
            "url": "https://files.pythonhosted.org/packages/3e/01/eb17d2f4d72a4f6981059aad27330609fe2b258ebef979376015f2a3dd41/kiarina_utils_file-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-11 08:52:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kiarina",
    "github_project": "kiarina-python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "kiarina-utils-file"
}
        
Elapsed time: 1.78317s