# sortx-universal
[](https://github.com/Okymi-X/sortx-universal/actions)
[](https://badge.fury.io/py/sortx-universal)
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
**sortx-universal** is a powerful, universal sorting tool and Python library designed to sort any kind of data: in-memory data structures, CSV/JSONL files, plain text, and even massive datasets using efficient external sorting algorithms.
## β¨ Features
π **Universal Sorting**: Sort any data format (CSV, JSONL, TXT, compressed files)
π **Multi-key Sorting**: Sort by multiple columns with different data types and directions
β‘ **External Sorting**: Handle massive files that don't fit in memory using external merge sort
π **Locale-aware**: International text sorting with locale support
π§ **Smart Detection**: Automatically detect file formats and separators
π¦ **Easy Installation**: Simple `pip install sortx-universal`
π οΈ **CLI + Library**: Use as command-line tool or import as Python library
π― **Type Support**: Numbers, strings, dates, natural sorting
π **Stable Sorting**: Preserves original order for equal elements
ποΈ **Flexible Options**: Reverse, unique constraints, memory limits
## π¦ Installation
### Basic Installation
```bash
pip install sortx-universal
```
### Full Installation (with CLI and enhanced features)
```bash
pip install sortx-universal[full]
```
The full installation includes:
- `typer` and `rich` for beautiful CLI experience
- `python-dateutil` for advanced date parsing
- `natsort` for natural sorting
- `chardet` for encoding detection
## π Quick Start
### Command Line Interface
```bash
# Sort CSV by price (numeric), then name (alphabetic)
sortx-universal data.csv -o sorted.csv -k price:num -k name:str
# Sort large JSONL file by timestamp with memory limit
sortx-universal logs.jsonl.gz -o sorted.jsonl.gz -k timestamp:date --memory-limit=512M
# Natural sort of text file (file2 comes before file10)
sortx-universal filenames.txt -o sorted.txt -k 0:nat
# Sort with uniqueness constraint
sortx-universal users.jsonl -o unique_users.jsonl -k created_at:date --unique=id
# Show sorting statistics
sortx-universal large_data.csv -o sorted_data.csv -k score:num:desc=true --stats
```
### Python Library
```python
import sortx
# Sort in-memory data
data = [
{"name": "Alice", "age": 30, "salary": 50000},
{"name": "Bob", "age": 25, "salary": 45000},
{"name": "Charlie", "age": 35, "salary": 60000}
]
# Single key sorting
sorted_by_age = list(sortx.sort_iter(
data,
keys=[sortx.key("age", "num")]
))
# Multi-key sorting
sorted_multi = list(sortx.sort_iter(
data,
keys=[
sortx.key("salary", "num", desc=True), # Salary descending
sortx.key("name", "str") # Then name ascending
]
))
# Sort file to file
stats = sortx.sort_file(
input_path="input.csv",
output_path="output.csv",
keys=[sortx.key("created_at", "date", desc=True)],
stats=True
)
print(f"Processed {stats.lines_processed} lines in {stats.processing_time:.2f}s")
```
## π Data Types
sortx-universal supports multiple data types for sorting keys:
| Type | Description | Example |
|------|-------------|---------|
| **`num`** | Numeric sorting (integers, floats) | `42`, `3.14`, `-10` |
| **`str`** | String sorting with locale support | `"Alice"`, `"cafΓ©"` |
| **`date`** | Date/time sorting (ISO 8601 + common formats) | `"2025-01-15"`, `"2025-01-15T10:30:00Z"` |
| **`nat`** | Natural sorting ("file2" < "file10") | `"file1.txt"`, `"file10.txt"` |
### Date Format Support
- ISO 8601: `2025-01-15T10:30:00Z`
- Common formats: `2025-01-15`, `01/15/2025`, `Jan 15, 2025`
- Automatic parsing with `python-dateutil` (when installed)
## π File Format Support
| Format | Extensions | Compression | Description |
|--------|------------|-------------|-------------|
| **CSV/TSV** | `.csv`, `.tsv` | β
| Automatic delimiter detection |
| **JSONL** | `.jsonl`, `.ndjson` | β
| One JSON object per line |
| **Plain Text** | `.txt`, any | β
| Line-by-line sorting |
| **Compressed** | `.gz`, `.zst` | - | Transparent compression support |
### Large File Handling
- **External Sorting**: Automatically handles files larger than available RAM
- **Memory Limits**: Configurable memory usage (`--memory-limit=512M`)
- **Streaming**: Processes files line-by-line to minimize memory footprint
## π§ Command Line Reference
```bash
sortx-universal [INPUT] [OPTIONS]
```
### Options
| Option | Short | Description |
|--------|-------|-------------|
| `--output FILE` | `-o` | Output file path |
| `--key KEY_SPEC` | `-k` | Sort key specification (can be used multiple times) |
| `--reverse` | | Reverse the entire sort order |
| `--stable` | | Use stable sorting (default) |
| `--unique COLUMN` | | Keep only unique values for specified column |
| `--memory-limit SIZE` | | Memory limit for external sorting (e.g., 512M, 2G) |
| `--stats` | | Show detailed sorting statistics |
| `--help` | `-h` | Show help message |
### Key Specification Format
Sort keys use the format: `column:type[:desc=true][:locale=name]`
**Examples:**
- `price:num` - Sort by price as number (ascending)
- `price:num:desc=true` - Sort by price as number (descending)
- `name:str:locale=fr_FR` - Sort by name with French locale
- `timestamp:date` - Sort by timestamp as date
- `0:nat` - Natural sort by first column (for text files)
## π‘ Examples
### Example 1: Sales Data Analysis
**Input (`sales.csv`):**
```csv
region,product,revenue,date
North,Widget A,1000,2025-01-15
South,Widget B,1500,2025-01-14
North,Widget C,800,2025-01-16
South,Widget A,1200,2025-01-13
```
**Command:**
```bash
sortx-universal sales.csv -o sorted_sales.csv -k region:str -k revenue:num:desc=true
```
**Output:**
```csv
region,product,revenue,date
North,Widget A,1000,2025-01-15
North,Widget C,800,2025-01-16
South,Widget B,1500,2025-01-14
South,Widget A,1200,2025-01-13
```
### Example 2: Log File Processing
**Input (`server.jsonl`):**
```json
{"timestamp": "2025-01-15T10:30:00Z", "level": "ERROR", "message": "Connection failed"}
{"timestamp": "2025-01-15T10:25:00Z", "level": "INFO", "message": "Server started"}
{"timestamp": "2025-01-15T10:35:00Z", "level": "WARN", "message": "High memory usage"}
```
**Command:**
```bash
sortx-universal server.jsonl -o sorted_logs.jsonl -k timestamp:date --stats
```
**Output includes statistics:**
```
Sorting Statistics:
Input file: server.jsonl
Output file: sorted_logs.jsonl
Lines processed: 3
Processing time: 0.01s
Input size: 312B
Output size: 312B
External sort: No
Throughput: 300 lines/sec
```
### Example 3: Large Dataset Processing
**Processing a 5GB file:**
```bash
sortx-universal huge_dataset.csv.gz -o sorted_huge.csv.gz \
-k timestamp:date \
-k user_id:num \
--memory-limit=1G \
--unique=transaction_id \
--stats
```
This command:
- Sorts by timestamp, then user_id
- Uses maximum 1GB of RAM (external sort for larger files)
- Removes duplicate transactions
- Shows detailed performance statistics
## π Python API Reference
### Core Functions
#### `sortx.key(column, data_type, desc=False, locale_name=None, **options)`
Create a sort key specification.
**Parameters:**
- `column`: Column name (dict) or index (list/tuple)
- `data_type`: Data type (`'str'`, `'num'`, `'date'`, `'nat'`)
- `desc`: Sort in descending order if True
- `locale_name`: Locale for string sorting (e.g., `'fr_FR.UTF-8'`)
#### `sortx.sort_iter(data, keys, stable=True, reverse=False, unique=None)`
Sort an iterator of data in memory.
**Parameters:**
- `data`: Iterator of items to sort
- `keys`: List of SortKey specifications
- `stable`: Use stable sorting algorithm
- `reverse`: Reverse the entire sort order
- `unique`: Column name for uniqueness constraint
#### `sortx.sort_file(input_path, output_path, keys, memory_limit=None, stats=False, **options)`
Sort a file and write results to another file.
**Parameters:**
- `input_path`: Path to input file
- `output_path`: Path to output file
- `keys`: List of SortKey specifications
- `memory_limit`: Memory limit string (e.g., `'512M'`, `'2G'`)
- `stats`: Return sorting statistics
### Advanced Usage
```python
import sortx
# Complex multi-key sorting with different options per key
keys = [
sortx.key("department", "str"), # Primary: department
sortx.key("salary", "num", desc=True), # Secondary: salary (desc)
sortx.key("hire_date", "date"), # Tertiary: hire date
sortx.key("name", "str", locale_name="en_US") # Quaternary: name
]
result = list(sortx.sort_iter(employee_data, keys=keys))
# File sorting with memory management and statistics
stats = sortx.sort_file(
input_path="employees.csv",
output_path="sorted_employees.csv",
keys=keys,
memory_limit="256M", # Use max 256MB RAM
unique="employee_id", # Remove duplicates by employee ID
stats=True # Return detailed statistics
)
print(f"Sorted {stats.lines_processed} employees")
print(f"Processing time: {stats.processing_time:.2f} seconds")
print(f"Throughput: {stats.throughput:.0f} lines/second")
```
## β‘ Performance
sortx-universal is optimized for performance across different scenarios:
### In-Memory Sorting
- **Fast**: Optimized Python sorting with custom key functions
- **Memory Efficient**: Streaming processing where possible
- **Stable**: Maintains relative order of equal elements
### External Sorting (Large Files)
- **Scalable**: Handles files larger than available RAM
- **Configurable**: Memory usage limits prevent system overload
- **Efficient**: Multi-way merge sort with optimized I/O
### Benchmarks (Approximate)
| File Size | Records | Memory Limit | Processing Time | Throughput |
|-----------|---------|-------------|----------------|------------|
| 100MB | 1M | 512MB | 5s | 200K lines/sec |
| 1GB | 10M | 512MB | 60s | 167K lines/sec |
| 10GB | 100M | 1GB | 15min | 111K lines/sec |
*Benchmarks run on modern hardware (SSD, 16GB RAM). Performance varies based on data complexity and system specifications.*
## π οΈ Development
### Setup Development Environment
```bash
# Clone the repository
git clone https://github.com/Okymi-X/sortx-universal.git
cd sortx-universal
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install in development mode with all dependencies
pip install -e ".[full,dev]"
```
### Run Tests
```bash
# Run all tests
pytest
# Run tests with coverage
pytest --cov=sortx --cov-report=html
# Run specific test file
pytest tests/test_core.py
```
### Code Quality
```bash
# Format code
black sortx tests
# Sort imports
isort sortx tests
# Lint code
flake8 sortx tests
# Type checking
mypy sortx
```
### Running Demo
```bash
# Quick demo
python demo.py
# Comprehensive tests
python main.py
```
## π€ Contributing
Contributions are welcome! Here's how to get started:
1. **Fork** the repository
2. **Create** your feature branch (`git checkout -b feature/amazing-feature`)
3. **Make** your changes and add tests
4. **Ensure** code quality (`black`, `isort`, `flake8`, `pytest`)
5. **Commit** your changes (`git commit -m 'Add amazing feature'`)
6. **Push** to the branch (`git push origin feature/amazing-feature`)
7. **Open** a Pull Request
### Areas for Contribution
- π Performance optimizations
- π Additional file format support
- π Locale and internationalization improvements
- π Documentation and examples
- π§ͺ Test coverage expansion
- π§ CLI enhancements
## π License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## πΊοΈ Roadmap
### Version 0.2.0
- [ ] Rust core implementation for 10x performance boost
- [ ] Additional compression formats (bz2, xz, lz4)
- [ ] Memory-mapped file support for better performance
- [ ] Progress bars for long-running operations
### Version 0.3.0
- [ ] Additional file formats (Parquet, Avro, Excel)
- [ ] Database integration (PostgreSQL, SQLite)
- [ ] Parallel sorting with multiple CPU cores
- [ ] Advanced statistics and profiling
### Version 1.0.0
- [ ] Distributed sorting across multiple machines
- [ ] Web-based GUI interface
- [ ] Plugin system for custom data types
- [ ] Real-time streaming sort capabilities
## π Acknowledgments
- Inspired by **GNU sort** and other Unix sorting utilities
- Built with Python's robust ecosystem for data processing
- Uses **external sorting algorithms** from computer science literature
- Thanks to the open source community for excellent libraries:
- `typer` and `rich` for beautiful CLI
- `python-dateutil` for date parsing
- `natsort` for natural sorting
## π Support
- π **Documentation**: [GitHub README](https://github.com/Okymi-X/sortx-universal#readme)
- π **Bug Reports**: [GitHub Issues](https://github.com/Okymi-X/sortx-universal/issues)
- π¬ **Discussions**: [GitHub Discussions](https://github.com/Okymi-X/sortx-universal/discussions)
- π§ **Email**: dev@sortx-universal.io
---
Raw data
{
"_id": null,
"home_page": null,
"name": "sortx-universal",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "sortx-universal contributors <dev@sortx-universal.io>",
"keywords": "sorting, csv, json, external-sort, cli, data-processing",
"author": null,
"author_email": "sortx-universal contributors <dev@sortx-universal.io>",
"download_url": "https://files.pythonhosted.org/packages/1b/e8/2c6c2c4a0419fc48eb60d62398217329354b1ba77521edb6fbf10153a0a9/sortx_universal-0.1.0.tar.gz",
"platform": null,
"description": "# sortx-universal\r\n\r\n[](https://github.com/Okymi-X/sortx-universal/actions)\r\n[](https://badge.fury.io/py/sortx-universal)\r\n[](https://opensource.org/licenses/MIT)\r\n[](https://www.python.org/downloads/)\r\n\r\n**sortx-universal** is a powerful, universal sorting tool and Python library designed to sort any kind of data: in-memory data structures, CSV/JSONL files, plain text, and even massive datasets using efficient external sorting algorithms.\r\n\r\n## \u2728 Features\r\n\r\n\ud83d\ude80 **Universal Sorting**: Sort any data format (CSV, JSONL, TXT, compressed files) \r\n\ud83d\udcca **Multi-key Sorting**: Sort by multiple columns with different data types and directions \r\n\u26a1 **External Sorting**: Handle massive files that don't fit in memory using external merge sort \r\n\ud83c\udf0d **Locale-aware**: International text sorting with locale support \r\n\ud83d\udd27 **Smart Detection**: Automatically detect file formats and separators \r\n\ud83d\udce6 **Easy Installation**: Simple `pip install sortx-universal` \r\n\ud83d\udee0\ufe0f **CLI + Library**: Use as command-line tool or import as Python library \r\n\ud83c\udfaf **Type Support**: Numbers, strings, dates, natural sorting \r\n\ud83d\udd04 **Stable Sorting**: Preserves original order for equal elements \r\n\ud83c\udf9b\ufe0f **Flexible Options**: Reverse, unique constraints, memory limits \r\n\r\n## \ud83d\udce6 Installation\r\n\r\n### Basic Installation\r\n```bash\r\npip install sortx-universal\r\n```\r\n\r\n### Full Installation (with CLI and enhanced features)\r\n```bash\r\npip install sortx-universal[full]\r\n```\r\n\r\nThe full installation includes:\r\n- `typer` and `rich` for beautiful CLI experience\r\n- `python-dateutil` for advanced date parsing\r\n- `natsort` for natural sorting\r\n- `chardet` for encoding detection\r\n\r\n## \ud83d\ude80 Quick Start\r\n\r\n### Command Line Interface\r\n\r\n```bash\r\n# Sort CSV by price (numeric), then name (alphabetic)\r\nsortx-universal data.csv -o sorted.csv -k price:num -k name:str\r\n\r\n# Sort large JSONL file by timestamp with memory limit\r\nsortx-universal logs.jsonl.gz -o sorted.jsonl.gz -k timestamp:date --memory-limit=512M\r\n\r\n# Natural sort of text file (file2 comes before file10)\r\nsortx-universal filenames.txt -o sorted.txt -k 0:nat\r\n\r\n# Sort with uniqueness constraint\r\nsortx-universal users.jsonl -o unique_users.jsonl -k created_at:date --unique=id\r\n\r\n# Show sorting statistics\r\nsortx-universal large_data.csv -o sorted_data.csv -k score:num:desc=true --stats\r\n```\r\n\r\n### Python Library\r\n\r\n```python\r\nimport sortx\r\n\r\n# Sort in-memory data\r\ndata = [\r\n {\"name\": \"Alice\", \"age\": 30, \"salary\": 50000},\r\n {\"name\": \"Bob\", \"age\": 25, \"salary\": 45000},\r\n {\"name\": \"Charlie\", \"age\": 35, \"salary\": 60000}\r\n]\r\n\r\n# Single key sorting\r\nsorted_by_age = list(sortx.sort_iter(\r\n data, \r\n keys=[sortx.key(\"age\", \"num\")]\r\n))\r\n\r\n# Multi-key sorting\r\nsorted_multi = list(sortx.sort_iter(\r\n data,\r\n keys=[\r\n sortx.key(\"salary\", \"num\", desc=True), # Salary descending\r\n sortx.key(\"name\", \"str\") # Then name ascending\r\n ]\r\n))\r\n\r\n# Sort file to file\r\nstats = sortx.sort_file(\r\n input_path=\"input.csv\",\r\n output_path=\"output.csv\", \r\n keys=[sortx.key(\"created_at\", \"date\", desc=True)],\r\n stats=True\r\n)\r\nprint(f\"Processed {stats.lines_processed} lines in {stats.processing_time:.2f}s\")\r\n```\r\n\r\n## \ud83d\udcca Data Types\r\n\r\nsortx-universal supports multiple data types for sorting keys:\r\n\r\n| Type | Description | Example |\r\n|------|-------------|---------|\r\n| **`num`** | Numeric sorting (integers, floats) | `42`, `3.14`, `-10` |\r\n| **`str`** | String sorting with locale support | `\"Alice\"`, `\"caf\u00e9\"` |\r\n| **`date`** | Date/time sorting (ISO 8601 + common formats) | `\"2025-01-15\"`, `\"2025-01-15T10:30:00Z\"` |\r\n| **`nat`** | Natural sorting (\"file2\" < \"file10\") | `\"file1.txt\"`, `\"file10.txt\"` |\r\n\r\n### Date Format Support\r\n- ISO 8601: `2025-01-15T10:30:00Z`\r\n- Common formats: `2025-01-15`, `01/15/2025`, `Jan 15, 2025`\r\n- Automatic parsing with `python-dateutil` (when installed)\r\n\r\n## \ud83d\udcc1 File Format Support\r\n\r\n| Format | Extensions | Compression | Description |\r\n|--------|------------|-------------|-------------|\r\n| **CSV/TSV** | `.csv`, `.tsv` | \u2705 | Automatic delimiter detection |\r\n| **JSONL** | `.jsonl`, `.ndjson` | \u2705 | One JSON object per line |\r\n| **Plain Text** | `.txt`, any | \u2705 | Line-by-line sorting |\r\n| **Compressed** | `.gz`, `.zst` | - | Transparent compression support |\r\n\r\n### Large File Handling\r\n- **External Sorting**: Automatically handles files larger than available RAM\r\n- **Memory Limits**: Configurable memory usage (`--memory-limit=512M`)\r\n- **Streaming**: Processes files line-by-line to minimize memory footprint\r\n\r\n## \ud83d\udd27 Command Line Reference\r\n\r\n```bash\r\nsortx-universal [INPUT] [OPTIONS]\r\n```\r\n\r\n### Options\r\n\r\n| Option | Short | Description |\r\n|--------|-------|-------------|\r\n| `--output FILE` | `-o` | Output file path |\r\n| `--key KEY_SPEC` | `-k` | Sort key specification (can be used multiple times) |\r\n| `--reverse` | | Reverse the entire sort order |\r\n| `--stable` | | Use stable sorting (default) |\r\n| `--unique COLUMN` | | Keep only unique values for specified column |\r\n| `--memory-limit SIZE` | | Memory limit for external sorting (e.g., 512M, 2G) |\r\n| `--stats` | | Show detailed sorting statistics |\r\n| `--help` | `-h` | Show help message |\r\n\r\n### Key Specification Format\r\n\r\nSort keys use the format: `column:type[:desc=true][:locale=name]`\r\n\r\n**Examples:**\r\n- `price:num` - Sort by price as number (ascending)\r\n- `price:num:desc=true` - Sort by price as number (descending)\r\n- `name:str:locale=fr_FR` - Sort by name with French locale\r\n- `timestamp:date` - Sort by timestamp as date\r\n- `0:nat` - Natural sort by first column (for text files)\r\n\r\n## \ud83d\udca1 Examples\r\n\r\n### Example 1: Sales Data Analysis\r\n\r\n**Input (`sales.csv`):**\r\n```csv\r\nregion,product,revenue,date\r\nNorth,Widget A,1000,2025-01-15\r\nSouth,Widget B,1500,2025-01-14\r\nNorth,Widget C,800,2025-01-16\r\nSouth,Widget A,1200,2025-01-13\r\n```\r\n\r\n**Command:**\r\n```bash\r\nsortx-universal sales.csv -o sorted_sales.csv -k region:str -k revenue:num:desc=true\r\n```\r\n\r\n**Output:**\r\n```csv\r\nregion,product,revenue,date\r\nNorth,Widget A,1000,2025-01-15\r\nNorth,Widget C,800,2025-01-16\r\nSouth,Widget B,1500,2025-01-14\r\nSouth,Widget A,1200,2025-01-13\r\n```\r\n\r\n### Example 2: Log File Processing\r\n\r\n**Input (`server.jsonl`):**\r\n```json\r\n{\"timestamp\": \"2025-01-15T10:30:00Z\", \"level\": \"ERROR\", \"message\": \"Connection failed\"}\r\n{\"timestamp\": \"2025-01-15T10:25:00Z\", \"level\": \"INFO\", \"message\": \"Server started\"}\r\n{\"timestamp\": \"2025-01-15T10:35:00Z\", \"level\": \"WARN\", \"message\": \"High memory usage\"}\r\n```\r\n\r\n**Command:**\r\n```bash\r\nsortx-universal server.jsonl -o sorted_logs.jsonl -k timestamp:date --stats\r\n```\r\n\r\n**Output includes statistics:**\r\n```\r\nSorting Statistics:\r\n Input file: server.jsonl\r\n Output file: sorted_logs.jsonl\r\n Lines processed: 3\r\n Processing time: 0.01s\r\n Input size: 312B\r\n Output size: 312B\r\n External sort: No\r\n Throughput: 300 lines/sec\r\n```\r\n\r\n### Example 3: Large Dataset Processing\r\n\r\n**Processing a 5GB file:**\r\n```bash\r\nsortx-universal huge_dataset.csv.gz -o sorted_huge.csv.gz \\\r\n -k timestamp:date \\\r\n -k user_id:num \\\r\n --memory-limit=1G \\\r\n --unique=transaction_id \\\r\n --stats\r\n```\r\n\r\nThis command:\r\n- Sorts by timestamp, then user_id\r\n- Uses maximum 1GB of RAM (external sort for larger files)\r\n- Removes duplicate transactions\r\n- Shows detailed performance statistics\r\n\r\n## \ud83d\udc0d Python API Reference\r\n\r\n### Core Functions\r\n\r\n#### `sortx.key(column, data_type, desc=False, locale_name=None, **options)`\r\nCreate a sort key specification.\r\n\r\n**Parameters:**\r\n- `column`: Column name (dict) or index (list/tuple)\r\n- `data_type`: Data type (`'str'`, `'num'`, `'date'`, `'nat'`)\r\n- `desc`: Sort in descending order if True\r\n- `locale_name`: Locale for string sorting (e.g., `'fr_FR.UTF-8'`)\r\n\r\n#### `sortx.sort_iter(data, keys, stable=True, reverse=False, unique=None)`\r\nSort an iterator of data in memory.\r\n\r\n**Parameters:**\r\n- `data`: Iterator of items to sort\r\n- `keys`: List of SortKey specifications\r\n- `stable`: Use stable sorting algorithm\r\n- `reverse`: Reverse the entire sort order\r\n- `unique`: Column name for uniqueness constraint\r\n\r\n#### `sortx.sort_file(input_path, output_path, keys, memory_limit=None, stats=False, **options)`\r\nSort a file and write results to another file.\r\n\r\n**Parameters:**\r\n- `input_path`: Path to input file\r\n- `output_path`: Path to output file\r\n- `keys`: List of SortKey specifications\r\n- `memory_limit`: Memory limit string (e.g., `'512M'`, `'2G'`)\r\n- `stats`: Return sorting statistics\r\n\r\n### Advanced Usage\r\n\r\n```python\r\nimport sortx\r\n\r\n# Complex multi-key sorting with different options per key\r\nkeys = [\r\n sortx.key(\"department\", \"str\"), # Primary: department\r\n sortx.key(\"salary\", \"num\", desc=True), # Secondary: salary (desc)\r\n sortx.key(\"hire_date\", \"date\"), # Tertiary: hire date\r\n sortx.key(\"name\", \"str\", locale_name=\"en_US\") # Quaternary: name\r\n]\r\n\r\nresult = list(sortx.sort_iter(employee_data, keys=keys))\r\n\r\n# File sorting with memory management and statistics\r\nstats = sortx.sort_file(\r\n input_path=\"employees.csv\",\r\n output_path=\"sorted_employees.csv\",\r\n keys=keys,\r\n memory_limit=\"256M\", # Use max 256MB RAM\r\n unique=\"employee_id\", # Remove duplicates by employee ID\r\n stats=True # Return detailed statistics\r\n)\r\n\r\nprint(f\"Sorted {stats.lines_processed} employees\")\r\nprint(f\"Processing time: {stats.processing_time:.2f} seconds\")\r\nprint(f\"Throughput: {stats.throughput:.0f} lines/second\")\r\n```\r\n\r\n## \u26a1 Performance\r\n\r\nsortx-universal is optimized for performance across different scenarios:\r\n\r\n### In-Memory Sorting\r\n- **Fast**: Optimized Python sorting with custom key functions\r\n- **Memory Efficient**: Streaming processing where possible\r\n- **Stable**: Maintains relative order of equal elements\r\n\r\n### External Sorting (Large Files)\r\n- **Scalable**: Handles files larger than available RAM\r\n- **Configurable**: Memory usage limits prevent system overload\r\n- **Efficient**: Multi-way merge sort with optimized I/O\r\n\r\n### Benchmarks (Approximate)\r\n\r\n| File Size | Records | Memory Limit | Processing Time | Throughput |\r\n|-----------|---------|-------------|----------------|------------|\r\n| 100MB | 1M | 512MB | 5s | 200K lines/sec |\r\n| 1GB | 10M | 512MB | 60s | 167K lines/sec |\r\n| 10GB | 100M | 1GB | 15min | 111K lines/sec |\r\n\r\n*Benchmarks run on modern hardware (SSD, 16GB RAM). Performance varies based on data complexity and system specifications.*\r\n\r\n## \ud83d\udee0\ufe0f Development\r\n\r\n### Setup Development Environment\r\n\r\n```bash\r\n# Clone the repository\r\ngit clone https://github.com/Okymi-X/sortx-universal.git\r\ncd sortx-universal\r\n\r\n# Create virtual environment\r\npython -m venv .venv\r\nsource .venv/bin/activate # On Windows: .venv\\Scripts\\activate\r\n\r\n# Install in development mode with all dependencies\r\npip install -e \".[full,dev]\"\r\n```\r\n\r\n### Run Tests\r\n\r\n```bash\r\n# Run all tests\r\npytest\r\n\r\n# Run tests with coverage\r\npytest --cov=sortx --cov-report=html\r\n\r\n# Run specific test file\r\npytest tests/test_core.py\r\n```\r\n\r\n### Code Quality\r\n\r\n```bash\r\n# Format code\r\nblack sortx tests\r\n\r\n# Sort imports\r\nisort sortx tests\r\n\r\n# Lint code\r\nflake8 sortx tests\r\n\r\n# Type checking\r\nmypy sortx\r\n```\r\n\r\n### Running Demo\r\n\r\n```bash\r\n# Quick demo\r\npython demo.py\r\n\r\n# Comprehensive tests\r\npython main.py\r\n```\r\n\r\n## \ud83e\udd1d Contributing\r\n\r\nContributions are welcome! Here's how to get started:\r\n\r\n1. **Fork** the repository\r\n2. **Create** your feature branch (`git checkout -b feature/amazing-feature`)\r\n3. **Make** your changes and add tests\r\n4. **Ensure** code quality (`black`, `isort`, `flake8`, `pytest`)\r\n5. **Commit** your changes (`git commit -m 'Add amazing feature'`)\r\n6. **Push** to the branch (`git push origin feature/amazing-feature`)\r\n7. **Open** a Pull Request\r\n\r\n### Areas for Contribution\r\n- \ud83d\ude80 Performance optimizations\r\n- \ud83d\udcca Additional file format support\r\n- \ud83c\udf0d Locale and internationalization improvements\r\n- \ud83d\udcda Documentation and examples\r\n- \ud83e\uddea Test coverage expansion\r\n- \ud83d\udd27 CLI enhancements\r\n\r\n## \ud83d\udcc4 License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## \ud83d\uddfa\ufe0f Roadmap\r\n\r\n### Version 0.2.0\r\n- [ ] Rust core implementation for 10x performance boost\r\n- [ ] Additional compression formats (bz2, xz, lz4)\r\n- [ ] Memory-mapped file support for better performance\r\n- [ ] Progress bars for long-running operations\r\n\r\n### Version 0.3.0\r\n- [ ] Additional file formats (Parquet, Avro, Excel)\r\n- [ ] Database integration (PostgreSQL, SQLite)\r\n- [ ] Parallel sorting with multiple CPU cores\r\n- [ ] Advanced statistics and profiling\r\n\r\n### Version 1.0.0\r\n- [ ] Distributed sorting across multiple machines\r\n- [ ] Web-based GUI interface\r\n- [ ] Plugin system for custom data types\r\n- [ ] Real-time streaming sort capabilities\r\n\r\n## \ud83d\ude4f Acknowledgments\r\n\r\n- Inspired by **GNU sort** and other Unix sorting utilities\r\n- Built with Python's robust ecosystem for data processing\r\n- Uses **external sorting algorithms** from computer science literature\r\n- Thanks to the open source community for excellent libraries:\r\n - `typer` and `rich` for beautiful CLI\r\n - `python-dateutil` for date parsing\r\n - `natsort` for natural sorting\r\n\r\n## \ud83d\udcde Support\r\n\r\n- \ud83d\udcd6 **Documentation**: [GitHub README](https://github.com/Okymi-X/sortx-universal#readme)\r\n- \ud83d\udc1b **Bug Reports**: [GitHub Issues](https://github.com/Okymi-X/sortx-universal/issues)\r\n- \ud83d\udcac **Discussions**: [GitHub Discussions](https://github.com/Okymi-X/sortx-universal/discussions)\r\n- \ud83d\udce7 **Email**: dev@sortx-universal.io\r\n\r\n---\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Universal sorting tool for files, data structures, and large datasets",
"version": "0.1.0",
"project_urls": {
"Documentation": "https://github.com/Okymi-X/sortx-universal#readme",
"Homepage": "https://github.com/Okymi-X/sortx-universal",
"Issues": "https://github.com/Okymi-X/sortx-universal/issues",
"Repository": "https://github.com/Okymi-X/sortx-universal"
},
"split_keywords": [
"sorting",
" csv",
" json",
" external-sort",
" cli",
" data-processing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "253f702d88ec5808bcd02287096fa934914f7f9ee13ca3b89cf3a20ca85658e3",
"md5": "1847a16c9796fbe97522de961cb05c66",
"sha256": "e9eb3c63a3971dbf7726d8707aac3d7626c520661ae9788bcdfe36704ec207c4"
},
"downloads": -1,
"filename": "sortx_universal-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1847a16c9796fbe97522de961cb05c66",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 21881,
"upload_time": "2025-08-18T14:25:43",
"upload_time_iso_8601": "2025-08-18T14:25:43.682938Z",
"url": "https://files.pythonhosted.org/packages/25/3f/702d88ec5808bcd02287096fa934914f7f9ee13ca3b89cf3a20ca85658e3/sortx_universal-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1be82c6c2c4a0419fc48eb60d62398217329354b1ba77521edb6fbf10153a0a9",
"md5": "dab44c27a27b8ee9ee0c962da381a12f",
"sha256": "18cd2259bdbffa124fc1c30516513577377c9af7d8cd5433626dd4c8bde5fc71"
},
"downloads": -1,
"filename": "sortx_universal-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "dab44c27a27b8ee9ee0c962da381a12f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 28998,
"upload_time": "2025-08-18T14:25:44",
"upload_time_iso_8601": "2025-08-18T14:25:44.982923Z",
"url": "https://files.pythonhosted.org/packages/1b/e8/2c6c2c4a0419fc48eb60d62398217329354b1ba77521edb6fbf10153a0a9/sortx_universal-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-18 14:25:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Okymi-X",
"github_project": "sortx-universal#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "sortx-universal"
}