# Gapless Crypto Data
[](https://badge.fury.io/py/gapless-crypto-data)
[](https://pypi.org/project/gapless-crypto-data/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/astral-sh/uv)
Ultra-fast cryptocurrency data collection with zero gaps guarantee. Provides 11-column microstructure format through Binance public data repository with intelligent monthly-to-daily fallback for seamless coverage.
## Features
- **22x faster** data collection via Binance public data repository
- **Zero gaps guarantee** through intelligent monthly-to-daily fallback
- **Complete 13-timeframe support**: 1s, 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d
- **Ultra-high frequency** to daily data collection (1-second to 1-day intervals)
- **11-column microstructure format** with order flow and liquidity metrics
- **Intelligent fallback system** automatically switches to daily files when monthly files unavailable
- **Gap detection and filling** with authentic Binance API data only
- **UV-based Python tooling** for modern dependency management
- **Atomic file operations** ensuring data integrity
- **Multi-symbol & multi-timeframe** concurrent collection
- **CCXT-compatible** dual parameter support (timeframe/interval)
- **Production-grade** with comprehensive test coverage
## Quick Start
### Installation (UV)
```bash
# Install via UV
uv add gapless-crypto-data
# Or install globally
uv tool install gapless-crypto-data
```
### Installation (pip)
```bash
pip install gapless-crypto-data
```
### CLI Usage
```bash
# Collect data for multiple timeframes (all 13 timeframes supported)
gapless-crypto-data --symbol SOLUSDT --timeframes 1s,1m,5m,1h,4h,1d
# Ultra-high frequency data collection (1-second intervals)
gapless-crypto-data --symbol BTCUSDT --timeframes 1s,1m,3m
# Extended timeframes with intelligent fallback
gapless-crypto-data --symbol ETHUSDT --timeframes 6h,8h,12h,1d
# Collect multiple symbols at once (native multi-symbol support)
gapless-crypto-data --symbol BTCUSDT,ETHUSDT,SOLUSDT --timeframes 1h,4h,1d
# Collect specific date range with custom output directory
gapless-crypto-data --symbol BTCUSDT --timeframes 1h --start 2023-01-01 --end 2023-12-31 --output-dir ./crypto_data
# Multi-symbol with custom settings
gapless-crypto-data --symbol BTCUSDT,ETHUSDT --timeframes 5m,1h --start 2024-01-01 --end 2024-06-30 --output-dir ./crypto_data
# Fill gaps in existing data
gapless-crypto-data --fill-gaps --directory ./data
# Help
gapless-crypto-data --help
```
### Python API
#### Function-based API
```python
import gapless_crypto_data as gcd
# Fetch recent data with date range (CCXT-compatible timeframe parameter)
df = gcd.download("BTCUSDT", timeframe="1h", start="2024-01-01", end="2024-06-30")
# Or with limit
df = gcd.fetch_data("ETHUSDT", timeframe="4h", limit=1000)
# Backward compatibility (legacy interval parameter)
df = gcd.fetch_data("ETHUSDT", interval="4h", limit=1000) # DeprecationWarning
# Get available symbols and timeframes
symbols = gcd.get_supported_symbols()
timeframes = gcd.get_supported_timeframes()
# Fill gaps in existing data
results = gcd.fill_gaps("./data")
```
#### Class-based API
```python
from gapless_crypto_data import BinancePublicDataCollector, UniversalGapFiller
# Custom collection with full control
collector = BinancePublicDataCollector(
symbol="SOLUSDT",
start_date="2023-01-01",
end_date="2023-12-31"
)
result = collector.collect_timeframe_data("1h")
df = result["dataframe"]
# Manual gap filling
gap_filler = UniversalGapFiller()
gaps = gap_filler.detect_all_gaps(csv_file, "1h")
```
## Data Structure
All functions return pandas DataFrames with complete microstructure data:
```python
import gapless_crypto_data as gcd
# Fetch data
df = gcd.download("BTCUSDT", timeframe="1h", start="2024-01-01", end="2024-06-30")
# DataFrame columns (11-column microstructure format)
print(df.columns.tolist())
# ['date', 'open', 'high', 'low', 'close', 'volume',
# 'close_time', 'quote_asset_volume', 'number_of_trades',
# 'taker_buy_base_asset_volume', 'taker_buy_quote_asset_volume']
# Professional microstructure analysis
buy_pressure = df['taker_buy_base_asset_volume'].sum() / df['volume'].sum()
avg_trade_size = df['volume'].sum() / df['number_of_trades'].sum()
market_impact = df['quote_asset_volume'].std() / df['quote_asset_volume'].mean()
print(f"Taker buy pressure: {buy_pressure:.1%}")
print(f"Average trade size: {avg_trade_size:.4f} BTC")
print(f"Market impact volatility: {market_impact:.3f}")
```
## Data Sources
The package supports two data collection methods:
- **Binance Public Repository**: Pre-generated monthly ZIP files for historical data
- **Binance API**: Real-time data for gap filling and recent data collection
## 🏗️ Architecture
### Core Components
- **BinancePublicDataCollector**: Data collection with full 11-column microstructure format
- **UniversalGapFiller**: Intelligent gap detection and filling with authentic API-first validation
- **AtomicCSVOperations**: Corruption-proof file operations with atomic writes
- **SafeCSVMerger**: Safe merging of data files with integrity validation
### Data Flow
```
Binance Public Data Repository → BinancePublicDataCollector → 11-Column Microstructure Format
↓
Gap Detection → UniversalGapFiller → Authentic API-First Validation
↓
AtomicCSVOperations → Final Gapless Dataset with Order Flow Metrics
```
## 📝 CLI Options
### Data Collection
```bash
gapless-crypto-data [OPTIONS]
Options:
--symbol TEXT Trading pair symbol(s) - single symbol or comma-separated list (e.g., SOLUSDT, BTCUSDT,ETHUSDT)
--timeframes TEXT Comma-separated timeframes (1m,3m,5m,15m,30m,1h,2h,4h)
--start TEXT Start date (YYYY-MM-DD)
--end TEXT End date (YYYY-MM-DD)
--output-dir TEXT Output directory for CSV files (default: src/gapless_crypto_data/sample_data/)
--help Show this message and exit
```
### Gap Filling
```bash
gapless-crypto-data --fill-gaps [OPTIONS]
Options:
--directory TEXT Data directory to scan for gaps
--symbol TEXT Specific symbol to process (optional)
--timeframe TEXT Specific timeframe to process (optional)
--help Show this message and exit
```
## 🔧 Advanced Usage
### Batch Processing
#### CLI Multi-Symbol (Recommended)
```bash
# Native multi-symbol support
gapless-crypto-data --symbol BTCUSDT,ETHUSDT,SOLUSDT,ADAUSDT --timeframes 1m,5m,15m,1h,4h --start 2023-01-01 --end 2023-12-31
# Alternative: Multiple separate commands for different settings
gapless-crypto-data --symbol BTCUSDT,ETHUSDT --timeframes 1m,1h --start 2023-01-01 --end 2023-06-30
gapless-crypto-data --symbol SOLUSDT,ADAUSDT --timeframes 5m,4h --start 2023-07-01 --end 2023-12-31
```
#### Simple API (Recommended)
```python
import gapless_crypto_data as gcd
# Process multiple symbols with simple loops
symbols = ["BTCUSDT", "ETHUSDT", "SOLUSDT", "ADAUSDT"]
timeframes = ["1h", "4h"]
for symbol in symbols:
for timeframe in timeframes:
df = gcd.fetch_data(symbol, timeframe, start="2023-01-01", end="2023-12-31")
print(f"{symbol} {timeframe}: {len(df)} bars collected")
```
#### Advanced API (Complex Workflows)
```python
from gapless_crypto_data import BinancePublicDataCollector
# Initialize with custom settings
collector = BinancePublicDataCollector(
start_date="2023-01-01",
end_date="2023-12-31",
output_dir="./crypto_data"
)
# Process multiple symbols with detailed control
symbols = ["BTCUSDT", "ETHUSDT", "SOLUSDT"]
for symbol in symbols:
collector.symbol = symbol
results = collector.collect_multiple_timeframes(["1m", "5m", "1h", "4h"])
for timeframe, result in results.items():
print(f"{symbol} {timeframe}: {result['stats']}")
```
### Gap Analysis
#### Simple API (Recommended)
```python
import gapless_crypto_data as gcd
# Quick gap filling for entire directory
results = gcd.fill_gaps("./data")
print(f"Processed {results['files_processed']} files")
print(f"Filled {results['gaps_filled']}/{results['gaps_detected']} gaps")
print(f"Success rate: {results['success_rate']:.1f}%")
# Gap filling for specific symbols only
results = gcd.fill_gaps("./data", symbols=["BTCUSDT", "ETHUSDT"])
```
#### Advanced API (Detailed Control)
```python
from gapless_crypto_data import UniversalGapFiller
gap_filler = UniversalGapFiller()
# Manual gap detection and analysis
gaps = gap_filler.detect_all_gaps("BTCUSDT_1h.csv", "1h")
print(f"Found {len(gaps)} gaps")
for gap in gaps:
duration_hours = gap['duration'].total_seconds() / 3600
print(f"Gap: {gap['start_time']} → {gap['end_time']} ({duration_hours:.1f}h)")
# Fill specific gaps
result = gap_filler.process_file("BTCUSDT_1h.csv", "1h")
```
## 🛠️ Development
### Prerequisites
- **UV Package Manager** - [Install UV](https://docs.astral.sh/uv/getting-started/installation/)
- **Python 3.9+** - UV will manage Python versions automatically
- **Git** - For repository cloning and version control
### Development Installation Workflow
**IMPORTANT**: This project uses **mandatory pre-commit hooks** to prevent broken code from being committed. All commits are automatically validated for formatting, linting, and basic quality checks.
#### Step 1: Clone Repository
```bash
git clone https://github.com/Eon-Labs/gapless-crypto-data.git
cd gapless-crypto-data
```
#### Step 2: Development Environment Setup
```bash
# Create isolated virtual environment
uv venv
# Activate virtual environment
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows
# Install all dependencies (production + development)
uv sync --dev
```
#### Step 3: Verify Installation
```bash
# Test CLI functionality
uv run gapless-crypto-data --help
# Run test suite
uv run pytest
# Quick data collection test
uv run gapless-crypto-data --symbol BTCUSDT --timeframes 1h --start 2024-01-01 --end 2024-01-01 --output-dir ./test_data
```
#### Step 4: Set Up Pre-Commit Hooks (Mandatory)
```bash
# Install pre-commit hooks (prevents broken code from being committed)
uv run pre-commit install
# Test pre-commit hooks
uv run pre-commit run --all-files
```
#### Step 5: Development Tools
```bash
# Code formatting
uv run ruff format .
# Linting and auto-fixes
uv run ruff check --fix .
# Type checking
uv run mypy src/
# Run specific tests
uv run pytest tests/test_binance_collector.py -v
# Manual pre-commit validation
uv run pre-commit run --all-files
```
### Development Commands Reference
| Task | Command |
|------|---------|
| Install dependencies | `uv sync --dev` |
| Setup pre-commit hooks | `uv run pre-commit install` |
| Add new dependency | `uv add package-name` |
| Add dev dependency | `uv add --dev package-name` |
| Run CLI | `uv run gapless-crypto-data [args]` |
| Run tests | `uv run pytest` |
| Format code | `uv run ruff format .` |
| Lint code | `uv run ruff check --fix .` |
| Type check | `uv run mypy src/` |
| Validate pre-commit | `uv run pre-commit run --all-files` |
| Build package | `uv build` |
### Project Structure for Development
```
gapless-crypto-data/
├── src/gapless_crypto_data/ # Main package
│ ├── __init__.py # Package exports
│ ├── cli.py # CLI interface
│ ├── collectors/ # Data collection modules
│ └── gap_filling/ # Gap detection/filling
├── tests/ # Test suite
├── docs/ # Documentation
├── examples/ # Usage examples
├── pyproject.toml # Project configuration
└── uv.lock # Dependency lock file
```
### Building and Publishing
```bash
# Build package
uv build
# Publish to PyPI (requires API token)
uv publish
```
## 📁 Project Structure
```
gapless-crypto-data/
├── src/
│ └── gapless_crypto_data/
│ ├── __init__.py # Package exports
│ ├── cli.py # Command-line interface
│ ├── collectors/
│ │ ├── __init__.py
│ │ └── binance_public_data_collector.py
│ ├── gap_filling/
│ │ ├── __init__.py
│ │ ├── universal_gap_filler.py
│ │ └── safe_file_operations.py
│ └── utils/
│ └── __init__.py
├── tests/ # Test suite
├── docs/ # Documentation
├── pyproject.toml # Project configuration
├── README.md # This file
└── LICENSE # MIT License
```
## 🔍 Supported Timeframes
All 13 Binance timeframes supported for complete market coverage:
| Timeframe | Code | Description | Use Case |
|-----------|------|-------------|----------|
| 1 second | `1s` | Ultra-high frequency | HFT, microstructure analysis |
| 1 minute | `1m` | High resolution | Scalping, order flow |
| 3 minutes | `3m` | Short-term analysis | Quick trend detection |
| 5 minutes | `5m` | Common trading timeframe | Day trading signals |
| 15 minutes| `15m`| Medium-term signals | Swing trading entry |
| 30 minutes| `30m`| Longer-term patterns | Position management |
| 1 hour | `1h` | Popular for backtesting | Strategy development |
| 2 hours | `2h` | Extended analysis | Multi-timeframe confluence |
| 4 hours | `4h` | Daily cycle patterns | Trend following |
| 6 hours | `6h` | Quarter-day analysis | Position sizing |
| 8 hours | `8h` | Third-day cycles | Risk management |
| 12 hours | `12h`| Half-day patterns | Overnight positions |
| 1 day | `1d` | Daily analysis | Long-term trends |
## ⚠️ Requirements
- Python 3.9+
- pandas >= 2.0.0
- requests >= 2.25.0
- Stable internet connection for data downloads
## 🤝 Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Install development dependencies (`uv sync --dev`)
4. Make your changes
5. Run tests (`uv run pytest`)
6. Format code (`uv run ruff format .`)
7. Commit changes (`git commit -m 'Add amazing feature'`)
8. Push to branch (`git push origin feature/amazing-feature`)
9. Open a Pull Request
## 📚 API Reference
### BinancePublicDataCollector
Cryptocurrency spot data collection from Binance's public data repository using pre-generated monthly ZIP files.
#### Key Methods
**`__init__(symbol, start_date, end_date, output_dir)`**
Initialize the collector with trading pair and date range.
```python
collector = BinancePublicDataCollector(
symbol="BTCUSDT", # USDT spot pair
start_date="2023-01-01", # Start date (YYYY-MM-DD)
end_date="2023-12-31", # End date (YYYY-MM-DD)
output_dir="./crypto_data" # Output directory (optional)
)
```
**`collect_timeframe_data(trading_timeframe) -> Dict[str, Any]`**
Collect complete historical data for a single timeframe with full 11-column microstructure format.
```python
result = collector.collect_timeframe_data("1h")
df = result["dataframe"] # pandas DataFrame with OHLCV + microstructure
filepath = result["filepath"] # Path to saved CSV file
stats = result["stats"] # Collection statistics
# Access microstructure data
total_trades = df["number_of_trades"].sum()
taker_buy_ratio = df["taker_buy_base_asset_volume"].sum() / df["volume"].sum()
```
**`collect_multiple_timeframes(timeframes) -> Dict[str, Dict[str, Any]]`**
Collect data for multiple timeframes with comprehensive progress tracking.
```python
results = collector.collect_multiple_timeframes(["1h", "4h"])
for timeframe, result in results.items():
df = result["dataframe"]
print(f"{timeframe}: {len(df):,} bars")
```
### UniversalGapFiller
Gap detection and filling for various timeframes with 11-column microstructure format using Binance API data.
#### Key Methods
**`detect_all_gaps(csv_file) -> List[Dict]`**
Automatically detect timestamp gaps in CSV files.
```python
gap_filler = UniversalGapFiller()
gaps = gap_filler.detect_all_gaps("BTCUSDT_1h_data.csv")
print(f"Found {len(gaps)} gaps to fill")
```
**`fill_gap(csv_file, gap_info) -> bool`**
Fill a specific gap with authentic Binance API data.
```python
# Fill first detected gap
success = gap_filler.fill_gap("BTCUSDT_1h_data.csv", gaps[0])
print(f"Gap filled successfully: {success}")
```
**`process_file(directory) -> Dict[str, Dict]`**
Batch process all CSV files in a directory for gap detection and filling.
```python
results = gap_filler.process_file("./crypto_data/")
for filename, result in results.items():
print(f"{filename}: {result['gaps_filled']} gaps filled")
```
### AtomicCSVOperations
Safe atomic operations for CSV files with header preservation and corruption prevention. Uses temporary files and atomic rename operations to ensure data integrity.
#### Key Methods
**`create_backup() -> Path`**
Create timestamped backup of original file before modifications.
```python
from pathlib import Path
atomic_ops = AtomicCSVOperations(Path("data.csv"))
backup_path = atomic_ops.create_backup()
```
**`write_dataframe_atomic(df) -> bool`**
Atomically write DataFrame to CSV with integrity validation.
```python
success = atomic_ops.write_dataframe_atomic(df)
if not success:
atomic_ops.rollback_from_backup()
```
### SafeCSVMerger
Safe CSV data merging with gap filling capabilities and data integrity validation. Handles temporal data insertion while maintaining chronological order.
#### Key Methods
**`merge_gap_data_safe(gap_data, gap_start, gap_end) -> bool`**
Safely merge gap data into existing CSV using atomic operations.
```python
from datetime import datetime
merger = SafeCSVMerger(Path("eth_data.csv"))
success = merger.merge_gap_data_safe(
gap_data, # DataFrame with gap data
datetime(2024, 1, 1, 12), # Gap start time
datetime(2024, 1, 1, 15) # Gap end time
)
```
## Output Formats
### DataFrame Structure (Python API)
Returns pandas DataFrame with 11-column microstructure format:
| Column | Type | Description | Example |
|--------|------|-------------|---------|
| `date` | datetime64[ns] | Open timestamp | `2024-01-01 12:00:00` |
| `open` | float64 | Opening price | `42150.50` |
| `high` | float64 | Highest price | `42200.00` |
| `low` | float64 | Lowest price | `42100.25` |
| `close` | float64 | Closing price | `42175.75` |
| `volume` | float64 | Base asset volume | `15.250000` |
| `close_time` | datetime64[ns] | Close timestamp | `2024-01-01 12:59:59` |
| `quote_asset_volume` | float64 | Quote asset volume | `643238.125` |
| `number_of_trades` | int64 | Trade count | `1547` |
| `taker_buy_base_asset_volume` | float64 | Taker buy base volume | `7.825000` |
| `taker_buy_quote_asset_volume` | float64 | Taker buy quote volume | `329891.750` |
### CSV File Structure
CSV files include header comments with metadata followed by data:
```csv
# Binance Spot Market Data v2.5.0
# Generated: 2025-09-18T23:09:25.391126+00:00Z
# Source: Binance Public Data Repository
# Market: SPOT | Symbol: BTCUSDT | Timeframe: 1h
# Coverage: 48 bars
# Period: 2024-01-01 00:00:00 to 2024-01-02 23:00:00
# Collection: direct_download in 0.0s
# Data Hash: 5fba9d2e5d3db849...
# Compliance: Zero-Magic-Numbers, Temporal-Integrity, Official-Binance-Source
#
date,open,high,low,close,volume,close_time,quote_asset_volume,number_of_trades,taker_buy_base_asset_volume,taker_buy_quote_asset_volume
2024-01-01 00:00:00,42283.58,42554.57,42261.02,42475.23,1271.68108,2024-01-01 00:59:59,53957248.973789,47134,682.57581,28957416.819645
```
### Metadata JSON Structure
Each CSV file includes comprehensive metadata in `.metadata.json`:
```json
{
"version": "v2.5.0",
"generator": "BinancePublicDataCollector",
"data_source": "Binance Public Data Repository",
"symbol": "BTCUSDT",
"timeframe": "1h",
"enhanced_microstructure_format": {
"total_columns": 11,
"analysis_capabilities": [
"order_flow_analysis",
"liquidity_metrics",
"market_microstructure",
"trade_weighted_prices",
"institutional_data_patterns"
]
},
"gap_analysis": {
"total_gaps_detected": 0,
"data_completeness_score": 1.0,
"gap_filling_method": "authentic_binance_api"
},
"data_integrity": {
"chronological_order": true,
"corruption_detected": false
}
}
```
### Streaming Output (Memory-Efficient)
For large datasets, Polars streaming provides constant memory usage:
```python
from gapless_crypto_data.streaming import StreamingDataProcessor
processor = StreamingDataProcessor(chunk_size=10_000, memory_limit_mb=100)
for chunk in processor.stream_csv_chunks("large_dataset.csv"):
# Process chunk with constant memory usage
print(f"Chunk shape: {chunk.shape}")
```
### File Naming Convention
Output files follow consistent naming pattern:
```
binance_spot_{SYMBOL}-{TIMEFRAME}_{START_DATE}-{END_DATE}_v{VERSION}.csv
binance_spot_{SYMBOL}-{TIMEFRAME}_{START_DATE}-{END_DATE}_v{VERSION}.metadata.json
```
Examples:
- `binance_spot_BTCUSDT-1h_20240101-20240102_v2.5.0.csv`
- `binance_spot_ETHUSDT-4h_20240101-20240201_v2.5.0.csv`
- `binance_spot_SOLUSDT-1d_20240101-20241231_v2.5.0.csv`
### Error Handling
All classes implement robust error handling with meaningful exceptions:
```python
try:
collector = BinancePublicDataCollector(symbol="INVALIDPAIR")
result = collector.collect_timeframe_data("1h")
except ValueError as e:
print(f"Invalid symbol format: {e}")
except ConnectionError as e:
print(f"Network error: {e}")
except FileNotFoundError as e:
print(f"Output directory error: {e}")
```
### Type Hints
All public APIs include comprehensive type hints for better IDE support:
```python
from typing import Dict, List, Optional, Any
from pathlib import Path
import pandas as pd
def collect_timeframe_data(self, trading_timeframe: str) -> Dict[str, Any]:
# Returns dict with 'dataframe', 'filepath', and 'stats' keys
pass
def collect_multiple_timeframes(
self,
timeframes: Optional[List[str]] = None
) -> Dict[str, Dict[str, Any]]:
# Returns nested dict by timeframe
pass
```
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🏢 About Eon Labs
Gapless Crypto Data is developed by [Eon Labs](https://github.com/Eon-Labs), specializing in quantitative trading infrastructure and machine learning for financial markets.
---
**UV-based** - Python dependency management
**📊 11-Column Format** - Microstructure data with order flow metrics
**🔒 Gap Detection** - Data completeness validation and filling
Raw data
{
"_id": null,
"home_page": null,
"name": "gapless-crypto-data",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "Terry Li <terry@eonlabs.com>",
"keywords": "13-timeframes, 1s-1d, 22x-faster, OHLCV, api, authentic-data, backward-compatibility, binance, ccxt, collection, crypto, cryptocurrency, data, download, dual-parameter, fetch-data, financial-data, function-based, gap-filling, gapless, interval, liquidity, microstructure, monthly-daily-fallback, order-flow, pandas, performance, taker-volume, time-series, timeframe, trading, ultra-high-frequency, zero-gaps",
"author": null,
"author_email": "Eon Labs <terry@eonlabs.com>",
"download_url": "https://files.pythonhosted.org/packages/cc/11/4071d5c1310894372287f2a34eb974a717f40fb7f68ee362531a90bac401/gapless_crypto_data-2.10.0.tar.gz",
"platform": null,
"description": "# Gapless Crypto Data\n\n[](https://badge.fury.io/py/gapless-crypto-data)\n[](https://pypi.org/project/gapless-crypto-data/)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/astral-sh/uv)\n\nUltra-fast cryptocurrency data collection with zero gaps guarantee. Provides 11-column microstructure format through Binance public data repository with intelligent monthly-to-daily fallback for seamless coverage.\n\n## Features\n\n- **22x faster** data collection via Binance public data repository\n- **Zero gaps guarantee** through intelligent monthly-to-daily fallback\n- **Complete 13-timeframe support**: 1s, 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d\n- **Ultra-high frequency** to daily data collection (1-second to 1-day intervals)\n- **11-column microstructure format** with order flow and liquidity metrics\n- **Intelligent fallback system** automatically switches to daily files when monthly files unavailable\n- **Gap detection and filling** with authentic Binance API data only\n- **UV-based Python tooling** for modern dependency management\n- **Atomic file operations** ensuring data integrity\n- **Multi-symbol & multi-timeframe** concurrent collection\n- **CCXT-compatible** dual parameter support (timeframe/interval)\n- **Production-grade** with comprehensive test coverage\n\n## Quick Start\n\n### Installation (UV)\n\n```bash\n# Install via UV\nuv add gapless-crypto-data\n\n# Or install globally\nuv tool install gapless-crypto-data\n```\n\n### Installation (pip)\n\n```bash\npip install gapless-crypto-data\n```\n\n### CLI Usage\n\n```bash\n# Collect data for multiple timeframes (all 13 timeframes supported)\ngapless-crypto-data --symbol SOLUSDT --timeframes 1s,1m,5m,1h,4h,1d\n\n# Ultra-high frequency data collection (1-second intervals)\ngapless-crypto-data --symbol BTCUSDT --timeframes 1s,1m,3m\n\n# Extended timeframes with intelligent fallback\ngapless-crypto-data --symbol ETHUSDT --timeframes 6h,8h,12h,1d\n\n# Collect multiple symbols at once (native multi-symbol support)\ngapless-crypto-data --symbol BTCUSDT,ETHUSDT,SOLUSDT --timeframes 1h,4h,1d\n\n# Collect specific date range with custom output directory\ngapless-crypto-data --symbol BTCUSDT --timeframes 1h --start 2023-01-01 --end 2023-12-31 --output-dir ./crypto_data\n\n# Multi-symbol with custom settings\ngapless-crypto-data --symbol BTCUSDT,ETHUSDT --timeframes 5m,1h --start 2024-01-01 --end 2024-06-30 --output-dir ./crypto_data\n\n# Fill gaps in existing data\ngapless-crypto-data --fill-gaps --directory ./data\n\n# Help\ngapless-crypto-data --help\n```\n\n### Python API\n\n#### Function-based API\n\n```python\nimport gapless_crypto_data as gcd\n\n# Fetch recent data with date range (CCXT-compatible timeframe parameter)\ndf = gcd.download(\"BTCUSDT\", timeframe=\"1h\", start=\"2024-01-01\", end=\"2024-06-30\")\n\n# Or with limit\ndf = gcd.fetch_data(\"ETHUSDT\", timeframe=\"4h\", limit=1000)\n\n# Backward compatibility (legacy interval parameter)\ndf = gcd.fetch_data(\"ETHUSDT\", interval=\"4h\", limit=1000) # DeprecationWarning\n\n# Get available symbols and timeframes\nsymbols = gcd.get_supported_symbols()\ntimeframes = gcd.get_supported_timeframes()\n\n# Fill gaps in existing data\nresults = gcd.fill_gaps(\"./data\")\n```\n\n#### Class-based API\n\n```python\nfrom gapless_crypto_data import BinancePublicDataCollector, UniversalGapFiller\n\n# Custom collection with full control\ncollector = BinancePublicDataCollector(\n symbol=\"SOLUSDT\",\n start_date=\"2023-01-01\",\n end_date=\"2023-12-31\"\n)\n\nresult = collector.collect_timeframe_data(\"1h\")\ndf = result[\"dataframe\"]\n\n# Manual gap filling\ngap_filler = UniversalGapFiller()\ngaps = gap_filler.detect_all_gaps(csv_file, \"1h\")\n```\n\n## Data Structure\n\nAll functions return pandas DataFrames with complete microstructure data:\n\n```python\nimport gapless_crypto_data as gcd\n\n# Fetch data\ndf = gcd.download(\"BTCUSDT\", timeframe=\"1h\", start=\"2024-01-01\", end=\"2024-06-30\")\n\n# DataFrame columns (11-column microstructure format)\nprint(df.columns.tolist())\n# ['date', 'open', 'high', 'low', 'close', 'volume',\n# 'close_time', 'quote_asset_volume', 'number_of_trades',\n# 'taker_buy_base_asset_volume', 'taker_buy_quote_asset_volume']\n\n# Professional microstructure analysis\nbuy_pressure = df['taker_buy_base_asset_volume'].sum() / df['volume'].sum()\navg_trade_size = df['volume'].sum() / df['number_of_trades'].sum()\nmarket_impact = df['quote_asset_volume'].std() / df['quote_asset_volume'].mean()\n\nprint(f\"Taker buy pressure: {buy_pressure:.1%}\")\nprint(f\"Average trade size: {avg_trade_size:.4f} BTC\")\nprint(f\"Market impact volatility: {market_impact:.3f}\")\n```\n\n## Data Sources\n\nThe package supports two data collection methods:\n- **Binance Public Repository**: Pre-generated monthly ZIP files for historical data\n- **Binance API**: Real-time data for gap filling and recent data collection\n\n## \ud83c\udfd7\ufe0f Architecture\n\n### Core Components\n\n- **BinancePublicDataCollector**: Data collection with full 11-column microstructure format\n- **UniversalGapFiller**: Intelligent gap detection and filling with authentic API-first validation\n- **AtomicCSVOperations**: Corruption-proof file operations with atomic writes\n- **SafeCSVMerger**: Safe merging of data files with integrity validation\n\n### Data Flow\n\n```\nBinance Public Data Repository \u2192 BinancePublicDataCollector \u2192 11-Column Microstructure Format\n \u2193\nGap Detection \u2192 UniversalGapFiller \u2192 Authentic API-First Validation\n \u2193\nAtomicCSVOperations \u2192 Final Gapless Dataset with Order Flow Metrics\n```\n\n## \ud83d\udcdd CLI Options\n\n### Data Collection\n\n```bash\ngapless-crypto-data [OPTIONS]\n\nOptions:\n --symbol TEXT Trading pair symbol(s) - single symbol or comma-separated list (e.g., SOLUSDT, BTCUSDT,ETHUSDT)\n --timeframes TEXT Comma-separated timeframes (1m,3m,5m,15m,30m,1h,2h,4h)\n --start TEXT Start date (YYYY-MM-DD)\n --end TEXT End date (YYYY-MM-DD)\n --output-dir TEXT Output directory for CSV files (default: src/gapless_crypto_data/sample_data/)\n --help Show this message and exit\n```\n\n### Gap Filling\n\n```bash\ngapless-crypto-data --fill-gaps [OPTIONS]\n\nOptions:\n --directory TEXT Data directory to scan for gaps\n --symbol TEXT Specific symbol to process (optional)\n --timeframe TEXT Specific timeframe to process (optional)\n --help Show this message and exit\n```\n\n## \ud83d\udd27 Advanced Usage\n\n### Batch Processing\n\n#### CLI Multi-Symbol (Recommended)\n\n```bash\n# Native multi-symbol support\ngapless-crypto-data --symbol BTCUSDT,ETHUSDT,SOLUSDT,ADAUSDT --timeframes 1m,5m,15m,1h,4h --start 2023-01-01 --end 2023-12-31\n\n# Alternative: Multiple separate commands for different settings\ngapless-crypto-data --symbol BTCUSDT,ETHUSDT --timeframes 1m,1h --start 2023-01-01 --end 2023-06-30\ngapless-crypto-data --symbol SOLUSDT,ADAUSDT --timeframes 5m,4h --start 2023-07-01 --end 2023-12-31\n```\n\n#### Simple API (Recommended)\n\n```python\nimport gapless_crypto_data as gcd\n\n# Process multiple symbols with simple loops\nsymbols = [\"BTCUSDT\", \"ETHUSDT\", \"SOLUSDT\", \"ADAUSDT\"]\ntimeframes = [\"1h\", \"4h\"]\n\nfor symbol in symbols:\n for timeframe in timeframes:\n df = gcd.fetch_data(symbol, timeframe, start=\"2023-01-01\", end=\"2023-12-31\")\n print(f\"{symbol} {timeframe}: {len(df)} bars collected\")\n```\n\n#### Advanced API (Complex Workflows)\n\n```python\nfrom gapless_crypto_data import BinancePublicDataCollector\n\n# Initialize with custom settings\ncollector = BinancePublicDataCollector(\n start_date=\"2023-01-01\",\n end_date=\"2023-12-31\",\n output_dir=\"./crypto_data\"\n)\n\n# Process multiple symbols with detailed control\nsymbols = [\"BTCUSDT\", \"ETHUSDT\", \"SOLUSDT\"]\nfor symbol in symbols:\n collector.symbol = symbol\n results = collector.collect_multiple_timeframes([\"1m\", \"5m\", \"1h\", \"4h\"])\n for timeframe, result in results.items():\n print(f\"{symbol} {timeframe}: {result['stats']}\")\n```\n\n### Gap Analysis\n\n#### Simple API (Recommended)\n\n```python\nimport gapless_crypto_data as gcd\n\n# Quick gap filling for entire directory\nresults = gcd.fill_gaps(\"./data\")\nprint(f\"Processed {results['files_processed']} files\")\nprint(f\"Filled {results['gaps_filled']}/{results['gaps_detected']} gaps\")\nprint(f\"Success rate: {results['success_rate']:.1f}%\")\n\n# Gap filling for specific symbols only\nresults = gcd.fill_gaps(\"./data\", symbols=[\"BTCUSDT\", \"ETHUSDT\"])\n```\n\n#### Advanced API (Detailed Control)\n\n```python\nfrom gapless_crypto_data import UniversalGapFiller\n\ngap_filler = UniversalGapFiller()\n\n# Manual gap detection and analysis\ngaps = gap_filler.detect_all_gaps(\"BTCUSDT_1h.csv\", \"1h\")\nprint(f\"Found {len(gaps)} gaps\")\n\nfor gap in gaps:\n duration_hours = gap['duration'].total_seconds() / 3600\n print(f\"Gap: {gap['start_time']} \u2192 {gap['end_time']} ({duration_hours:.1f}h)\")\n\n# Fill specific gaps\nresult = gap_filler.process_file(\"BTCUSDT_1h.csv\", \"1h\")\n```\n\n## \ud83d\udee0\ufe0f Development\n\n### Prerequisites\n\n- **UV Package Manager** - [Install UV](https://docs.astral.sh/uv/getting-started/installation/)\n- **Python 3.9+** - UV will manage Python versions automatically\n- **Git** - For repository cloning and version control\n\n### Development Installation Workflow\n\n**IMPORTANT**: This project uses **mandatory pre-commit hooks** to prevent broken code from being committed. All commits are automatically validated for formatting, linting, and basic quality checks.\n\n#### Step 1: Clone Repository\n```bash\ngit clone https://github.com/Eon-Labs/gapless-crypto-data.git\ncd gapless-crypto-data\n```\n\n#### Step 2: Development Environment Setup\n```bash\n# Create isolated virtual environment\nuv venv\n\n# Activate virtual environment\nsource .venv/bin/activate # macOS/Linux\n# .venv\\Scripts\\activate # Windows\n\n# Install all dependencies (production + development)\nuv sync --dev\n```\n\n#### Step 3: Verify Installation\n```bash\n# Test CLI functionality\nuv run gapless-crypto-data --help\n\n# Run test suite\nuv run pytest\n\n# Quick data collection test\nuv run gapless-crypto-data --symbol BTCUSDT --timeframes 1h --start 2024-01-01 --end 2024-01-01 --output-dir ./test_data\n```\n\n#### Step 4: Set Up Pre-Commit Hooks (Mandatory)\n```bash\n# Install pre-commit hooks (prevents broken code from being committed)\nuv run pre-commit install\n\n# Test pre-commit hooks\nuv run pre-commit run --all-files\n```\n\n#### Step 5: Development Tools\n```bash\n# Code formatting\nuv run ruff format .\n\n# Linting and auto-fixes\nuv run ruff check --fix .\n\n# Type checking\nuv run mypy src/\n\n# Run specific tests\nuv run pytest tests/test_binance_collector.py -v\n\n# Manual pre-commit validation\nuv run pre-commit run --all-files\n```\n\n### Development Commands Reference\n\n| Task | Command |\n|------|---------|\n| Install dependencies | `uv sync --dev` |\n| Setup pre-commit hooks | `uv run pre-commit install` |\n| Add new dependency | `uv add package-name` |\n| Add dev dependency | `uv add --dev package-name` |\n| Run CLI | `uv run gapless-crypto-data [args]` |\n| Run tests | `uv run pytest` |\n| Format code | `uv run ruff format .` |\n| Lint code | `uv run ruff check --fix .` |\n| Type check | `uv run mypy src/` |\n| Validate pre-commit | `uv run pre-commit run --all-files` |\n| Build package | `uv build` |\n\n### Project Structure for Development\n```\ngapless-crypto-data/\n\u251c\u2500\u2500 src/gapless_crypto_data/ # Main package\n\u2502 \u251c\u2500\u2500 __init__.py # Package exports\n\u2502 \u251c\u2500\u2500 cli.py # CLI interface\n\u2502 \u251c\u2500\u2500 collectors/ # Data collection modules\n\u2502 \u2514\u2500\u2500 gap_filling/ # Gap detection/filling\n\u251c\u2500\u2500 tests/ # Test suite\n\u251c\u2500\u2500 docs/ # Documentation\n\u251c\u2500\u2500 examples/ # Usage examples\n\u251c\u2500\u2500 pyproject.toml # Project configuration\n\u2514\u2500\u2500 uv.lock # Dependency lock file\n```\n\n### Building and Publishing\n\n```bash\n# Build package\nuv build\n\n# Publish to PyPI (requires API token)\nuv publish\n```\n\n## \ud83d\udcc1 Project Structure\n\n```\ngapless-crypto-data/\n\u251c\u2500\u2500 src/\n\u2502 \u2514\u2500\u2500 gapless_crypto_data/\n\u2502 \u251c\u2500\u2500 __init__.py # Package exports\n\u2502 \u251c\u2500\u2500 cli.py # Command-line interface\n\u2502 \u251c\u2500\u2500 collectors/\n\u2502 \u2502 \u251c\u2500\u2500 __init__.py\n\u2502 \u2502 \u2514\u2500\u2500 binance_public_data_collector.py\n\u2502 \u251c\u2500\u2500 gap_filling/\n\u2502 \u2502 \u251c\u2500\u2500 __init__.py\n\u2502 \u2502 \u251c\u2500\u2500 universal_gap_filler.py\n\u2502 \u2502 \u2514\u2500\u2500 safe_file_operations.py\n\u2502 \u2514\u2500\u2500 utils/\n\u2502 \u2514\u2500\u2500 __init__.py\n\u251c\u2500\u2500 tests/ # Test suite\n\u251c\u2500\u2500 docs/ # Documentation\n\u251c\u2500\u2500 pyproject.toml # Project configuration\n\u251c\u2500\u2500 README.md # This file\n\u2514\u2500\u2500 LICENSE # MIT License\n```\n\n## \ud83d\udd0d Supported Timeframes\n\nAll 13 Binance timeframes supported for complete market coverage:\n\n| Timeframe | Code | Description | Use Case |\n|-----------|------|-------------|----------|\n| 1 second | `1s` | Ultra-high frequency | HFT, microstructure analysis |\n| 1 minute | `1m` | High resolution | Scalping, order flow |\n| 3 minutes | `3m` | Short-term analysis | Quick trend detection |\n| 5 minutes | `5m` | Common trading timeframe | Day trading signals |\n| 15 minutes| `15m`| Medium-term signals | Swing trading entry |\n| 30 minutes| `30m`| Longer-term patterns | Position management |\n| 1 hour | `1h` | Popular for backtesting | Strategy development |\n| 2 hours | `2h` | Extended analysis | Multi-timeframe confluence |\n| 4 hours | `4h` | Daily cycle patterns | Trend following |\n| 6 hours | `6h` | Quarter-day analysis | Position sizing |\n| 8 hours | `8h` | Third-day cycles | Risk management |\n| 12 hours | `12h`| Half-day patterns | Overnight positions |\n| 1 day | `1d` | Daily analysis | Long-term trends |\n\n## \u26a0\ufe0f Requirements\n\n- Python 3.9+\n- pandas >= 2.0.0\n- requests >= 2.25.0\n- Stable internet connection for data downloads\n\n## \ud83e\udd1d Contributing\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Install development dependencies (`uv sync --dev`)\n4. Make your changes\n5. Run tests (`uv run pytest`)\n6. Format code (`uv run ruff format .`)\n7. Commit changes (`git commit -m 'Add amazing feature'`)\n8. Push to branch (`git push origin feature/amazing-feature`)\n9. Open a Pull Request\n\n## \ud83d\udcda API Reference\n\n### BinancePublicDataCollector\n\nCryptocurrency spot data collection from Binance's public data repository using pre-generated monthly ZIP files.\n\n#### Key Methods\n\n**`__init__(symbol, start_date, end_date, output_dir)`**\n\nInitialize the collector with trading pair and date range.\n\n```python\ncollector = BinancePublicDataCollector(\n symbol=\"BTCUSDT\", # USDT spot pair\n start_date=\"2023-01-01\", # Start date (YYYY-MM-DD)\n end_date=\"2023-12-31\", # End date (YYYY-MM-DD)\n output_dir=\"./crypto_data\" # Output directory (optional)\n)\n```\n\n**`collect_timeframe_data(trading_timeframe) -> Dict[str, Any]`**\n\nCollect complete historical data for a single timeframe with full 11-column microstructure format.\n\n```python\nresult = collector.collect_timeframe_data(\"1h\")\ndf = result[\"dataframe\"] # pandas DataFrame with OHLCV + microstructure\nfilepath = result[\"filepath\"] # Path to saved CSV file\nstats = result[\"stats\"] # Collection statistics\n\n# Access microstructure data\ntotal_trades = df[\"number_of_trades\"].sum()\ntaker_buy_ratio = df[\"taker_buy_base_asset_volume\"].sum() / df[\"volume\"].sum()\n```\n\n**`collect_multiple_timeframes(timeframes) -> Dict[str, Dict[str, Any]]`**\n\nCollect data for multiple timeframes with comprehensive progress tracking.\n\n```python\nresults = collector.collect_multiple_timeframes([\"1h\", \"4h\"])\nfor timeframe, result in results.items():\n df = result[\"dataframe\"]\n print(f\"{timeframe}: {len(df):,} bars\")\n```\n\n### UniversalGapFiller\n\nGap detection and filling for various timeframes with 11-column microstructure format using Binance API data.\n\n#### Key Methods\n\n**`detect_all_gaps(csv_file) -> List[Dict]`**\n\nAutomatically detect timestamp gaps in CSV files.\n\n```python\ngap_filler = UniversalGapFiller()\ngaps = gap_filler.detect_all_gaps(\"BTCUSDT_1h_data.csv\")\nprint(f\"Found {len(gaps)} gaps to fill\")\n```\n\n**`fill_gap(csv_file, gap_info) -> bool`**\n\nFill a specific gap with authentic Binance API data.\n\n```python\n# Fill first detected gap\nsuccess = gap_filler.fill_gap(\"BTCUSDT_1h_data.csv\", gaps[0])\nprint(f\"Gap filled successfully: {success}\")\n```\n\n**`process_file(directory) -> Dict[str, Dict]`**\n\nBatch process all CSV files in a directory for gap detection and filling.\n\n```python\nresults = gap_filler.process_file(\"./crypto_data/\")\nfor filename, result in results.items():\n print(f\"{filename}: {result['gaps_filled']} gaps filled\")\n```\n\n### AtomicCSVOperations\n\nSafe atomic operations for CSV files with header preservation and corruption prevention. Uses temporary files and atomic rename operations to ensure data integrity.\n\n#### Key Methods\n\n**`create_backup() -> Path`**\n\nCreate timestamped backup of original file before modifications.\n\n```python\nfrom pathlib import Path\natomic_ops = AtomicCSVOperations(Path(\"data.csv\"))\nbackup_path = atomic_ops.create_backup()\n```\n\n**`write_dataframe_atomic(df) -> bool`**\n\nAtomically write DataFrame to CSV with integrity validation.\n\n```python\nsuccess = atomic_ops.write_dataframe_atomic(df)\nif not success:\n atomic_ops.rollback_from_backup()\n```\n\n### SafeCSVMerger\n\nSafe CSV data merging with gap filling capabilities and data integrity validation. Handles temporal data insertion while maintaining chronological order.\n\n#### Key Methods\n\n**`merge_gap_data_safe(gap_data, gap_start, gap_end) -> bool`**\n\nSafely merge gap data into existing CSV using atomic operations.\n\n```python\nfrom datetime import datetime\nmerger = SafeCSVMerger(Path(\"eth_data.csv\"))\nsuccess = merger.merge_gap_data_safe(\n gap_data, # DataFrame with gap data\n datetime(2024, 1, 1, 12), # Gap start time\n datetime(2024, 1, 1, 15) # Gap end time\n)\n```\n\n## Output Formats\n\n### DataFrame Structure (Python API)\n\nReturns pandas DataFrame with 11-column microstructure format:\n\n| Column | Type | Description | Example |\n|--------|------|-------------|---------|\n| `date` | datetime64[ns] | Open timestamp | `2024-01-01 12:00:00` |\n| `open` | float64 | Opening price | `42150.50` |\n| `high` | float64 | Highest price | `42200.00` |\n| `low` | float64 | Lowest price | `42100.25` |\n| `close` | float64 | Closing price | `42175.75` |\n| `volume` | float64 | Base asset volume | `15.250000` |\n| `close_time` | datetime64[ns] | Close timestamp | `2024-01-01 12:59:59` |\n| `quote_asset_volume` | float64 | Quote asset volume | `643238.125` |\n| `number_of_trades` | int64 | Trade count | `1547` |\n| `taker_buy_base_asset_volume` | float64 | Taker buy base volume | `7.825000` |\n| `taker_buy_quote_asset_volume` | float64 | Taker buy quote volume | `329891.750` |\n\n### CSV File Structure\n\nCSV files include header comments with metadata followed by data:\n\n```csv\n# Binance Spot Market Data v2.5.0\n# Generated: 2025-09-18T23:09:25.391126+00:00Z\n# Source: Binance Public Data Repository\n# Market: SPOT | Symbol: BTCUSDT | Timeframe: 1h\n# Coverage: 48 bars\n# Period: 2024-01-01 00:00:00 to 2024-01-02 23:00:00\n# Collection: direct_download in 0.0s\n# Data Hash: 5fba9d2e5d3db849...\n# Compliance: Zero-Magic-Numbers, Temporal-Integrity, Official-Binance-Source\n#\ndate,open,high,low,close,volume,close_time,quote_asset_volume,number_of_trades,taker_buy_base_asset_volume,taker_buy_quote_asset_volume\n2024-01-01 00:00:00,42283.58,42554.57,42261.02,42475.23,1271.68108,2024-01-01 00:59:59,53957248.973789,47134,682.57581,28957416.819645\n```\n\n### Metadata JSON Structure\n\nEach CSV file includes comprehensive metadata in `.metadata.json`:\n\n```json\n{\n \"version\": \"v2.5.0\",\n \"generator\": \"BinancePublicDataCollector\",\n \"data_source\": \"Binance Public Data Repository\",\n \"symbol\": \"BTCUSDT\",\n \"timeframe\": \"1h\",\n \"enhanced_microstructure_format\": {\n \"total_columns\": 11,\n \"analysis_capabilities\": [\n \"order_flow_analysis\",\n \"liquidity_metrics\",\n \"market_microstructure\",\n \"trade_weighted_prices\",\n \"institutional_data_patterns\"\n ]\n },\n \"gap_analysis\": {\n \"total_gaps_detected\": 0,\n \"data_completeness_score\": 1.0,\n \"gap_filling_method\": \"authentic_binance_api\"\n },\n \"data_integrity\": {\n \"chronological_order\": true,\n \"corruption_detected\": false\n }\n}\n```\n\n### Streaming Output (Memory-Efficient)\n\nFor large datasets, Polars streaming provides constant memory usage:\n\n```python\nfrom gapless_crypto_data.streaming import StreamingDataProcessor\n\nprocessor = StreamingDataProcessor(chunk_size=10_000, memory_limit_mb=100)\nfor chunk in processor.stream_csv_chunks(\"large_dataset.csv\"):\n # Process chunk with constant memory usage\n print(f\"Chunk shape: {chunk.shape}\")\n```\n\n### File Naming Convention\n\nOutput files follow consistent naming pattern:\n\n```\nbinance_spot_{SYMBOL}-{TIMEFRAME}_{START_DATE}-{END_DATE}_v{VERSION}.csv\nbinance_spot_{SYMBOL}-{TIMEFRAME}_{START_DATE}-{END_DATE}_v{VERSION}.metadata.json\n```\n\nExamples:\n- `binance_spot_BTCUSDT-1h_20240101-20240102_v2.5.0.csv`\n- `binance_spot_ETHUSDT-4h_20240101-20240201_v2.5.0.csv`\n- `binance_spot_SOLUSDT-1d_20240101-20241231_v2.5.0.csv`\n\n### Error Handling\n\nAll classes implement robust error handling with meaningful exceptions:\n\n```python\ntry:\n collector = BinancePublicDataCollector(symbol=\"INVALIDPAIR\")\n result = collector.collect_timeframe_data(\"1h\")\nexcept ValueError as e:\n print(f\"Invalid symbol format: {e}\")\nexcept ConnectionError as e:\n print(f\"Network error: {e}\")\nexcept FileNotFoundError as e:\n print(f\"Output directory error: {e}\")\n```\n\n### Type Hints\n\nAll public APIs include comprehensive type hints for better IDE support:\n\n```python\nfrom typing import Dict, List, Optional, Any\nfrom pathlib import Path\nimport pandas as pd\n\ndef collect_timeframe_data(self, trading_timeframe: str) -> Dict[str, Any]:\n # Returns dict with 'dataframe', 'filepath', and 'stats' keys\n pass\n\ndef collect_multiple_timeframes(\n self,\n timeframes: Optional[List[str]] = None\n) -> Dict[str, Dict[str, Any]]:\n # Returns nested dict by timeframe\n pass\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83c\udfe2 About Eon Labs\n\nGapless Crypto Data is developed by [Eon Labs](https://github.com/Eon-Labs), specializing in quantitative trading infrastructure and machine learning for financial markets.\n\n---\n\n**UV-based** - Python dependency management\n**\ud83d\udcca 11-Column Format** - Microstructure data with order flow metrics\n**\ud83d\udd12 Gap Detection** - Data completeness validation and filling\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Ultra-fast cryptocurrency data collection with zero gaps guarantee. 22x faster via Binance public repository with complete 13-timeframe support (1s-1d) and intelligent monthly-to-daily fallback. Provides 11-column microstructure format with order flow metrics.",
"version": "2.10.0",
"project_urls": {
"Changelog": "https://github.com/Eon-Labs/gapless-crypto-data/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/Eon-Labs/gapless-crypto-data#readme",
"Homepage": "https://github.com/Eon-Labs/gapless-crypto-data",
"Issues": "https://github.com/Eon-Labs/gapless-crypto-data/issues",
"Repository": "https://github.com/Eon-Labs/gapless-crypto-data.git"
},
"split_keywords": [
"13-timeframes",
" 1s-1d",
" 22x-faster",
" ohlcv",
" api",
" authentic-data",
" backward-compatibility",
" binance",
" ccxt",
" collection",
" crypto",
" cryptocurrency",
" data",
" download",
" dual-parameter",
" fetch-data",
" financial-data",
" function-based",
" gap-filling",
" gapless",
" interval",
" liquidity",
" microstructure",
" monthly-daily-fallback",
" order-flow",
" pandas",
" performance",
" taker-volume",
" time-series",
" timeframe",
" trading",
" ultra-high-frequency",
" zero-gaps"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "5f3b6bc44f93d5f4bff4bccc7ff1683d5df4e3eec086bea632b4b9db5ea4ab80",
"md5": "9a4faf66dbac2df756fc86217c154c4d",
"sha256": "3ffa5fa880a648ff9143682257fd2ca645cfd89b4dee030de71cff09958be167"
},
"downloads": -1,
"filename": "gapless_crypto_data-2.10.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9a4faf66dbac2df756fc86217c154c4d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 106365,
"upload_time": "2025-09-20T07:33:20",
"upload_time_iso_8601": "2025-09-20T07:33:20.681696Z",
"url": "https://files.pythonhosted.org/packages/5f/3b/6bc44f93d5f4bff4bccc7ff1683d5df4e3eec086bea632b4b9db5ea4ab80/gapless_crypto_data-2.10.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "cc114071d5c1310894372287f2a34eb974a717f40fb7f68ee362531a90bac401",
"md5": "662ce9c3880c3813557d457618b4d339",
"sha256": "4fb6187726defcbc90a315b9be60f19de0517c0a7ce284bacd8b83a9fa29b565"
},
"downloads": -1,
"filename": "gapless_crypto_data-2.10.0.tar.gz",
"has_sig": false,
"md5_digest": "662ce9c3880c3813557d457618b4d339",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 3298874,
"upload_time": "2025-09-20T07:33:22",
"upload_time_iso_8601": "2025-09-20T07:33:22.866387Z",
"url": "https://files.pythonhosted.org/packages/cc/11/4071d5c1310894372287f2a34eb974a717f40fb7f68ee362531a90bac401/gapless_crypto_data-2.10.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-20 07:33:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Eon-Labs",
"github_project": "gapless-crypto-data",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "gapless-crypto-data"
}