# Gapless Crypto Data
[](https://badge.fury.io/py/gapless-crypto-data)
[](https://pypi.org/project/gapless-crypto-data/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/astral-sh/uv)
Ultra-fast cryptocurrency data collection with zero gaps guarantee and full 11-column microstructure format - **22x faster** than API calls via Binance public data repository.
## ⚡ Features
- 🚀 **22x faster** than API calls via Binance public data repository
- 📊 **Full 11-column microstructure format** with order flow and liquidity metrics
- 🔒 **Zero gaps guarantee** through authentic API-first validation
- ⚡ **UV-first** modern Python tooling
- 🛡️ **Corruption-proof** atomic file operations
- 📊 **Multi-symbol & multi-timeframe support** (1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h)
- 🔧 **Gap detection and filling** with authentic data only
- 📈 **Production-grade** data collection for quantitative trading
## 🚀 Quick Start
### Installation (UV - Recommended)
```bash
# Install via UV (fastest)
uv add gapless-crypto-data
# Or install globally
uv tool install gapless-crypto-data
```
### Installation (pip)
```bash
pip install gapless-crypto-data
```
### CLI Usage
```bash
# Collect data for multiple timeframes (default output location)
gapless-crypto-data --symbol SOLUSDT --timeframes 1m,3m,5m,15m,30m,1h,2h,4h
# Collect multiple symbols at once (native multi-symbol support)
gapless-crypto-data --symbol BTCUSDT,ETHUSDT,SOLUSDT --timeframes 1h,4h
# Collect specific date range with custom output directory
gapless-crypto-data --symbol BTCUSDT --timeframes 1h --start 2023-01-01 --end 2023-12-31 --output-dir ./crypto_data
# Multi-symbol with custom settings
gapless-crypto-data --symbol BTCUSDT,ETHUSDT --timeframes 5m,1h --start 2024-01-01 --end 2024-06-30 --output-dir ./crypto_data
# Fill gaps in existing data
gapless-crypto-data --fill-gaps --directory ./data
# Help
gapless-crypto-data --help
```
### Python API
#### Simple API (Recommended)
```python
import gapless_crypto_data as gcd
# Fetch recent data with date range
df = gcd.download("BTCUSDT", "1h", start="2024-01-01", end="2024-06-30")
# Or with limit
df = gcd.fetch_data("ETHUSDT", "4h", limit=1000)
# Get available symbols and timeframes
symbols = gcd.get_supported_symbols()
timeframes = gcd.get_supported_timeframes()
# Fill gaps in existing data
results = gcd.fill_gaps("./data")
```
#### Advanced API (Power Users)
```python
from gapless_crypto_data import BinancePublicDataCollector, UniversalGapFiller
# Custom collection with full control
collector = BinancePublicDataCollector(
symbol="SOLUSDT",
start_date="2023-01-01",
end_date="2023-12-31"
)
result = collector.collect_timeframe_data("1h")
df = result["dataframe"]
# Manual gap filling
gap_filler = UniversalGapFiller()
gaps = gap_filler.detect_all_gaps(csv_file, "1h")
```
## 🎯 Data Structure
All functions return pandas DataFrames with complete microstructure data:
```python
import gapless_crypto_data as gcd
# Fetch data
df = gcd.download("BTCUSDT", "1h", start="2024-01-01", end="2024-06-30")
# DataFrame columns (11-column microstructure format)
print(df.columns.tolist())
# ['date', 'open', 'high', 'low', 'close', 'volume',
# 'close_time', 'quote_asset_volume', 'number_of_trades',
# 'taker_buy_base_asset_volume', 'taker_buy_quote_asset_volume']
# Professional microstructure analysis
buy_pressure = df['taker_buy_base_asset_volume'].sum() / df['volume'].sum()
avg_trade_size = df['volume'].sum() / df['number_of_trades'].sum()
market_impact = df['quote_asset_volume'].std() / df['quote_asset_volume'].mean()
print(f"Taker buy pressure: {buy_pressure:.1%}")
print(f"Average trade size: {avg_trade_size:.4f} BTC")
print(f"Market impact volatility: {market_impact:.3f}")
```
## 📊 Performance Comparison
| Method | Collection Speed | Microstructure Data | Gap Handling | Data Integrity |
|--------|-----------------|-------------------|--------------|----------------|
| **Gapless Crypto Data** | **22x faster** | ✅ Full 11-column format | ✅ Authentic API-first | ✅ Atomic operations |
| Traditional APIs | 1x baseline | ⚠️ Basic OHLCV only | ❌ Manual handling | ⚠️ Corruption risk |
| Other downloaders | 2-5x faster | ❌ Limited format | ❌ Limited coverage | ⚠️ Basic validation |
## 🏗️ Architecture
### Core Components
- **BinancePublicDataCollector**: Ultra-fast data collection with full 11-column microstructure format
- **UniversalGapFiller**: Intelligent gap detection and filling with authentic API-first validation
- **AtomicCSVOperations**: Corruption-proof file operations with atomic writes
- **SafeCSVMerger**: Safe merging of data files with integrity validation
### Data Flow
```
Binance Public Data Repository → BinancePublicDataCollector → 11-Column Microstructure Format
↓
Gap Detection → UniversalGapFiller → Authentic API-First Validation
↓
AtomicCSVOperations → Final Gapless Dataset with Order Flow Metrics
```
## 📝 CLI Options
### Data Collection
```bash
gapless-crypto-data [OPTIONS]
Options:
--symbol TEXT Trading pair symbol(s) - single symbol or comma-separated list (e.g., SOLUSDT, BTCUSDT,ETHUSDT)
--timeframes TEXT Comma-separated timeframes (1m,3m,5m,15m,30m,1h,2h,4h)
--start TEXT Start date (YYYY-MM-DD)
--end TEXT End date (YYYY-MM-DD)
--output-dir TEXT Output directory for CSV files (default: src/gapless_crypto_data/sample_data/)
--help Show this message and exit
```
### Gap Filling
```bash
gapless-crypto-data --fill-gaps [OPTIONS]
Options:
--directory TEXT Data directory to scan for gaps
--symbol TEXT Specific symbol to process (optional)
--timeframe TEXT Specific timeframe to process (optional)
--help Show this message and exit
```
## 🔧 Advanced Usage
### Batch Processing
#### CLI Multi-Symbol (Recommended)
```bash
# Native multi-symbol support (fastest approach)
gapless-crypto-data --symbol BTCUSDT,ETHUSDT,SOLUSDT,ADAUSDT --timeframes 1m,5m,15m,1h,4h --start 2023-01-01 --end 2023-12-31
# Alternative: Multiple separate commands for different settings
gapless-crypto-data --symbol BTCUSDT,ETHUSDT --timeframes 1m,1h --start 2023-01-01 --end 2023-06-30
gapless-crypto-data --symbol SOLUSDT,ADAUSDT --timeframes 5m,4h --start 2023-07-01 --end 2023-12-31
```
#### Simple API (Recommended)
```python
import gapless_crypto_data as gcd
# Process multiple symbols with simple loops
symbols = ["BTCUSDT", "ETHUSDT", "SOLUSDT", "ADAUSDT"]
timeframes = ["1h", "4h"]
for symbol in symbols:
for timeframe in timeframes:
df = gcd.fetch_data(symbol, timeframe, start="2023-01-01", end="2023-12-31")
print(f"{symbol} {timeframe}: {len(df)} bars collected")
```
#### Advanced API (Complex Workflows)
```python
from gapless_crypto_data import BinancePublicDataCollector
# Initialize with custom settings
collector = BinancePublicDataCollector(
start_date="2023-01-01",
end_date="2023-12-31",
output_dir="./crypto_data"
)
# Process multiple symbols with detailed control
symbols = ["BTCUSDT", "ETHUSDT", "SOLUSDT"]
for symbol in symbols:
collector.symbol = symbol
results = collector.collect_multiple_timeframes(["1m", "5m", "1h", "4h"])
for timeframe, result in results.items():
print(f"{symbol} {timeframe}: {result['stats']}")
```
### Gap Analysis
#### Simple API (Recommended)
```python
import gapless_crypto_data as gcd
# Quick gap filling for entire directory
results = gcd.fill_gaps("./data")
print(f"Processed {results['files_processed']} files")
print(f"Filled {results['gaps_filled']}/{results['gaps_detected']} gaps")
print(f"Success rate: {results['success_rate']:.1f}%")
# Gap filling for specific symbols only
results = gcd.fill_gaps("./data", symbols=["BTCUSDT", "ETHUSDT"])
```
#### Advanced API (Detailed Control)
```python
from gapless_crypto_data import UniversalGapFiller
gap_filler = UniversalGapFiller()
# Manual gap detection and analysis
gaps = gap_filler.detect_all_gaps("BTCUSDT_1h.csv", "1h")
print(f"Found {len(gaps)} gaps")
for gap in gaps:
duration_hours = gap['duration'].total_seconds() / 3600
print(f"Gap: {gap['start_time']} → {gap['end_time']} ({duration_hours:.1f}h)")
# Fill specific gaps
result = gap_filler.process_file("BTCUSDT_1h.csv", "1h")
```
## 🛠️ Development
### Prerequisites
- **UV Package Manager** (recommended) - [Install UV](https://docs.astral.sh/uv/getting-started/installation/)
- **Python 3.9+** - UV will manage Python versions automatically
- **Git** - For repository cloning and version control
### Development Installation Workflow
**IMPORTANT**: This project uses **mandatory pre-commit hooks** to prevent broken code from being committed. All commits are automatically validated for formatting, linting, and basic quality checks.
#### Step 1: Clone Repository
```bash
git clone https://github.com/Eon-Labs/gapless-crypto-data.git
cd gapless-crypto-data
```
#### Step 2: Development Environment Setup
```bash
# Create isolated virtual environment
uv venv
# Activate virtual environment
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows
# Install all dependencies (production + development)
uv sync --dev
```
#### Step 3: Verify Installation
```bash
# Test CLI functionality
uv run gapless-crypto-data --help
# Run test suite
uv run pytest
# Quick data collection test
uv run gapless-crypto-data --symbol BTCUSDT --timeframes 1h --start 2024-01-01 --end 2024-01-01 --output-dir ./test_data
```
#### Step 4: Set Up Pre-Commit Hooks (Mandatory)
```bash
# Install pre-commit hooks (prevents broken code from being committed)
uv run pre-commit install
# Test pre-commit hooks
uv run pre-commit run --all-files
```
#### Step 5: Development Tools
```bash
# Code formatting
uv run ruff format .
# Linting and auto-fixes
uv run ruff check --fix .
# Type checking
uv run mypy src/
# Run specific tests
uv run pytest tests/test_binance_collector.py -v
# Manual pre-commit validation
uv run pre-commit run --all-files
```
### Development Commands Reference
| Task | Command |
|------|---------|
| Install dependencies | `uv sync --dev` |
| Setup pre-commit hooks | `uv run pre-commit install` |
| Add new dependency | `uv add package-name` |
| Add dev dependency | `uv add --dev package-name` |
| Run CLI | `uv run gapless-crypto-data [args]` |
| Run tests | `uv run pytest` |
| Format code | `uv run ruff format .` |
| Lint code | `uv run ruff check --fix .` |
| Type check | `uv run mypy src/` |
| Validate pre-commit | `uv run pre-commit run --all-files` |
| Build package | `uv build` |
### Project Structure for Development
```
gapless-crypto-data/
├── src/gapless_crypto_data/ # Main package
│ ├── __init__.py # Package exports
│ ├── cli.py # CLI interface
│ ├── collectors/ # Data collection modules
│ └── gap_filling/ # Gap detection/filling
├── tests/ # Test suite
├── docs/ # Documentation
├── examples/ # Usage examples
├── pyproject.toml # Project configuration
└── uv.lock # Dependency lock file
```
### Building and Publishing
```bash
# Build package
uv build
# Publish to PyPI (requires API token)
uv publish
```
## 📁 Project Structure
```
gapless-crypto-data/
├── src/
│ └── gapless_crypto_data/
│ ├── __init__.py # Package exports
│ ├── cli.py # Command-line interface
│ ├── collectors/
│ │ ├── __init__.py
│ │ └── binance_public_data_collector.py
│ ├── gap_filling/
│ │ ├── __init__.py
│ │ ├── universal_gap_filler.py
│ │ └── safe_file_operations.py
│ └── utils/
│ └── __init__.py
├── tests/ # Test suite
├── docs/ # Documentation
├── pyproject.toml # Project configuration
├── README.md # This file
└── LICENSE # MIT License
```
## 🔍 Supported Timeframes
| Timeframe | Code | Description |
|-----------|------|-------------|
| 1 minute | `1m` | Highest resolution |
| 3 minutes | `3m` | Short-term analysis |
| 5 minutes | `5m` | Common trading timeframe |
| 15 minutes| `15m`| Medium-term signals |
| 30 minutes| `30m`| Longer-term patterns |
| 1 hour | `1h` | Popular for backtesting |
| 2 hours | `2h` | Extended analysis |
| 4 hours | `4h` | Daily cycle patterns |
## ⚠️ Requirements
- Python 3.9+
- pandas >= 2.0.0
- requests >= 2.25.0
- Stable internet connection for data downloads
## 🤝 Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Install development dependencies (`uv sync --dev`)
4. Make your changes
5. Run tests (`uv run pytest`)
6. Format code (`uv run ruff format .`)
7. Commit changes (`git commit -m 'Add amazing feature'`)
8. Push to branch (`git push origin feature/amazing-feature`)
9. Open a Pull Request
## 📚 API Reference
### BinancePublicDataCollector
Ultra-fast cryptocurrency spot data collection from Binance's public data repository. Provides 10-100x faster data collection compared to API calls by downloading pre-generated monthly ZIP files.
#### Key Methods
**`__init__(symbol, start_date, end_date, output_dir)`**
Initialize the collector with trading pair and date range.
```python
collector = BinancePublicDataCollector(
symbol="BTCUSDT", # USDT spot pair
start_date="2023-01-01", # Start date (YYYY-MM-DD)
end_date="2023-12-31", # End date (YYYY-MM-DD)
output_dir="./crypto_data" # Output directory (optional)
)
```
**`collect_timeframe_data(trading_timeframe) -> Dict[str, Any]`**
Collect complete historical data for a single timeframe with full 11-column microstructure format.
```python
result = collector.collect_timeframe_data("1h")
df = result["dataframe"] # pandas DataFrame with OHLCV + microstructure
filepath = result["filepath"] # Path to saved CSV file
stats = result["stats"] # Collection statistics
# Access microstructure data
total_trades = df["number_of_trades"].sum()
taker_buy_ratio = df["taker_buy_base_asset_volume"].sum() / df["volume"].sum()
```
**`collect_multiple_timeframes(timeframes) -> Dict[str, Dict[str, Any]]`**
Collect data for multiple timeframes with comprehensive progress tracking.
```python
results = collector.collect_multiple_timeframes(["1h", "4h"])
for timeframe, result in results.items():
df = result["dataframe"]
print(f"{timeframe}: {len(df):,} bars")
```
### UniversalGapFiller
Universal gap detection and filling for all timeframes with authentic 11-column microstructure format. Uses only authentic Binance API data - never synthetic data.
#### Key Methods
**`detect_all_gaps(csv_file) -> List[Dict]`**
Automatically detect timestamp gaps in CSV files.
```python
gap_filler = UniversalGapFiller()
gaps = gap_filler.detect_all_gaps("BTCUSDT_1h_data.csv")
print(f"Found {len(gaps)} gaps to fill")
```
**`fill_gap(csv_file, gap_info) -> bool`**
Fill a specific gap with authentic Binance API data.
```python
# Fill first detected gap
success = gap_filler.fill_gap("BTCUSDT_1h_data.csv", gaps[0])
print(f"Gap filled successfully: {success}")
```
**`process_file(directory) -> Dict[str, Dict]`**
Batch process all CSV files in a directory for gap detection and filling.
```python
results = gap_filler.process_file("./crypto_data/")
for filename, result in results.items():
print(f"{filename}: {result['gaps_filled']} gaps filled")
```
### AtomicCSVOperations
Safe atomic operations for CSV files with header preservation and corruption prevention. Uses temporary files and atomic rename operations to ensure data integrity.
#### Key Methods
**`create_backup() -> Path`**
Create timestamped backup of original file before modifications.
```python
from pathlib import Path
atomic_ops = AtomicCSVOperations(Path("data.csv"))
backup_path = atomic_ops.create_backup()
```
**`write_dataframe_atomic(df) -> bool`**
Atomically write DataFrame to CSV with integrity validation.
```python
success = atomic_ops.write_dataframe_atomic(df)
if not success:
atomic_ops.rollback_from_backup()
```
### SafeCSVMerger
Safe CSV data merging with gap filling capabilities and data integrity validation. Handles temporal data insertion while maintaining chronological order.
#### Key Methods
**`merge_gap_data_safe(gap_data, gap_start, gap_end) -> bool`**
Safely merge gap data into existing CSV using atomic operations.
```python
from datetime import datetime
merger = SafeCSVMerger(Path("eth_data.csv"))
success = merger.merge_gap_data_safe(
gap_data, # DataFrame with gap data
datetime(2024, 1, 1, 12), # Gap start time
datetime(2024, 1, 1, 15) # Gap end time
)
```
### Data Format
All classes work with the standardized 11-column microstructure format:
| Column | Description | Example |
|--------|-------------|---------|
| `date` | Open timestamp | `2024-01-01 12:00:00` |
| `open` | Opening price | `42150.50` |
| `high` | Highest price | `42200.00` |
| `low` | Lowest price | `42100.25` |
| `close` | Closing price | `42175.75` |
| `volume` | Base asset volume | `15.250000` |
| `close_time` | Close timestamp | `2024-01-01 12:59:59` |
| `quote_asset_volume` | Quote asset volume | `643238.125` |
| `number_of_trades` | Trade count | `1547` |
| `taker_buy_base_asset_volume` | Taker buy base volume | `7.825000` |
| `taker_buy_quote_asset_volume` | Taker buy quote volume | `329891.750` |
### Error Handling
All classes implement robust error handling with meaningful exceptions:
```python
try:
collector = BinancePublicDataCollector(symbol="INVALIDPAIR")
result = collector.collect_timeframe_data("1h")
except ValueError as e:
print(f"Invalid symbol format: {e}")
except ConnectionError as e:
print(f"Network error: {e}")
except FileNotFoundError as e:
print(f"Output directory error: {e}")
```
### Type Hints
All public APIs include comprehensive type hints for better IDE support:
```python
from typing import Dict, List, Optional, Any
from pathlib import Path
import pandas as pd
def collect_timeframe_data(self, trading_timeframe: str) -> Dict[str, Any]:
# Returns dict with 'dataframe', 'filepath', and 'stats' keys
pass
def collect_multiple_timeframes(
self,
timeframes: Optional[List[str]] = None
) -> Dict[str, Dict[str, Any]]:
# Returns nested dict by timeframe
pass
```
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🏢 About Eon Labs
Gapless Crypto Data is developed by [Eon Labs](https://github.com/Eon-Labs), specializing in quantitative trading infrastructure and machine learning for financial markets.
---
**⚡ Powered by UV** - Modern Python dependency management
**🚀 22x Faster** - Than traditional API-based collection
**📊 11-Column Format** - Full microstructure data with order flow metrics
**🔒 Zero Gaps** - Guaranteed complete datasets with authentic data only
Raw data
{
"_id": null,
"home_page": null,
"name": "gapless-crypto-data",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "Terry Li <terry@eonlabs.com>",
"keywords": "OHLCV, api, authentic-data, binance, collection, crypto, cryptocurrency, data, download, fetch-data, financial-data, function-based, gap-filling, liquidity, microstructure, order-flow, pandas, simple-api, taker-volume, time-series, trading",
"author": null,
"author_email": "Eon Labs <terry@eonlabs.com>",
"download_url": "https://files.pythonhosted.org/packages/fc/f7/3668d9b8246639d67579106ae98981455be97efe4978099d15df5acd6fba/gapless_crypto_data-2.6.3.tar.gz",
"platform": null,
"description": "# Gapless Crypto Data\n\n[](https://badge.fury.io/py/gapless-crypto-data)\n[](https://pypi.org/project/gapless-crypto-data/)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/astral-sh/uv)\n\nUltra-fast cryptocurrency data collection with zero gaps guarantee and full 11-column microstructure format - **22x faster** than API calls via Binance public data repository.\n\n## \u26a1 Features\n\n- \ud83d\ude80 **22x faster** than API calls via Binance public data repository\n- \ud83d\udcca **Full 11-column microstructure format** with order flow and liquidity metrics\n- \ud83d\udd12 **Zero gaps guarantee** through authentic API-first validation\n- \u26a1 **UV-first** modern Python tooling\n- \ud83d\udee1\ufe0f **Corruption-proof** atomic file operations\n- \ud83d\udcca **Multi-symbol & multi-timeframe support** (1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h)\n- \ud83d\udd27 **Gap detection and filling** with authentic data only\n- \ud83d\udcc8 **Production-grade** data collection for quantitative trading\n\n## \ud83d\ude80 Quick Start\n\n### Installation (UV - Recommended)\n\n```bash\n# Install via UV (fastest)\nuv add gapless-crypto-data\n\n# Or install globally\nuv tool install gapless-crypto-data\n```\n\n### Installation (pip)\n\n```bash\npip install gapless-crypto-data\n```\n\n### CLI Usage\n\n```bash\n# Collect data for multiple timeframes (default output location)\ngapless-crypto-data --symbol SOLUSDT --timeframes 1m,3m,5m,15m,30m,1h,2h,4h\n\n# Collect multiple symbols at once (native multi-symbol support)\ngapless-crypto-data --symbol BTCUSDT,ETHUSDT,SOLUSDT --timeframes 1h,4h\n\n# Collect specific date range with custom output directory\ngapless-crypto-data --symbol BTCUSDT --timeframes 1h --start 2023-01-01 --end 2023-12-31 --output-dir ./crypto_data\n\n# Multi-symbol with custom settings\ngapless-crypto-data --symbol BTCUSDT,ETHUSDT --timeframes 5m,1h --start 2024-01-01 --end 2024-06-30 --output-dir ./crypto_data\n\n# Fill gaps in existing data\ngapless-crypto-data --fill-gaps --directory ./data\n\n# Help\ngapless-crypto-data --help\n```\n\n### Python API\n\n#### Simple API (Recommended)\n\n```python\nimport gapless_crypto_data as gcd\n\n# Fetch recent data with date range\ndf = gcd.download(\"BTCUSDT\", \"1h\", start=\"2024-01-01\", end=\"2024-06-30\")\n\n# Or with limit\ndf = gcd.fetch_data(\"ETHUSDT\", \"4h\", limit=1000)\n\n# Get available symbols and timeframes\nsymbols = gcd.get_supported_symbols()\ntimeframes = gcd.get_supported_timeframes()\n\n# Fill gaps in existing data\nresults = gcd.fill_gaps(\"./data\")\n```\n\n#### Advanced API (Power Users)\n\n```python\nfrom gapless_crypto_data import BinancePublicDataCollector, UniversalGapFiller\n\n# Custom collection with full control\ncollector = BinancePublicDataCollector(\n symbol=\"SOLUSDT\",\n start_date=\"2023-01-01\",\n end_date=\"2023-12-31\"\n)\n\nresult = collector.collect_timeframe_data(\"1h\")\ndf = result[\"dataframe\"]\n\n# Manual gap filling\ngap_filler = UniversalGapFiller()\ngaps = gap_filler.detect_all_gaps(csv_file, \"1h\")\n```\n\n## \ud83c\udfaf Data Structure\n\nAll functions return pandas DataFrames with complete microstructure data:\n\n```python\nimport gapless_crypto_data as gcd\n\n# Fetch data\ndf = gcd.download(\"BTCUSDT\", \"1h\", start=\"2024-01-01\", end=\"2024-06-30\")\n\n# DataFrame columns (11-column microstructure format)\nprint(df.columns.tolist())\n# ['date', 'open', 'high', 'low', 'close', 'volume',\n# 'close_time', 'quote_asset_volume', 'number_of_trades',\n# 'taker_buy_base_asset_volume', 'taker_buy_quote_asset_volume']\n\n# Professional microstructure analysis\nbuy_pressure = df['taker_buy_base_asset_volume'].sum() / df['volume'].sum()\navg_trade_size = df['volume'].sum() / df['number_of_trades'].sum()\nmarket_impact = df['quote_asset_volume'].std() / df['quote_asset_volume'].mean()\n\nprint(f\"Taker buy pressure: {buy_pressure:.1%}\")\nprint(f\"Average trade size: {avg_trade_size:.4f} BTC\")\nprint(f\"Market impact volatility: {market_impact:.3f}\")\n```\n\n## \ud83d\udcca Performance Comparison\n\n| Method | Collection Speed | Microstructure Data | Gap Handling | Data Integrity |\n|--------|-----------------|-------------------|--------------|----------------|\n| **Gapless Crypto Data** | **22x faster** | \u2705 Full 11-column format | \u2705 Authentic API-first | \u2705 Atomic operations |\n| Traditional APIs | 1x baseline | \u26a0\ufe0f Basic OHLCV only | \u274c Manual handling | \u26a0\ufe0f Corruption risk |\n| Other downloaders | 2-5x faster | \u274c Limited format | \u274c Limited coverage | \u26a0\ufe0f Basic validation |\n\n## \ud83c\udfd7\ufe0f Architecture\n\n### Core Components\n\n- **BinancePublicDataCollector**: Ultra-fast data collection with full 11-column microstructure format\n- **UniversalGapFiller**: Intelligent gap detection and filling with authentic API-first validation\n- **AtomicCSVOperations**: Corruption-proof file operations with atomic writes\n- **SafeCSVMerger**: Safe merging of data files with integrity validation\n\n### Data Flow\n\n```\nBinance Public Data Repository \u2192 BinancePublicDataCollector \u2192 11-Column Microstructure Format\n \u2193\nGap Detection \u2192 UniversalGapFiller \u2192 Authentic API-First Validation\n \u2193\nAtomicCSVOperations \u2192 Final Gapless Dataset with Order Flow Metrics\n```\n\n## \ud83d\udcdd CLI Options\n\n### Data Collection\n\n```bash\ngapless-crypto-data [OPTIONS]\n\nOptions:\n --symbol TEXT Trading pair symbol(s) - single symbol or comma-separated list (e.g., SOLUSDT, BTCUSDT,ETHUSDT)\n --timeframes TEXT Comma-separated timeframes (1m,3m,5m,15m,30m,1h,2h,4h)\n --start TEXT Start date (YYYY-MM-DD)\n --end TEXT End date (YYYY-MM-DD)\n --output-dir TEXT Output directory for CSV files (default: src/gapless_crypto_data/sample_data/)\n --help Show this message and exit\n```\n\n### Gap Filling\n\n```bash\ngapless-crypto-data --fill-gaps [OPTIONS]\n\nOptions:\n --directory TEXT Data directory to scan for gaps\n --symbol TEXT Specific symbol to process (optional)\n --timeframe TEXT Specific timeframe to process (optional)\n --help Show this message and exit\n```\n\n## \ud83d\udd27 Advanced Usage\n\n### Batch Processing\n\n#### CLI Multi-Symbol (Recommended)\n\n```bash\n# Native multi-symbol support (fastest approach)\ngapless-crypto-data --symbol BTCUSDT,ETHUSDT,SOLUSDT,ADAUSDT --timeframes 1m,5m,15m,1h,4h --start 2023-01-01 --end 2023-12-31\n\n# Alternative: Multiple separate commands for different settings\ngapless-crypto-data --symbol BTCUSDT,ETHUSDT --timeframes 1m,1h --start 2023-01-01 --end 2023-06-30\ngapless-crypto-data --symbol SOLUSDT,ADAUSDT --timeframes 5m,4h --start 2023-07-01 --end 2023-12-31\n```\n\n#### Simple API (Recommended)\n\n```python\nimport gapless_crypto_data as gcd\n\n# Process multiple symbols with simple loops\nsymbols = [\"BTCUSDT\", \"ETHUSDT\", \"SOLUSDT\", \"ADAUSDT\"]\ntimeframes = [\"1h\", \"4h\"]\n\nfor symbol in symbols:\n for timeframe in timeframes:\n df = gcd.fetch_data(symbol, timeframe, start=\"2023-01-01\", end=\"2023-12-31\")\n print(f\"{symbol} {timeframe}: {len(df)} bars collected\")\n```\n\n#### Advanced API (Complex Workflows)\n\n```python\nfrom gapless_crypto_data import BinancePublicDataCollector\n\n# Initialize with custom settings\ncollector = BinancePublicDataCollector(\n start_date=\"2023-01-01\",\n end_date=\"2023-12-31\",\n output_dir=\"./crypto_data\"\n)\n\n# Process multiple symbols with detailed control\nsymbols = [\"BTCUSDT\", \"ETHUSDT\", \"SOLUSDT\"]\nfor symbol in symbols:\n collector.symbol = symbol\n results = collector.collect_multiple_timeframes([\"1m\", \"5m\", \"1h\", \"4h\"])\n for timeframe, result in results.items():\n print(f\"{symbol} {timeframe}: {result['stats']}\")\n```\n\n### Gap Analysis\n\n#### Simple API (Recommended)\n\n```python\nimport gapless_crypto_data as gcd\n\n# Quick gap filling for entire directory\nresults = gcd.fill_gaps(\"./data\")\nprint(f\"Processed {results['files_processed']} files\")\nprint(f\"Filled {results['gaps_filled']}/{results['gaps_detected']} gaps\")\nprint(f\"Success rate: {results['success_rate']:.1f}%\")\n\n# Gap filling for specific symbols only\nresults = gcd.fill_gaps(\"./data\", symbols=[\"BTCUSDT\", \"ETHUSDT\"])\n```\n\n#### Advanced API (Detailed Control)\n\n```python\nfrom gapless_crypto_data import UniversalGapFiller\n\ngap_filler = UniversalGapFiller()\n\n# Manual gap detection and analysis\ngaps = gap_filler.detect_all_gaps(\"BTCUSDT_1h.csv\", \"1h\")\nprint(f\"Found {len(gaps)} gaps\")\n\nfor gap in gaps:\n duration_hours = gap['duration'].total_seconds() / 3600\n print(f\"Gap: {gap['start_time']} \u2192 {gap['end_time']} ({duration_hours:.1f}h)\")\n\n# Fill specific gaps\nresult = gap_filler.process_file(\"BTCUSDT_1h.csv\", \"1h\")\n```\n\n## \ud83d\udee0\ufe0f Development\n\n### Prerequisites\n\n- **UV Package Manager** (recommended) - [Install UV](https://docs.astral.sh/uv/getting-started/installation/)\n- **Python 3.9+** - UV will manage Python versions automatically\n- **Git** - For repository cloning and version control\n\n### Development Installation Workflow\n\n**IMPORTANT**: This project uses **mandatory pre-commit hooks** to prevent broken code from being committed. All commits are automatically validated for formatting, linting, and basic quality checks.\n\n#### Step 1: Clone Repository\n```bash\ngit clone https://github.com/Eon-Labs/gapless-crypto-data.git\ncd gapless-crypto-data\n```\n\n#### Step 2: Development Environment Setup\n```bash\n# Create isolated virtual environment\nuv venv\n\n# Activate virtual environment\nsource .venv/bin/activate # macOS/Linux\n# .venv\\Scripts\\activate # Windows\n\n# Install all dependencies (production + development)\nuv sync --dev\n```\n\n#### Step 3: Verify Installation\n```bash\n# Test CLI functionality\nuv run gapless-crypto-data --help\n\n# Run test suite\nuv run pytest\n\n# Quick data collection test\nuv run gapless-crypto-data --symbol BTCUSDT --timeframes 1h --start 2024-01-01 --end 2024-01-01 --output-dir ./test_data\n```\n\n#### Step 4: Set Up Pre-Commit Hooks (Mandatory)\n```bash\n# Install pre-commit hooks (prevents broken code from being committed)\nuv run pre-commit install\n\n# Test pre-commit hooks\nuv run pre-commit run --all-files\n```\n\n#### Step 5: Development Tools\n```bash\n# Code formatting\nuv run ruff format .\n\n# Linting and auto-fixes\nuv run ruff check --fix .\n\n# Type checking\nuv run mypy src/\n\n# Run specific tests\nuv run pytest tests/test_binance_collector.py -v\n\n# Manual pre-commit validation\nuv run pre-commit run --all-files\n```\n\n### Development Commands Reference\n\n| Task | Command |\n|------|---------|\n| Install dependencies | `uv sync --dev` |\n| Setup pre-commit hooks | `uv run pre-commit install` |\n| Add new dependency | `uv add package-name` |\n| Add dev dependency | `uv add --dev package-name` |\n| Run CLI | `uv run gapless-crypto-data [args]` |\n| Run tests | `uv run pytest` |\n| Format code | `uv run ruff format .` |\n| Lint code | `uv run ruff check --fix .` |\n| Type check | `uv run mypy src/` |\n| Validate pre-commit | `uv run pre-commit run --all-files` |\n| Build package | `uv build` |\n\n### Project Structure for Development\n```\ngapless-crypto-data/\n\u251c\u2500\u2500 src/gapless_crypto_data/ # Main package\n\u2502 \u251c\u2500\u2500 __init__.py # Package exports\n\u2502 \u251c\u2500\u2500 cli.py # CLI interface\n\u2502 \u251c\u2500\u2500 collectors/ # Data collection modules\n\u2502 \u2514\u2500\u2500 gap_filling/ # Gap detection/filling\n\u251c\u2500\u2500 tests/ # Test suite\n\u251c\u2500\u2500 docs/ # Documentation\n\u251c\u2500\u2500 examples/ # Usage examples\n\u251c\u2500\u2500 pyproject.toml # Project configuration\n\u2514\u2500\u2500 uv.lock # Dependency lock file\n```\n\n### Building and Publishing\n\n```bash\n# Build package\nuv build\n\n# Publish to PyPI (requires API token)\nuv publish\n```\n\n## \ud83d\udcc1 Project Structure\n\n```\ngapless-crypto-data/\n\u251c\u2500\u2500 src/\n\u2502 \u2514\u2500\u2500 gapless_crypto_data/\n\u2502 \u251c\u2500\u2500 __init__.py # Package exports\n\u2502 \u251c\u2500\u2500 cli.py # Command-line interface\n\u2502 \u251c\u2500\u2500 collectors/\n\u2502 \u2502 \u251c\u2500\u2500 __init__.py\n\u2502 \u2502 \u2514\u2500\u2500 binance_public_data_collector.py\n\u2502 \u251c\u2500\u2500 gap_filling/\n\u2502 \u2502 \u251c\u2500\u2500 __init__.py\n\u2502 \u2502 \u251c\u2500\u2500 universal_gap_filler.py\n\u2502 \u2502 \u2514\u2500\u2500 safe_file_operations.py\n\u2502 \u2514\u2500\u2500 utils/\n\u2502 \u2514\u2500\u2500 __init__.py\n\u251c\u2500\u2500 tests/ # Test suite\n\u251c\u2500\u2500 docs/ # Documentation\n\u251c\u2500\u2500 pyproject.toml # Project configuration\n\u251c\u2500\u2500 README.md # This file\n\u2514\u2500\u2500 LICENSE # MIT License\n```\n\n## \ud83d\udd0d Supported Timeframes\n\n| Timeframe | Code | Description |\n|-----------|------|-------------|\n| 1 minute | `1m` | Highest resolution |\n| 3 minutes | `3m` | Short-term analysis |\n| 5 minutes | `5m` | Common trading timeframe |\n| 15 minutes| `15m`| Medium-term signals |\n| 30 minutes| `30m`| Longer-term patterns |\n| 1 hour | `1h` | Popular for backtesting |\n| 2 hours | `2h` | Extended analysis |\n| 4 hours | `4h` | Daily cycle patterns |\n\n## \u26a0\ufe0f Requirements\n\n- Python 3.9+\n- pandas >= 2.0.0\n- requests >= 2.25.0\n- Stable internet connection for data downloads\n\n## \ud83e\udd1d Contributing\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Install development dependencies (`uv sync --dev`)\n4. Make your changes\n5. Run tests (`uv run pytest`)\n6. Format code (`uv run ruff format .`)\n7. Commit changes (`git commit -m 'Add amazing feature'`)\n8. Push to branch (`git push origin feature/amazing-feature`)\n9. Open a Pull Request\n\n## \ud83d\udcda API Reference\n\n### BinancePublicDataCollector\n\nUltra-fast cryptocurrency spot data collection from Binance's public data repository. Provides 10-100x faster data collection compared to API calls by downloading pre-generated monthly ZIP files.\n\n#### Key Methods\n\n**`__init__(symbol, start_date, end_date, output_dir)`**\n\nInitialize the collector with trading pair and date range.\n\n```python\ncollector = BinancePublicDataCollector(\n symbol=\"BTCUSDT\", # USDT spot pair\n start_date=\"2023-01-01\", # Start date (YYYY-MM-DD)\n end_date=\"2023-12-31\", # End date (YYYY-MM-DD)\n output_dir=\"./crypto_data\" # Output directory (optional)\n)\n```\n\n**`collect_timeframe_data(trading_timeframe) -> Dict[str, Any]`**\n\nCollect complete historical data for a single timeframe with full 11-column microstructure format.\n\n```python\nresult = collector.collect_timeframe_data(\"1h\")\ndf = result[\"dataframe\"] # pandas DataFrame with OHLCV + microstructure\nfilepath = result[\"filepath\"] # Path to saved CSV file\nstats = result[\"stats\"] # Collection statistics\n\n# Access microstructure data\ntotal_trades = df[\"number_of_trades\"].sum()\ntaker_buy_ratio = df[\"taker_buy_base_asset_volume\"].sum() / df[\"volume\"].sum()\n```\n\n**`collect_multiple_timeframes(timeframes) -> Dict[str, Dict[str, Any]]`**\n\nCollect data for multiple timeframes with comprehensive progress tracking.\n\n```python\nresults = collector.collect_multiple_timeframes([\"1h\", \"4h\"])\nfor timeframe, result in results.items():\n df = result[\"dataframe\"]\n print(f\"{timeframe}: {len(df):,} bars\")\n```\n\n### UniversalGapFiller\n\nUniversal gap detection and filling for all timeframes with authentic 11-column microstructure format. Uses only authentic Binance API data - never synthetic data.\n\n#### Key Methods\n\n**`detect_all_gaps(csv_file) -> List[Dict]`**\n\nAutomatically detect timestamp gaps in CSV files.\n\n```python\ngap_filler = UniversalGapFiller()\ngaps = gap_filler.detect_all_gaps(\"BTCUSDT_1h_data.csv\")\nprint(f\"Found {len(gaps)} gaps to fill\")\n```\n\n**`fill_gap(csv_file, gap_info) -> bool`**\n\nFill a specific gap with authentic Binance API data.\n\n```python\n# Fill first detected gap\nsuccess = gap_filler.fill_gap(\"BTCUSDT_1h_data.csv\", gaps[0])\nprint(f\"Gap filled successfully: {success}\")\n```\n\n**`process_file(directory) -> Dict[str, Dict]`**\n\nBatch process all CSV files in a directory for gap detection and filling.\n\n```python\nresults = gap_filler.process_file(\"./crypto_data/\")\nfor filename, result in results.items():\n print(f\"{filename}: {result['gaps_filled']} gaps filled\")\n```\n\n### AtomicCSVOperations\n\nSafe atomic operations for CSV files with header preservation and corruption prevention. Uses temporary files and atomic rename operations to ensure data integrity.\n\n#### Key Methods\n\n**`create_backup() -> Path`**\n\nCreate timestamped backup of original file before modifications.\n\n```python\nfrom pathlib import Path\natomic_ops = AtomicCSVOperations(Path(\"data.csv\"))\nbackup_path = atomic_ops.create_backup()\n```\n\n**`write_dataframe_atomic(df) -> bool`**\n\nAtomically write DataFrame to CSV with integrity validation.\n\n```python\nsuccess = atomic_ops.write_dataframe_atomic(df)\nif not success:\n atomic_ops.rollback_from_backup()\n```\n\n### SafeCSVMerger\n\nSafe CSV data merging with gap filling capabilities and data integrity validation. Handles temporal data insertion while maintaining chronological order.\n\n#### Key Methods\n\n**`merge_gap_data_safe(gap_data, gap_start, gap_end) -> bool`**\n\nSafely merge gap data into existing CSV using atomic operations.\n\n```python\nfrom datetime import datetime\nmerger = SafeCSVMerger(Path(\"eth_data.csv\"))\nsuccess = merger.merge_gap_data_safe(\n gap_data, # DataFrame with gap data\n datetime(2024, 1, 1, 12), # Gap start time\n datetime(2024, 1, 1, 15) # Gap end time\n)\n```\n\n### Data Format\n\nAll classes work with the standardized 11-column microstructure format:\n\n| Column | Description | Example |\n|--------|-------------|---------|\n| `date` | Open timestamp | `2024-01-01 12:00:00` |\n| `open` | Opening price | `42150.50` |\n| `high` | Highest price | `42200.00` |\n| `low` | Lowest price | `42100.25` |\n| `close` | Closing price | `42175.75` |\n| `volume` | Base asset volume | `15.250000` |\n| `close_time` | Close timestamp | `2024-01-01 12:59:59` |\n| `quote_asset_volume` | Quote asset volume | `643238.125` |\n| `number_of_trades` | Trade count | `1547` |\n| `taker_buy_base_asset_volume` | Taker buy base volume | `7.825000` |\n| `taker_buy_quote_asset_volume` | Taker buy quote volume | `329891.750` |\n\n### Error Handling\n\nAll classes implement robust error handling with meaningful exceptions:\n\n```python\ntry:\n collector = BinancePublicDataCollector(symbol=\"INVALIDPAIR\")\n result = collector.collect_timeframe_data(\"1h\")\nexcept ValueError as e:\n print(f\"Invalid symbol format: {e}\")\nexcept ConnectionError as e:\n print(f\"Network error: {e}\")\nexcept FileNotFoundError as e:\n print(f\"Output directory error: {e}\")\n```\n\n### Type Hints\n\nAll public APIs include comprehensive type hints for better IDE support:\n\n```python\nfrom typing import Dict, List, Optional, Any\nfrom pathlib import Path\nimport pandas as pd\n\ndef collect_timeframe_data(self, trading_timeframe: str) -> Dict[str, Any]:\n # Returns dict with 'dataframe', 'filepath', and 'stats' keys\n pass\n\ndef collect_multiple_timeframes(\n self,\n timeframes: Optional[List[str]] = None\n) -> Dict[str, Dict[str, Any]]:\n # Returns nested dict by timeframe\n pass\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83c\udfe2 About Eon Labs\n\nGapless Crypto Data is developed by [Eon Labs](https://github.com/Eon-Labs), specializing in quantitative trading infrastructure and machine learning for financial markets.\n\n---\n\n**\u26a1 Powered by UV** - Modern Python dependency management\n**\ud83d\ude80 22x Faster** - Than traditional API-based collection\n**\ud83d\udcca 11-Column Format** - Full microstructure data with order flow metrics\n**\ud83d\udd12 Zero Gaps** - Guaranteed complete datasets with authentic data only\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Ultra-fast cryptocurrency data collection with intuitive function-based API, zero gaps guarantee, and full 11-column microstructure format. Both simple function-based and advanced class-based APIs available.",
"version": "2.6.3",
"project_urls": {
"Changelog": "https://github.com/Eon-Labs/gapless-crypto-data/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/Eon-Labs/gapless-crypto-data#readme",
"Homepage": "https://github.com/Eon-Labs/gapless-crypto-data",
"Issues": "https://github.com/Eon-Labs/gapless-crypto-data/issues",
"Repository": "https://github.com/Eon-Labs/gapless-crypto-data.git"
},
"split_keywords": [
"ohlcv",
" api",
" authentic-data",
" binance",
" collection",
" crypto",
" cryptocurrency",
" data",
" download",
" fetch-data",
" financial-data",
" function-based",
" gap-filling",
" liquidity",
" microstructure",
" order-flow",
" pandas",
" simple-api",
" taker-volume",
" time-series",
" trading"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "630125d925fe9a48749f1db23f82432c33fb084574fd1ffb3ed3bc2737dedf0e",
"md5": "d81fabf3838af448d0a1a26fe69c7503",
"sha256": "8578e5a33319d0a587e239563f6d1e37ce54842580ee48f67bbc19e94289bc1e"
},
"downloads": -1,
"filename": "gapless_crypto_data-2.6.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d81fabf3838af448d0a1a26fe69c7503",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 88901,
"upload_time": "2025-09-18T16:41:12",
"upload_time_iso_8601": "2025-09-18T16:41:12.028816Z",
"url": "https://files.pythonhosted.org/packages/63/01/25d925fe9a48749f1db23f82432c33fb084574fd1ffb3ed3bc2737dedf0e/gapless_crypto_data-2.6.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "fcf73668d9b8246639d67579106ae98981455be97efe4978099d15df5acd6fba",
"md5": "56521190fa2bf2ddad99da53d084ea82",
"sha256": "a33d7c7d67e0787f5f565f3931b44b9ae3122dbef88af227182bcbe71b8e8d5b"
},
"downloads": -1,
"filename": "gapless_crypto_data-2.6.3.tar.gz",
"has_sig": false,
"md5_digest": "56521190fa2bf2ddad99da53d084ea82",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 3354070,
"upload_time": "2025-09-18T16:41:15",
"upload_time_iso_8601": "2025-09-18T16:41:15.418461Z",
"url": "https://files.pythonhosted.org/packages/fc/f7/3668d9b8246639d67579106ae98981455be97efe4978099d15df5acd6fba/gapless_crypto_data-2.6.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-18 16:41:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Eon-Labs",
"github_project": "gapless-crypto-data",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "gapless-crypto-data"
}