# Paged List
[](https://badge.fury.io/py/paged-list)
[](https://pypi.org/project/paged-list/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/christensendaniel/paged-list/actions)
[](https://github.com/psf/black)
A Python package that provides a disk-backed list implementation for handling large datasets efficiently. When your data gets too large for memory, paged-list automatically chunks it into pickle files on disk, only loading relevant chunks when needed.
## Links
- **PyPI Package**: [https://pypi.org/project/paged-list/](https://pypi.org/project/paged-list/)
- **Documentation**: [https://paged-list.readthedocs.io/](https://paged-list.readthedocs.io/)
- **Source Code**: [https://github.com/christensendaniel/paged-list](https://github.com/christensendaniel/paged-list)
- **Issues**: [https://github.com/christensendaniel/paged-list/issues](https://github.com/christensendaniel/paged-list/issues)
- **Changelog**: [CHANGELOG.md](CHANGELOG.md)
## Features
- **Memory Efficient**: Only keeps a small portion of data in memory
- **Automatic Chunking**: Transparently splits large datasets into manageable chunks
- **List-like Interface**: Supports indexing, slicing, and iteration like regular Python lists
- **Parallel Processing**: Built-in map and serialization functions with multi-threading support
- **Type Safety**: Designed for dictionaries with comprehensive type hints
- **Context Manager**: Automatic cleanup of temporary files
## Requirements
- Python 3.9 or higher
- No external dependencies for core functionality
## Installation
Install from PyPI:
```bash
pip install paged-list
```
**PyPI Package**: [https://pypi.org/project/paged-list/](https://pypi.org/project/paged-list/)
**Note:** Python 3.9+ is required. If you're using an older Python version, please upgrade before installing.
## Python Version Compatibility
paged-list supports Python 3.9 and later versions:
- ✅ Python 3.9+ (recommended)
- ✅ Python 3.10+
- ✅ Python 3.11+
- ✅ Python 3.12+
The package is tested across multiple Python versions and operating systems (Linux, Windows, macOS) to ensure compatibility.
### Testing Compatibility
To test compatibility on your system:
```bash
# Install with development dependencies
pip install paged-list[dev]
# Run compatibility tests
python -m pytest tests/test_python_compatibility.py -v
# Or use the standalone compatibility script
python scripts/test_compatibility.py
```
### Multi-Version Testing
For developers, you can test across multiple Python versions using tox:
```bash
# Install tox
pip install tox
# Test on available Python versions
tox
# Test on specific Python version
tox -e py39
# Run linting and formatting checks
tox -e flake8,black,mypy
```
**Note**: The `tox` command will automatically skip Python versions that aren't installed on your system.
Install from source:
```bash
git clone https://github.com/christensendaniel/paged-list.git
cd paged-list
pip install -e .
```
## Quick Start
```python
from paged_list import PagedList
# Create a disk-backed list
cl = PagedList(chunk_size=50000, disk_path="data")
# Add data - will automatically chunk to disk when needed
for i in range(100000):
cl.append({"id": i, "value": f"item_{i}", "score": i * 1.5})
# Access data like a regular list
print(cl[0]) # First item
print(cl[-1]) # Last item
print(cl[1000:1010]) # Slice of 10 items
# Update items
cl[5] = {"id": 5, "value": "updated", "score": 99.9}
# Apply transformations to all data (uses threading)
def double_score(record):
record["score"] *= 2
return record
cl.map(double_score)
# Serialize complex data types to JSON strings
cl.serialize()
# Clean up when done
cl.cleanup_chunks()
```
## Use Cases
- **Large Dataset Processing**: Handle datasets that don't fit in memory
- **Data Pipelines**: Process streaming data with automatic disk overflow
- **ETL Operations**: Transform large datasets chunk by chunk
- **Data Analysis**: Analyze large datasets without memory constraints
- **Caching**: Implement persistent, memory-efficient caches
## Advanced Usage
### Context Manager (Recommended)
```python
from paged_list import PagedList
with PagedList(chunk_size=10000) as cl:
# Add lots of data
for i in range(1000000):
cl.append({"data": f"item_{i}"})
# Process data
result = cl[500000:500010]
# Automatic cleanup on exit
```
### Custom Serialization
```python
# Serialize complex Python objects to JSON strings
cl.append(
{
"id": 1,
"metadata": {"tags": ["python", "data"], "active": True},
"scores": [1.2, 3.4, 5.6],
}
)
cl.serialize() # Converts lists, dicts, and bools to JSON strings
```
### Parallel Processing
```python
# Process data in parallel across chunks
def process_record(record):
record["processed"] = True
record["timestamp"] = "2024-01-01"
return record
cl.map(process_record, max_workers=4) # Use 4 threads
```
## Performance
PagedList is designed for scenarios where:
- Your dataset is too large for memory
- You need random access to data
- You want to process data in chunks
- Memory usage is more important than raw speed
Typical performance characteristics:
- **Memory usage**: O(chunk_size) instead of O(total_items)
- **Access time**: O(1) for sequential access, O(log chunks) for random access
- **Disk usage**: Temporary pickle files (cleaned up automatically)
## Development
This project uses [Hatch](https://hatch.pypa.io/) for development environment management and packaging.
### Development Quick Start
```bash
# Install Hatch
pip install hatch
# Run tests
hatch run test
# Run tests with coverage
hatch run test-cov
# Format and lint code
hatch run format
hatch run lint
# Run everything (format, lint, test with coverage)
hatch run all
```
### Hatch Environments
- **default**: Main development environment with all tools
- **test**: Testing across Python 3.9-3.13
- **docs**: Documentation building
For detailed development setup, see [CONTRIBUTING.md](CONTRIBUTING.md).
### Legacy Commands
```bash
# Also works (legacy pytest)
pytest
# Run examples
python -m paged_list demo # Small demonstration
python -m paged_list example # Full example with 1M items
```
## About the Author
paged-list was created by **Christensen Daniel**, a passionate data engineer who specializes in building tools that make working with large datasets more efficient and enjoyable.
### Connect
- **LinkedIn**: [dbchristensen](https://www.linkedin.com/in/dbchristensen/) - For data engineering insights and project updates
- **GitHub**: [christensendaniel](https://github.com/christensendaniel) - Explore more projects and contributions
- **Email**: [christensen.daniel+pagedlist@outlook.com](mailto:christensen.daniel+pagedlist@outlook.com)
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "paged-list",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "chunking, data-processing, disk-backed, large-data, memory-efficient, serialization",
"author": null,
"author_email": "Christensen Daniel <christensen.daniel+pagedlist@outlook.com>",
"download_url": "https://files.pythonhosted.org/packages/2a/ef/659e7e83c0badea6bbbb5fa6b4cb5ea55ba29531da2f87f4d00ce8bd4f26/paged_list-0.1.3.tar.gz",
"platform": null,
"description": "# Paged List\n\n[](https://badge.fury.io/py/paged-list)\n[](https://pypi.org/project/paged-list/)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/christensendaniel/paged-list/actions)\n[](https://github.com/psf/black)\n\nA Python package that provides a disk-backed list implementation for handling large datasets efficiently. When your data gets too large for memory, paged-list automatically chunks it into pickle files on disk, only loading relevant chunks when needed.\n\n## Links\n\n- **PyPI Package**: [https://pypi.org/project/paged-list/](https://pypi.org/project/paged-list/)\n- **Documentation**: [https://paged-list.readthedocs.io/](https://paged-list.readthedocs.io/)\n- **Source Code**: [https://github.com/christensendaniel/paged-list](https://github.com/christensendaniel/paged-list)\n- **Issues**: [https://github.com/christensendaniel/paged-list/issues](https://github.com/christensendaniel/paged-list/issues)\n- **Changelog**: [CHANGELOG.md](CHANGELOG.md)\n\n## Features\n\n- **Memory Efficient**: Only keeps a small portion of data in memory\n- **Automatic Chunking**: Transparently splits large datasets into manageable chunks\n- **List-like Interface**: Supports indexing, slicing, and iteration like regular Python lists\n- **Parallel Processing**: Built-in map and serialization functions with multi-threading support\n- **Type Safety**: Designed for dictionaries with comprehensive type hints\n- **Context Manager**: Automatic cleanup of temporary files\n\n## Requirements\n\n- Python 3.9 or higher\n- No external dependencies for core functionality\n\n## Installation\n\nInstall from PyPI:\n\n```bash\npip install paged-list\n```\n\n**PyPI Package**: [https://pypi.org/project/paged-list/](https://pypi.org/project/paged-list/)\n\n**Note:** Python 3.9+ is required. If you're using an older Python version, please upgrade before installing.\n\n## Python Version Compatibility\n\npaged-list supports Python 3.9 and later versions:\n\n- \u2705 Python 3.9+ (recommended)\n- \u2705 Python 3.10+\n- \u2705 Python 3.11+\n- \u2705 Python 3.12+\n\nThe package is tested across multiple Python versions and operating systems (Linux, Windows, macOS) to ensure compatibility.\n\n### Testing Compatibility\n\nTo test compatibility on your system:\n\n```bash\n# Install with development dependencies\npip install paged-list[dev]\n\n# Run compatibility tests\npython -m pytest tests/test_python_compatibility.py -v\n\n# Or use the standalone compatibility script\npython scripts/test_compatibility.py\n```\n\n### Multi-Version Testing\n\nFor developers, you can test across multiple Python versions using tox:\n\n```bash\n# Install tox\npip install tox\n\n# Test on available Python versions\ntox\n\n# Test on specific Python version\ntox -e py39\n\n# Run linting and formatting checks\ntox -e flake8,black,mypy\n```\n\n**Note**: The `tox` command will automatically skip Python versions that aren't installed on your system.\n\nInstall from source:\n\n```bash\ngit clone https://github.com/christensendaniel/paged-list.git\ncd paged-list\npip install -e .\n```\n\n## Quick Start\n\n```python\nfrom paged_list import PagedList\n\n# Create a disk-backed list\ncl = PagedList(chunk_size=50000, disk_path=\"data\")\n\n# Add data - will automatically chunk to disk when needed\nfor i in range(100000):\n cl.append({\"id\": i, \"value\": f\"item_{i}\", \"score\": i * 1.5})\n\n# Access data like a regular list\nprint(cl[0]) # First item\nprint(cl[-1]) # Last item\nprint(cl[1000:1010]) # Slice of 10 items\n\n# Update items\ncl[5] = {\"id\": 5, \"value\": \"updated\", \"score\": 99.9}\n\n\n# Apply transformations to all data (uses threading)\ndef double_score(record):\n record[\"score\"] *= 2\n return record\n\n\ncl.map(double_score)\n\n# Serialize complex data types to JSON strings\ncl.serialize()\n\n# Clean up when done\ncl.cleanup_chunks()\n```\n\n## Use Cases\n\n- **Large Dataset Processing**: Handle datasets that don't fit in memory\n- **Data Pipelines**: Process streaming data with automatic disk overflow\n- **ETL Operations**: Transform large datasets chunk by chunk\n- **Data Analysis**: Analyze large datasets without memory constraints\n- **Caching**: Implement persistent, memory-efficient caches\n\n## Advanced Usage\n\n### Context Manager (Recommended)\n\n```python\nfrom paged_list import PagedList\n\nwith PagedList(chunk_size=10000) as cl:\n # Add lots of data\n for i in range(1000000):\n cl.append({\"data\": f\"item_{i}\"})\n\n # Process data\n result = cl[500000:500010]\n\n # Automatic cleanup on exit\n```\n\n### Custom Serialization\n\n```python\n# Serialize complex Python objects to JSON strings\ncl.append(\n {\n \"id\": 1,\n \"metadata\": {\"tags\": [\"python\", \"data\"], \"active\": True},\n \"scores\": [1.2, 3.4, 5.6],\n }\n)\n\ncl.serialize() # Converts lists, dicts, and bools to JSON strings\n```\n\n### Parallel Processing\n\n```python\n# Process data in parallel across chunks\ndef process_record(record):\n record[\"processed\"] = True\n record[\"timestamp\"] = \"2024-01-01\"\n return record\n\n\ncl.map(process_record, max_workers=4) # Use 4 threads\n```\n\n## Performance\n\nPagedList is designed for scenarios where:\n\n- Your dataset is too large for memory\n- You need random access to data\n- You want to process data in chunks\n- Memory usage is more important than raw speed\n\nTypical performance characteristics:\n\n- **Memory usage**: O(chunk_size) instead of O(total_items)\n- **Access time**: O(1) for sequential access, O(log chunks) for random access\n- **Disk usage**: Temporary pickle files (cleaned up automatically)\n\n## Development\n\nThis project uses [Hatch](https://hatch.pypa.io/) for development environment management and packaging.\n\n### Development Quick Start\n\n```bash\n# Install Hatch\npip install hatch\n\n# Run tests\nhatch run test\n\n# Run tests with coverage\nhatch run test-cov\n\n# Format and lint code\nhatch run format\nhatch run lint\n\n# Run everything (format, lint, test with coverage)\nhatch run all\n```\n\n### Hatch Environments\n\n- **default**: Main development environment with all tools\n- **test**: Testing across Python 3.9-3.13\n- **docs**: Documentation building\n\nFor detailed development setup, see [CONTRIBUTING.md](CONTRIBUTING.md).\n\n### Legacy Commands\n\n```bash\n# Also works (legacy pytest)\npytest\n\n# Run examples\npython -m paged_list demo # Small demonstration\npython -m paged_list example # Full example with 1M items\n```\n\n## About the Author\n\npaged-list was created by **Christensen Daniel**, a passionate data engineer who specializes in building tools that make working with large datasets more efficient and enjoyable.\n\n### Connect\n\n- **LinkedIn**: [dbchristensen](https://www.linkedin.com/in/dbchristensen/) - For data engineering insights and project updates\n- **GitHub**: [christensendaniel](https://github.com/christensendaniel) - Explore more projects and contributions\n- **Email**: [christensen.daniel+pagedlist@outlook.com](mailto:christensen.daniel+pagedlist@outlook.com)\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A disk-backed list implementation for handling large datasets efficiently",
"version": "0.1.3",
"project_urls": {
"Bug Tracker": "https://github.com/christensendaniel/paged-list/issues",
"Documentation": "https://paged-list.readthedocs.io/",
"Homepage": "https://github.com/christensendaniel/paged-list",
"PyPI": "https://pypi.org/project/paged-list/",
"Repository": "https://github.com/christensendaniel/paged-list.git"
},
"split_keywords": [
"chunking",
" data-processing",
" disk-backed",
" large-data",
" memory-efficient",
" serialization"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "eb337d625ed733e517d0e2de2f9d7b8b21d00cfa64032b58c965a933babc9931",
"md5": "5e8060e6c2deb578f32ecbb4e375bb4d",
"sha256": "ac9aab4ee9780b9afa2ba33c81ac16a3f28e095db1d853e4fb614999401754ba"
},
"downloads": -1,
"filename": "paged_list-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5e8060e6c2deb578f32ecbb4e375bb4d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 12834,
"upload_time": "2025-09-01T20:11:45",
"upload_time_iso_8601": "2025-09-01T20:11:45.402186Z",
"url": "https://files.pythonhosted.org/packages/eb/33/7d625ed733e517d0e2de2f9d7b8b21d00cfa64032b58c965a933babc9931/paged_list-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2aef659e7e83c0badea6bbbb5fa6b4cb5ea55ba29531da2f87f4d00ce8bd4f26",
"md5": "7503bd95d9a57f28541d8e30e8a33261",
"sha256": "28218698127d9d1b061e40e28408f73227ae26242006b942417ee0e74c964c58"
},
"downloads": -1,
"filename": "paged_list-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "7503bd95d9a57f28541d8e30e8a33261",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 42751,
"upload_time": "2025-09-01T20:11:46",
"upload_time_iso_8601": "2025-09-01T20:11:46.902653Z",
"url": "https://files.pythonhosted.org/packages/2a/ef/659e7e83c0badea6bbbb5fa6b4cb5ea55ba29531da2f87f4d00ce8bd4f26/paged_list-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-01 20:11:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "christensendaniel",
"github_project": "paged-list",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "paged-list"
}