paged-list


Namepaged-list JSON
Version 0.1.3 PyPI version JSON
download
home_pageNone
SummaryA disk-backed list implementation for handling large datasets efficiently
upload_time2025-09-01 20:11:46
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords chunking data-processing disk-backed large-data memory-efficient serialization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Paged List

[![PyPI version](https://badge.fury.io/py/paged-list.svg)](https://badge.fury.io/py/paged-list)
[![Python versions](https://img.shields.io/pypi/pyversions/paged-list.svg)](https://pypi.org/project/paged-list/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://github.com/christensendaniel/paged-list/workflows/Tests/badge.svg)](https://github.com/christensendaniel/paged-list/actions)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

A Python package that provides a disk-backed list implementation for handling large datasets efficiently. When your data gets too large for memory, paged-list automatically chunks it into pickle files on disk, only loading relevant chunks when needed.

## Links

- **PyPI Package**: [https://pypi.org/project/paged-list/](https://pypi.org/project/paged-list/)
- **Documentation**: [https://paged-list.readthedocs.io/](https://paged-list.readthedocs.io/)
- **Source Code**: [https://github.com/christensendaniel/paged-list](https://github.com/christensendaniel/paged-list)
- **Issues**: [https://github.com/christensendaniel/paged-list/issues](https://github.com/christensendaniel/paged-list/issues)
- **Changelog**: [CHANGELOG.md](CHANGELOG.md)

## Features

- **Memory Efficient**: Only keeps a small portion of data in memory
- **Automatic Chunking**: Transparently splits large datasets into manageable chunks
- **List-like Interface**: Supports indexing, slicing, and iteration like regular Python lists
- **Parallel Processing**: Built-in map and serialization functions with multi-threading support
- **Type Safety**: Designed for dictionaries with comprehensive type hints
- **Context Manager**: Automatic cleanup of temporary files

## Requirements

- Python 3.9 or higher
- No external dependencies for core functionality

## Installation

Install from PyPI:

```bash
pip install paged-list
```

**PyPI Package**: [https://pypi.org/project/paged-list/](https://pypi.org/project/paged-list/)

**Note:** Python 3.9+ is required. If you're using an older Python version, please upgrade before installing.

## Python Version Compatibility

paged-list supports Python 3.9 and later versions:

- ✅ Python 3.9+ (recommended)
- ✅ Python 3.10+
- ✅ Python 3.11+
- ✅ Python 3.12+

The package is tested across multiple Python versions and operating systems (Linux, Windows, macOS) to ensure compatibility.

### Testing Compatibility

To test compatibility on your system:

```bash
# Install with development dependencies
pip install paged-list[dev]

# Run compatibility tests
python -m pytest tests/test_python_compatibility.py -v

# Or use the standalone compatibility script
python scripts/test_compatibility.py
```

### Multi-Version Testing

For developers, you can test across multiple Python versions using tox:

```bash
# Install tox
pip install tox

# Test on available Python versions
tox

# Test on specific Python version
tox -e py39

# Run linting and formatting checks
tox -e flake8,black,mypy
```

**Note**: The `tox` command will automatically skip Python versions that aren't installed on your system.

Install from source:

```bash
git clone https://github.com/christensendaniel/paged-list.git
cd paged-list
pip install -e .
```

## Quick Start

```python
from paged_list import PagedList

# Create a disk-backed list
cl = PagedList(chunk_size=50000, disk_path="data")

# Add data - will automatically chunk to disk when needed
for i in range(100000):
    cl.append({"id": i, "value": f"item_{i}", "score": i * 1.5})

# Access data like a regular list
print(cl[0])  # First item
print(cl[-1])  # Last item
print(cl[1000:1010])  # Slice of 10 items

# Update items
cl[5] = {"id": 5, "value": "updated", "score": 99.9}


# Apply transformations to all data (uses threading)
def double_score(record):
    record["score"] *= 2
    return record


cl.map(double_score)

# Serialize complex data types to JSON strings
cl.serialize()

# Clean up when done
cl.cleanup_chunks()
```

## Use Cases

- **Large Dataset Processing**: Handle datasets that don't fit in memory
- **Data Pipelines**: Process streaming data with automatic disk overflow
- **ETL Operations**: Transform large datasets chunk by chunk
- **Data Analysis**: Analyze large datasets without memory constraints
- **Caching**: Implement persistent, memory-efficient caches

## Advanced Usage

### Context Manager (Recommended)

```python
from paged_list import PagedList

with PagedList(chunk_size=10000) as cl:
    # Add lots of data
    for i in range(1000000):
        cl.append({"data": f"item_{i}"})

    # Process data
    result = cl[500000:500010]

    # Automatic cleanup on exit
```

### Custom Serialization

```python
# Serialize complex Python objects to JSON strings
cl.append(
    {
        "id": 1,
        "metadata": {"tags": ["python", "data"], "active": True},
        "scores": [1.2, 3.4, 5.6],
    }
)

cl.serialize()  # Converts lists, dicts, and bools to JSON strings
```

### Parallel Processing

```python
# Process data in parallel across chunks
def process_record(record):
    record["processed"] = True
    record["timestamp"] = "2024-01-01"
    return record


cl.map(process_record, max_workers=4)  # Use 4 threads
```

## Performance

PagedList is designed for scenarios where:

- Your dataset is too large for memory
- You need random access to data
- You want to process data in chunks
- Memory usage is more important than raw speed

Typical performance characteristics:

- **Memory usage**: O(chunk_size) instead of O(total_items)
- **Access time**: O(1) for sequential access, O(log chunks) for random access
- **Disk usage**: Temporary pickle files (cleaned up automatically)

## Development

This project uses [Hatch](https://hatch.pypa.io/) for development environment management and packaging.

### Development Quick Start

```bash
# Install Hatch
pip install hatch

# Run tests
hatch run test

# Run tests with coverage
hatch run test-cov

# Format and lint code
hatch run format
hatch run lint

# Run everything (format, lint, test with coverage)
hatch run all
```

### Hatch Environments

- **default**: Main development environment with all tools
- **test**: Testing across Python 3.9-3.13
- **docs**: Documentation building

For detailed development setup, see [CONTRIBUTING.md](CONTRIBUTING.md).

### Legacy Commands

```bash
# Also works (legacy pytest)
pytest

# Run examples
python -m paged_list demo      # Small demonstration
python -m paged_list example   # Full example with 1M items
```

## About the Author

paged-list was created by **Christensen Daniel**, a passionate data engineer who specializes in building tools that make working with large datasets more efficient and enjoyable.

### Connect

- **LinkedIn**: [dbchristensen](https://www.linkedin.com/in/dbchristensen/) - For data engineering insights and project updates
- **GitHub**: [christensendaniel](https://github.com/christensendaniel) - Explore more projects and contributions
- **Email**: [christensen.daniel+pagedlist@outlook.com](mailto:christensen.daniel+pagedlist@outlook.com)

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "paged-list",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "chunking, data-processing, disk-backed, large-data, memory-efficient, serialization",
    "author": null,
    "author_email": "Christensen Daniel <christensen.daniel+pagedlist@outlook.com>",
    "download_url": "https://files.pythonhosted.org/packages/2a/ef/659e7e83c0badea6bbbb5fa6b4cb5ea55ba29531da2f87f4d00ce8bd4f26/paged_list-0.1.3.tar.gz",
    "platform": null,
    "description": "# Paged List\n\n[![PyPI version](https://badge.fury.io/py/paged-list.svg)](https://badge.fury.io/py/paged-list)\n[![Python versions](https://img.shields.io/pypi/pyversions/paged-list.svg)](https://pypi.org/project/paged-list/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Tests](https://github.com/christensendaniel/paged-list/workflows/Tests/badge.svg)](https://github.com/christensendaniel/paged-list/actions)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nA Python package that provides a disk-backed list implementation for handling large datasets efficiently. When your data gets too large for memory, paged-list automatically chunks it into pickle files on disk, only loading relevant chunks when needed.\n\n## Links\n\n- **PyPI Package**: [https://pypi.org/project/paged-list/](https://pypi.org/project/paged-list/)\n- **Documentation**: [https://paged-list.readthedocs.io/](https://paged-list.readthedocs.io/)\n- **Source Code**: [https://github.com/christensendaniel/paged-list](https://github.com/christensendaniel/paged-list)\n- **Issues**: [https://github.com/christensendaniel/paged-list/issues](https://github.com/christensendaniel/paged-list/issues)\n- **Changelog**: [CHANGELOG.md](CHANGELOG.md)\n\n## Features\n\n- **Memory Efficient**: Only keeps a small portion of data in memory\n- **Automatic Chunking**: Transparently splits large datasets into manageable chunks\n- **List-like Interface**: Supports indexing, slicing, and iteration like regular Python lists\n- **Parallel Processing**: Built-in map and serialization functions with multi-threading support\n- **Type Safety**: Designed for dictionaries with comprehensive type hints\n- **Context Manager**: Automatic cleanup of temporary files\n\n## Requirements\n\n- Python 3.9 or higher\n- No external dependencies for core functionality\n\n## Installation\n\nInstall from PyPI:\n\n```bash\npip install paged-list\n```\n\n**PyPI Package**: [https://pypi.org/project/paged-list/](https://pypi.org/project/paged-list/)\n\n**Note:** Python 3.9+ is required. If you're using an older Python version, please upgrade before installing.\n\n## Python Version Compatibility\n\npaged-list supports Python 3.9 and later versions:\n\n- \u2705 Python 3.9+ (recommended)\n- \u2705 Python 3.10+\n- \u2705 Python 3.11+\n- \u2705 Python 3.12+\n\nThe package is tested across multiple Python versions and operating systems (Linux, Windows, macOS) to ensure compatibility.\n\n### Testing Compatibility\n\nTo test compatibility on your system:\n\n```bash\n# Install with development dependencies\npip install paged-list[dev]\n\n# Run compatibility tests\npython -m pytest tests/test_python_compatibility.py -v\n\n# Or use the standalone compatibility script\npython scripts/test_compatibility.py\n```\n\n### Multi-Version Testing\n\nFor developers, you can test across multiple Python versions using tox:\n\n```bash\n# Install tox\npip install tox\n\n# Test on available Python versions\ntox\n\n# Test on specific Python version\ntox -e py39\n\n# Run linting and formatting checks\ntox -e flake8,black,mypy\n```\n\n**Note**: The `tox` command will automatically skip Python versions that aren't installed on your system.\n\nInstall from source:\n\n```bash\ngit clone https://github.com/christensendaniel/paged-list.git\ncd paged-list\npip install -e .\n```\n\n## Quick Start\n\n```python\nfrom paged_list import PagedList\n\n# Create a disk-backed list\ncl = PagedList(chunk_size=50000, disk_path=\"data\")\n\n# Add data - will automatically chunk to disk when needed\nfor i in range(100000):\n    cl.append({\"id\": i, \"value\": f\"item_{i}\", \"score\": i * 1.5})\n\n# Access data like a regular list\nprint(cl[0])  # First item\nprint(cl[-1])  # Last item\nprint(cl[1000:1010])  # Slice of 10 items\n\n# Update items\ncl[5] = {\"id\": 5, \"value\": \"updated\", \"score\": 99.9}\n\n\n# Apply transformations to all data (uses threading)\ndef double_score(record):\n    record[\"score\"] *= 2\n    return record\n\n\ncl.map(double_score)\n\n# Serialize complex data types to JSON strings\ncl.serialize()\n\n# Clean up when done\ncl.cleanup_chunks()\n```\n\n## Use Cases\n\n- **Large Dataset Processing**: Handle datasets that don't fit in memory\n- **Data Pipelines**: Process streaming data with automatic disk overflow\n- **ETL Operations**: Transform large datasets chunk by chunk\n- **Data Analysis**: Analyze large datasets without memory constraints\n- **Caching**: Implement persistent, memory-efficient caches\n\n## Advanced Usage\n\n### Context Manager (Recommended)\n\n```python\nfrom paged_list import PagedList\n\nwith PagedList(chunk_size=10000) as cl:\n    # Add lots of data\n    for i in range(1000000):\n        cl.append({\"data\": f\"item_{i}\"})\n\n    # Process data\n    result = cl[500000:500010]\n\n    # Automatic cleanup on exit\n```\n\n### Custom Serialization\n\n```python\n# Serialize complex Python objects to JSON strings\ncl.append(\n    {\n        \"id\": 1,\n        \"metadata\": {\"tags\": [\"python\", \"data\"], \"active\": True},\n        \"scores\": [1.2, 3.4, 5.6],\n    }\n)\n\ncl.serialize()  # Converts lists, dicts, and bools to JSON strings\n```\n\n### Parallel Processing\n\n```python\n# Process data in parallel across chunks\ndef process_record(record):\n    record[\"processed\"] = True\n    record[\"timestamp\"] = \"2024-01-01\"\n    return record\n\n\ncl.map(process_record, max_workers=4)  # Use 4 threads\n```\n\n## Performance\n\nPagedList is designed for scenarios where:\n\n- Your dataset is too large for memory\n- You need random access to data\n- You want to process data in chunks\n- Memory usage is more important than raw speed\n\nTypical performance characteristics:\n\n- **Memory usage**: O(chunk_size) instead of O(total_items)\n- **Access time**: O(1) for sequential access, O(log chunks) for random access\n- **Disk usage**: Temporary pickle files (cleaned up automatically)\n\n## Development\n\nThis project uses [Hatch](https://hatch.pypa.io/) for development environment management and packaging.\n\n### Development Quick Start\n\n```bash\n# Install Hatch\npip install hatch\n\n# Run tests\nhatch run test\n\n# Run tests with coverage\nhatch run test-cov\n\n# Format and lint code\nhatch run format\nhatch run lint\n\n# Run everything (format, lint, test with coverage)\nhatch run all\n```\n\n### Hatch Environments\n\n- **default**: Main development environment with all tools\n- **test**: Testing across Python 3.9-3.13\n- **docs**: Documentation building\n\nFor detailed development setup, see [CONTRIBUTING.md](CONTRIBUTING.md).\n\n### Legacy Commands\n\n```bash\n# Also works (legacy pytest)\npytest\n\n# Run examples\npython -m paged_list demo      # Small demonstration\npython -m paged_list example   # Full example with 1M items\n```\n\n## About the Author\n\npaged-list was created by **Christensen Daniel**, a passionate data engineer who specializes in building tools that make working with large datasets more efficient and enjoyable.\n\n### Connect\n\n- **LinkedIn**: [dbchristensen](https://www.linkedin.com/in/dbchristensen/) - For data engineering insights and project updates\n- **GitHub**: [christensendaniel](https://github.com/christensendaniel) - Explore more projects and contributions\n- **Email**: [christensen.daniel+pagedlist@outlook.com](mailto:christensen.daniel+pagedlist@outlook.com)\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A disk-backed list implementation for handling large datasets efficiently",
    "version": "0.1.3",
    "project_urls": {
        "Bug Tracker": "https://github.com/christensendaniel/paged-list/issues",
        "Documentation": "https://paged-list.readthedocs.io/",
        "Homepage": "https://github.com/christensendaniel/paged-list",
        "PyPI": "https://pypi.org/project/paged-list/",
        "Repository": "https://github.com/christensendaniel/paged-list.git"
    },
    "split_keywords": [
        "chunking",
        " data-processing",
        " disk-backed",
        " large-data",
        " memory-efficient",
        " serialization"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "eb337d625ed733e517d0e2de2f9d7b8b21d00cfa64032b58c965a933babc9931",
                "md5": "5e8060e6c2deb578f32ecbb4e375bb4d",
                "sha256": "ac9aab4ee9780b9afa2ba33c81ac16a3f28e095db1d853e4fb614999401754ba"
            },
            "downloads": -1,
            "filename": "paged_list-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5e8060e6c2deb578f32ecbb4e375bb4d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 12834,
            "upload_time": "2025-09-01T20:11:45",
            "upload_time_iso_8601": "2025-09-01T20:11:45.402186Z",
            "url": "https://files.pythonhosted.org/packages/eb/33/7d625ed733e517d0e2de2f9d7b8b21d00cfa64032b58c965a933babc9931/paged_list-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2aef659e7e83c0badea6bbbb5fa6b4cb5ea55ba29531da2f87f4d00ce8bd4f26",
                "md5": "7503bd95d9a57f28541d8e30e8a33261",
                "sha256": "28218698127d9d1b061e40e28408f73227ae26242006b942417ee0e74c964c58"
            },
            "downloads": -1,
            "filename": "paged_list-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "7503bd95d9a57f28541d8e30e8a33261",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 42751,
            "upload_time": "2025-09-01T20:11:46",
            "upload_time_iso_8601": "2025-09-01T20:11:46.902653Z",
            "url": "https://files.pythonhosted.org/packages/2a/ef/659e7e83c0badea6bbbb5fa6b4cb5ea55ba29531da2f87f4d00ce8bd4f26/paged_list-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-01 20:11:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "christensendaniel",
    "github_project": "paged-list",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "paged-list"
}
        
Elapsed time: 1.04004s