disjoint-forest


Namedisjoint-forest JSON
Version 1.1 PyPI version JSON
download
home_pagehttps://github.com/yourusername/disjoint-forest
SummaryA high-performance C++ implementation of disjoint-set data structure with Python bindings
upload_time2025-08-11 05:06:55
maintainerNone
docs_urlNone
authorDisjointForest Contributors
requires_python>=3.8
licenseNone
keywords disjoint-set union-find data-structure algorithm graph optimization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DisjointForest

A C++ implementation of a disjoint-set data structure (also known as union-find) using the disjoint forest representation with path compression and union by rank optimizations, with Python bindings included as well. This is intended to fill a void in too many existing standard libraries, which don't include a proper implementation of this despite its utility in many applications.

## Features

- **Template-based**: Works with any data type
- **Path Compression**: Automatically compresses paths during find operations
- **Union by Rank**: Optimizes union operations by maintaining tree depth information
- **Dynamic Operations**: Expand capacity, contract nodes, and clear the forest
- **Memory Efficient**: STL vectors with automatic memory management
- **Smart Pointers**: Uses `std::unique_ptr` for automatic memory cleanup
- **Exception Safety**: Proper error handling with `std::invalid_argument` and `std::runtime_error`
- **Comprehensive Testing**: Full unit test suite using Google Test
- **Python Bindings**: Full Python interface with pybind11

## Class Structure

### Node<T>
- `data`: The actual data stored in the node
- `parent`: Pointer to the parent node (self-referential for root nodes)
- `rank`: Depth of the subtree rooted at this node

### DisjointForest<T>
- `makeSet(T data)`: Creates a new set containing the given data
- `find(Node<T>* node)`: Finds the representative of the set containing the node
- `unionSets(Node<T>* node1, Node<T>* node2)`: Merges two sets
- Constructor takes the maximum number of elements as parameter

#### Dynamic Operations
- `expand(int additional_capacity)`: Increases the forest's capacity by the specified amount
- `contract(Node<T>* node)`: Removes a node from the forest, updating parent references
- `clear()`: Removes all nodes and resets the forest to empty state
- `size()`: Returns the current number of nodes in the forest
- `capacity()`: Returns the total capacity allocated for the forest
- `isEmpty()`: Returns true if the forest contains no nodes
- `getAllNodes()`: Returns a vector of all current nodes in the forest

## Building the Project

### Prerequisites
- CMake 3.10 or higher
- C++17 compatible compiler
- Google Test library
- Python 3.6+ (for Python bindings)
- pybind11 (for Python bindings)

### Install Google Test

#### Ubuntu/Debian
```bash
sudo apt-get install libgtest-dev
```

#### macOS
```bash
brew install googletest
```

#### CentOS/RHEL/Fedora
```bash
sudo yum install gtest-devel
# or for newer versions
sudo dnf install gtest-devel
```

#### Arch Linux
```bash
sudo pacman -S gtest
```

### Build Steps

#### Using CMake
```bash
# Create build directory
mkdir build
cd build

# Configure and build
cmake ..
make

# Run tests
make test
# or directly
./disjoint_forest_test

# Run example
./disjoint_forest_example
```

#### Using Makefile (Alternative)
```bash
# Build everything including Python bindings
make all

# Build only C++ components
make cpp

# Build Python bindings
make python-bindings

# Run all tests
make test

# Run Python tests
make python-test

# Run examples
make example
make python-example

# Clean build artifacts
make clean
```

## Running Tests

The test suite covers:

1. **Basic Operations**: Constructor, destructor, makeSet
2. **Find Operations**: Single node find, path compression
3. **Union Operations**: Basic union, union by rank, same set union
4. **Dynamic Operations**: Expand capacity, contract nodes, clear forest, size/capacity queries
5. **Complex Scenarios**: Multiple unions, large datasets, dynamic capacity changes
6. **Edge Cases**: Empty forests, single element forests, contraction edge cases
7. **Memory Management**: Large operations, proper cleanup, dynamic memory allocation
8. **Template Support**: Tests with different data types (int, string)

## Example Usage

### Basic Operations
```cpp
#include "disjoint_forest.h"

// Create a forest with capacity for 100 elements
DisjointForest<int> forest(100);

// Create some sets
Node<int>* node1 = forest.makeSet(1);
Node<int>* node2 = forest.makeSet(2);
Node<int>* node3 = forest.makeSet(3);

// Union sets
forest.unionSets(node1, node2);
forest.unionSets(node1, node3);

// Find representatives
Node<int>* rep1 = forest.find(node1);
Node<int>* rep2 = forest.find(node2);
Node<int>* rep3 = forest.find(node3);

// All should have the same representative
assert(rep1 == rep2 && rep2 == rep3);
```

### Dynamic Operations
```cpp
// Expand capacity when needed
forest.expand(50);  // Add capacity for 50 more elements

// Check current state
std::cout << "Size: " << forest.size() << std::endl;
std::cout << "Capacity: " << forest.capacity() << std::endl;
std::cout << "Is empty: " << (forest.isEmpty() ? "Yes" : "No") << std::endl;

// Get all nodes for iteration
auto all_nodes = forest.getAllNodes();
for (auto* node : all_nodes) {
    std::cout << "Node data: " << node->data << std::endl;
}

// Contract (remove) a node
forest.contract(node2);

// Clear the entire forest
forest.clear();
```

## Algorithm Complexity

- **makeSet**: O(1)
- **find**: O(α(n)) amortized, where α is the inverse Ackermann function
- **unionSets**: O(α(n)) amortized
- **expand**: O(1) amortized (vector doubling strategy)
- **contract**: O(1) (marking nodes as invalid)
- **clear**: O(n) where n is the number of nodes
- **size/capacity/isEmpty**: O(1)
- **getAllNodes**: O(n) where n is the number of nodes
- **Space**: O(n) where n is the number of elements

## Use Cases

Disjoint-set data structures are fundamental to many algorithms and applications:

- **Graph Algorithms**: Connected components, minimum spanning trees (Kruskal's algorithm)
- **Image Processing**: Connected component labeling, flood fill algorithms
- **Network Analysis**: Community detection, clustering algorithms
- **Game Development**: Pathfinding, collision detection systems
- **Database Systems**: Transaction management, dependency resolution
- **Compiler Design**: Register allocation, live variable analysis

## Python Bindings

The project includes Python bindings that provide a unified interface for working with any Python object type. The bindings are built using pybind11 and offer the same functionality as the C++ implementation.

### Features
- **Type Flexibility**: Works with integers, strings, lists, dictionaries, custom objects, etc.
- **Unified API**: Single `DisjointForest` class that handles any Python object type
- **Safe Operations**: Built-in safety checks and error handling
- **Memory Management**: Automatic cleanup of Python references

### Installation
```bash
# Install Python dependencies
cd python
pip3 install -r requirements.txt

# Build the Python module
make build

# Test the bindings
make test

# Run the example
make example
```

### Python Usage
```python
import disjoint_forest

# Create a forest
forest = disjoint_forest.DisjointForest(10)

# Create sets with different types
node1 = forest.make_set(42)
node2 = forest.make_set("hello")
node3 = forest.make_set([1, 2, 3])

# Union operations
forest.union_sets(node1, node2)

# Find representatives
rep = forest.find(node1)
print(f"Representative: {rep.data}")

# Dynamic operations
forest.expand(20)
print(f"Capacity: {forest.capacity()}")
print(f"Size: {forest.size()}")

# Get all nodes
all_nodes = forest.get_all_nodes()
for node in all_nodes:
    print(f"Node: {node.data}")
```

## Implementation Details

### STL Vector Usage
The implementation uses `std::vector` instead of fixed-size arrays, providing several advantages:
- **Dynamic Capacity**: The forest can grow beyond its initial capacity using the `expand()` method
- **Automatic Memory Management**: Vectors handle memory allocation and deallocation automatically
- **Efficient Resizing**: STL vectors use exponential growth strategies for optimal performance
- **Exception Safety**: Vector operations provide strong exception guarantees

### Memory Management
- **Smart Pointers**: `std::unique_ptr` ensures automatic cleanup of Node objects
- **RAII Principles**: Resource acquisition is tied to object lifetime
- **Exception Safety**: Memory is properly cleaned up even when exceptions occur
- **Python Integration**: Python bindings include reference counting and safe memory management

## Notes

- The implementation automatically handles path compression during find operations
- Union by rank ensures balanced trees for optimal performance
- Memory is automatically managed with proper cleanup in the destructor
- Python bindings provide safe access to all C++ functionality
- Dynamic operations maintain the forest's integrity while allowing flexible capacity management

## Troubleshooting

### Common Build Issues

#### Google Test Not Found
If you encounter linking errors with Google Test:
```bash
# The Makefile will automatically install Google Test if needed
make install-deps

# Or manually install for your system (see Prerequisites section)
```

#### Python Import Errors
If Python can't import the module:
```bash
# Ensure the module is built
cd python
make build

# Check if the shared library exists
ls -la *.so

# Verify Python path
python3 -c "import sys; print(sys.path)"
```

#### Memory Issues
The implementation uses smart pointers and RAII principles. If you encounter memory issues:
- Ensure you're not manually deleting Node pointers
- Use the provided methods (`clear()`, `contract()`) instead of manual memory management
- Check that Python bindings are properly handling object lifetimes

## Contributing

We welcome contributions to improve the implementation, test coverage, or documentation. Please:

1. **Fork the repository** and create a feature branch
2. **Follow the existing code style** and naming conventions
3. **Add tests** for new functionality
4. **Update documentation** to reflect any changes
5. **Submit a pull request** with a clear description of your changes

### Development Setup
```bash
# Clone and setup
git clone <your-fork-url>
cd disjoint_set

# Install development dependencies
make install-deps

# Run all tests to ensure everything works
make test
make python-test

# Build everything
make all
``` 

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yourusername/disjoint-forest",
    "name": "disjoint-forest",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "disjoint-set, union-find, data-structure, algorithm, graph, optimization",
    "author": "DisjointForest Contributors",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/a8/82/02995f6d78bd7ae80b711d5761b1824046341c6516282f3208b85aa70992/disjoint_forest-1.1.tar.gz",
    "platform": null,
    "description": "# DisjointForest\n\nA C++ implementation of a disjoint-set data structure (also known as union-find) using the disjoint forest representation with path compression and union by rank optimizations, with Python bindings included as well. This is intended to fill a void in too many existing standard libraries, which don't include a proper implementation of this despite its utility in many applications.\n\n## Features\n\n- **Template-based**: Works with any data type\n- **Path Compression**: Automatically compresses paths during find operations\n- **Union by Rank**: Optimizes union operations by maintaining tree depth information\n- **Dynamic Operations**: Expand capacity, contract nodes, and clear the forest\n- **Memory Efficient**: STL vectors with automatic memory management\n- **Smart Pointers**: Uses `std::unique_ptr` for automatic memory cleanup\n- **Exception Safety**: Proper error handling with `std::invalid_argument` and `std::runtime_error`\n- **Comprehensive Testing**: Full unit test suite using Google Test\n- **Python Bindings**: Full Python interface with pybind11\n\n## Class Structure\n\n### Node<T>\n- `data`: The actual data stored in the node\n- `parent`: Pointer to the parent node (self-referential for root nodes)\n- `rank`: Depth of the subtree rooted at this node\n\n### DisjointForest<T>\n- `makeSet(T data)`: Creates a new set containing the given data\n- `find(Node<T>* node)`: Finds the representative of the set containing the node\n- `unionSets(Node<T>* node1, Node<T>* node2)`: Merges two sets\n- Constructor takes the maximum number of elements as parameter\n\n#### Dynamic Operations\n- `expand(int additional_capacity)`: Increases the forest's capacity by the specified amount\n- `contract(Node<T>* node)`: Removes a node from the forest, updating parent references\n- `clear()`: Removes all nodes and resets the forest to empty state\n- `size()`: Returns the current number of nodes in the forest\n- `capacity()`: Returns the total capacity allocated for the forest\n- `isEmpty()`: Returns true if the forest contains no nodes\n- `getAllNodes()`: Returns a vector of all current nodes in the forest\n\n## Building the Project\n\n### Prerequisites\n- CMake 3.10 or higher\n- C++17 compatible compiler\n- Google Test library\n- Python 3.6+ (for Python bindings)\n- pybind11 (for Python bindings)\n\n### Install Google Test\n\n#### Ubuntu/Debian\n```bash\nsudo apt-get install libgtest-dev\n```\n\n#### macOS\n```bash\nbrew install googletest\n```\n\n#### CentOS/RHEL/Fedora\n```bash\nsudo yum install gtest-devel\n# or for newer versions\nsudo dnf install gtest-devel\n```\n\n#### Arch Linux\n```bash\nsudo pacman -S gtest\n```\n\n### Build Steps\n\n#### Using CMake\n```bash\n# Create build directory\nmkdir build\ncd build\n\n# Configure and build\ncmake ..\nmake\n\n# Run tests\nmake test\n# or directly\n./disjoint_forest_test\n\n# Run example\n./disjoint_forest_example\n```\n\n#### Using Makefile (Alternative)\n```bash\n# Build everything including Python bindings\nmake all\n\n# Build only C++ components\nmake cpp\n\n# Build Python bindings\nmake python-bindings\n\n# Run all tests\nmake test\n\n# Run Python tests\nmake python-test\n\n# Run examples\nmake example\nmake python-example\n\n# Clean build artifacts\nmake clean\n```\n\n## Running Tests\n\nThe test suite covers:\n\n1. **Basic Operations**: Constructor, destructor, makeSet\n2. **Find Operations**: Single node find, path compression\n3. **Union Operations**: Basic union, union by rank, same set union\n4. **Dynamic Operations**: Expand capacity, contract nodes, clear forest, size/capacity queries\n5. **Complex Scenarios**: Multiple unions, large datasets, dynamic capacity changes\n6. **Edge Cases**: Empty forests, single element forests, contraction edge cases\n7. **Memory Management**: Large operations, proper cleanup, dynamic memory allocation\n8. **Template Support**: Tests with different data types (int, string)\n\n## Example Usage\n\n### Basic Operations\n```cpp\n#include \"disjoint_forest.h\"\n\n// Create a forest with capacity for 100 elements\nDisjointForest<int> forest(100);\n\n// Create some sets\nNode<int>* node1 = forest.makeSet(1);\nNode<int>* node2 = forest.makeSet(2);\nNode<int>* node3 = forest.makeSet(3);\n\n// Union sets\nforest.unionSets(node1, node2);\nforest.unionSets(node1, node3);\n\n// Find representatives\nNode<int>* rep1 = forest.find(node1);\nNode<int>* rep2 = forest.find(node2);\nNode<int>* rep3 = forest.find(node3);\n\n// All should have the same representative\nassert(rep1 == rep2 && rep2 == rep3);\n```\n\n### Dynamic Operations\n```cpp\n// Expand capacity when needed\nforest.expand(50);  // Add capacity for 50 more elements\n\n// Check current state\nstd::cout << \"Size: \" << forest.size() << std::endl;\nstd::cout << \"Capacity: \" << forest.capacity() << std::endl;\nstd::cout << \"Is empty: \" << (forest.isEmpty() ? \"Yes\" : \"No\") << std::endl;\n\n// Get all nodes for iteration\nauto all_nodes = forest.getAllNodes();\nfor (auto* node : all_nodes) {\n    std::cout << \"Node data: \" << node->data << std::endl;\n}\n\n// Contract (remove) a node\nforest.contract(node2);\n\n// Clear the entire forest\nforest.clear();\n```\n\n## Algorithm Complexity\n\n- **makeSet**: O(1)\n- **find**: O(\u03b1(n)) amortized, where \u03b1 is the inverse Ackermann function\n- **unionSets**: O(\u03b1(n)) amortized\n- **expand**: O(1) amortized (vector doubling strategy)\n- **contract**: O(1) (marking nodes as invalid)\n- **clear**: O(n) where n is the number of nodes\n- **size/capacity/isEmpty**: O(1)\n- **getAllNodes**: O(n) where n is the number of nodes\n- **Space**: O(n) where n is the number of elements\n\n## Use Cases\n\nDisjoint-set data structures are fundamental to many algorithms and applications:\n\n- **Graph Algorithms**: Connected components, minimum spanning trees (Kruskal's algorithm)\n- **Image Processing**: Connected component labeling, flood fill algorithms\n- **Network Analysis**: Community detection, clustering algorithms\n- **Game Development**: Pathfinding, collision detection systems\n- **Database Systems**: Transaction management, dependency resolution\n- **Compiler Design**: Register allocation, live variable analysis\n\n## Python Bindings\n\nThe project includes Python bindings that provide a unified interface for working with any Python object type. The bindings are built using pybind11 and offer the same functionality as the C++ implementation.\n\n### Features\n- **Type Flexibility**: Works with integers, strings, lists, dictionaries, custom objects, etc.\n- **Unified API**: Single `DisjointForest` class that handles any Python object type\n- **Safe Operations**: Built-in safety checks and error handling\n- **Memory Management**: Automatic cleanup of Python references\n\n### Installation\n```bash\n# Install Python dependencies\ncd python\npip3 install -r requirements.txt\n\n# Build the Python module\nmake build\n\n# Test the bindings\nmake test\n\n# Run the example\nmake example\n```\n\n### Python Usage\n```python\nimport disjoint_forest\n\n# Create a forest\nforest = disjoint_forest.DisjointForest(10)\n\n# Create sets with different types\nnode1 = forest.make_set(42)\nnode2 = forest.make_set(\"hello\")\nnode3 = forest.make_set([1, 2, 3])\n\n# Union operations\nforest.union_sets(node1, node2)\n\n# Find representatives\nrep = forest.find(node1)\nprint(f\"Representative: {rep.data}\")\n\n# Dynamic operations\nforest.expand(20)\nprint(f\"Capacity: {forest.capacity()}\")\nprint(f\"Size: {forest.size()}\")\n\n# Get all nodes\nall_nodes = forest.get_all_nodes()\nfor node in all_nodes:\n    print(f\"Node: {node.data}\")\n```\n\n## Implementation Details\n\n### STL Vector Usage\nThe implementation uses `std::vector` instead of fixed-size arrays, providing several advantages:\n- **Dynamic Capacity**: The forest can grow beyond its initial capacity using the `expand()` method\n- **Automatic Memory Management**: Vectors handle memory allocation and deallocation automatically\n- **Efficient Resizing**: STL vectors use exponential growth strategies for optimal performance\n- **Exception Safety**: Vector operations provide strong exception guarantees\n\n### Memory Management\n- **Smart Pointers**: `std::unique_ptr` ensures automatic cleanup of Node objects\n- **RAII Principles**: Resource acquisition is tied to object lifetime\n- **Exception Safety**: Memory is properly cleaned up even when exceptions occur\n- **Python Integration**: Python bindings include reference counting and safe memory management\n\n## Notes\n\n- The implementation automatically handles path compression during find operations\n- Union by rank ensures balanced trees for optimal performance\n- Memory is automatically managed with proper cleanup in the destructor\n- Python bindings provide safe access to all C++ functionality\n- Dynamic operations maintain the forest's integrity while allowing flexible capacity management\n\n## Troubleshooting\n\n### Common Build Issues\n\n#### Google Test Not Found\nIf you encounter linking errors with Google Test:\n```bash\n# The Makefile will automatically install Google Test if needed\nmake install-deps\n\n# Or manually install for your system (see Prerequisites section)\n```\n\n#### Python Import Errors\nIf Python can't import the module:\n```bash\n# Ensure the module is built\ncd python\nmake build\n\n# Check if the shared library exists\nls -la *.so\n\n# Verify Python path\npython3 -c \"import sys; print(sys.path)\"\n```\n\n#### Memory Issues\nThe implementation uses smart pointers and RAII principles. If you encounter memory issues:\n- Ensure you're not manually deleting Node pointers\n- Use the provided methods (`clear()`, `contract()`) instead of manual memory management\n- Check that Python bindings are properly handling object lifetimes\n\n## Contributing\n\nWe welcome contributions to improve the implementation, test coverage, or documentation. Please:\n\n1. **Fork the repository** and create a feature branch\n2. **Follow the existing code style** and naming conventions\n3. **Add tests** for new functionality\n4. **Update documentation** to reflect any changes\n5. **Submit a pull request** with a clear description of your changes\n\n### Development Setup\n```bash\n# Clone and setup\ngit clone <your-fork-url>\ncd disjoint_set\n\n# Install development dependencies\nmake install-deps\n\n# Run all tests to ensure everything works\nmake test\nmake python-test\n\n# Build everything\nmake all\n``` \n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A high-performance C++ implementation of disjoint-set data structure with Python bindings",
    "version": "1.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/yourusername/disjoint-forest/issues",
        "Homepage": "https://github.com/yourusername/disjoint-forest",
        "Source Code": "https://github.com/yourusername/disjoint-forest"
    },
    "split_keywords": [
        "disjoint-set",
        " union-find",
        " data-structure",
        " algorithm",
        " graph",
        " optimization"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bb65bfd861bf6567c05204ae1a48437851608d7fd8be5ccddf5ebf0c045ca800",
                "md5": "9b06dd700d762404c2a7e5807f7717f2",
                "sha256": "b1c7f01eb6695efd0a66e8707dee3d82c9e1446880090100726985ea040f53b3"
            },
            "downloads": -1,
            "filename": "disjoint_forest-1.1-cp310-cp310-manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "9b06dd700d762404c2a7e5807f7717f2",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.8",
            "size": 1322924,
            "upload_time": "2025-08-11T05:06:53",
            "upload_time_iso_8601": "2025-08-11T05:06:53.734018Z",
            "url": "https://files.pythonhosted.org/packages/bb/65/bfd861bf6567c05204ae1a48437851608d7fd8be5ccddf5ebf0c045ca800/disjoint_forest-1.1-cp310-cp310-manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a88202995f6d78bd7ae80b711d5761b1824046341c6516282f3208b85aa70992",
                "md5": "81160032c4a10451ed7a438d24b92707",
                "sha256": "5e60077c74484c23c8ed48d0a0e18824c5f73c80f2fc5d9e1ddd2e6483389486"
            },
            "downloads": -1,
            "filename": "disjoint_forest-1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "81160032c4a10451ed7a438d24b92707",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 14508,
            "upload_time": "2025-08-11T05:06:55",
            "upload_time_iso_8601": "2025-08-11T05:06:55.068408Z",
            "url": "https://files.pythonhosted.org/packages/a8/82/02995f6d78bd7ae80b711d5761b1824046341c6516282f3208b85aa70992/disjoint_forest-1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-11 05:06:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yourusername",
    "github_project": "disjoint-forest",
    "github_not_found": true,
    "lcname": "disjoint-forest"
}
        
Elapsed time: 1.20926s