# DisjointForest
A C++ implementation of a disjoint-set data structure (also known as union-find) using the disjoint forest representation with path compression and union by rank optimizations, with Python bindings included as well. This is intended to fill a void in too many existing standard libraries, which don't include a proper implementation of this despite its utility in many applications.
## Features
- **Template-based**: Works with any data type
- **Path Compression**: Automatically compresses paths during find operations
- **Union by Rank**: Optimizes union operations by maintaining tree depth information
- **Dynamic Operations**: Expand capacity, contract nodes, and clear the forest
- **Memory Efficient**: STL vectors with automatic memory management
- **Smart Pointers**: Uses `std::unique_ptr` for automatic memory cleanup
- **Exception Safety**: Proper error handling with `std::invalid_argument` and `std::runtime_error`
- **Comprehensive Testing**: Full unit test suite using Google Test
- **Python Bindings**: Full Python interface with pybind11
## Class Structure
### Node<T>
- `data`: The actual data stored in the node
- `parent`: Pointer to the parent node (self-referential for root nodes)
- `rank`: Depth of the subtree rooted at this node
### DisjointForest<T>
- `makeSet(T data)`: Creates a new set containing the given data
- `find(Node<T>* node)`: Finds the representative of the set containing the node
- `unionSets(Node<T>* node1, Node<T>* node2)`: Merges two sets
- Constructor takes the maximum number of elements as parameter
#### Dynamic Operations
- `expand(int additional_capacity)`: Increases the forest's capacity by the specified amount
- `contract(Node<T>* node)`: Removes a node from the forest, updating parent references
- `clear()`: Removes all nodes and resets the forest to empty state
- `size()`: Returns the current number of nodes in the forest
- `capacity()`: Returns the total capacity allocated for the forest
- `isEmpty()`: Returns true if the forest contains no nodes
- `getAllNodes()`: Returns a vector of all current nodes in the forest
## Building the Project
### Prerequisites
- CMake 3.10 or higher
- C++17 compatible compiler
- Google Test library
- Python 3.6+ (for Python bindings)
- pybind11 (for Python bindings)
### Install Google Test
#### Ubuntu/Debian
```bash
sudo apt-get install libgtest-dev
```
#### macOS
```bash
brew install googletest
```
#### CentOS/RHEL/Fedora
```bash
sudo yum install gtest-devel
# or for newer versions
sudo dnf install gtest-devel
```
#### Arch Linux
```bash
sudo pacman -S gtest
```
### Build Steps
#### Using CMake
```bash
# Create build directory
mkdir build
cd build
# Configure and build
cmake ..
make
# Run tests
make test
# or directly
./disjoint_forest_test
# Run example
./disjoint_forest_example
```
#### Using Makefile (Alternative)
```bash
# Build everything including Python bindings
make all
# Build only C++ components
make cpp
# Build Python bindings
make python-bindings
# Run all tests
make test
# Run Python tests
make python-test
# Run examples
make example
make python-example
# Clean build artifacts
make clean
```
## Running Tests
The test suite covers:
1. **Basic Operations**: Constructor, destructor, makeSet
2. **Find Operations**: Single node find, path compression
3. **Union Operations**: Basic union, union by rank, same set union
4. **Dynamic Operations**: Expand capacity, contract nodes, clear forest, size/capacity queries
5. **Complex Scenarios**: Multiple unions, large datasets, dynamic capacity changes
6. **Edge Cases**: Empty forests, single element forests, contraction edge cases
7. **Memory Management**: Large operations, proper cleanup, dynamic memory allocation
8. **Template Support**: Tests with different data types (int, string)
## Example Usage
### Basic Operations
```cpp
#include "disjoint_forest.h"
// Create a forest with capacity for 100 elements
DisjointForest<int> forest(100);
// Create some sets
Node<int>* node1 = forest.makeSet(1);
Node<int>* node2 = forest.makeSet(2);
Node<int>* node3 = forest.makeSet(3);
// Union sets
forest.unionSets(node1, node2);
forest.unionSets(node1, node3);
// Find representatives
Node<int>* rep1 = forest.find(node1);
Node<int>* rep2 = forest.find(node2);
Node<int>* rep3 = forest.find(node3);
// All should have the same representative
assert(rep1 == rep2 && rep2 == rep3);
```
### Dynamic Operations
```cpp
// Expand capacity when needed
forest.expand(50); // Add capacity for 50 more elements
// Check current state
std::cout << "Size: " << forest.size() << std::endl;
std::cout << "Capacity: " << forest.capacity() << std::endl;
std::cout << "Is empty: " << (forest.isEmpty() ? "Yes" : "No") << std::endl;
// Get all nodes for iteration
auto all_nodes = forest.getAllNodes();
for (auto* node : all_nodes) {
std::cout << "Node data: " << node->data << std::endl;
}
// Contract (remove) a node
forest.contract(node2);
// Clear the entire forest
forest.clear();
```
## Algorithm Complexity
- **makeSet**: O(1)
- **find**: O(α(n)) amortized, where α is the inverse Ackermann function
- **unionSets**: O(α(n)) amortized
- **expand**: O(1) amortized (vector doubling strategy)
- **contract**: O(1) (marking nodes as invalid)
- **clear**: O(n) where n is the number of nodes
- **size/capacity/isEmpty**: O(1)
- **getAllNodes**: O(n) where n is the number of nodes
- **Space**: O(n) where n is the number of elements
## Use Cases
Disjoint-set data structures are fundamental to many algorithms and applications:
- **Graph Algorithms**: Connected components, minimum spanning trees (Kruskal's algorithm)
- **Image Processing**: Connected component labeling, flood fill algorithms
- **Network Analysis**: Community detection, clustering algorithms
- **Game Development**: Pathfinding, collision detection systems
- **Database Systems**: Transaction management, dependency resolution
- **Compiler Design**: Register allocation, live variable analysis
## Python Bindings
The project includes Python bindings that provide a unified interface for working with any Python object type. The bindings are built using pybind11 and offer the same functionality as the C++ implementation.
### Features
- **Type Flexibility**: Works with integers, strings, lists, dictionaries, custom objects, etc.
- **Unified API**: Single `DisjointForest` class that handles any Python object type
- **Safe Operations**: Built-in safety checks and error handling
- **Memory Management**: Automatic cleanup of Python references
### Installation
```bash
# Install Python dependencies
cd python
pip3 install -r requirements.txt
# Build the Python module
make build
# Test the bindings
make test
# Run the example
make example
```
### Python Usage
```python
import disjoint_forest
# Create a forest
forest = disjoint_forest.DisjointForest(10)
# Create sets with different types
node1 = forest.make_set(42)
node2 = forest.make_set("hello")
node3 = forest.make_set([1, 2, 3])
# Union operations
forest.union_sets(node1, node2)
# Find representatives
rep = forest.find(node1)
print(f"Representative: {rep.data}")
# Dynamic operations
forest.expand(20)
print(f"Capacity: {forest.capacity()}")
print(f"Size: {forest.size()}")
# Get all nodes
all_nodes = forest.get_all_nodes()
for node in all_nodes:
print(f"Node: {node.data}")
```
## Implementation Details
### STL Vector Usage
The implementation uses `std::vector` instead of fixed-size arrays, providing several advantages:
- **Dynamic Capacity**: The forest can grow beyond its initial capacity using the `expand()` method
- **Automatic Memory Management**: Vectors handle memory allocation and deallocation automatically
- **Efficient Resizing**: STL vectors use exponential growth strategies for optimal performance
- **Exception Safety**: Vector operations provide strong exception guarantees
### Memory Management
- **Smart Pointers**: `std::unique_ptr` ensures automatic cleanup of Node objects
- **RAII Principles**: Resource acquisition is tied to object lifetime
- **Exception Safety**: Memory is properly cleaned up even when exceptions occur
- **Python Integration**: Python bindings include reference counting and safe memory management
## Notes
- The implementation automatically handles path compression during find operations
- Union by rank ensures balanced trees for optimal performance
- Memory is automatically managed with proper cleanup in the destructor
- Python bindings provide safe access to all C++ functionality
- Dynamic operations maintain the forest's integrity while allowing flexible capacity management
## Troubleshooting
### Common Build Issues
#### Google Test Not Found
If you encounter linking errors with Google Test:
```bash
# The Makefile will automatically install Google Test if needed
make install-deps
# Or manually install for your system (see Prerequisites section)
```
#### Python Import Errors
If Python can't import the module:
```bash
# Ensure the module is built
cd python
make build
# Check if the shared library exists
ls -la *.so
# Verify Python path
python3 -c "import sys; print(sys.path)"
```
#### Memory Issues
The implementation uses smart pointers and RAII principles. If you encounter memory issues:
- Ensure you're not manually deleting Node pointers
- Use the provided methods (`clear()`, `contract()`) instead of manual memory management
- Check that Python bindings are properly handling object lifetimes
## Contributing
We welcome contributions to improve the implementation, test coverage, or documentation. Please:
1. **Fork the repository** and create a feature branch
2. **Follow the existing code style** and naming conventions
3. **Add tests** for new functionality
4. **Update documentation** to reflect any changes
5. **Submit a pull request** with a clear description of your changes
### Development Setup
```bash
# Clone and setup
git clone <your-fork-url>
cd disjoint_set
# Install development dependencies
make install-deps
# Run all tests to ensure everything works
make test
make python-test
# Build everything
make all
```
Raw data
{
"_id": null,
"home_page": "https://github.com/yourusername/disjoint-forest",
"name": "disjoint-forest",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "disjoint-set, union-find, data-structure, algorithm, graph, optimization",
"author": "DisjointForest Contributors",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/a8/82/02995f6d78bd7ae80b711d5761b1824046341c6516282f3208b85aa70992/disjoint_forest-1.1.tar.gz",
"platform": null,
"description": "# DisjointForest\n\nA C++ implementation of a disjoint-set data structure (also known as union-find) using the disjoint forest representation with path compression and union by rank optimizations, with Python bindings included as well. This is intended to fill a void in too many existing standard libraries, which don't include a proper implementation of this despite its utility in many applications.\n\n## Features\n\n- **Template-based**: Works with any data type\n- **Path Compression**: Automatically compresses paths during find operations\n- **Union by Rank**: Optimizes union operations by maintaining tree depth information\n- **Dynamic Operations**: Expand capacity, contract nodes, and clear the forest\n- **Memory Efficient**: STL vectors with automatic memory management\n- **Smart Pointers**: Uses `std::unique_ptr` for automatic memory cleanup\n- **Exception Safety**: Proper error handling with `std::invalid_argument` and `std::runtime_error`\n- **Comprehensive Testing**: Full unit test suite using Google Test\n- **Python Bindings**: Full Python interface with pybind11\n\n## Class Structure\n\n### Node<T>\n- `data`: The actual data stored in the node\n- `parent`: Pointer to the parent node (self-referential for root nodes)\n- `rank`: Depth of the subtree rooted at this node\n\n### DisjointForest<T>\n- `makeSet(T data)`: Creates a new set containing the given data\n- `find(Node<T>* node)`: Finds the representative of the set containing the node\n- `unionSets(Node<T>* node1, Node<T>* node2)`: Merges two sets\n- Constructor takes the maximum number of elements as parameter\n\n#### Dynamic Operations\n- `expand(int additional_capacity)`: Increases the forest's capacity by the specified amount\n- `contract(Node<T>* node)`: Removes a node from the forest, updating parent references\n- `clear()`: Removes all nodes and resets the forest to empty state\n- `size()`: Returns the current number of nodes in the forest\n- `capacity()`: Returns the total capacity allocated for the forest\n- `isEmpty()`: Returns true if the forest contains no nodes\n- `getAllNodes()`: Returns a vector of all current nodes in the forest\n\n## Building the Project\n\n### Prerequisites\n- CMake 3.10 or higher\n- C++17 compatible compiler\n- Google Test library\n- Python 3.6+ (for Python bindings)\n- pybind11 (for Python bindings)\n\n### Install Google Test\n\n#### Ubuntu/Debian\n```bash\nsudo apt-get install libgtest-dev\n```\n\n#### macOS\n```bash\nbrew install googletest\n```\n\n#### CentOS/RHEL/Fedora\n```bash\nsudo yum install gtest-devel\n# or for newer versions\nsudo dnf install gtest-devel\n```\n\n#### Arch Linux\n```bash\nsudo pacman -S gtest\n```\n\n### Build Steps\n\n#### Using CMake\n```bash\n# Create build directory\nmkdir build\ncd build\n\n# Configure and build\ncmake ..\nmake\n\n# Run tests\nmake test\n# or directly\n./disjoint_forest_test\n\n# Run example\n./disjoint_forest_example\n```\n\n#### Using Makefile (Alternative)\n```bash\n# Build everything including Python bindings\nmake all\n\n# Build only C++ components\nmake cpp\n\n# Build Python bindings\nmake python-bindings\n\n# Run all tests\nmake test\n\n# Run Python tests\nmake python-test\n\n# Run examples\nmake example\nmake python-example\n\n# Clean build artifacts\nmake clean\n```\n\n## Running Tests\n\nThe test suite covers:\n\n1. **Basic Operations**: Constructor, destructor, makeSet\n2. **Find Operations**: Single node find, path compression\n3. **Union Operations**: Basic union, union by rank, same set union\n4. **Dynamic Operations**: Expand capacity, contract nodes, clear forest, size/capacity queries\n5. **Complex Scenarios**: Multiple unions, large datasets, dynamic capacity changes\n6. **Edge Cases**: Empty forests, single element forests, contraction edge cases\n7. **Memory Management**: Large operations, proper cleanup, dynamic memory allocation\n8. **Template Support**: Tests with different data types (int, string)\n\n## Example Usage\n\n### Basic Operations\n```cpp\n#include \"disjoint_forest.h\"\n\n// Create a forest with capacity for 100 elements\nDisjointForest<int> forest(100);\n\n// Create some sets\nNode<int>* node1 = forest.makeSet(1);\nNode<int>* node2 = forest.makeSet(2);\nNode<int>* node3 = forest.makeSet(3);\n\n// Union sets\nforest.unionSets(node1, node2);\nforest.unionSets(node1, node3);\n\n// Find representatives\nNode<int>* rep1 = forest.find(node1);\nNode<int>* rep2 = forest.find(node2);\nNode<int>* rep3 = forest.find(node3);\n\n// All should have the same representative\nassert(rep1 == rep2 && rep2 == rep3);\n```\n\n### Dynamic Operations\n```cpp\n// Expand capacity when needed\nforest.expand(50); // Add capacity for 50 more elements\n\n// Check current state\nstd::cout << \"Size: \" << forest.size() << std::endl;\nstd::cout << \"Capacity: \" << forest.capacity() << std::endl;\nstd::cout << \"Is empty: \" << (forest.isEmpty() ? \"Yes\" : \"No\") << std::endl;\n\n// Get all nodes for iteration\nauto all_nodes = forest.getAllNodes();\nfor (auto* node : all_nodes) {\n std::cout << \"Node data: \" << node->data << std::endl;\n}\n\n// Contract (remove) a node\nforest.contract(node2);\n\n// Clear the entire forest\nforest.clear();\n```\n\n## Algorithm Complexity\n\n- **makeSet**: O(1)\n- **find**: O(\u03b1(n)) amortized, where \u03b1 is the inverse Ackermann function\n- **unionSets**: O(\u03b1(n)) amortized\n- **expand**: O(1) amortized (vector doubling strategy)\n- **contract**: O(1) (marking nodes as invalid)\n- **clear**: O(n) where n is the number of nodes\n- **size/capacity/isEmpty**: O(1)\n- **getAllNodes**: O(n) where n is the number of nodes\n- **Space**: O(n) where n is the number of elements\n\n## Use Cases\n\nDisjoint-set data structures are fundamental to many algorithms and applications:\n\n- **Graph Algorithms**: Connected components, minimum spanning trees (Kruskal's algorithm)\n- **Image Processing**: Connected component labeling, flood fill algorithms\n- **Network Analysis**: Community detection, clustering algorithms\n- **Game Development**: Pathfinding, collision detection systems\n- **Database Systems**: Transaction management, dependency resolution\n- **Compiler Design**: Register allocation, live variable analysis\n\n## Python Bindings\n\nThe project includes Python bindings that provide a unified interface for working with any Python object type. The bindings are built using pybind11 and offer the same functionality as the C++ implementation.\n\n### Features\n- **Type Flexibility**: Works with integers, strings, lists, dictionaries, custom objects, etc.\n- **Unified API**: Single `DisjointForest` class that handles any Python object type\n- **Safe Operations**: Built-in safety checks and error handling\n- **Memory Management**: Automatic cleanup of Python references\n\n### Installation\n```bash\n# Install Python dependencies\ncd python\npip3 install -r requirements.txt\n\n# Build the Python module\nmake build\n\n# Test the bindings\nmake test\n\n# Run the example\nmake example\n```\n\n### Python Usage\n```python\nimport disjoint_forest\n\n# Create a forest\nforest = disjoint_forest.DisjointForest(10)\n\n# Create sets with different types\nnode1 = forest.make_set(42)\nnode2 = forest.make_set(\"hello\")\nnode3 = forest.make_set([1, 2, 3])\n\n# Union operations\nforest.union_sets(node1, node2)\n\n# Find representatives\nrep = forest.find(node1)\nprint(f\"Representative: {rep.data}\")\n\n# Dynamic operations\nforest.expand(20)\nprint(f\"Capacity: {forest.capacity()}\")\nprint(f\"Size: {forest.size()}\")\n\n# Get all nodes\nall_nodes = forest.get_all_nodes()\nfor node in all_nodes:\n print(f\"Node: {node.data}\")\n```\n\n## Implementation Details\n\n### STL Vector Usage\nThe implementation uses `std::vector` instead of fixed-size arrays, providing several advantages:\n- **Dynamic Capacity**: The forest can grow beyond its initial capacity using the `expand()` method\n- **Automatic Memory Management**: Vectors handle memory allocation and deallocation automatically\n- **Efficient Resizing**: STL vectors use exponential growth strategies for optimal performance\n- **Exception Safety**: Vector operations provide strong exception guarantees\n\n### Memory Management\n- **Smart Pointers**: `std::unique_ptr` ensures automatic cleanup of Node objects\n- **RAII Principles**: Resource acquisition is tied to object lifetime\n- **Exception Safety**: Memory is properly cleaned up even when exceptions occur\n- **Python Integration**: Python bindings include reference counting and safe memory management\n\n## Notes\n\n- The implementation automatically handles path compression during find operations\n- Union by rank ensures balanced trees for optimal performance\n- Memory is automatically managed with proper cleanup in the destructor\n- Python bindings provide safe access to all C++ functionality\n- Dynamic operations maintain the forest's integrity while allowing flexible capacity management\n\n## Troubleshooting\n\n### Common Build Issues\n\n#### Google Test Not Found\nIf you encounter linking errors with Google Test:\n```bash\n# The Makefile will automatically install Google Test if needed\nmake install-deps\n\n# Or manually install for your system (see Prerequisites section)\n```\n\n#### Python Import Errors\nIf Python can't import the module:\n```bash\n# Ensure the module is built\ncd python\nmake build\n\n# Check if the shared library exists\nls -la *.so\n\n# Verify Python path\npython3 -c \"import sys; print(sys.path)\"\n```\n\n#### Memory Issues\nThe implementation uses smart pointers and RAII principles. If you encounter memory issues:\n- Ensure you're not manually deleting Node pointers\n- Use the provided methods (`clear()`, `contract()`) instead of manual memory management\n- Check that Python bindings are properly handling object lifetimes\n\n## Contributing\n\nWe welcome contributions to improve the implementation, test coverage, or documentation. Please:\n\n1. **Fork the repository** and create a feature branch\n2. **Follow the existing code style** and naming conventions\n3. **Add tests** for new functionality\n4. **Update documentation** to reflect any changes\n5. **Submit a pull request** with a clear description of your changes\n\n### Development Setup\n```bash\n# Clone and setup\ngit clone <your-fork-url>\ncd disjoint_set\n\n# Install development dependencies\nmake install-deps\n\n# Run all tests to ensure everything works\nmake test\nmake python-test\n\n# Build everything\nmake all\n``` \n",
"bugtrack_url": null,
"license": null,
"summary": "A high-performance C++ implementation of disjoint-set data structure with Python bindings",
"version": "1.1",
"project_urls": {
"Bug Tracker": "https://github.com/yourusername/disjoint-forest/issues",
"Homepage": "https://github.com/yourusername/disjoint-forest",
"Source Code": "https://github.com/yourusername/disjoint-forest"
},
"split_keywords": [
"disjoint-set",
" union-find",
" data-structure",
" algorithm",
" graph",
" optimization"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "bb65bfd861bf6567c05204ae1a48437851608d7fd8be5ccddf5ebf0c045ca800",
"md5": "9b06dd700d762404c2a7e5807f7717f2",
"sha256": "b1c7f01eb6695efd0a66e8707dee3d82c9e1446880090100726985ea040f53b3"
},
"downloads": -1,
"filename": "disjoint_forest-1.1-cp310-cp310-manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "9b06dd700d762404c2a7e5807f7717f2",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.8",
"size": 1322924,
"upload_time": "2025-08-11T05:06:53",
"upload_time_iso_8601": "2025-08-11T05:06:53.734018Z",
"url": "https://files.pythonhosted.org/packages/bb/65/bfd861bf6567c05204ae1a48437851608d7fd8be5ccddf5ebf0c045ca800/disjoint_forest-1.1-cp310-cp310-manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a88202995f6d78bd7ae80b711d5761b1824046341c6516282f3208b85aa70992",
"md5": "81160032c4a10451ed7a438d24b92707",
"sha256": "5e60077c74484c23c8ed48d0a0e18824c5f73c80f2fc5d9e1ddd2e6483389486"
},
"downloads": -1,
"filename": "disjoint_forest-1.1.tar.gz",
"has_sig": false,
"md5_digest": "81160032c4a10451ed7a438d24b92707",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 14508,
"upload_time": "2025-08-11T05:06:55",
"upload_time_iso_8601": "2025-08-11T05:06:55.068408Z",
"url": "https://files.pythonhosted.org/packages/a8/82/02995f6d78bd7ae80b711d5761b1824046341c6516282f3208b85aa70992/disjoint_forest-1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-11 05:06:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yourusername",
"github_project": "disjoint-forest",
"github_not_found": true,
"lcname": "disjoint-forest"
}