# IKC - Iterative K-Core Clustering
Fast C++ implementation of Iterative K-Core Clustering with Python wrapper.
## Project Structure
```
ikc/
├── app/ # C++ application
├── lib/ # C++ libraries
│ ├── algorithms/ # Core IKC algorithms
│ ├── data_structures/ # Graph data structures
│ └── io/ # Graph I/O utilities
├── python/ # Python wrapper
│ ├── ikc/ # Python package
│ ├── bindings.cpp # pybind11 C++ bindings
│ └── example.py # Example Python usage
├── data/ # Test datasets
└── build/ # Build directory
```
## Building the C++ Executable
```bash
cd build
cmake ..
make
```
The compiled executable will be at `build/ikc`.
## C++ Command-Line Usage
```bash
./ikc -e <graph_file.tsv> -o <output.csv> [-k <min_k>] [-t <num_threads>] [-q] [--tsv]
```
### Options
- `-e <graph_file.tsv>` - Path to input graph edge list (TSV format)
- `-o <output.csv>` - Path to output file
- `-k <min_k>` - Minimum k value for valid clusters (default: 0)
- `-t <num_threads>` - Number of threads (default: hardware concurrency)
- `-q` - Quiet mode (suppress verbose output)
- `--tsv` - Output as TSV (node_id cluster_id) without header
### Examples
```bash
# Run with default settings (CSV output with all columns)
./ikc -e data/cit_hepph.tsv -o output.csv
# Run with min_k=10 and TSV output
./ikc -e data/cit_hepph.tsv -o output.tsv -k 10 --tsv
# Run with 8 threads in quiet mode
./ikc -e data/cit_hepph.tsv -o output.csv -k 10 -t 8 -q
```
### Output Formats
**CSV format** (default):
```
node_id,cluster_id,k_value,modularity
3306,1,30,1.0
9803315,1,30,1.0
```
**TSV format** (with `--tsv` flag):
```
3306 1
9803315 1
```
## Python Wrapper
The Python wrapper uses **pybind11** to directly bind the C++ code, providing:
- Fast performance (no subprocess overhead)
- Clean Python API
- No heavy dependencies (no pandas required)
### Installation
**Option 1: Install from PyPI (once published):**
```bash
pip install ikc
```
**Option 2: Install from source:**
**Install dependencies:**
```bash
pip install -r requirements.txt
```
**Install the Python package:**
```bash
pip install -e .
```
This will automatically compile the C++ extension and install the Python package.
Alternatively, install in one step (pip will handle dependencies):
```bash
pip install -e .
```
**Verify installation:**
```bash
python3 -c "import ikc; print('IKC wrapper installed successfully!')"
```
### Python Usage
```python
import ikc
# Load a graph from a TSV edge list file
g = ikc.load_graph('net.tsv')
# Run the IKC algorithm with min_k=10
c = g.ikc(10)
# Save results as TSV (node_id, cluster_id, no header)
c.save('out.tsv', tsv=True)
# Or save as CSV with all columns (node_id, cluster_id, k_value, modularity)
c.save('out.csv', tsv=False)
# Access clustering information
print(f"Number of clusters: {c.num_clusters}")
print(f"Number of nodes: {c.num_nodes}")
# Access the underlying data as list of tuples
print(c.data[:10]) # First 10 rows
```
### Python API Reference
#### `ikc.load_graph(graph_file, num_threads=None, verbose=False)`
Load a graph from a TSV edge list file.
**Parameters:**
- `graph_file` (str): Path to the graph edge list file (TSV format)
- `num_threads` (int, optional): Number of threads for loading (default: hardware concurrency)
- `verbose` (bool): Print loading progress (default: False)
**Returns:**
- `Graph`: Graph object ready for clustering
#### `Graph.ikc(min_k=0, verbose=False, progress_bar=False)`
Run the Iterative K-Core Clustering algorithm.
**Parameters:**
- `min_k` (int): Minimum k value for valid clusters (default: 0)
- `verbose` (bool): Print algorithm progress (default: False)
- `progress_bar` (bool): Display tqdm progress bar tracking k-core decomposition from initial max k (0%) to min_k (100%) (default: False)
**Returns:**
- `ClusterResult`: Object containing the clustering results
**Progress Bar:**
When `progress_bar=True`, a tqdm progress bar displays the k-core decomposition progress:
- **Initial max k-core value** → **0% progress**
- **Target min_k value** → **100% progress**
For example, if you run `ikc(min_k=10)` and the first core found is k=40:
- k=40 → 0% progress (start)
- k=30 → 33% progress
- k=20 → 66% progress
- k=10 → 100% progress (complete)
The progress bar shows the current k value and processing speed.
#### `ClusterResult.save(filename, tsv=False)`
Save clustering results to a file.
**Parameters:**
- `filename` (str): Output file path
- `tsv` (bool): If True, save as TSV with only node_id and cluster_id (no header). If False, save as CSV with all columns (default: False)
#### `ClusterResult` Properties
- `num_clusters`: Number of clusters found
- `num_nodes`: Number of nodes in the clustering
- `data`: List of tuples `(node_id, cluster_id, k_value, modularity)` for all clustered nodes
- `clusters`: List of C++ Cluster objects with `nodes`, `k_value`, and `modularity` attributes
### Python Example
```python
import ikc
# Load graph with 4 threads
g = ikc.load_graph('data/cit_hepph.tsv', num_threads=4)
print(g) # Graph(file='...', nodes=34546, edges=420877)
# Run IKC with min_k=10 and progress bar
clusters = g.ikc(min_k=10, progress_bar=True)
# Output: IKC Progress: 100%|██████████| 20/20 [00:00<00:00, 84.99k-core levels/s, current_k=10]
# Print summary
print(clusters) # ClusterResult(nodes=34546, clusters=27639)
# Save in TSV format
clusters.save('output.tsv', tsv=True)
# Save in CSV format with all columns
clusters.save('output.csv', tsv=False)
# Access the data (first 5 rows)
for row in clusters.data[:5]:
node_id, cluster_id, k_value, modularity = row
print(f"Node {node_id} in cluster {cluster_id}")
# Get cluster statistics
print(f"Total clusters: {clusters.num_clusters}")
print(f"Total nodes: {clusters.num_nodes}")
```
### Run Example Script
```bash
python3 python/example.py
```
## Requirements
### C++
- CMake 3.10+
- C++17 compatible compiler
- OpenMP support (for parallel graph loading)
### Python
- Python 3.7+
- pybind11 >= 2.6.0 (automatically installed with `pip install`)
- tqdm >= 4.0.0 (for progress bar feature, automatically installed with `pip install`)
## Input Format
The input graph file should be a TSV (tab-separated) edge list with two columns:
```
node1 node2
node3 node4
...
```
No header required. Nodes can be any integer IDs.
---
## Streaming IKC (Incremental Updates)
The streaming IKC implementation allows you to efficiently update clustering as your graph evolves over time, without full recomputation.
### Overview
The streaming algorithm:
- Starts with an initial graph and IKC clustering
- Adds new edges and nodes incrementally
- Efficiently updates clustering using localized recomputation
- Provides 10-100x speedup for sparse updates vs full recomputation
### Algorithm Summary
The streaming algorithm consists of three main phases:
1. **Incremental Core Number Update** - Based on Sariyüce et al. (2013), when edges are added:
- Core numbers can only increase (never decrease)
- Only nodes near added edges need checking
- Uses priority queue for efficient promotion
2. **Cluster Invalidation Detection** - For each existing cluster:
- **Valid**: No affected nodes → cluster unchanged
- **Invalid (k-validity)**: Internal degrees drop below k → needs recomputation
- **Invalid (merge)**: External nodes promoted to same k-core → potential merge
3. **Localized Recomputation** - Extract affected subgraph and run standard IKC on just that region
### Python API
#### Basic Usage
```python
import ikc
# Load graph and compute initial clustering
g = ikc.StreamingGraph('network.tsv')
result = g.ikc(min_k=10)
print(f"Initial: {result.num_clusters} clusters")
# Add edges incrementally
result = g.add_edges([(100, 200), (101, 202)])
print(f"After update: {result.num_clusters} clusters")
# View update statistics
stats = g.last_update_stats
print(f"Affected nodes: {stats['affected_nodes']}")
print(f"Update time: {stats['total_time_ms']}ms")
```
#### Batch Mode (Recommended for Multiple Updates)
```python
# Accumulate updates without recomputation
g.begin_batch()
g.add_edges([(1, 2), (3, 4)])
g.add_edges([(5, 6), (7, 8)])
g.add_nodes([100, 101, 102])
# Apply all updates at once
result = g.commit_batch()
```
#### Combined Updates
```python
# More efficient than separate calls
result = g.update(
new_edges=[(1, 2), (3, 4)],
new_nodes=[1, 2, 3, 4] # Include all nodes referenced in edges
)
```
**Important Note**: When using `update()`, all nodes referenced in `new_edges` must either:
1. Already exist in the graph, OR
2. Be included in the `new_nodes` list
For example, this will raise an error:
```python
# ERROR: Nodes 100 and 200 don't exist and aren't in new_nodes
result = g.update(new_edges=[(100, 200)]) # ValueError!
```
Correct usage:
```python
# Include all new nodes
result = g.update(new_edges=[(100, 200)], new_nodes=[100, 200]) # ✓
```
### StreamingGraph API Reference
#### Constructor
```python
g = ikc.StreamingGraph(graph_file, num_threads=None, verbose=False)
```
#### Methods
**`ikc(min_k=0, verbose=False, progress_bar=False) -> ClusterResult`**
- Run initial IKC clustering and initialize streaming state
- Must be called before any update operations
**`add_edges(edges, verbose=False) -> ClusterResult`**
- Add edges and update clustering incrementally
- `edges`: List of `(node_id, node_id)` tuples
- Returns updated clustering
**`add_nodes(nodes, verbose=False) -> ClusterResult`**
- Add isolated nodes to the graph
- `nodes`: List of node IDs
- Returns updated clustering
**`update(new_edges=None, new_nodes=None, verbose=False) -> ClusterResult`**
- Add both edges and nodes in a single operation
- More efficient than separate calls
- **Important**: All nodes referenced in `new_edges` must be included in `new_nodes` (if they don't already exist in the graph)
- Raises `ValueError` if an edge references a non-existent node not in `new_nodes`
**`begin_batch()`**
- Enter batch mode - accumulate updates without recomputation
**`commit_batch(verbose=False) -> ClusterResult`**
- Apply all pending updates and exit batch mode
#### Properties
**`current_clustering -> ClusterResult`**
- Get current clustering without triggering recomputation
**`last_update_stats -> dict`**
- Statistics from the last update operation:
- `affected_nodes`: Nodes with changed core numbers
- `invalidated_clusters`: Clusters that needed recomputation
- `valid_clusters`: Clusters that remained valid
- `merge_candidates`: Nodes involved in potential merges
- `recompute_time_ms`: Time spent in localized recomputation
- `total_time_ms`: Total update time
**`num_nodes -> int`**, **`num_edges -> int`**, **`max_core -> int`**, **`is_batch_mode -> bool`**
### Streaming Example
```python
import ikc
# Initialize with existing graph
g = ikc.StreamingGraph('network.tsv')
result = g.ikc(min_k=10, progress_bar=True)
print(f"Initial: {result.num_clusters} clusters")
# Incremental update
result = g.add_edges([(100, 200), (101, 202)])
stats = g.last_update_stats
print(f"Update: {stats['affected_nodes']} nodes affected in {stats['total_time_ms']:.2f}ms")
# Batch mode for multiple updates
g.begin_batch()
g.add_edges([(1, 2), (3, 4)])
g.add_nodes([500, 501])
result = g.commit_batch()
print(f"Batch complete: {result.num_clusters} clusters")
```
See `python/streaming_example.py` for a complete demonstration.
### Performance Characteristics
**Expected Case (Sparse Updates)**
- Per-edge time: O(Δ² · log n) where Δ = max degree
- Most clusters remain valid
- Speedup: 10-100x vs full recomputation
**Worst Case (Dense Core Updates)**
- Per-edge time: O(n + m) (same as full recomputation)
- Occurs when adding edges to maximum k-core
**Batch Mode Benefits**
- Amortizes overhead across multiple updates
- Single core number recomputation for all edges
- Recommended for bulk updates
### When to Use Streaming vs Batch IKC
**Use Streaming IKC When:**
- Graph evolves incrementally over time
- Need up-to-date clustering after each update
- Updates are sparse (few edges at a time)
- Want to track which clusters changed
**Use Batch IKC (`Graph.ikc()`) When:**
- Have complete graph from the start
- Don't need intermediate clustering results
- Graph structure is static
### Limitations
- **Edge deletions not supported** - Only additions are handled
- **No state persistence** - Cannot save/load streaming state
- Cluster splits never occur with edge additions (proven mathematically)
### References
- Sariyüce, A. E., Gedik, B., Jacques-Silva, G., Wu, K. L., & Çatalyürek, Ü. V. (2013).
"Streaming algorithms for k-core decomposition."
*Proceedings of the VLDB Endowment*, 6(6), 433-444.
Raw data
{
"_id": null,
"home_page": "https://github.com/vikramr2/ikc",
"name": "ikc",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "graph clustering k-core community-detection network-analysis",
"author": "Vikram Ramavarapu",
"author_email": "vikramr2@illinois.edu",
"download_url": "https://files.pythonhosted.org/packages/49/fc/7b844a360ec132971ebea6f53d1f1615764d168fe1a0a64decd53fbd9cd2/ikc-0.3.0.tar.gz",
"platform": null,
"description": "# IKC - Iterative K-Core Clustering\n\nFast C++ implementation of Iterative K-Core Clustering with Python wrapper.\n\n## Project Structure\n\n```\nikc/\n\u251c\u2500\u2500 app/ # C++ application\n\u251c\u2500\u2500 lib/ # C++ libraries\n\u2502 \u251c\u2500\u2500 algorithms/ # Core IKC algorithms\n\u2502 \u251c\u2500\u2500 data_structures/ # Graph data structures\n\u2502 \u2514\u2500\u2500 io/ # Graph I/O utilities\n\u251c\u2500\u2500 python/ # Python wrapper\n\u2502 \u251c\u2500\u2500 ikc/ # Python package\n\u2502 \u251c\u2500\u2500 bindings.cpp # pybind11 C++ bindings\n\u2502 \u2514\u2500\u2500 example.py # Example Python usage\n\u251c\u2500\u2500 data/ # Test datasets\n\u2514\u2500\u2500 build/ # Build directory\n```\n\n## Building the C++ Executable\n\n```bash\ncd build\ncmake ..\nmake\n```\n\nThe compiled executable will be at `build/ikc`.\n\n## C++ Command-Line Usage\n\n```bash\n./ikc -e <graph_file.tsv> -o <output.csv> [-k <min_k>] [-t <num_threads>] [-q] [--tsv]\n```\n\n### Options\n\n- `-e <graph_file.tsv>` - Path to input graph edge list (TSV format)\n- `-o <output.csv>` - Path to output file\n- `-k <min_k>` - Minimum k value for valid clusters (default: 0)\n- `-t <num_threads>` - Number of threads (default: hardware concurrency)\n- `-q` - Quiet mode (suppress verbose output)\n- `--tsv` - Output as TSV (node_id cluster_id) without header\n\n### Examples\n\n```bash\n# Run with default settings (CSV output with all columns)\n./ikc -e data/cit_hepph.tsv -o output.csv\n\n# Run with min_k=10 and TSV output\n./ikc -e data/cit_hepph.tsv -o output.tsv -k 10 --tsv\n\n# Run with 8 threads in quiet mode\n./ikc -e data/cit_hepph.tsv -o output.csv -k 10 -t 8 -q\n```\n\n### Output Formats\n\n**CSV format** (default):\n```\nnode_id,cluster_id,k_value,modularity\n3306,1,30,1.0\n9803315,1,30,1.0\n```\n\n**TSV format** (with `--tsv` flag):\n```\n3306\t1\n9803315\t1\n```\n\n## Python Wrapper\n\nThe Python wrapper uses **pybind11** to directly bind the C++ code, providing:\n- Fast performance (no subprocess overhead)\n- Clean Python API\n- No heavy dependencies (no pandas required)\n\n### Installation\n\n**Option 1: Install from PyPI (once published):**\n\n```bash\npip install ikc\n```\n\n**Option 2: Install from source:**\n\n**Install dependencies:**\n\n```bash\npip install -r requirements.txt\n```\n\n**Install the Python package:**\n\n```bash\npip install -e .\n```\n\nThis will automatically compile the C++ extension and install the Python package.\n\nAlternatively, install in one step (pip will handle dependencies):\n\n```bash\npip install -e .\n```\n\n**Verify installation:**\n\n```bash\npython3 -c \"import ikc; print('IKC wrapper installed successfully!')\"\n```\n\n### Python Usage\n\n```python\nimport ikc\n\n# Load a graph from a TSV edge list file\ng = ikc.load_graph('net.tsv')\n\n# Run the IKC algorithm with min_k=10\nc = g.ikc(10)\n\n# Save results as TSV (node_id, cluster_id, no header)\nc.save('out.tsv', tsv=True)\n\n# Or save as CSV with all columns (node_id, cluster_id, k_value, modularity)\nc.save('out.csv', tsv=False)\n\n# Access clustering information\nprint(f\"Number of clusters: {c.num_clusters}\")\nprint(f\"Number of nodes: {c.num_nodes}\")\n\n# Access the underlying data as list of tuples\nprint(c.data[:10]) # First 10 rows\n```\n\n### Python API Reference\n\n#### `ikc.load_graph(graph_file, num_threads=None, verbose=False)`\n\nLoad a graph from a TSV edge list file.\n\n**Parameters:**\n- `graph_file` (str): Path to the graph edge list file (TSV format)\n- `num_threads` (int, optional): Number of threads for loading (default: hardware concurrency)\n- `verbose` (bool): Print loading progress (default: False)\n\n**Returns:**\n- `Graph`: Graph object ready for clustering\n\n#### `Graph.ikc(min_k=0, verbose=False, progress_bar=False)`\n\nRun the Iterative K-Core Clustering algorithm.\n\n**Parameters:**\n- `min_k` (int): Minimum k value for valid clusters (default: 0)\n- `verbose` (bool): Print algorithm progress (default: False)\n- `progress_bar` (bool): Display tqdm progress bar tracking k-core decomposition from initial max k (0%) to min_k (100%) (default: False)\n\n**Returns:**\n- `ClusterResult`: Object containing the clustering results\n\n**Progress Bar:**\nWhen `progress_bar=True`, a tqdm progress bar displays the k-core decomposition progress:\n- **Initial max k-core value** \u2192 **0% progress**\n- **Target min_k value** \u2192 **100% progress**\n\nFor example, if you run `ikc(min_k=10)` and the first core found is k=40:\n- k=40 \u2192 0% progress (start)\n- k=30 \u2192 33% progress\n- k=20 \u2192 66% progress\n- k=10 \u2192 100% progress (complete)\n\nThe progress bar shows the current k value and processing speed.\n\n#### `ClusterResult.save(filename, tsv=False)`\n\nSave clustering results to a file.\n\n**Parameters:**\n- `filename` (str): Output file path\n- `tsv` (bool): If True, save as TSV with only node_id and cluster_id (no header). If False, save as CSV with all columns (default: False)\n\n#### `ClusterResult` Properties\n\n- `num_clusters`: Number of clusters found\n- `num_nodes`: Number of nodes in the clustering\n- `data`: List of tuples `(node_id, cluster_id, k_value, modularity)` for all clustered nodes\n- `clusters`: List of C++ Cluster objects with `nodes`, `k_value`, and `modularity` attributes\n\n### Python Example\n\n```python\nimport ikc\n\n# Load graph with 4 threads\ng = ikc.load_graph('data/cit_hepph.tsv', num_threads=4)\nprint(g) # Graph(file='...', nodes=34546, edges=420877)\n\n# Run IKC with min_k=10 and progress bar\nclusters = g.ikc(min_k=10, progress_bar=True)\n# Output: IKC Progress: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 20/20 [00:00<00:00, 84.99k-core levels/s, current_k=10]\n\n# Print summary\nprint(clusters) # ClusterResult(nodes=34546, clusters=27639)\n\n# Save in TSV format\nclusters.save('output.tsv', tsv=True)\n\n# Save in CSV format with all columns\nclusters.save('output.csv', tsv=False)\n\n# Access the data (first 5 rows)\nfor row in clusters.data[:5]:\n node_id, cluster_id, k_value, modularity = row\n print(f\"Node {node_id} in cluster {cluster_id}\")\n\n# Get cluster statistics\nprint(f\"Total clusters: {clusters.num_clusters}\")\nprint(f\"Total nodes: {clusters.num_nodes}\")\n```\n\n### Run Example Script\n\n```bash\npython3 python/example.py\n```\n\n## Requirements\n\n### C++\n- CMake 3.10+\n- C++17 compatible compiler\n- OpenMP support (for parallel graph loading)\n\n### Python\n- Python 3.7+\n- pybind11 >= 2.6.0 (automatically installed with `pip install`)\n- tqdm >= 4.0.0 (for progress bar feature, automatically installed with `pip install`)\n\n## Input Format\n\nThe input graph file should be a TSV (tab-separated) edge list with two columns:\n```\nnode1\tnode2\nnode3\tnode4\n...\n```\n\nNo header required. Nodes can be any integer IDs.\n\n---\n\n## Streaming IKC (Incremental Updates)\n\nThe streaming IKC implementation allows you to efficiently update clustering as your graph evolves over time, without full recomputation.\n\n### Overview\n\nThe streaming algorithm:\n- Starts with an initial graph and IKC clustering\n- Adds new edges and nodes incrementally\n- Efficiently updates clustering using localized recomputation\n- Provides 10-100x speedup for sparse updates vs full recomputation\n\n### Algorithm Summary\n\nThe streaming algorithm consists of three main phases:\n\n1. **Incremental Core Number Update** - Based on Sariy\u00fcce et al. (2013), when edges are added:\n - Core numbers can only increase (never decrease)\n - Only nodes near added edges need checking\n - Uses priority queue for efficient promotion\n\n2. **Cluster Invalidation Detection** - For each existing cluster:\n - **Valid**: No affected nodes \u2192 cluster unchanged\n - **Invalid (k-validity)**: Internal degrees drop below k \u2192 needs recomputation\n - **Invalid (merge)**: External nodes promoted to same k-core \u2192 potential merge\n\n3. **Localized Recomputation** - Extract affected subgraph and run standard IKC on just that region\n\n### Python API\n\n#### Basic Usage\n\n```python\nimport ikc\n\n# Load graph and compute initial clustering\ng = ikc.StreamingGraph('network.tsv')\nresult = g.ikc(min_k=10)\n\nprint(f\"Initial: {result.num_clusters} clusters\")\n\n# Add edges incrementally\nresult = g.add_edges([(100, 200), (101, 202)])\nprint(f\"After update: {result.num_clusters} clusters\")\n\n# View update statistics\nstats = g.last_update_stats\nprint(f\"Affected nodes: {stats['affected_nodes']}\")\nprint(f\"Update time: {stats['total_time_ms']}ms\")\n```\n\n#### Batch Mode (Recommended for Multiple Updates)\n\n```python\n# Accumulate updates without recomputation\ng.begin_batch()\n\ng.add_edges([(1, 2), (3, 4)])\ng.add_edges([(5, 6), (7, 8)])\ng.add_nodes([100, 101, 102])\n\n# Apply all updates at once\nresult = g.commit_batch()\n```\n\n#### Combined Updates\n\n```python\n# More efficient than separate calls\nresult = g.update(\n new_edges=[(1, 2), (3, 4)],\n new_nodes=[1, 2, 3, 4] # Include all nodes referenced in edges\n)\n```\n\n**Important Note**: When using `update()`, all nodes referenced in `new_edges` must either:\n1. Already exist in the graph, OR\n2. Be included in the `new_nodes` list\n\nFor example, this will raise an error:\n```python\n# ERROR: Nodes 100 and 200 don't exist and aren't in new_nodes\nresult = g.update(new_edges=[(100, 200)]) # ValueError!\n```\n\nCorrect usage:\n```python\n# Include all new nodes\nresult = g.update(new_edges=[(100, 200)], new_nodes=[100, 200]) # \u2713\n```\n\n### StreamingGraph API Reference\n\n#### Constructor\n```python\ng = ikc.StreamingGraph(graph_file, num_threads=None, verbose=False)\n```\n\n#### Methods\n\n**`ikc(min_k=0, verbose=False, progress_bar=False) -> ClusterResult`**\n- Run initial IKC clustering and initialize streaming state\n- Must be called before any update operations\n\n**`add_edges(edges, verbose=False) -> ClusterResult`**\n- Add edges and update clustering incrementally\n- `edges`: List of `(node_id, node_id)` tuples\n- Returns updated clustering\n\n**`add_nodes(nodes, verbose=False) -> ClusterResult`**\n- Add isolated nodes to the graph\n- `nodes`: List of node IDs\n- Returns updated clustering\n\n**`update(new_edges=None, new_nodes=None, verbose=False) -> ClusterResult`**\n- Add both edges and nodes in a single operation\n- More efficient than separate calls\n- **Important**: All nodes referenced in `new_edges` must be included in `new_nodes` (if they don't already exist in the graph)\n- Raises `ValueError` if an edge references a non-existent node not in `new_nodes`\n\n**`begin_batch()`**\n- Enter batch mode - accumulate updates without recomputation\n\n**`commit_batch(verbose=False) -> ClusterResult`**\n- Apply all pending updates and exit batch mode\n\n#### Properties\n\n**`current_clustering -> ClusterResult`**\n- Get current clustering without triggering recomputation\n\n**`last_update_stats -> dict`**\n- Statistics from the last update operation:\n - `affected_nodes`: Nodes with changed core numbers\n - `invalidated_clusters`: Clusters that needed recomputation\n - `valid_clusters`: Clusters that remained valid\n - `merge_candidates`: Nodes involved in potential merges\n - `recompute_time_ms`: Time spent in localized recomputation\n - `total_time_ms`: Total update time\n\n**`num_nodes -> int`**, **`num_edges -> int`**, **`max_core -> int`**, **`is_batch_mode -> bool`**\n\n### Streaming Example\n\n```python\nimport ikc\n\n# Initialize with existing graph\ng = ikc.StreamingGraph('network.tsv')\nresult = g.ikc(min_k=10, progress_bar=True)\nprint(f\"Initial: {result.num_clusters} clusters\")\n\n# Incremental update\nresult = g.add_edges([(100, 200), (101, 202)])\nstats = g.last_update_stats\nprint(f\"Update: {stats['affected_nodes']} nodes affected in {stats['total_time_ms']:.2f}ms\")\n\n# Batch mode for multiple updates\ng.begin_batch()\ng.add_edges([(1, 2), (3, 4)])\ng.add_nodes([500, 501])\nresult = g.commit_batch()\nprint(f\"Batch complete: {result.num_clusters} clusters\")\n```\n\nSee `python/streaming_example.py` for a complete demonstration.\n\n### Performance Characteristics\n\n**Expected Case (Sparse Updates)**\n- Per-edge time: O(\u0394\u00b2 \u00b7 log n) where \u0394 = max degree\n- Most clusters remain valid\n- Speedup: 10-100x vs full recomputation\n\n**Worst Case (Dense Core Updates)**\n- Per-edge time: O(n + m) (same as full recomputation)\n- Occurs when adding edges to maximum k-core\n\n**Batch Mode Benefits**\n- Amortizes overhead across multiple updates\n- Single core number recomputation for all edges\n- Recommended for bulk updates\n\n### When to Use Streaming vs Batch IKC\n\n**Use Streaming IKC When:**\n- Graph evolves incrementally over time\n- Need up-to-date clustering after each update\n- Updates are sparse (few edges at a time)\n- Want to track which clusters changed\n\n**Use Batch IKC (`Graph.ikc()`) When:**\n- Have complete graph from the start\n- Don't need intermediate clustering results\n- Graph structure is static\n\n### Limitations\n\n- **Edge deletions not supported** - Only additions are handled\n- **No state persistence** - Cannot save/load streaming state\n- Cluster splits never occur with edge additions (proven mathematically)\n\n### References\n\n- Sariy\u00fcce, A. E., Gedik, B., Jacques-Silva, G., Wu, K. L., & \u00c7ataly\u00fcrek, \u00dc. V. (2013).\n \"Streaming algorithms for k-core decomposition.\"\n *Proceedings of the VLDB Endowment*, 6(6), 433-444.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Fast C++ implementation of Iterative K-Core Clustering with Python bindings",
"version": "0.3.0",
"project_urls": {
"Bug Reports": "https://github.com/vikramr2/ikc/issues",
"Homepage": "https://github.com/vikramr2/ikc",
"Source": "https://github.com/vikramr2/ikc"
},
"split_keywords": [
"graph",
"clustering",
"k-core",
"community-detection",
"network-analysis"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c86d0aef8a7d67d60c36ad1eb1de1a2438e9758b73c6f2f2221cda4b2cd15a41",
"md5": "cd98c8074a8cbfc3d9b758fdde5123ba",
"sha256": "0c9882d69ea7732c445d56cc54a215e31919dea6d8810800a9113c086f72afdc"
},
"downloads": -1,
"filename": "ikc-0.3.0-cp39-cp39-macosx_10_9_x86_64.whl",
"has_sig": false,
"md5_digest": "cd98c8074a8cbfc3d9b758fdde5123ba",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 176890,
"upload_time": "2025-11-08T01:34:49",
"upload_time_iso_8601": "2025-11-08T01:34:49.344433Z",
"url": "https://files.pythonhosted.org/packages/c8/6d/0aef8a7d67d60c36ad1eb1de1a2438e9758b73c6f2f2221cda4b2cd15a41/ikc-0.3.0-cp39-cp39-macosx_10_9_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "49fc7b844a360ec132971ebea6f53d1f1615764d168fe1a0a64decd53fbd9cd2",
"md5": "d07c72cec21a60591984441dc47d8a97",
"sha256": "fac70439379e83b41479db2e4d98218427a91917792f5c01bd1992cf49b02e9b"
},
"downloads": -1,
"filename": "ikc-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "d07c72cec21a60591984441dc47d8a97",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 30676,
"upload_time": "2025-11-08T01:34:51",
"upload_time_iso_8601": "2025-11-08T01:34:51.221939Z",
"url": "https://files.pythonhosted.org/packages/49/fc/7b844a360ec132971ebea6f53d1f1615764d168fe1a0a64decd53fbd9cd2/ikc-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-08 01:34:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "vikramr2",
"github_project": "ikc",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pybind11",
"specs": [
[
">=",
"2.6.0"
]
]
}
],
"lcname": "ikc"
}