ikc

Name	ikc JSON
Version	0.3.0 JSON
	download
home_page	https://github.com/vikramr2/ikc
Summary	Fast C++ implementation of Iterative K-Core Clustering with Python bindings
upload_time	2025-11-08 01:34:51
maintainer	None
docs_url	None
author	Vikram Ramavarapu
requires_python	>=3.7
license	MIT
keywords	graph clustering k-core community-detection network-analysis
VCS
bugtrack_url
requirements	pybind11
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # IKC - Iterative K-Core Clustering

Fast C++ implementation of Iterative K-Core Clustering with Python wrapper.

## Project Structure

```
ikc/
├── app/                     # C++ application
├── lib/                     # C++ libraries
│   ├── algorithms/          # Core IKC algorithms
│   ├── data_structures/     # Graph data structures
│   └── io/                  # Graph I/O utilities
├── python/                  # Python wrapper
│   ├── ikc/                 # Python package
│   ├── bindings.cpp         # pybind11 C++ bindings
│   └── example.py           # Example Python usage
├── data/                    # Test datasets
└── build/                   # Build directory
```

## Building the C++ Executable

```bash
cd build
cmake ..
make
```

The compiled executable will be at `build/ikc`.

## C++ Command-Line Usage

```bash
./ikc -e <graph_file.tsv> -o <output.csv> [-k <min_k>] [-t <num_threads>] [-q] [--tsv]
```

### Options

- `-e <graph_file.tsv>` - Path to input graph edge list (TSV format)
- `-o <output.csv>` - Path to output file
- `-k <min_k>` - Minimum k value for valid clusters (default: 0)
- `-t <num_threads>` - Number of threads (default: hardware concurrency)
- `-q` - Quiet mode (suppress verbose output)
- `--tsv` - Output as TSV (node_id cluster_id) without header

### Examples

```bash
# Run with default settings (CSV output with all columns)
./ikc -e data/cit_hepph.tsv -o output.csv

# Run with min_k=10 and TSV output
./ikc -e data/cit_hepph.tsv -o output.tsv -k 10 --tsv

# Run with 8 threads in quiet mode
./ikc -e data/cit_hepph.tsv -o output.csv -k 10 -t 8 -q
```

### Output Formats

**CSV format** (default):
```
node_id,cluster_id,k_value,modularity
3306,1,30,1.0
9803315,1,30,1.0
```

**TSV format** (with `--tsv` flag):
```
3306	1
9803315	1
```

## Python Wrapper

The Python wrapper uses **pybind11** to directly bind the C++ code, providing:
- Fast performance (no subprocess overhead)
- Clean Python API
- No heavy dependencies (no pandas required)

### Installation

**Option 1: Install from PyPI (once published):**

```bash
pip install ikc
```

**Option 2: Install from source:**

**Install dependencies:**

```bash
pip install -r requirements.txt
```

**Install the Python package:**

```bash
pip install -e .
```

This will automatically compile the C++ extension and install the Python package.

Alternatively, install in one step (pip will handle dependencies):

```bash
pip install -e .
```

**Verify installation:**

```bash
python3 -c "import ikc; print('IKC wrapper installed successfully!')"
```

### Python Usage

```python
import ikc

# Load a graph from a TSV edge list file
g = ikc.load_graph('net.tsv')

# Run the IKC algorithm with min_k=10
c = g.ikc(10)

# Save results as TSV (node_id, cluster_id, no header)
c.save('out.tsv', tsv=True)

# Or save as CSV with all columns (node_id, cluster_id, k_value, modularity)
c.save('out.csv', tsv=False)

# Access clustering information
print(f"Number of clusters: {c.num_clusters}")
print(f"Number of nodes: {c.num_nodes}")

# Access the underlying data as list of tuples
print(c.data[:10])  # First 10 rows
```

### Python API Reference

#### `ikc.load_graph(graph_file, num_threads=None, verbose=False)`

Load a graph from a TSV edge list file.

**Parameters:**
- `graph_file` (str): Path to the graph edge list file (TSV format)
- `num_threads` (int, optional): Number of threads for loading (default: hardware concurrency)
- `verbose` (bool): Print loading progress (default: False)

**Returns:**
- `Graph`: Graph object ready for clustering

#### `Graph.ikc(min_k=0, verbose=False, progress_bar=False)`

Run the Iterative K-Core Clustering algorithm.

**Parameters:**
- `min_k` (int): Minimum k value for valid clusters (default: 0)
- `verbose` (bool): Print algorithm progress (default: False)
- `progress_bar` (bool): Display tqdm progress bar tracking k-core decomposition from initial max k (0%) to min_k (100%) (default: False)

**Returns:**
- `ClusterResult`: Object containing the clustering results

**Progress Bar:**
When `progress_bar=True`, a tqdm progress bar displays the k-core decomposition progress:
- **Initial max k-core value** → **0% progress**
- **Target min_k value** → **100% progress**

For example, if you run `ikc(min_k=10)` and the first core found is k=40:
- k=40 → 0% progress (start)
- k=30 → 33% progress
- k=20 → 66% progress
- k=10 → 100% progress (complete)

The progress bar shows the current k value and processing speed.

#### `ClusterResult.save(filename, tsv=False)`

Save clustering results to a file.

**Parameters:**
- `filename` (str): Output file path
- `tsv` (bool): If True, save as TSV with only node_id and cluster_id (no header). If False, save as CSV with all columns (default: False)

#### `ClusterResult` Properties

- `num_clusters`: Number of clusters found
- `num_nodes`: Number of nodes in the clustering
- `data`: List of tuples `(node_id, cluster_id, k_value, modularity)` for all clustered nodes
- `clusters`: List of C++ Cluster objects with `nodes`, `k_value`, and `modularity` attributes

### Python Example

```python
import ikc

# Load graph with 4 threads
g = ikc.load_graph('data/cit_hepph.tsv', num_threads=4)
print(g)  # Graph(file='...', nodes=34546, edges=420877)

# Run IKC with min_k=10 and progress bar
clusters = g.ikc(min_k=10, progress_bar=True)
# Output: IKC Progress: 100%|██████████| 20/20 [00:00<00:00, 84.99k-core levels/s, current_k=10]

# Print summary
print(clusters)  # ClusterResult(nodes=34546, clusters=27639)

# Save in TSV format
clusters.save('output.tsv', tsv=True)

# Save in CSV format with all columns
clusters.save('output.csv', tsv=False)

# Access the data (first 5 rows)
for row in clusters.data[:5]:
    node_id, cluster_id, k_value, modularity = row
    print(f"Node {node_id} in cluster {cluster_id}")

# Get cluster statistics
print(f"Total clusters: {clusters.num_clusters}")
print(f"Total nodes: {clusters.num_nodes}")
```

### Run Example Script

```bash
python3 python/example.py
```

## Requirements

### C++
- CMake 3.10+
- C++17 compatible compiler
- OpenMP support (for parallel graph loading)

### Python
- Python 3.7+
- pybind11 >= 2.6.0 (automatically installed with `pip install`)
- tqdm >= 4.0.0 (for progress bar feature, automatically installed with `pip install`)

## Input Format

The input graph file should be a TSV (tab-separated) edge list with two columns:
```
node1	node2
node3	node4
...
```

No header required. Nodes can be any integer IDs.

---

## Streaming IKC (Incremental Updates)

The streaming IKC implementation allows you to efficiently update clustering as your graph evolves over time, without full recomputation.

### Overview

The streaming algorithm:
- Starts with an initial graph and IKC clustering
- Adds new edges and nodes incrementally
- Efficiently updates clustering using localized recomputation
- Provides 10-100x speedup for sparse updates vs full recomputation

### Algorithm Summary

The streaming algorithm consists of three main phases:

1. **Incremental Core Number Update** - Based on Sariyüce et al. (2013), when edges are added:
   - Core numbers can only increase (never decrease)
   - Only nodes near added edges need checking
   - Uses priority queue for efficient promotion

2. **Cluster Invalidation Detection** - For each existing cluster:
   - **Valid**: No affected nodes → cluster unchanged
   - **Invalid (k-validity)**: Internal degrees drop below k → needs recomputation
   - **Invalid (merge)**: External nodes promoted to same k-core → potential merge

3. **Localized Recomputation** - Extract affected subgraph and run standard IKC on just that region

### Python API

#### Basic Usage

```python
import ikc

# Load graph and compute initial clustering
g = ikc.StreamingGraph('network.tsv')
result = g.ikc(min_k=10)

print(f"Initial: {result.num_clusters} clusters")

# Add edges incrementally
result = g.add_edges([(100, 200), (101, 202)])
print(f"After update: {result.num_clusters} clusters")

# View update statistics
stats = g.last_update_stats
print(f"Affected nodes: {stats['affected_nodes']}")
print(f"Update time: {stats['total_time_ms']}ms")
```

#### Batch Mode (Recommended for Multiple Updates)

```python
# Accumulate updates without recomputation
g.begin_batch()

g.add_edges([(1, 2), (3, 4)])
g.add_edges([(5, 6), (7, 8)])
g.add_nodes([100, 101, 102])

# Apply all updates at once
result = g.commit_batch()
```

#### Combined Updates

```python
# More efficient than separate calls
result = g.update(
    new_edges=[(1, 2), (3, 4)],
    new_nodes=[1, 2, 3, 4]  # Include all nodes referenced in edges
)
```

**Important Note**: When using `update()`, all nodes referenced in `new_edges` must either:
1. Already exist in the graph, OR
2. Be included in the `new_nodes` list

For example, this will raise an error:
```python
# ERROR: Nodes 100 and 200 don't exist and aren't in new_nodes
result = g.update(new_edges=[(100, 200)])  # ValueError!
```

Correct usage:
```python
# Include all new nodes
result = g.update(new_edges=[(100, 200)], new_nodes=[100, 200])  # ✓
```

### StreamingGraph API Reference

#### Constructor
```python
g = ikc.StreamingGraph(graph_file, num_threads=None, verbose=False)
```

#### Methods

**`ikc(min_k=0, verbose=False, progress_bar=False) -> ClusterResult`**
- Run initial IKC clustering and initialize streaming state
- Must be called before any update operations

**`add_edges(edges, verbose=False) -> ClusterResult`**
- Add edges and update clustering incrementally
- `edges`: List of `(node_id, node_id)` tuples
- Returns updated clustering

**`add_nodes(nodes, verbose=False) -> ClusterResult`**
- Add isolated nodes to the graph
- `nodes`: List of node IDs
- Returns updated clustering

**`update(new_edges=None, new_nodes=None, verbose=False) -> ClusterResult`**
- Add both edges and nodes in a single operation
- More efficient than separate calls
- **Important**: All nodes referenced in `new_edges` must be included in `new_nodes` (if they don't already exist in the graph)
- Raises `ValueError` if an edge references a non-existent node not in `new_nodes`

**`begin_batch()`**
- Enter batch mode - accumulate updates without recomputation

**`commit_batch(verbose=False) -> ClusterResult`**
- Apply all pending updates and exit batch mode

#### Properties

**`current_clustering -> ClusterResult`**
- Get current clustering without triggering recomputation

**`last_update_stats -> dict`**
- Statistics from the last update operation:
  - `affected_nodes`: Nodes with changed core numbers
  - `invalidated_clusters`: Clusters that needed recomputation
  - `valid_clusters`: Clusters that remained valid
  - `merge_candidates`: Nodes involved in potential merges
  - `recompute_time_ms`: Time spent in localized recomputation
  - `total_time_ms`: Total update time

**`num_nodes -> int`**, **`num_edges -> int`**, **`max_core -> int`**, **`is_batch_mode -> bool`**

### Streaming Example

```python
import ikc

# Initialize with existing graph
g = ikc.StreamingGraph('network.tsv')
result = g.ikc(min_k=10, progress_bar=True)
print(f"Initial: {result.num_clusters} clusters")

# Incremental update
result = g.add_edges([(100, 200), (101, 202)])
stats = g.last_update_stats
print(f"Update: {stats['affected_nodes']} nodes affected in {stats['total_time_ms']:.2f}ms")

# Batch mode for multiple updates
g.begin_batch()
g.add_edges([(1, 2), (3, 4)])
g.add_nodes([500, 501])
result = g.commit_batch()
print(f"Batch complete: {result.num_clusters} clusters")
```

See `python/streaming_example.py` for a complete demonstration.

### Performance Characteristics

**Expected Case (Sparse Updates)**
- Per-edge time: O(Δ² · log n) where Δ = max degree
- Most clusters remain valid
- Speedup: 10-100x vs full recomputation

**Worst Case (Dense Core Updates)**
- Per-edge time: O(n + m) (same as full recomputation)
- Occurs when adding edges to maximum k-core

**Batch Mode Benefits**
- Amortizes overhead across multiple updates
- Single core number recomputation for all edges
- Recommended for bulk updates

### When to Use Streaming vs Batch IKC

**Use Streaming IKC When:**
- Graph evolves incrementally over time
- Need up-to-date clustering after each update
- Updates are sparse (few edges at a time)
- Want to track which clusters changed

**Use Batch IKC (`Graph.ikc()`) When:**
- Have complete graph from the start
- Don't need intermediate clustering results
- Graph structure is static

### Limitations

- **Edge deletions not supported** - Only additions are handled
- **No state persistence** - Cannot save/load streaming state
- Cluster splits never occur with edge additions (proven mathematically)

### References

- Sariyüce, A. E., Gedik, B., Jacques-Silva, G., Wu, K. L., & Çatalyürek, Ü. V. (2013).
  "Streaming algorithms for k-core decomposition."
  *Proceedings of the VLDB Endowment*, 6(6), 433-444.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/vikramr2/ikc",
    "name": "ikc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "graph clustering k-core community-detection network-analysis",
    "author": "Vikram Ramavarapu",
    "author_email": "vikramr2@illinois.edu",
    "download_url": "https://files.pythonhosted.org/packages/49/fc/7b844a360ec132971ebea6f53d1f1615764d168fe1a0a64decd53fbd9cd2/ikc-0.3.0.tar.gz",
    "platform": null,
    "description": "# IKC - Iterative K-Core Clustering\n\nFast C++ implementation of Iterative K-Core Clustering with Python wrapper.\n\n## Project Structure\n\n```\nikc/\n\u251c\u2500\u2500 app/                     # C++ application\n\u251c\u2500\u2500 lib/                     # C++ libraries\n\u2502   \u251c\u2500\u2500 algorithms/          # Core IKC algorithms\n\u2502   \u251c\u2500\u2500 data_structures/     # Graph data structures\n\u2502   \u2514\u2500\u2500 io/                  # Graph I/O utilities\n\u251c\u2500\u2500 python/                  # Python wrapper\n\u2502   \u251c\u2500\u2500 ikc/                 # Python package\n\u2502   \u251c\u2500\u2500 bindings.cpp         # pybind11 C++ bindings\n\u2502   \u2514\u2500\u2500 example.py           # Example Python usage\n\u251c\u2500\u2500 data/                    # Test datasets\n\u2514\u2500\u2500 build/                   # Build directory\n```\n\n## Building the C++ Executable\n\n```bash\ncd build\ncmake ..\nmake\n```\n\nThe compiled executable will be at `build/ikc`.\n\n## C++ Command-Line Usage\n\n```bash\n./ikc -e <graph_file.tsv> -o <output.csv> [-k <min_k>] [-t <num_threads>] [-q] [--tsv]\n```\n\n### Options\n\n- `-e <graph_file.tsv>` - Path to input graph edge list (TSV format)\n- `-o <output.csv>` - Path to output file\n- `-k <min_k>` - Minimum k value for valid clusters (default: 0)\n- `-t <num_threads>` - Number of threads (default: hardware concurrency)\n- `-q` - Quiet mode (suppress verbose output)\n- `--tsv` - Output as TSV (node_id cluster_id) without header\n\n### Examples\n\n```bash\n# Run with default settings (CSV output with all columns)\n./ikc -e data/cit_hepph.tsv -o output.csv\n\n# Run with min_k=10 and TSV output\n./ikc -e data/cit_hepph.tsv -o output.tsv -k 10 --tsv\n\n# Run with 8 threads in quiet mode\n./ikc -e data/cit_hepph.tsv -o output.csv -k 10 -t 8 -q\n```\n\n### Output Formats\n\n**CSV format** (default):\n```\nnode_id,cluster_id,k_value,modularity\n3306,1,30,1.0\n9803315,1,30,1.0\n```\n\n**TSV format** (with `--tsv` flag):\n```\n3306\t1\n9803315\t1\n```\n\n## Python Wrapper\n\nThe Python wrapper uses **pybind11** to directly bind the C++ code, providing:\n- Fast performance (no subprocess overhead)\n- Clean Python API\n- No heavy dependencies (no pandas required)\n\n### Installation\n\n**Option 1: Install from PyPI (once published):**\n\n```bash\npip install ikc\n```\n\n**Option 2: Install from source:**\n\n**Install dependencies:**\n\n```bash\npip install -r requirements.txt\n```\n\n**Install the Python package:**\n\n```bash\npip install -e .\n```\n\nThis will automatically compile the C++ extension and install the Python package.\n\nAlternatively, install in one step (pip will handle dependencies):\n\n```bash\npip install -e .\n```\n\n**Verify installation:**\n\n```bash\npython3 -c \"import ikc; print('IKC wrapper installed successfully!')\"\n```\n\n### Python Usage\n\n```python\nimport ikc\n\n# Load a graph from a TSV edge list file\ng = ikc.load_graph('net.tsv')\n\n# Run the IKC algorithm with min_k=10\nc = g.ikc(10)\n\n# Save results as TSV (node_id, cluster_id, no header)\nc.save('out.tsv', tsv=True)\n\n# Or save as CSV with all columns (node_id, cluster_id, k_value, modularity)\nc.save('out.csv', tsv=False)\n\n# Access clustering information\nprint(f\"Number of clusters: {c.num_clusters}\")\nprint(f\"Number of nodes: {c.num_nodes}\")\n\n# Access the underlying data as list of tuples\nprint(c.data[:10])  # First 10 rows\n```\n\n### Python API Reference\n\n#### `ikc.load_graph(graph_file, num_threads=None, verbose=False)`\n\nLoad a graph from a TSV edge list file.\n\n**Parameters:**\n- `graph_file` (str): Path to the graph edge list file (TSV format)\n- `num_threads` (int, optional): Number of threads for loading (default: hardware concurrency)\n- `verbose` (bool): Print loading progress (default: False)\n\n**Returns:**\n- `Graph`: Graph object ready for clustering\n\n#### `Graph.ikc(min_k=0, verbose=False, progress_bar=False)`\n\nRun the Iterative K-Core Clustering algorithm.\n\n**Parameters:**\n- `min_k` (int): Minimum k value for valid clusters (default: 0)\n- `verbose` (bool): Print algorithm progress (default: False)\n- `progress_bar` (bool): Display tqdm progress bar tracking k-core decomposition from initial max k (0%) to min_k (100%) (default: False)\n\n**Returns:**\n- `ClusterResult`: Object containing the clustering results\n\n**Progress Bar:**\nWhen `progress_bar=True`, a tqdm progress bar displays the k-core decomposition progress:\n- **Initial max k-core value** \u2192 **0% progress**\n- **Target min_k value** \u2192 **100% progress**\n\nFor example, if you run `ikc(min_k=10)` and the first core found is k=40:\n- k=40 \u2192 0% progress (start)\n- k=30 \u2192 33% progress\n- k=20 \u2192 66% progress\n- k=10 \u2192 100% progress (complete)\n\nThe progress bar shows the current k value and processing speed.\n\n#### `ClusterResult.save(filename, tsv=False)`\n\nSave clustering results to a file.\n\n**Parameters:**\n- `filename` (str): Output file path\n- `tsv` (bool): If True, save as TSV with only node_id and cluster_id (no header). If False, save as CSV with all columns (default: False)\n\n#### `ClusterResult` Properties\n\n- `num_clusters`: Number of clusters found\n- `num_nodes`: Number of nodes in the clustering\n- `data`: List of tuples `(node_id, cluster_id, k_value, modularity)` for all clustered nodes\n- `clusters`: List of C++ Cluster objects with `nodes`, `k_value`, and `modularity` attributes\n\n### Python Example\n\n```python\nimport ikc\n\n# Load graph with 4 threads\ng = ikc.load_graph('data/cit_hepph.tsv', num_threads=4)\nprint(g)  # Graph(file='...', nodes=34546, edges=420877)\n\n# Run IKC with min_k=10 and progress bar\nclusters = g.ikc(min_k=10, progress_bar=True)\n# Output: IKC Progress: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 20/20 [00:00<00:00, 84.99k-core levels/s, current_k=10]\n\n# Print summary\nprint(clusters)  # ClusterResult(nodes=34546, clusters=27639)\n\n# Save in TSV format\nclusters.save('output.tsv', tsv=True)\n\n# Save in CSV format with all columns\nclusters.save('output.csv', tsv=False)\n\n# Access the data (first 5 rows)\nfor row in clusters.data[:5]:\n    node_id, cluster_id, k_value, modularity = row\n    print(f\"Node {node_id} in cluster {cluster_id}\")\n\n# Get cluster statistics\nprint(f\"Total clusters: {clusters.num_clusters}\")\nprint(f\"Total nodes: {clusters.num_nodes}\")\n```\n\n### Run Example Script\n\n```bash\npython3 python/example.py\n```\n\n## Requirements\n\n### C++\n- CMake 3.10+\n- C++17 compatible compiler\n- OpenMP support (for parallel graph loading)\n\n### Python\n- Python 3.7+\n- pybind11 >= 2.6.0 (automatically installed with `pip install`)\n- tqdm >= 4.0.0 (for progress bar feature, automatically installed with `pip install`)\n\n## Input Format\n\nThe input graph file should be a TSV (tab-separated) edge list with two columns:\n```\nnode1\tnode2\nnode3\tnode4\n...\n```\n\nNo header required. Nodes can be any integer IDs.\n\n---\n\n## Streaming IKC (Incremental Updates)\n\nThe streaming IKC implementation allows you to efficiently update clustering as your graph evolves over time, without full recomputation.\n\n### Overview\n\nThe streaming algorithm:\n- Starts with an initial graph and IKC clustering\n- Adds new edges and nodes incrementally\n- Efficiently updates clustering using localized recomputation\n- Provides 10-100x speedup for sparse updates vs full recomputation\n\n### Algorithm Summary\n\nThe streaming algorithm consists of three main phases:\n\n1. **Incremental Core Number Update** - Based on Sariy\u00fcce et al. (2013), when edges are added:\n   - Core numbers can only increase (never decrease)\n   - Only nodes near added edges need checking\n   - Uses priority queue for efficient promotion\n\n2. **Cluster Invalidation Detection** - For each existing cluster:\n   - **Valid**: No affected nodes \u2192 cluster unchanged\n   - **Invalid (k-validity)**: Internal degrees drop below k \u2192 needs recomputation\n   - **Invalid (merge)**: External nodes promoted to same k-core \u2192 potential merge\n\n3. **Localized Recomputation** - Extract affected subgraph and run standard IKC on just that region\n\n### Python API\n\n#### Basic Usage\n\n```python\nimport ikc\n\n# Load graph and compute initial clustering\ng = ikc.StreamingGraph('network.tsv')\nresult = g.ikc(min_k=10)\n\nprint(f\"Initial: {result.num_clusters} clusters\")\n\n# Add edges incrementally\nresult = g.add_edges([(100, 200), (101, 202)])\nprint(f\"After update: {result.num_clusters} clusters\")\n\n# View update statistics\nstats = g.last_update_stats\nprint(f\"Affected nodes: {stats['affected_nodes']}\")\nprint(f\"Update time: {stats['total_time_ms']}ms\")\n```\n\n#### Batch Mode (Recommended for Multiple Updates)\n\n```python\n# Accumulate updates without recomputation\ng.begin_batch()\n\ng.add_edges([(1, 2), (3, 4)])\ng.add_edges([(5, 6), (7, 8)])\ng.add_nodes([100, 101, 102])\n\n# Apply all updates at once\nresult = g.commit_batch()\n```\n\n#### Combined Updates\n\n```python\n# More efficient than separate calls\nresult = g.update(\n    new_edges=[(1, 2), (3, 4)],\n    new_nodes=[1, 2, 3, 4]  # Include all nodes referenced in edges\n)\n```\n\n**Important Note**: When using `update()`, all nodes referenced in `new_edges` must either:\n1. Already exist in the graph, OR\n2. Be included in the `new_nodes` list\n\nFor example, this will raise an error:\n```python\n# ERROR: Nodes 100 and 200 don't exist and aren't in new_nodes\nresult = g.update(new_edges=[(100, 200)])  # ValueError!\n```\n\nCorrect usage:\n```python\n# Include all new nodes\nresult = g.update(new_edges=[(100, 200)], new_nodes=[100, 200])  # \u2713\n```\n\n### StreamingGraph API Reference\n\n#### Constructor\n```python\ng = ikc.StreamingGraph(graph_file, num_threads=None, verbose=False)\n```\n\n#### Methods\n\n**`ikc(min_k=0, verbose=False, progress_bar=False) -> ClusterResult`**\n- Run initial IKC clustering and initialize streaming state\n- Must be called before any update operations\n\n**`add_edges(edges, verbose=False) -> ClusterResult`**\n- Add edges and update clustering incrementally\n- `edges`: List of `(node_id, node_id)` tuples\n- Returns updated clustering\n\n**`add_nodes(nodes, verbose=False) -> ClusterResult`**\n- Add isolated nodes to the graph\n- `nodes`: List of node IDs\n- Returns updated clustering\n\n**`update(new_edges=None, new_nodes=None, verbose=False) -> ClusterResult`**\n- Add both edges and nodes in a single operation\n- More efficient than separate calls\n- **Important**: All nodes referenced in `new_edges` must be included in `new_nodes` (if they don't already exist in the graph)\n- Raises `ValueError` if an edge references a non-existent node not in `new_nodes`\n\n**`begin_batch()`**\n- Enter batch mode - accumulate updates without recomputation\n\n**`commit_batch(verbose=False) -> ClusterResult`**\n- Apply all pending updates and exit batch mode\n\n#### Properties\n\n**`current_clustering -> ClusterResult`**\n- Get current clustering without triggering recomputation\n\n**`last_update_stats -> dict`**\n- Statistics from the last update operation:\n  - `affected_nodes`: Nodes with changed core numbers\n  - `invalidated_clusters`: Clusters that needed recomputation\n  - `valid_clusters`: Clusters that remained valid\n  - `merge_candidates`: Nodes involved in potential merges\n  - `recompute_time_ms`: Time spent in localized recomputation\n  - `total_time_ms`: Total update time\n\n**`num_nodes -> int`**, **`num_edges -> int`**, **`max_core -> int`**, **`is_batch_mode -> bool`**\n\n### Streaming Example\n\n```python\nimport ikc\n\n# Initialize with existing graph\ng = ikc.StreamingGraph('network.tsv')\nresult = g.ikc(min_k=10, progress_bar=True)\nprint(f\"Initial: {result.num_clusters} clusters\")\n\n# Incremental update\nresult = g.add_edges([(100, 200), (101, 202)])\nstats = g.last_update_stats\nprint(f\"Update: {stats['affected_nodes']} nodes affected in {stats['total_time_ms']:.2f}ms\")\n\n# Batch mode for multiple updates\ng.begin_batch()\ng.add_edges([(1, 2), (3, 4)])\ng.add_nodes([500, 501])\nresult = g.commit_batch()\nprint(f\"Batch complete: {result.num_clusters} clusters\")\n```\n\nSee `python/streaming_example.py` for a complete demonstration.\n\n### Performance Characteristics\n\n**Expected Case (Sparse Updates)**\n- Per-edge time: O(\u0394\u00b2 \u00b7 log n) where \u0394 = max degree\n- Most clusters remain valid\n- Speedup: 10-100x vs full recomputation\n\n**Worst Case (Dense Core Updates)**\n- Per-edge time: O(n + m) (same as full recomputation)\n- Occurs when adding edges to maximum k-core\n\n**Batch Mode Benefits**\n- Amortizes overhead across multiple updates\n- Single core number recomputation for all edges\n- Recommended for bulk updates\n\n### When to Use Streaming vs Batch IKC\n\n**Use Streaming IKC When:**\n- Graph evolves incrementally over time\n- Need up-to-date clustering after each update\n- Updates are sparse (few edges at a time)\n- Want to track which clusters changed\n\n**Use Batch IKC (`Graph.ikc()`) When:**\n- Have complete graph from the start\n- Don't need intermediate clustering results\n- Graph structure is static\n\n### Limitations\n\n- **Edge deletions not supported** - Only additions are handled\n- **No state persistence** - Cannot save/load streaming state\n- Cluster splits never occur with edge additions (proven mathematically)\n\n### References\n\n- Sariy\u00fcce, A. E., Gedik, B., Jacques-Silva, G., Wu, K. L., & \u00c7ataly\u00fcrek, \u00dc. V. (2013).\n  \"Streaming algorithms for k-core decomposition.\"\n  *Proceedings of the VLDB Endowment*, 6(6), 433-444.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Fast C++ implementation of Iterative K-Core Clustering with Python bindings",
    "version": "0.3.0",
    "project_urls": {
        "Bug Reports": "https://github.com/vikramr2/ikc/issues",
        "Homepage": "https://github.com/vikramr2/ikc",
        "Source": "https://github.com/vikramr2/ikc"
    },
    "split_keywords": [
        "graph",
        "clustering",
        "k-core",
        "community-detection",
        "network-analysis"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c86d0aef8a7d67d60c36ad1eb1de1a2438e9758b73c6f2f2221cda4b2cd15a41",
                "md5": "cd98c8074a8cbfc3d9b758fdde5123ba",
                "sha256": "0c9882d69ea7732c445d56cc54a215e31919dea6d8810800a9113c086f72afdc"
            },
            "downloads": -1,
            "filename": "ikc-0.3.0-cp39-cp39-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "cd98c8074a8cbfc3d9b758fdde5123ba",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.7",
            "size": 176890,
            "upload_time": "2025-11-08T01:34:49",
            "upload_time_iso_8601": "2025-11-08T01:34:49.344433Z",
            "url": "https://files.pythonhosted.org/packages/c8/6d/0aef8a7d67d60c36ad1eb1de1a2438e9758b73c6f2f2221cda4b2cd15a41/ikc-0.3.0-cp39-cp39-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "49fc7b844a360ec132971ebea6f53d1f1615764d168fe1a0a64decd53fbd9cd2",
                "md5": "d07c72cec21a60591984441dc47d8a97",
                "sha256": "fac70439379e83b41479db2e4d98218427a91917792f5c01bd1992cf49b02e9b"
            },
            "downloads": -1,
            "filename": "ikc-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "d07c72cec21a60591984441dc47d8a97",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 30676,
            "upload_time": "2025-11-08T01:34:51",
            "upload_time_iso_8601": "2025-11-08T01:34:51.221939Z",
            "url": "https://files.pythonhosted.org/packages/49/fc/7b844a360ec132971ebea6f53d1f1615764d168fe1a0a64decd53fbd9cd2/ikc-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-08 01:34:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "vikramr2",
    "github_project": "ikc",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pybind11",
            "specs": [
                [
                    ">=",
                    "2.6.0"
                ]
            ]
        }
    ],
    "lcname": "ikc"
}

Vikram Ramavarapu