# ORCA: Python Wrapper for Efficient Graphlet Counting
[](https://badge.fury.io/py/orca-graphlets)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
A Python implementation of the ORCA (ORbit Counting Algorithm) for efficient counting of graphlet orbits in networks. This is a pure Python port of the original C++ implementation by [thocevar/orca](https://github.com/thocevar/orca).
## What is ORCA?
ORCA is an efficient algorithm for counting graphlets in networks. It computes node-orbits and edge-orbits for 4-node and 5-node graphlets for each node in the network. Graphlets are small connected subgraphs that serve as fundamental building blocks for network analysis.
## Features
- **Pure Python implementation** - No external C++ dependencies required
- **Node and edge orbit counting** - Count orbits for both nodes and edges
- **4-node and 5-node graphlets** - Support for graphlets of size 4 and 5
- **NumPy integration** - Efficient array-based operations
- **Validated accuracy** - Test suite ensures outputs match the original C++ implementation
- **Easy to use** - Simple API with sensible defaults
## Installation
```bash
pip install orca-graphlets
```
## Quick Start
```python
import numpy as np
from orca import orca_nodes, orca_edges
# Define a simple graph as an edge list
edges = np.array([
[0, 1],
[1, 2],
[2, 3],
[3, 0],
[1, 3]
])
# Count node orbits for 4-node graphlets
node_orbits = orca_nodes(edges, graphlet_size=4)
print("Node orbits shape:", node_orbits.shape)
# Count edge orbits for 4-node graphlets
edge_orbits = orca_edges(edges, graphlet_size=4)
print("Edge orbits shape:", edge_orbits.shape)
```
## API Reference
### `orca_nodes(edge_list, num_nodes=None, graphlet_size=4, debug=False)`
Count node orbits for each node in the graph.
**Parameters:**
- `edge_list` (np.ndarray): Array of shape (E, 2) containing edges as pairs of node indices
- `num_nodes` (int, optional): Number of nodes in the graph. If None, inferred from edge_list
- `graphlet_size` (int): Size of graphlets to count (4 or 5)
- `debug` (bool): Enable debug output
**Returns:**
- `np.ndarray`: Array of shape (N, K) where N is the number of nodes and K is the number of orbit types for the given graphlet size
### `orca_edges(edge_list, num_nodes=None, graphlet_size=4, debug=False)`
Count edge orbits for each edge in the graph.
**Parameters:**
- Same as `orca_nodes`
**Returns:**
- `np.ndarray`: Array of shape (E, K) where E is the number of edges and K is the number of orbit types for the given graphlet size
## Graphlet Orbits
### 4-node graphlets
- **Node orbits**: 15 different orbit types (0-14)
- **Edge orbits**: 11 different orbit types (0-10)
### 5-node graphlets
- **Node orbits**: 73 different orbit types (0-72)
- **Edge orbits**: 58 different orbit types (0-57)
## Input Format
The edge list should be a NumPy array where:
- Each row represents an undirected edge
- Columns contain the node indices (0-based)
- Node indices should be integers from 0 to N-1 where N is the number of nodes
Example:
```python
# Triangle graph: nodes 0, 1, 2 fully connected
edges = np.array([
[0, 1],
[1, 2],
[2, 0]
])
```
## Examples
### Basic Usage
```python
import numpy as np
from orca import orca_nodes, orca_edges
# Create a small graph
edges = np.array([
[0, 1],
[1, 2],
[2, 3],
[0, 3]
])
# Count 4-node graphlet orbits for nodes
node_counts = orca_nodes(edges, graphlet_size=4)
print(f"Node 0 orbit counts: {node_counts[0]}")
# Count 5-node graphlet orbits for edges
edge_counts = orca_edges(edges, graphlet_size=5)
print(f"Edge (0,1) orbit counts: {edge_counts[0]}")
```
### Loading from File
```python
import numpy as np
from orca import orca_nodes
# Load graph from file (format: first line = "num_nodes num_edges",
# following lines = "node1 node2")
def load_graph(filename):
with open(filename, 'r') as f:
lines = f.readlines()
num_nodes, num_edges = map(int, lines[0].strip().split())
edges = np.array([
list(map(int, line.strip().split()))
for line in lines[1:]
])
return edges, num_nodes
edges, num_nodes = load_graph('graph.in')
orbits = orca_nodes(edges, num_nodes=num_nodes, graphlet_size=4)
```
## Performance
This pure Python implementation prioritizes:
- **Correctness**: Exact same results as the original C++ version
- **Ease of use**: No compilation or external dependencies required
- **Maintainability**: Clean, readable Python code
For maximum performance on very large graphs, consider using the original C++ implementation.
## Testing
The package includes comprehensive tests that verify the outputs match the original C++ implementation:
```bash
pytest tests/
```
## Original Work
This is a Python port of the original ORCA algorithm:
- **Original repository**: [thocevar/orca](https://github.com/thocevar/orca)
- **Algorithm paper**: Hočevar, T., & Demšar, J. (2014). A combinatorial approach to graphlet counting. Bioinformatics, 30(4), 559-565.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
The original ORCA algorithm is licensed under GPL-3.0.
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## Citation
If you use this software in your research, please cite the original paper:
```bibtex
@article{hocevar2014combinatorial,
title={A combinatorial approach to graphlet counting},
author={Ho{\v{c}}evar, Toma{\v{z}} and Dem{\v{s}}ar, Janez},
journal={Bioinformatics},
volume={30},
number={4},
pages={559--565},
year={2014},
publisher={Oxford University Press}
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "orca-graphlets",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "graph, graphlets, network, analysis, orca",
"author": null,
"author_email": "Ole Petersen <peteole2707@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/10/53/bc0f611d31d75ac2334a7fa107cdd46232e6bcd2a240e7d274f63b49a6f4/orca_graphlets-0.1.4.tar.gz",
"platform": null,
"description": "# ORCA: Python Wrapper for Efficient Graphlet Counting\n\n[](https://badge.fury.io/py/orca-graphlets)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n\nA Python implementation of the ORCA (ORbit Counting Algorithm) for efficient counting of graphlet orbits in networks. This is a pure Python port of the original C++ implementation by [thocevar/orca](https://github.com/thocevar/orca).\n\n## What is ORCA?\n\nORCA is an efficient algorithm for counting graphlets in networks. It computes node-orbits and edge-orbits for 4-node and 5-node graphlets for each node in the network. Graphlets are small connected subgraphs that serve as fundamental building blocks for network analysis.\n\n## Features\n\n- **Pure Python implementation** - No external C++ dependencies required\n- **Node and edge orbit counting** - Count orbits for both nodes and edges\n- **4-node and 5-node graphlets** - Support for graphlets of size 4 and 5\n- **NumPy integration** - Efficient array-based operations\n- **Validated accuracy** - Test suite ensures outputs match the original C++ implementation\n- **Easy to use** - Simple API with sensible defaults\n\n## Installation\n\n```bash\npip install orca-graphlets\n```\n\n## Quick Start\n\n```python\nimport numpy as np\nfrom orca import orca_nodes, orca_edges\n\n# Define a simple graph as an edge list\nedges = np.array([\n [0, 1],\n [1, 2],\n [2, 3],\n [3, 0],\n [1, 3]\n])\n\n# Count node orbits for 4-node graphlets\nnode_orbits = orca_nodes(edges, graphlet_size=4)\nprint(\"Node orbits shape:\", node_orbits.shape)\n\n# Count edge orbits for 4-node graphlets \nedge_orbits = orca_edges(edges, graphlet_size=4)\nprint(\"Edge orbits shape:\", edge_orbits.shape)\n```\n\n## API Reference\n\n### `orca_nodes(edge_list, num_nodes=None, graphlet_size=4, debug=False)`\n\nCount node orbits for each node in the graph.\n\n**Parameters:**\n\n- `edge_list` (np.ndarray): Array of shape (E, 2) containing edges as pairs of node indices\n- `num_nodes` (int, optional): Number of nodes in the graph. If None, inferred from edge_list\n- `graphlet_size` (int): Size of graphlets to count (4 or 5)\n- `debug` (bool): Enable debug output\n\n**Returns:**\n\n- `np.ndarray`: Array of shape (N, K) where N is the number of nodes and K is the number of orbit types for the given graphlet size\n\n### `orca_edges(edge_list, num_nodes=None, graphlet_size=4, debug=False)`\n\nCount edge orbits for each edge in the graph.\n\n**Parameters:**\n\n- Same as `orca_nodes`\n\n**Returns:**\n\n- `np.ndarray`: Array of shape (E, K) where E is the number of edges and K is the number of orbit types for the given graphlet size\n\n## Graphlet Orbits\n\n### 4-node graphlets\n\n- **Node orbits**: 15 different orbit types (0-14)\n- **Edge orbits**: 11 different orbit types (0-10)\n\n### 5-node graphlets\n\n- **Node orbits**: 73 different orbit types (0-72)\n- **Edge orbits**: 58 different orbit types (0-57)\n\n## Input Format\n\nThe edge list should be a NumPy array where:\n\n- Each row represents an undirected edge\n- Columns contain the node indices (0-based)\n- Node indices should be integers from 0 to N-1 where N is the number of nodes\n\nExample:\n\n```python\n# Triangle graph: nodes 0, 1, 2 fully connected\nedges = np.array([\n [0, 1],\n [1, 2], \n [2, 0]\n])\n```\n\n## Examples\n\n### Basic Usage\n\n```python\nimport numpy as np\nfrom orca import orca_nodes, orca_edges\n\n# Create a small graph\nedges = np.array([\n [0, 1],\n [1, 2],\n [2, 3],\n [0, 3]\n])\n\n# Count 4-node graphlet orbits for nodes\nnode_counts = orca_nodes(edges, graphlet_size=4)\nprint(f\"Node 0 orbit counts: {node_counts[0]}\")\n\n# Count 5-node graphlet orbits for edges\nedge_counts = orca_edges(edges, graphlet_size=5)\nprint(f\"Edge (0,1) orbit counts: {edge_counts[0]}\")\n```\n\n### Loading from File\n\n```python\nimport numpy as np\nfrom orca import orca_nodes\n\n# Load graph from file (format: first line = \"num_nodes num_edges\", \n# following lines = \"node1 node2\")\ndef load_graph(filename):\n with open(filename, 'r') as f:\n lines = f.readlines()\n num_nodes, num_edges = map(int, lines[0].strip().split())\n edges = np.array([\n list(map(int, line.strip().split())) \n for line in lines[1:]\n ])\n return edges, num_nodes\n\nedges, num_nodes = load_graph('graph.in')\norbits = orca_nodes(edges, num_nodes=num_nodes, graphlet_size=4)\n```\n\n## Performance\n\nThis pure Python implementation prioritizes:\n\n- **Correctness**: Exact same results as the original C++ version\n- **Ease of use**: No compilation or external dependencies required\n- **Maintainability**: Clean, readable Python code\n\nFor maximum performance on very large graphs, consider using the original C++ implementation.\n\n## Testing\n\nThe package includes comprehensive tests that verify the outputs match the original C++ implementation:\n\n```bash\npytest tests/\n```\n\n## Original Work\n\nThis is a Python port of the original ORCA algorithm:\n\n- **Original repository**: [thocevar/orca](https://github.com/thocevar/orca)\n- **Algorithm paper**: Ho\u010devar, T., & Dem\u0161ar, J. (2014). A combinatorial approach to graphlet counting. Bioinformatics, 30(4), 559-565.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\nThe original ORCA algorithm is licensed under GPL-3.0.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## Citation\n\nIf you use this software in your research, please cite the original paper:\n\n```bibtex\n@article{hocevar2014combinatorial,\n title={A combinatorial approach to graphlet counting},\n author={Ho{\\v{c}}evar, Toma{\\v{z}} and Dem{\\v{s}}ar, Janez},\n journal={Bioinformatics},\n volume={30},\n number={4},\n pages={559--565},\n year={2014},\n publisher={Oxford University Press}\n}\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "ORCA: Python wrapper for efficient graphlet counting",
"version": "0.1.4",
"project_urls": {
"Homepage": "https://github.com/yourusername/orca_v2",
"Issues": "https://github.com/yourusername/orca_v2/issues",
"Repository": "https://github.com/yourusername/orca_v2"
},
"split_keywords": [
"graph",
" graphlets",
" network",
" analysis",
" orca"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "ac9e65643942fd0bc68e1c526c759735b62c9263a3ecdeb8adcd1a1c45ff6e3f",
"md5": "23c40bbcb3bb95ee8dc3436fb3090a3e",
"sha256": "bd411bd04400167e446ce450d77f71a29690f73a45a859af5097c48094700296"
},
"downloads": -1,
"filename": "orca_graphlets-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "23c40bbcb3bb95ee8dc3436fb3090a3e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 15452,
"upload_time": "2025-07-22T09:21:53",
"upload_time_iso_8601": "2025-07-22T09:21:53.343440Z",
"url": "https://files.pythonhosted.org/packages/ac/9e/65643942fd0bc68e1c526c759735b62c9263a3ecdeb8adcd1a1c45ff6e3f/orca_graphlets-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1053bc0f611d31d75ac2334a7fa107cdd46232e6bcd2a240e7d274f63b49a6f4",
"md5": "6c0848fc92c2b4697204284342de7d39",
"sha256": "f47d553184ae7bed2b117c6a860fe5de585ea45cc245bc490e29710c89e43e70"
},
"downloads": -1,
"filename": "orca_graphlets-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "6c0848fc92c2b4697204284342de7d39",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 17988,
"upload_time": "2025-07-22T09:21:54",
"upload_time_iso_8601": "2025-07-22T09:21:54.568082Z",
"url": "https://files.pythonhosted.org/packages/10/53/bc0f611d31d75ac2334a7fa107cdd46232e6bcd2a240e7d274f63b49a6f4/orca_graphlets-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-22 09:21:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yourusername",
"github_project": "orca_v2",
"github_not_found": true,
"lcname": "orca-graphlets"
}