orca-graphlets


Nameorca-graphlets JSON
Version 0.1.4 PyPI version JSON
download
home_pageNone
SummaryORCA: Python wrapper for efficient graphlet counting
upload_time2025-07-22 09:21:54
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords graph graphlets network analysis orca
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ORCA: Python Wrapper for Efficient Graphlet Counting

[![PyPI version](https://badge.fury.io/py/orca-graphlets.svg)](https://badge.fury.io/py/orca-graphlets)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A Python implementation of the ORCA (ORbit Counting Algorithm) for efficient counting of graphlet orbits in networks. This is a pure Python port of the original C++ implementation by [thocevar/orca](https://github.com/thocevar/orca).

## What is ORCA?

ORCA is an efficient algorithm for counting graphlets in networks. It computes node-orbits and edge-orbits for 4-node and 5-node graphlets for each node in the network. Graphlets are small connected subgraphs that serve as fundamental building blocks for network analysis.

## Features

- **Pure Python implementation** - No external C++ dependencies required
- **Node and edge orbit counting** - Count orbits for both nodes and edges
- **4-node and 5-node graphlets** - Support for graphlets of size 4 and 5
- **NumPy integration** - Efficient array-based operations
- **Validated accuracy** - Test suite ensures outputs match the original C++ implementation
- **Easy to use** - Simple API with sensible defaults

## Installation

```bash
pip install orca-graphlets
```

## Quick Start

```python
import numpy as np
from orca import orca_nodes, orca_edges

# Define a simple graph as an edge list
edges = np.array([
    [0, 1],
    [1, 2],
    [2, 3],
    [3, 0],
    [1, 3]
])

# Count node orbits for 4-node graphlets
node_orbits = orca_nodes(edges, graphlet_size=4)
print("Node orbits shape:", node_orbits.shape)

# Count edge orbits for 4-node graphlets  
edge_orbits = orca_edges(edges, graphlet_size=4)
print("Edge orbits shape:", edge_orbits.shape)
```

## API Reference

### `orca_nodes(edge_list, num_nodes=None, graphlet_size=4, debug=False)`

Count node orbits for each node in the graph.

**Parameters:**

- `edge_list` (np.ndarray): Array of shape (E, 2) containing edges as pairs of node indices
- `num_nodes` (int, optional): Number of nodes in the graph. If None, inferred from edge_list
- `graphlet_size` (int): Size of graphlets to count (4 or 5)
- `debug` (bool): Enable debug output

**Returns:**

- `np.ndarray`: Array of shape (N, K) where N is the number of nodes and K is the number of orbit types for the given graphlet size

### `orca_edges(edge_list, num_nodes=None, graphlet_size=4, debug=False)`

Count edge orbits for each edge in the graph.

**Parameters:**

- Same as `orca_nodes`

**Returns:**

- `np.ndarray`: Array of shape (E, K) where E is the number of edges and K is the number of orbit types for the given graphlet size

## Graphlet Orbits

### 4-node graphlets

- **Node orbits**: 15 different orbit types (0-14)
- **Edge orbits**: 11 different orbit types (0-10)

### 5-node graphlets

- **Node orbits**: 73 different orbit types (0-72)
- **Edge orbits**: 58 different orbit types (0-57)

## Input Format

The edge list should be a NumPy array where:

- Each row represents an undirected edge
- Columns contain the node indices (0-based)
- Node indices should be integers from 0 to N-1 where N is the number of nodes

Example:

```python
# Triangle graph: nodes 0, 1, 2 fully connected
edges = np.array([
    [0, 1],
    [1, 2], 
    [2, 0]
])
```

## Examples

### Basic Usage

```python
import numpy as np
from orca import orca_nodes, orca_edges

# Create a small graph
edges = np.array([
    [0, 1],
    [1, 2],
    [2, 3],
    [0, 3]
])

# Count 4-node graphlet orbits for nodes
node_counts = orca_nodes(edges, graphlet_size=4)
print(f"Node 0 orbit counts: {node_counts[0]}")

# Count 5-node graphlet orbits for edges
edge_counts = orca_edges(edges, graphlet_size=5)
print(f"Edge (0,1) orbit counts: {edge_counts[0]}")
```

### Loading from File

```python
import numpy as np
from orca import orca_nodes

# Load graph from file (format: first line = "num_nodes num_edges", 
# following lines = "node1 node2")
def load_graph(filename):
    with open(filename, 'r') as f:
        lines = f.readlines()
        num_nodes, num_edges = map(int, lines[0].strip().split())
        edges = np.array([
            list(map(int, line.strip().split())) 
            for line in lines[1:]
        ])
    return edges, num_nodes

edges, num_nodes = load_graph('graph.in')
orbits = orca_nodes(edges, num_nodes=num_nodes, graphlet_size=4)
```

## Performance

This pure Python implementation prioritizes:

- **Correctness**: Exact same results as the original C++ version
- **Ease of use**: No compilation or external dependencies required
- **Maintainability**: Clean, readable Python code

For maximum performance on very large graphs, consider using the original C++ implementation.

## Testing

The package includes comprehensive tests that verify the outputs match the original C++ implementation:

```bash
pytest tests/
```

## Original Work

This is a Python port of the original ORCA algorithm:

- **Original repository**: [thocevar/orca](https://github.com/thocevar/orca)
- **Algorithm paper**: Hočevar, T., & Demšar, J. (2014). A combinatorial approach to graphlet counting. Bioinformatics, 30(4), 559-565.

## License

This project is licensed under the MIT License - see the LICENSE file for details.

The original ORCA algorithm is licensed under GPL-3.0.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Citation

If you use this software in your research, please cite the original paper:

```bibtex
@article{hocevar2014combinatorial,
  title={A combinatorial approach to graphlet counting},
  author={Ho{\v{c}}evar, Toma{\v{z}} and Dem{\v{s}}ar, Janez},
  journal={Bioinformatics},
  volume={30},
  number={4},
  pages={559--565},
  year={2014},
  publisher={Oxford University Press}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "orca-graphlets",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "graph, graphlets, network, analysis, orca",
    "author": null,
    "author_email": "Ole Petersen <peteole2707@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/10/53/bc0f611d31d75ac2334a7fa107cdd46232e6bcd2a240e7d274f63b49a6f4/orca_graphlets-0.1.4.tar.gz",
    "platform": null,
    "description": "# ORCA: Python Wrapper for Efficient Graphlet Counting\n\n[![PyPI version](https://badge.fury.io/py/orca-graphlets.svg)](https://badge.fury.io/py/orca-graphlets)\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nA Python implementation of the ORCA (ORbit Counting Algorithm) for efficient counting of graphlet orbits in networks. This is a pure Python port of the original C++ implementation by [thocevar/orca](https://github.com/thocevar/orca).\n\n## What is ORCA?\n\nORCA is an efficient algorithm for counting graphlets in networks. It computes node-orbits and edge-orbits for 4-node and 5-node graphlets for each node in the network. Graphlets are small connected subgraphs that serve as fundamental building blocks for network analysis.\n\n## Features\n\n- **Pure Python implementation** - No external C++ dependencies required\n- **Node and edge orbit counting** - Count orbits for both nodes and edges\n- **4-node and 5-node graphlets** - Support for graphlets of size 4 and 5\n- **NumPy integration** - Efficient array-based operations\n- **Validated accuracy** - Test suite ensures outputs match the original C++ implementation\n- **Easy to use** - Simple API with sensible defaults\n\n## Installation\n\n```bash\npip install orca-graphlets\n```\n\n## Quick Start\n\n```python\nimport numpy as np\nfrom orca import orca_nodes, orca_edges\n\n# Define a simple graph as an edge list\nedges = np.array([\n    [0, 1],\n    [1, 2],\n    [2, 3],\n    [3, 0],\n    [1, 3]\n])\n\n# Count node orbits for 4-node graphlets\nnode_orbits = orca_nodes(edges, graphlet_size=4)\nprint(\"Node orbits shape:\", node_orbits.shape)\n\n# Count edge orbits for 4-node graphlets  \nedge_orbits = orca_edges(edges, graphlet_size=4)\nprint(\"Edge orbits shape:\", edge_orbits.shape)\n```\n\n## API Reference\n\n### `orca_nodes(edge_list, num_nodes=None, graphlet_size=4, debug=False)`\n\nCount node orbits for each node in the graph.\n\n**Parameters:**\n\n- `edge_list` (np.ndarray): Array of shape (E, 2) containing edges as pairs of node indices\n- `num_nodes` (int, optional): Number of nodes in the graph. If None, inferred from edge_list\n- `graphlet_size` (int): Size of graphlets to count (4 or 5)\n- `debug` (bool): Enable debug output\n\n**Returns:**\n\n- `np.ndarray`: Array of shape (N, K) where N is the number of nodes and K is the number of orbit types for the given graphlet size\n\n### `orca_edges(edge_list, num_nodes=None, graphlet_size=4, debug=False)`\n\nCount edge orbits for each edge in the graph.\n\n**Parameters:**\n\n- Same as `orca_nodes`\n\n**Returns:**\n\n- `np.ndarray`: Array of shape (E, K) where E is the number of edges and K is the number of orbit types for the given graphlet size\n\n## Graphlet Orbits\n\n### 4-node graphlets\n\n- **Node orbits**: 15 different orbit types (0-14)\n- **Edge orbits**: 11 different orbit types (0-10)\n\n### 5-node graphlets\n\n- **Node orbits**: 73 different orbit types (0-72)\n- **Edge orbits**: 58 different orbit types (0-57)\n\n## Input Format\n\nThe edge list should be a NumPy array where:\n\n- Each row represents an undirected edge\n- Columns contain the node indices (0-based)\n- Node indices should be integers from 0 to N-1 where N is the number of nodes\n\nExample:\n\n```python\n# Triangle graph: nodes 0, 1, 2 fully connected\nedges = np.array([\n    [0, 1],\n    [1, 2], \n    [2, 0]\n])\n```\n\n## Examples\n\n### Basic Usage\n\n```python\nimport numpy as np\nfrom orca import orca_nodes, orca_edges\n\n# Create a small graph\nedges = np.array([\n    [0, 1],\n    [1, 2],\n    [2, 3],\n    [0, 3]\n])\n\n# Count 4-node graphlet orbits for nodes\nnode_counts = orca_nodes(edges, graphlet_size=4)\nprint(f\"Node 0 orbit counts: {node_counts[0]}\")\n\n# Count 5-node graphlet orbits for edges\nedge_counts = orca_edges(edges, graphlet_size=5)\nprint(f\"Edge (0,1) orbit counts: {edge_counts[0]}\")\n```\n\n### Loading from File\n\n```python\nimport numpy as np\nfrom orca import orca_nodes\n\n# Load graph from file (format: first line = \"num_nodes num_edges\", \n# following lines = \"node1 node2\")\ndef load_graph(filename):\n    with open(filename, 'r') as f:\n        lines = f.readlines()\n        num_nodes, num_edges = map(int, lines[0].strip().split())\n        edges = np.array([\n            list(map(int, line.strip().split())) \n            for line in lines[1:]\n        ])\n    return edges, num_nodes\n\nedges, num_nodes = load_graph('graph.in')\norbits = orca_nodes(edges, num_nodes=num_nodes, graphlet_size=4)\n```\n\n## Performance\n\nThis pure Python implementation prioritizes:\n\n- **Correctness**: Exact same results as the original C++ version\n- **Ease of use**: No compilation or external dependencies required\n- **Maintainability**: Clean, readable Python code\n\nFor maximum performance on very large graphs, consider using the original C++ implementation.\n\n## Testing\n\nThe package includes comprehensive tests that verify the outputs match the original C++ implementation:\n\n```bash\npytest tests/\n```\n\n## Original Work\n\nThis is a Python port of the original ORCA algorithm:\n\n- **Original repository**: [thocevar/orca](https://github.com/thocevar/orca)\n- **Algorithm paper**: Ho\u010devar, T., & Dem\u0161ar, J. (2014). A combinatorial approach to graphlet counting. Bioinformatics, 30(4), 559-565.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\nThe original ORCA algorithm is licensed under GPL-3.0.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## Citation\n\nIf you use this software in your research, please cite the original paper:\n\n```bibtex\n@article{hocevar2014combinatorial,\n  title={A combinatorial approach to graphlet counting},\n  author={Ho{\\v{c}}evar, Toma{\\v{z}} and Dem{\\v{s}}ar, Janez},\n  journal={Bioinformatics},\n  volume={30},\n  number={4},\n  pages={559--565},\n  year={2014},\n  publisher={Oxford University Press}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "ORCA: Python wrapper for efficient graphlet counting",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/yourusername/orca_v2",
        "Issues": "https://github.com/yourusername/orca_v2/issues",
        "Repository": "https://github.com/yourusername/orca_v2"
    },
    "split_keywords": [
        "graph",
        " graphlets",
        " network",
        " analysis",
        " orca"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ac9e65643942fd0bc68e1c526c759735b62c9263a3ecdeb8adcd1a1c45ff6e3f",
                "md5": "23c40bbcb3bb95ee8dc3436fb3090a3e",
                "sha256": "bd411bd04400167e446ce450d77f71a29690f73a45a859af5097c48094700296"
            },
            "downloads": -1,
            "filename": "orca_graphlets-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "23c40bbcb3bb95ee8dc3436fb3090a3e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 15452,
            "upload_time": "2025-07-22T09:21:53",
            "upload_time_iso_8601": "2025-07-22T09:21:53.343440Z",
            "url": "https://files.pythonhosted.org/packages/ac/9e/65643942fd0bc68e1c526c759735b62c9263a3ecdeb8adcd1a1c45ff6e3f/orca_graphlets-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1053bc0f611d31d75ac2334a7fa107cdd46232e6bcd2a240e7d274f63b49a6f4",
                "md5": "6c0848fc92c2b4697204284342de7d39",
                "sha256": "f47d553184ae7bed2b117c6a860fe5de585ea45cc245bc490e29710c89e43e70"
            },
            "downloads": -1,
            "filename": "orca_graphlets-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "6c0848fc92c2b4697204284342de7d39",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 17988,
            "upload_time": "2025-07-22T09:21:54",
            "upload_time_iso_8601": "2025-07-22T09:21:54.568082Z",
            "url": "https://files.pythonhosted.org/packages/10/53/bc0f611d31d75ac2334a7fa107cdd46232e6bcd2a240e7d274f63b49a6f4/orca_graphlets-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-22 09:21:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yourusername",
    "github_project": "orca_v2",
    "github_not_found": true,
    "lcname": "orca-graphlets"
}
        
Elapsed time: 0.61509s