Name | torch-graph-force JSON |
Version |
0.1.1
JSON |
| download |
home_page | |
Summary | Force-directed layouts for Large Graphs with GPU acceleration |
upload_time | 2022-12-21 16:13:46 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.8 |
license | |
keywords |
graph
layout
force-directed
pytorch
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# torch-graph-force [WIP]
A PyTorch-based library for embedding large graphs to low-dimensional space using force-directed layouts with GPU acceleration.
The aim of this project is to speed up the process of obtaining low-dimensional layouts for large graphs, especially with GPU acceleration.
## Install
- Install PyTorch (follow [official instructions](https://pytorch.org/get-started/locally))
- Install `torch-graph-force`:
```shell
pip install git+https://github.com/tintn/torch-graph-force.git
```
## Usage
### Create `GraphDataset` for The Graph.
The dataset can be created from a dataframe, an edgelist or Networkx Graph using `from_pandas_dataframe`, `from_edgelist`, or `from_networkx` respectively. `from_pandas_dataframe` is the recommended way as it's more efficient compared to other methods.
If the node IDs are consecutive integers starting from 0, the dataset can be constructed with a dataframe for edges and the number of nodes:
```python
import pandas as pd
import torch_graph_force
# The first argument is a dataframe of edges with at least two columns for source and target nodes.
# By default, column names "source", "target" and "weight" are taken as source nodes, target nodes and edge weights.
df = pd.DataFrame([[0, 1], [1, 2], [2, 3]], columns=['source', 'target'])
# Having a column for edge weights is optional. If the column for edge weights does not exist, 1.0 will be used for all edges.
# The second argument is the number of nodes in case the node IDs are consecutive integers starting from 0.
n_nodes = 4
# Create a GraphDataset for the graph
ds = torch_graph_force.from_pandas_dataframe(
df, n_nodes
)
```
If the node IDs are not consecutive integers, a list of node IDs must be provided:
```python
import pandas as pd
import torch_graph_force
df = pd.DataFrame([["A", "B"], ["B", "C"], ["C", "D"]], columns=['source', 'target'])
# Order of the nodes in "nodes" is used to map the node IDs to node indices.
nodes = ["A", "B", "C", "D"]
ds = torch_graph_force.from_pandas_dataframe(
df, nodes
)
# the dataset's order follows the order of the provided list of nodes. In this example, calling ds[0] will return the data for node "A" and ds[1] for node "B"
# List of nodes can be access with ds.nodes
print(ds.nodes)
```
### Compute Graph Layout
Once having the graph dataset ready, we can feed the dataset to `spring_layout` to compute the graph layout.
```python
pos = torch_graph_force.spring_layout(
ds
)
# pos is a numpy array of size (n_nodes, n_dim)
# each row represents the position of a node with corresponding index
print(pos)
# if node IDs are not consecutive integers, the nodes' positions can be obtained from the node list
node_pos = {nid: pos[idx] for idx, nid in enumerate(ds.nodes)}
```
Optional arguments for `spring_layout`:
- `batch_size`: number of nodes to process in a batch. Larger batch size usually speeds up the processing, but it consumes more memory. (default: 64)
- `iterations`: Maximum number of iterations taken. (default: 50)
- `num_workers`: number of workers to fetch data from GraphDataset. If device is "cuda", `num_workers` must be 0. (default: 0)
- `device`: the device to store the graph and the layout model. If None, it's "cuda" if cuda is available otherwise "cpu". (default: None)
- `iteration_progress`: monitor the progress of each iteration, it's useful for large graph. (default: False)
- `layout_config`: additional config for the layout model. (default: {})
The layout model has some parameters with default values:
```python
default_layout_config = {
# Tensor of shape (n_nodes, ndim) for initial positions
"pos": None,
# Optimal distance between nodes
"k": None,
# Dimension of the layout
"ndim": 2,
# Threshold for relative error in node position changes.
"threshold": 1e-4,
}
```
Use the `layout_config` argument to change the parameters if needed. The example below provides intial positions for the layout model:
```python
n_nodes = len(ds)
n_dim = 2
# Generate initial positions for the nodes
init_pos = np.random.rand(n_nodes, n_dim)
pos = torch_graph_force.spring_layout(
ds,
layout_config={"pos": init_pos}
)
```
## Benchmarks
The implementation from `torch-graph-force` **without GPU acceleration** is 1.5x faster than Networkx's implementation.
![CPU Benchmark](/assets/cpu-benchmark.jpg)
GPU accelerated `torch-graph-force` can compute layouts of graphs with 100k nodes within minutes. The benchmark was conducted with Tesla P100.
![GPU Benchmark](/assets/gpu-benchmark.jpg)
Code for the benchmarks can be found [here](/torch_graph_force/benchmark.py)
Raw data
{
"_id": null,
"home_page": "",
"name": "torch-graph-force",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "graph,layout,force-directed,pytorch",
"author": "",
"author_email": "Tin Nguyen <trung.tin.nguyen0309@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/15/7c/f9d2f1c8ad55311181de5b7fe46e4e45218618b3a1eab1bccef095193a2a/torch_graph_force-0.1.1.tar.gz",
"platform": null,
"description": "# torch-graph-force [WIP]\n\nA PyTorch-based library for embedding large graphs to low-dimensional space using force-directed layouts with GPU acceleration.\n\nThe aim of this project is to speed up the process of obtaining low-dimensional layouts for large graphs, especially with GPU acceleration.\n\n## Install\n\n- Install PyTorch (follow [official instructions](https://pytorch.org/get-started/locally))\n- Install `torch-graph-force`:\n```shell\npip install git+https://github.com/tintn/torch-graph-force.git\n```\n\n## Usage\n\n### Create `GraphDataset` for The Graph.\n\nThe dataset can be created from a dataframe, an edgelist or Networkx Graph using `from_pandas_dataframe`, `from_edgelist`, or `from_networkx` respectively. `from_pandas_dataframe` is the recommended way as it's more efficient compared to other methods.\n\nIf the node IDs are consecutive integers starting from 0, the dataset can be constructed with a dataframe for edges and the number of nodes:\n\n```python\nimport pandas as pd\nimport torch_graph_force\n\n# The first argument is a dataframe of edges with at least two columns for source and target nodes.\n# By default, column names \"source\", \"target\" and \"weight\" are taken as source nodes, target nodes and edge weights.\ndf = pd.DataFrame([[0, 1], [1, 2], [2, 3]], columns=['source', 'target'])\n# Having a column for edge weights is optional. If the column for edge weights does not exist, 1.0 will be used for all edges.\n# The second argument is the number of nodes in case the node IDs are consecutive integers starting from 0.\nn_nodes = 4\n# Create a GraphDataset for the graph\nds = torch_graph_force.from_pandas_dataframe(\n df, n_nodes\n)\n```\n\nIf the node IDs are not consecutive integers, a list of node IDs must be provided:\n```python\nimport pandas as pd\nimport torch_graph_force\n\ndf = pd.DataFrame([[\"A\", \"B\"], [\"B\", \"C\"], [\"C\", \"D\"]], columns=['source', 'target'])\n# Order of the nodes in \"nodes\" is used to map the node IDs to node indices.\nnodes = [\"A\", \"B\", \"C\", \"D\"]\n\nds = torch_graph_force.from_pandas_dataframe(\n df, nodes\n)\n# the dataset's order follows the order of the provided list of nodes. In this example, calling ds[0] will return the data for node \"A\" and ds[1] for node \"B\"\n# List of nodes can be access with ds.nodes\nprint(ds.nodes)\n```\n### Compute Graph Layout\n\nOnce having the graph dataset ready, we can feed the dataset to `spring_layout` to compute the graph layout.\n\n```python\n\npos = torch_graph_force.spring_layout(\n ds\n)\n# pos is a numpy array of size (n_nodes, n_dim)\n# each row represents the position of a node with corresponding index\nprint(pos)\n# if node IDs are not consecutive integers, the nodes' positions can be obtained from the node list\nnode_pos = {nid: pos[idx] for idx, nid in enumerate(ds.nodes)}\n```\n\nOptional arguments for `spring_layout`:\n- `batch_size`: number of nodes to process in a batch. Larger batch size usually speeds up the processing, but it consumes more memory. (default: 64)\n- `iterations`: Maximum number of iterations taken. (default: 50)\n- `num_workers`: number of workers to fetch data from GraphDataset. If device is \"cuda\", `num_workers` must be 0. (default: 0)\n- `device`: the device to store the graph and the layout model. If None, it's \"cuda\" if cuda is available otherwise \"cpu\". (default: None)\n- `iteration_progress`: monitor the progress of each iteration, it's useful for large graph. (default: False)\n- `layout_config`: additional config for the layout model. (default: {})\n\nThe layout model has some parameters with default values:\n```python\ndefault_layout_config = {\n # Tensor of shape (n_nodes, ndim) for initial positions\n \"pos\": None,\n # Optimal distance between nodes\n \"k\": None,\n # Dimension of the layout\n \"ndim\": 2,\n # Threshold for relative error in node position changes.\n \"threshold\": 1e-4,\n}\n```\n\nUse the `layout_config` argument to change the parameters if needed. The example below provides intial positions for the layout model:\n```python\nn_nodes = len(ds)\nn_dim = 2\n# Generate initial positions for the nodes\ninit_pos = np.random.rand(n_nodes, n_dim)\npos = torch_graph_force.spring_layout(\n ds,\n layout_config={\"pos\": init_pos}\n)\n```\n## Benchmarks\n\nThe implementation from `torch-graph-force` **without GPU acceleration** is 1.5x faster than Networkx's implementation.\n\n![CPU Benchmark](/assets/cpu-benchmark.jpg)\n\nGPU accelerated `torch-graph-force` can compute layouts of graphs with 100k nodes within minutes. The benchmark was conducted with Tesla P100.\n\n![GPU Benchmark](/assets/gpu-benchmark.jpg)\n\nCode for the benchmarks can be found [here](/torch_graph_force/benchmark.py)\n",
"bugtrack_url": null,
"license": "",
"summary": "Force-directed layouts for Large Graphs with GPU acceleration",
"version": "0.1.1",
"split_keywords": [
"graph",
"layout",
"force-directed",
"pytorch"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "07dd4374484daf3d8acef2e0a1d61851",
"sha256": "125f06f8e915f93ad77cf7fcc6463e17f4b70530baf36f465118a29f9e355ed5"
},
"downloads": -1,
"filename": "torch_graph_force-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "07dd4374484daf3d8acef2e0a1d61851",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 10387,
"upload_time": "2022-12-21T16:13:45",
"upload_time_iso_8601": "2022-12-21T16:13:45.481185Z",
"url": "https://files.pythonhosted.org/packages/02/24/56f29ab8e1b896d9fdc09096ce4698a86001504fc2c31376a30c0907690b/torch_graph_force-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "bdf7d47fed2ea05c5998ae719e3d610a",
"sha256": "373ac784fc1b0a0956e94182532d894a14c34e020fa7e78976f3c6b30db11225"
},
"downloads": -1,
"filename": "torch_graph_force-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "bdf7d47fed2ea05c5998ae719e3d610a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 10397,
"upload_time": "2022-12-21T16:13:46",
"upload_time_iso_8601": "2022-12-21T16:13:46.813765Z",
"url": "https://files.pythonhosted.org/packages/15/7c/f9d2f1c8ad55311181de5b7fe46e4e45218618b3a1eab1bccef095193a2a/torch_graph_force-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-12-21 16:13:46",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "torch-graph-force"
}