torch-graph-force

Name	torch-graph-force JSON
Version	0.1.1 JSON
	download
home_page
Summary	Force-directed layouts for Large Graphs with GPU acceleration
upload_time	2022-12-21 16:13:46
maintainer
docs_url	None
author
requires_python	>=3.8
license
keywords	graph layout force-directed pytorch
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # torch-graph-force [WIP]

A PyTorch-based library for embedding large graphs to low-dimensional space using force-directed layouts with GPU acceleration.

The aim of this project is to speed up the process of obtaining low-dimensional layouts for large graphs, especially with GPU acceleration.

## Install

- Install PyTorch (follow [official instructions](https://pytorch.org/get-started/locally))
- Install `torch-graph-force`:
```shell
pip install git+https://github.com/tintn/torch-graph-force.git
```

## Usage

### Create `GraphDataset` for The Graph.

The dataset can be created from a dataframe, an edgelist or Networkx Graph using `from_pandas_dataframe`, `from_edgelist`, or `from_networkx` respectively. `from_pandas_dataframe` is the recommended way as it's more efficient compared to other methods.

If the node IDs are consecutive integers starting from 0, the dataset can be constructed with a dataframe for edges and the number of nodes:

```python
import pandas as pd
import torch_graph_force

# The first argument is a dataframe of edges with at least two columns for source and target nodes.
# By default, column names "source", "target" and "weight" are taken as source nodes, target nodes and edge weights.
df = pd.DataFrame([[0, 1], [1, 2], [2, 3]], columns=['source', 'target'])
# Having a column for edge weights is optional. If the column for edge weights does not exist, 1.0 will be used for all edges.
# The second argument is the number of nodes in case the node IDs are consecutive integers starting from 0.
n_nodes = 4
# Create a GraphDataset for the graph
ds = torch_graph_force.from_pandas_dataframe(
    df, n_nodes
)
```

If the node IDs are not consecutive integers, a list of node IDs must be provided:
```python
import pandas as pd
import torch_graph_force

df = pd.DataFrame([["A", "B"], ["B", "C"], ["C", "D"]], columns=['source', 'target'])
# Order of the nodes in "nodes" is used to map the node IDs to node indices.
nodes = ["A", "B", "C", "D"]

ds = torch_graph_force.from_pandas_dataframe(
    df, nodes
)
# the dataset's order follows the order of the provided list of nodes. In this example, calling  ds[0] will return the data for node "A" and ds[1] for node "B"
# List of nodes can be access with ds.nodes
print(ds.nodes)
```
### Compute Graph Layout

Once having the graph dataset ready, we can feed the dataset to `spring_layout` to compute the graph layout.

```python

pos = torch_graph_force.spring_layout(
    ds
)
# pos is a numpy array of size (n_nodes, n_dim)
# each row represents the position of a node with corresponding index
print(pos)
# if node IDs are not consecutive integers, the nodes' positions can be obtained from the node list
node_pos = {nid: pos[idx] for idx, nid in enumerate(ds.nodes)}
```

Optional arguments for `spring_layout`:
- `batch_size`: number of nodes to process in a batch. Larger batch size usually speeds up the processing, but it consumes more memory. (default: 64)
- `iterations`: Maximum number of iterations taken. (default: 50)
- `num_workers`: number of workers to fetch data from GraphDataset. If device is "cuda", `num_workers` must be 0. (default: 0)
- `device`: the device to store the graph and the layout model. If None, it's "cuda" if cuda is available otherwise "cpu". (default: None)
- `iteration_progress`: monitor the progress of each iteration, it's useful for large graph. (default: False)
- `layout_config`: additional config for the layout model. (default: {})

The layout model has some parameters with default values:
```python
default_layout_config = {
    # Tensor of shape (n_nodes, ndim) for initial positions
    "pos": None,
    # Optimal distance between nodes
    "k": None,
    # Dimension of the layout
    "ndim": 2,
    # Threshold for relative error in node position changes.
    "threshold": 1e-4,
}
```

Use the `layout_config` argument to change the parameters if needed. The example below provides intial positions for the layout model:
```python
n_nodes = len(ds)
n_dim = 2
# Generate initial positions for the nodes
init_pos = np.random.rand(n_nodes, n_dim)
pos = torch_graph_force.spring_layout(
    ds,
    layout_config={"pos": init_pos}
)
```
## Benchmarks

The implementation from `torch-graph-force` **without GPU acceleration** is 1.5x faster than Networkx's implementation.

![CPU Benchmark](/assets/cpu-benchmark.jpg)

GPU accelerated `torch-graph-force` can compute layouts of graphs with 100k nodes within minutes. The benchmark was conducted with Tesla P100.

![GPU Benchmark](/assets/gpu-benchmark.jpg)

Code for the benchmarks can be found [here](/torch_graph_force/benchmark.py)

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "torch-graph-force",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "graph,layout,force-directed,pytorch",
    "author": "",
    "author_email": "Tin Nguyen <trung.tin.nguyen0309@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/15/7c/f9d2f1c8ad55311181de5b7fe46e4e45218618b3a1eab1bccef095193a2a/torch_graph_force-0.1.1.tar.gz",
    "platform": null,
    "description": "# torch-graph-force [WIP]\n\nA PyTorch-based library for embedding large graphs to low-dimensional space using force-directed layouts with GPU acceleration.\n\nThe aim of this project is to speed up the process of obtaining low-dimensional layouts for large graphs, especially with GPU acceleration.\n\n## Install\n\n- Install PyTorch (follow [official instructions](https://pytorch.org/get-started/locally))\n- Install `torch-graph-force`:\n```shell\npip install git+https://github.com/tintn/torch-graph-force.git\n```\n\n## Usage\n\n### Create `GraphDataset` for The Graph.\n\nThe dataset can be created from a dataframe, an edgelist or Networkx Graph using `from_pandas_dataframe`, `from_edgelist`, or `from_networkx` respectively. `from_pandas_dataframe` is the recommended way as it's more efficient compared to other methods.\n\nIf the node IDs are consecutive integers starting from 0, the dataset can be constructed with a dataframe for edges and the number of nodes:\n\n```python\nimport pandas as pd\nimport torch_graph_force\n\n# The first argument is a dataframe of edges with at least two columns for source and target nodes.\n# By default, column names \"source\", \"target\" and \"weight\" are taken as source nodes, target nodes and edge weights.\ndf = pd.DataFrame([[0, 1], [1, 2], [2, 3]], columns=['source', 'target'])\n# Having a column for edge weights is optional. If the column for edge weights does not exist, 1.0 will be used for all edges.\n# The second argument is the number of nodes in case the node IDs are consecutive integers starting from 0.\nn_nodes = 4\n# Create a GraphDataset for the graph\nds = torch_graph_force.from_pandas_dataframe(\n    df, n_nodes\n)\n```\n\nIf the node IDs are not consecutive integers, a list of node IDs must be provided:\n```python\nimport pandas as pd\nimport torch_graph_force\n\ndf = pd.DataFrame([[\"A\", \"B\"], [\"B\", \"C\"], [\"C\", \"D\"]], columns=['source', 'target'])\n# Order of the nodes in \"nodes\" is used to map the node IDs to node indices.\nnodes = [\"A\", \"B\", \"C\", \"D\"]\n\nds = torch_graph_force.from_pandas_dataframe(\n    df, nodes\n)\n# the dataset's order follows the order of the provided list of nodes. In this example, calling  ds[0] will return the data for node \"A\" and ds[1] for node \"B\"\n# List of nodes can be access with ds.nodes\nprint(ds.nodes)\n```\n### Compute Graph Layout\n\nOnce having the graph dataset ready, we can feed the dataset to `spring_layout` to compute the graph layout.\n\n```python\n\npos = torch_graph_force.spring_layout(\n    ds\n)\n# pos is a numpy array of size (n_nodes, n_dim)\n# each row represents the position of a node with corresponding index\nprint(pos)\n# if node IDs are not consecutive integers, the nodes' positions can be obtained from the node list\nnode_pos = {nid: pos[idx] for idx, nid in enumerate(ds.nodes)}\n```\n\nOptional arguments for `spring_layout`:\n- `batch_size`: number of nodes to process in a batch. Larger batch size usually speeds up the processing, but it consumes more memory. (default: 64)\n- `iterations`: Maximum number of iterations taken. (default: 50)\n- `num_workers`: number of workers to fetch data from GraphDataset. If device is \"cuda\", `num_workers` must be 0. (default: 0)\n- `device`: the device to store the graph and the layout model. If None, it's \"cuda\" if cuda is available otherwise \"cpu\". (default: None)\n- `iteration_progress`: monitor the progress of each iteration, it's useful for large graph. (default: False)\n- `layout_config`: additional config for the layout model. (default: {})\n\nThe layout model has some parameters with default values:\n```python\ndefault_layout_config = {\n    # Tensor of shape (n_nodes, ndim) for initial positions\n    \"pos\": None,\n    # Optimal distance between nodes\n    \"k\": None,\n    # Dimension of the layout\n    \"ndim\": 2,\n    # Threshold for relative error in node position changes.\n    \"threshold\": 1e-4,\n}\n```\n\nUse the `layout_config` argument to change the parameters if needed. The example below provides intial positions for the layout model:\n```python\nn_nodes = len(ds)\nn_dim = 2\n# Generate initial positions for the nodes\ninit_pos = np.random.rand(n_nodes, n_dim)\npos = torch_graph_force.spring_layout(\n    ds,\n    layout_config={\"pos\": init_pos}\n)\n```\n## Benchmarks\n\nThe implementation from `torch-graph-force` **without GPU acceleration** is 1.5x faster than Networkx's implementation.\n\n![CPU Benchmark](/assets/cpu-benchmark.jpg)\n\nGPU accelerated `torch-graph-force` can compute layouts of graphs with 100k nodes within minutes. The benchmark was conducted with Tesla P100.\n\n![GPU Benchmark](/assets/gpu-benchmark.jpg)\n\nCode for the benchmarks can be found [here](/torch_graph_force/benchmark.py)\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Force-directed layouts for Large Graphs with GPU acceleration",
    "version": "0.1.1",
    "split_keywords": [
        "graph",
        "layout",
        "force-directed",
        "pytorch"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "07dd4374484daf3d8acef2e0a1d61851",
                "sha256": "125f06f8e915f93ad77cf7fcc6463e17f4b70530baf36f465118a29f9e355ed5"
            },
            "downloads": -1,
            "filename": "torch_graph_force-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "07dd4374484daf3d8acef2e0a1d61851",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 10387,
            "upload_time": "2022-12-21T16:13:45",
            "upload_time_iso_8601": "2022-12-21T16:13:45.481185Z",
            "url": "https://files.pythonhosted.org/packages/02/24/56f29ab8e1b896d9fdc09096ce4698a86001504fc2c31376a30c0907690b/torch_graph_force-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "bdf7d47fed2ea05c5998ae719e3d610a",
                "sha256": "373ac784fc1b0a0956e94182532d894a14c34e020fa7e78976f3c6b30db11225"
            },
            "downloads": -1,
            "filename": "torch_graph_force-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "bdf7d47fed2ea05c5998ae719e3d610a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 10397,
            "upload_time": "2022-12-21T16:13:46",
            "upload_time_iso_8601": "2022-12-21T16:13:46.813765Z",
            "url": "https://files.pythonhosted.org/packages/15/7c/f9d2f1c8ad55311181de5b7fe46e4e45218618b3a1eab1bccef095193a2a/torch_graph_force-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-21 16:13:46",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "torch-graph-force"
}