# GraphRole
[![Build Status](https://travis-ci.com/dkaslovsky/GraphRole.svg?branch=master)](https://travis-ci.com/dkaslovsky/GraphRole)
[![Coverage Status](https://coveralls.io/repos/github/dkaslovsky/GraphRole/badge.svg?branch=master)](https://coveralls.io/github/dkaslovsky/GraphRole?branch=master)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/GraphRole)
Automatic feature extraction and node role assignment for transfer learning on graphs; based on the ReFeX/RolX algorithms [1, 2] of Henderson, et al.
<p align="center">
<img src="./examples/karate_graph.png" width=600>
</p>
### Overview
A fundamental problem for learning on graphs is extracting meaningful features. `GraphRole` provides the `RecursiveFeatureExtractor` class to automate this process by extracting recursive features capturing local and neighborhood ("regional") structural properties of a given graph. The specific implementation follows that of the ReFeX algorithm [1]. Node features (e.g., degree) and ego-net features (e.g., neighbors, number of internal vs. external edges) are extracted and then recursively aggregated over each node's neighbors' features until no additional information is encoded. As is shown in [1], these recursive, "regional" features facilitate node classification and perform quite well in transfer learning tasks.
`GraphRole` also provides the `RoleExtractor` class for node role assignment (a form of classification). Different nodes play different structural roles in a graph, and using recursive regional features, these roles can be identified and assigned to collections of nodes. As they are structural in nature, node roles differ from and are often more intuitive than the commonly used communities of nodes. In particular, roles can generalize across graphs whereas the notion of communities cannot [2]. Identification and assignment of node roles has been shown to facilitate many graph learning tasks.
Please see [1, 2] for more technical details.
### Installation
This package is hosted on PyPI and can be installed via `pip`:
```
$ pip install graphrole
```
To instead install from source:
```
$ git clone https://github.com/dkaslovsky/GraphRole.git
$ cd GraphRole
$ python setup.py install
```
### Example
An example of `GraphRole` usage is found in the `examples` directory. The notebook
[example.ipynb](./examples/example.ipynb)
(also available via [nbviewer](https://nbviewer.jupyter.org/github/dkaslovsky/GraphRole/blob/master/examples/example.ipynb))
walks through feature extraction and role assignment for the well-known `karate_club_graph` that is included with `NetworkX`. Recursive features are extracted and used to learn role assignments for each node in the graph. The graph is shown above with each node colored corresponding to its role.
The extracted roles reflect structural properties of the graph at the node level. The nodes `0` and `33` (dark green) are central to the graph and are connected to many other nodes. Nodes `1`, `2`, `3`, and `32` are assigned to a similar role (red). In contrast, the roles colored as dark blue, light blue, and pink are found at the periphery of the graph. Notably, nodes need not be near one another to be assigned to the same role; instead nodes with similar properties are grouped together across the graph by their role assignments.
Although not reflected by this example, weighted and directed graphs are also supported and will yield weighted and directed variants of the extracted features.
### Usage
For general usage, begin by importing the two feature and role extraction classes:
```python
>>> from graphrole import RecursiveFeatureExtractor, RoleExtractor
```
Features are then extracted from a graph `G` into a `pandas.DataFrame`:
```python
>>> feature_extractor = RecursiveFeatureExtractor(G)
>>> features = feature_extractor.extract_features()
```
Next, these features are used to learn roles. The number of roles is automatically determined by
a model selection procedure when `n_roles=None` is passed to the `RoleExtractor` class instance.
Alternatively, `n_roles` can be set to a desired number of roles to be extracted.
```python
>>> role_extractor = RoleExtractor(n_roles=None)
>>> role_extractor.extract_role_factors(features)
```
The role assignment for each node can be retrieved as a dictionary:
```python
>>> role_extractor.roles
```
Alternatively, roles can be viewed as a soft assignment and a node's percent membership to each role
can be retrieved as a `pandas.DataFrame`:
```python
>>> role_extractor.role_percentage
```
### Node Attributes as User-Defined Features
`GraphRole` uses predefined structural graph properties for constructing features. It is also possible, as of version 1.1.0, to optionally include numeric node attributes as features. Providing a graph annotated with node attributes to `GraphRole` allows a user to seed the recursive feature extraction process with user-defined features for each node.
Node attributes are enabled by passing `attributes=True` as a kwarg to the `RecursiveFeatureExtractor`:
```python
>>> feature_extractor = RecursiveFeatureExtractor(G, attributes=True)
```
Providing this kwarg will automatically include all numeric node attributes as features to be included in the recursive feature calculations. Attributes with non-numeric values are always skipped (set to zero). Note that the feature names associated with node attributes will be the provided attribute name prepended with a prefix of `attribute_`.
A list of attributes to be included for feature calculation instead of defaulting to all numeric attributes can be provided as:
```python
>>> feature_extractor = RecursiveFeatureExtractor(G, attributes=True, attributes_include=['attr1', 'attr3'])
```
which will specify the use of only node attributes `attr1` and `attr3`.
A list of attributes to be excluded for feature calculation from the default of all numeric attributes can be provided as:
```python
>>> feature_extractor = RecursiveFeatureExtractor(G, attributes=True, attributes_exclude=['attr2'])
```
which will specify the use of all node attributes other than `attr2`.
For safety, the `attributes_exclude` list takes priority over the `attributes_include` list when conflicting specifications are provided.
Note: `igraph` uses the attribute `name` to store an identifier for all nodes and therefore the corresponding attribute value is never used for feature calculations. The attribute `name`, even if overwritten by the user, is always skipped for `igraph` graph instances.
### Graph Interfaces
An interface for graph data structures is provided in the `graphrole.graph.interface` module. Implementations for `networkx` and `igraph` are included.
The `igraph` package is not included in `requirements.txt` and thus will need to be manually installed
if desired. This is due to additional installation requirements beyond `pip install python-igraph`; see
the [igraph documentation](https://igraph.org/python/#pyinstall) for more details. Note that all tests
that require `igraph` will be skipped if it is not installed.
To add an implementation of an additional graph library or data structure:
1. Subclass the `BaseGraphInterface` ABC in `graphrole.graph.interface.base.py` and implement the required methods
1. Update the `INTERFACES` dict in `graphrole.graph.interface.__init__.py` to make the new subclass discoverable
1. Add tests by trivially implementing a `setUpClass()` classmethod of a subclass of `BaseGraphInterfaceTest.BaseGraphInterfaceTestCases` in the `tests.test_graph.test_interface.py` module
1. If desired, a similar procedure allows the feature extraction tests to be run using the added interface
by again trivially implementing a `setUpClass()` classmethod of a subclass of `BaseRecursiveFeatureExtractorTest.TestCases` in the `tests.test_features.test_extract.py` module
### Future Development
Model explanation ("sense making") will be added to the `RoleExtractor` class in a future release.
### Tests
To run tests:
```
$ python -m unittest discover -v
```
As noted above, the tests for the `igraph` interface are skipped when `igraph` is not installed. Because this package is intentionally not required, the test coverage reported above is much lower than when `igraph` is installed and its interface tests are not skipped (__97% coverage__ to date).
### References
[1] Henderson, et al. [It’s Who You Know: Graph Mining Using Recursive Structural Features](http://www.cs.cmu.edu/~leili/pubs/henderson-kdd2011.pdf).
[2] Henderson, et al. [RolX: Structural Role Extraction & Mining in Large Graphs](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46591.pdf).
Raw data
{
"_id": null,
"home_page": "https://github.com/dkaslovsky/GraphRole",
"name": "graphrole",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "graph,feature extraction,transfer learning,network,graph analysis,network analysis,refex,rolx",
"author": "Daniel Kaslovsky",
"author_email": "dkaslovsky@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/84/d9/608f5f46b54652a41cf7b38aaad784c369418c242f34826724636a121a55/graphrole-1.1.1.tar.gz",
"platform": null,
"description": "# GraphRole\n\n[![Build Status](https://travis-ci.com/dkaslovsky/GraphRole.svg?branch=master)](https://travis-ci.com/dkaslovsky/GraphRole)\n[![Coverage Status](https://coveralls.io/repos/github/dkaslovsky/GraphRole/badge.svg?branch=master)](https://coveralls.io/github/dkaslovsky/GraphRole?branch=master)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/GraphRole)\n\nAutomatic feature extraction and node role assignment for transfer learning on graphs; based on the ReFeX/RolX algorithms [1, 2] of Henderson, et al.\n\n<p align=\"center\">\n<img src=\"./examples/karate_graph.png\" width=600>\n</p>\n\n### Overview\nA fundamental problem for learning on graphs is extracting meaningful features. `GraphRole` provides the `RecursiveFeatureExtractor` class to automate this process by extracting recursive features capturing local and neighborhood (\"regional\") structural properties of a given graph. The specific implementation follows that of the ReFeX algorithm [1]. Node features (e.g., degree) and ego-net features (e.g., neighbors, number of internal vs. external edges) are extracted and then recursively aggregated over each node's neighbors' features until no additional information is encoded. As is shown in [1], these recursive, \"regional\" features facilitate node classification and perform quite well in transfer learning tasks.\n\n`GraphRole` also provides the `RoleExtractor` class for node role assignment (a form of classification). Different nodes play different structural roles in a graph, and using recursive regional features, these roles can be identified and assigned to collections of nodes. As they are structural in nature, node roles differ from and are often more intuitive than the commonly used communities of nodes. In particular, roles can generalize across graphs whereas the notion of communities cannot [2]. Identification and assignment of node roles has been shown to facilitate many graph learning tasks.\n\nPlease see [1, 2] for more technical details.\n\n### Installation\nThis package is hosted on PyPI and can be installed via `pip`:\n```\n$ pip install graphrole\n```\nTo instead install from source:\n```\n$ git clone https://github.com/dkaslovsky/GraphRole.git\n$ cd GraphRole\n$ python setup.py install\n```\n\n### Example\nAn example of `GraphRole` usage is found in the `examples` directory. The notebook\n[example.ipynb](./examples/example.ipynb)\n(also available via [nbviewer](https://nbviewer.jupyter.org/github/dkaslovsky/GraphRole/blob/master/examples/example.ipynb))\nwalks through feature extraction and role assignment for the well-known `karate_club_graph` that is included with `NetworkX`. Recursive features are extracted and used to learn role assignments for each node in the graph. The graph is shown above with each node colored corresponding to its role.\n\nThe extracted roles reflect structural properties of the graph at the node level. The nodes `0` and `33` (dark green) are central to the graph and are connected to many other nodes. Nodes `1`, `2`, `3`, and `32` are assigned to a similar role (red). In contrast, the roles colored as dark blue, light blue, and pink are found at the periphery of the graph. Notably, nodes need not be near one another to be assigned to the same role; instead nodes with similar properties are grouped together across the graph by their role assignments.\n\nAlthough not reflected by this example, weighted and directed graphs are also supported and will yield weighted and directed variants of the extracted features.\n\n### Usage\nFor general usage, begin by importing the two feature and role extraction classes:\n```python\n>>> from graphrole import RecursiveFeatureExtractor, RoleExtractor\n```\nFeatures are then extracted from a graph `G` into a `pandas.DataFrame`:\n```python\n>>> feature_extractor = RecursiveFeatureExtractor(G)\n>>> features = feature_extractor.extract_features()\n```\nNext, these features are used to learn roles. The number of roles is automatically determined by\na model selection procedure when `n_roles=None` is passed to the `RoleExtractor` class instance.\nAlternatively, `n_roles` can be set to a desired number of roles to be extracted.\n```python\n>>> role_extractor = RoleExtractor(n_roles=None)\n>>> role_extractor.extract_role_factors(features)\n```\nThe role assignment for each node can be retrieved as a dictionary:\n```python\n>>> role_extractor.roles\n```\nAlternatively, roles can be viewed as a soft assignment and a node's percent membership to each role\ncan be retrieved as a `pandas.DataFrame`:\n```python\n>>> role_extractor.role_percentage\n```\n\n### Node Attributes as User-Defined Features\n`GraphRole` uses predefined structural graph properties for constructing features. It is also possible, as of version 1.1.0, to optionally include numeric node attributes as features. Providing a graph annotated with node attributes to `GraphRole` allows a user to seed the recursive feature extraction process with user-defined features for each node.\n\nNode attributes are enabled by passing `attributes=True` as a kwarg to the `RecursiveFeatureExtractor`:\n```python\n>>> feature_extractor = RecursiveFeatureExtractor(G, attributes=True)\n```\nProviding this kwarg will automatically include all numeric node attributes as features to be included in the recursive feature calculations. Attributes with non-numeric values are always skipped (set to zero). Note that the feature names associated with node attributes will be the provided attribute name prepended with a prefix of `attribute_`.\n\nA list of attributes to be included for feature calculation instead of defaulting to all numeric attributes can be provided as:\n```python\n>>> feature_extractor = RecursiveFeatureExtractor(G, attributes=True, attributes_include=['attr1', 'attr3'])\n```\nwhich will specify the use of only node attributes `attr1` and `attr3`.\n\nA list of attributes to be excluded for feature calculation from the default of all numeric attributes can be provided as:\n```python\n>>> feature_extractor = RecursiveFeatureExtractor(G, attributes=True, attributes_exclude=['attr2'])\n```\nwhich will specify the use of all node attributes other than `attr2`.\n\nFor safety, the `attributes_exclude` list takes priority over the `attributes_include` list when conflicting specifications are provided.\n\nNote: `igraph` uses the attribute `name` to store an identifier for all nodes and therefore the corresponding attribute value is never used for feature calculations. The attribute `name`, even if overwritten by the user, is always skipped for `igraph` graph instances.\n\n### Graph Interfaces\nAn interface for graph data structures is provided in the `graphrole.graph.interface` module. Implementations for `networkx` and `igraph` are included.\n\nThe `igraph` package is not included in `requirements.txt` and thus will need to be manually installed\nif desired. This is due to additional installation requirements beyond `pip install python-igraph`; see\nthe [igraph documentation](https://igraph.org/python/#pyinstall) for more details. Note that all tests\nthat require `igraph` will be skipped if it is not installed.\n\nTo add an implementation of an additional graph library or data structure:\n1. Subclass the `BaseGraphInterface` ABC in `graphrole.graph.interface.base.py` and implement the required methods\n1. Update the `INTERFACES` dict in `graphrole.graph.interface.__init__.py` to make the new subclass discoverable\n1. Add tests by trivially implementing a `setUpClass()` classmethod of a subclass of `BaseGraphInterfaceTest.BaseGraphInterfaceTestCases` in the `tests.test_graph.test_interface.py` module\n1. If desired, a similar procedure allows the feature extraction tests to be run using the added interface\nby again trivially implementing a `setUpClass()` classmethod of a subclass of `BaseRecursiveFeatureExtractorTest.TestCases` in the `tests.test_features.test_extract.py` module\n\n### Future Development\nModel explanation (\"sense making\") will be added to the `RoleExtractor` class in a future release.\n\n### Tests\nTo run tests:\n```\n$ python -m unittest discover -v\n```\nAs noted above, the tests for the `igraph` interface are skipped when `igraph` is not installed. Because this package is intentionally not required, the test coverage reported above is much lower than when `igraph` is installed and its interface tests are not skipped (__97% coverage__ to date).\n\n### References\n[1] Henderson, et al. [It\u2019s Who You Know: Graph Mining Using Recursive Structural Features](http://www.cs.cmu.edu/~leili/pubs/henderson-kdd2011.pdf).\n\n[2] Henderson, et al. [RolX: Structural Role Extraction & Mining in Large Graphs](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46591.pdf).\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Automatic feature extraction and node role assignment for transfer learning on graphs",
"version": "1.1.1",
"project_urls": {
"Homepage": "https://github.com/dkaslovsky/GraphRole"
},
"split_keywords": [
"graph",
"feature extraction",
"transfer learning",
"network",
"graph analysis",
"network analysis",
"refex",
"rolx"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a8e1f185324ecf9a64174f9d8198b63a3f6a5b4ecb21ba61cbf81708780509b4",
"md5": "9f4de2ce1f8e9a3faf5c0be5feb16814",
"sha256": "d8f97bd8d96c6936391935047813417ddceb1a8a9f80083e4aeb328d8ca23e77"
},
"downloads": -1,
"filename": "graphrole-1.1.1-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "9f4de2ce1f8e9a3faf5c0be5feb16814",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 68411,
"upload_time": "2023-10-09T01:16:47",
"upload_time_iso_8601": "2023-10-09T01:16:47.787003Z",
"url": "https://files.pythonhosted.org/packages/a8/e1/f185324ecf9a64174f9d8198b63a3f6a5b4ecb21ba61cbf81708780509b4/graphrole-1.1.1-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "84d9608f5f46b54652a41cf7b38aaad784c369418c242f34826724636a121a55",
"md5": "247ab9911e1aa71d9911350f415ace3e",
"sha256": "578e3fdd4a84548d93d81dc7ac350dccd6b31ba592122691875ca36bdffde814"
},
"downloads": -1,
"filename": "graphrole-1.1.1.tar.gz",
"has_sig": false,
"md5_digest": "247ab9911e1aa71d9911350f415ace3e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 24523,
"upload_time": "2023-10-09T01:16:50",
"upload_time_iso_8601": "2023-10-09T01:16:50.527142Z",
"url": "https://files.pythonhosted.org/packages/84/d9/608f5f46b54652a41cf7b38aaad784c369418c242f34826724636a121a55/graphrole-1.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-09 01:16:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dkaslovsky",
"github_project": "GraphRole",
"travis_ci": true,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "graphrole"
}