residual2vec


Nameresidual2vec JSON
Version 0.0.12 PyPI version JSON
download
home_pagehttps://github.com/skojaku/residual2vec
Summaryresidual2vec: debiasing graph embedding with random graphs
upload_time2023-04-07 16:38:27
maintainer
docs_urlNone
authorSadamori Kojaku
requires_python
licenseMIT
keywords graph embedding
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Unit Test & Deploy](https://github.com/skojaku/residual2vec/actions/workflows/main.yml/badge.svg)](https://github.com/skojaku/residual2vec/actions/workflows/main.yml)

# Python package for residual2vec graph embedding algorithm

residual2vec is an algorithm to embed networks to a vector space while controlling for various structural properties such as degree. If you use this package, please cite:

- S. Kojaku, J. Yoon, I. Constantino, and Y.-Y. Ahn, Residual2Vec: Debiasing graph embedding using random graphs. NerurIPS (2021). [link will be added when available]

- [Preprint (arXiv)](https://arxiv.org/abs/2110.07654)

- BibTex entry:
```latex
@inproceedings{kojaku2021neurips,
 title={Residual2Vec: Debiasing graph embedding using random graphs},
 author={Sadamori Kojaku and Jisung Yoon and Isabel Constantino and Yong-Yeol Ahn},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {},
 pages = {},
 publisher = {Curran Associates, Inc.},
 volume = {},
 year = {2021}
}
```

## Install

```bash
pip install residual2vec
```

### Requirements

This code is tested in Python 3.7 and 3.8, and has dependencies with
the following packages:

```
- numpy==1.19.0
- scipy==1.7.1
- scikit-learn==1.0
- faiss-cpu==1.7.0
- numba==0.50.0
- torch==1.10.0
- tqdm==4.48.2
```


## Example

residual2vec has two versions, one optimized with a matrix factorization, and the other optimized with a stochatic gradient descent aglorithm.

The residual2vec with a matrix factorization is used in the original paper and runs faster than the other version for networks of upto 100k nodes.

```python
import residual2vec as rv

model = rv.residual2vec_matrix_factorization(window_length = 10, group_membership = None)
model.fit(G)
emb = model.transform(dim = 64)
# or equivalently emb = model.fit(G).transform(dim = 64)
```
- `G`: adjacency matrix of the input graph. [numpy.array](https://numpy.org/doc/stable/reference/generated/numpy.array.html) or [scipy.sparse.csr_matrix](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html) can be accepted.
- `window_length`: the length of context window.
- `group_membership`: an array of node labels. Used to debias the structural bias correlated with the node labels.
- `dim`: Dimension of the embedding
- `emb`: 2D numpy array of shape (`N`, `dim`), where `N` is the number of nodes. The `i`th row in the array (i.e., `emb[i, :]`) represents the embedding vector of the `i`th node in the given adjacency matrix `G`.


A limitation of the matrix-factorization-based implementation is that it is memory demanding, especially for dense or large networks.
The other version is implemented to circumvent this problem by using the stochastic gradient descent (SGD) algorithm, that
incrementally updates the embedding with a small chunk of data instead of deriving the whole embedding in one go.

```python
import residual2vec as rv

noise_sampler = rv.ConfigModelNodeSampler() # sampler for the negative sampling

model = rv.residual2vec_sgd(noise_sampler, window_length = 10)
model.fit(G)
emb = model.transform(dim = 64)
# or equivalently emb = model.fit(G).transform(dim = 64)
```

The `residual2vec_sgd` has an additional argument `noise_sampler`, which is a class that samples context nodes for a given center node.
Several samplers are implemented in this package:
- `ErdosRenyiNodeSampler`: Sampler based on the Erdos Renyi random graph (i.e., sample context node with the same probability)
- `ConfigModelNodeSampler`: Sampler based on the configuration model (i.e., sample context node with probability proportional to its degree)
- `SBMNodeSampler`: Sampler based on the stochastic block model (i.e., sample context node using the stochastic block model)
- `ConditionalContextSampler`: Sampling a random context node conditioned on the group to which a given context node blongs. The group membership needs to be given when creating this instance (experimental).

The `SBMNodeSampler` is useful to negate the bias due to a group structure in networks (i.e., structure correlated with a discrete label of nodes):

```python
import residual2vec as rv

group_membership = [0,0,0,0,1,1,1,1]
noise_sampler = rv.SBMNodeSampler(window_length = 10, group_membership=group_membership) # sampler for the negative sampling

model = rv.residual2vec_sgd(noise_sampler, window_length = 10)
model.fit(G)
emb = model.transform(dim = 64)
# or equivalently emb = model.fit(G).transform(dim = 64)
```

An added bonus for the SGD-based approach is that it offers a way to customize the noise distribution, which is useful to debias a particular bias in embedding.
Implement the following class inherited from `rv.NodeSampler`:

```python
import residual2vec as rv
class CustomNodeSampler(rv.NodeSampler):
    def fit(self, A):
        #Fit the sampler
        #:param A: adjacency matrix
        #:type A: scipy.csr_matrix
        pass

    def sampling(self, center_node, n_samples):
        #Sample context nodes from the graph for center nodes
        #:param center_node: ID of center node
        #:type center_node: int
        #:param n_samples: number of samples per center node
        #:type n_samples: int
        pass
```

See the `residual2vec/node_samplers` for examples.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/skojaku/residual2vec",
    "name": "residual2vec",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "graph embedding",
    "author": "Sadamori Kojaku",
    "author_email": "",
    "download_url": "",
    "platform": null,
    "description": "[![Unit Test & Deploy](https://github.com/skojaku/residual2vec/actions/workflows/main.yml/badge.svg)](https://github.com/skojaku/residual2vec/actions/workflows/main.yml)\n\n# Python package for residual2vec graph embedding algorithm\n\nresidual2vec is an algorithm to embed networks to a vector space while controlling for various structural properties such as degree. If you use this package, please cite:\n\n- S. Kojaku, J. Yoon, I. Constantino, and Y.-Y. Ahn, Residual2Vec: Debiasing graph embedding using random graphs. NerurIPS (2021). [link will be added when available]\n\n- [Preprint (arXiv)](https://arxiv.org/abs/2110.07654)\n\n- BibTex entry:\n```latex\n@inproceedings{kojaku2021neurips,\n title={Residual2Vec: Debiasing graph embedding using random graphs},\n author={Sadamori Kojaku and Jisung Yoon and Isabel Constantino and Yong-Yeol Ahn},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {},\n pages = {},\n publisher = {Curran Associates, Inc.},\n volume = {},\n year = {2021}\n}\n```\n\n## Install\n\n```bash\npip install residual2vec\n```\n\n### Requirements\n\nThis code is tested in Python 3.7 and 3.8, and has dependencies with\nthe following packages:\n\n```\n- numpy==1.19.0\n- scipy==1.7.1\n- scikit-learn==1.0\n- faiss-cpu==1.7.0\n- numba==0.50.0\n- torch==1.10.0\n- tqdm==4.48.2\n```\n\n\n## Example\n\nresidual2vec has two versions, one optimized with a matrix factorization, and the other optimized with a stochatic gradient descent aglorithm.\n\nThe residual2vec with a matrix factorization is used in the original paper and runs faster than the other version for networks of upto 100k nodes.\n\n```python\nimport residual2vec as rv\n\nmodel = rv.residual2vec_matrix_factorization(window_length = 10, group_membership = None)\nmodel.fit(G)\nemb = model.transform(dim = 64)\n# or equivalently emb = model.fit(G).transform(dim = 64)\n```\n- `G`: adjacency matrix of the input graph. [numpy.array](https://numpy.org/doc/stable/reference/generated/numpy.array.html) or [scipy.sparse.csr_matrix](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html) can be accepted.\n- `window_length`: the length of context window.\n- `group_membership`: an array of node labels. Used to debias the structural bias correlated with the node labels.\n- `dim`: Dimension of the embedding\n- `emb`: 2D numpy array of shape (`N`, `dim`), where `N` is the number of nodes. The `i`th row in the array (i.e., `emb[i, :]`) represents the embedding vector of the `i`th node in the given adjacency matrix `G`.\n\n\nA limitation of the matrix-factorization-based implementation is that it is memory demanding, especially for dense or large networks.\nThe other version is implemented to circumvent this problem by using the stochastic gradient descent (SGD) algorithm, that\nincrementally updates the embedding with a small chunk of data instead of deriving the whole embedding in one go.\n\n```python\nimport residual2vec as rv\n\nnoise_sampler = rv.ConfigModelNodeSampler() # sampler for the negative sampling\n\nmodel = rv.residual2vec_sgd(noise_sampler, window_length = 10)\nmodel.fit(G)\nemb = model.transform(dim = 64)\n# or equivalently emb = model.fit(G).transform(dim = 64)\n```\n\nThe `residual2vec_sgd` has an additional argument `noise_sampler`, which is a class that samples context nodes for a given center node.\nSeveral samplers are implemented in this package:\n- `ErdosRenyiNodeSampler`: Sampler based on the Erdos Renyi random graph (i.e., sample context node with the same probability)\n- `ConfigModelNodeSampler`: Sampler based on the configuration model (i.e., sample context node with probability proportional to its degree)\n- `SBMNodeSampler`: Sampler based on the stochastic block model (i.e., sample context node using the stochastic block model)\n- `ConditionalContextSampler`: Sampling a random context node conditioned on the group to which a given context node blongs. The group membership needs to be given when creating this instance (experimental).\n\nThe `SBMNodeSampler` is useful to negate the bias due to a group structure in networks (i.e., structure correlated with a discrete label of nodes):\n\n```python\nimport residual2vec as rv\n\ngroup_membership = [0,0,0,0,1,1,1,1]\nnoise_sampler = rv.SBMNodeSampler(window_length = 10, group_membership=group_membership) # sampler for the negative sampling\n\nmodel = rv.residual2vec_sgd(noise_sampler, window_length = 10)\nmodel.fit(G)\nemb = model.transform(dim = 64)\n# or equivalently emb = model.fit(G).transform(dim = 64)\n```\n\nAn added bonus for the SGD-based approach is that it offers a way to customize the noise distribution, which is useful to debias a particular bias in embedding.\nImplement the following class inherited from `rv.NodeSampler`:\n\n```python\nimport residual2vec as rv\nclass CustomNodeSampler(rv.NodeSampler):\n    def fit(self, A):\n        #Fit the sampler\n        #:param A: adjacency matrix\n        #:type A: scipy.csr_matrix\n        pass\n\n    def sampling(self, center_node, n_samples):\n        #Sample context nodes from the graph for center nodes\n        #:param center_node: ID of center node\n        #:type center_node: int\n        #:param n_samples: number of samples per center node\n        #:type n_samples: int\n        pass\n```\n\nSee the `residual2vec/node_samplers` for examples.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "residual2vec: debiasing graph embedding with random graphs",
    "version": "0.0.12",
    "split_keywords": [
        "graph",
        "embedding"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "389e7fcbbeac904d805d7aa3d3887c0eaa41f07a5084ab22d65e26f22f96b3f6",
                "md5": "ed1520841428a17dbddc21fdd57a04b1",
                "sha256": "97440da9a75926069f59afbdc52c3c7dadbe95d51f545dc3313377d5e636f5a0"
            },
            "downloads": -1,
            "filename": "residual2vec-0.0.12-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ed1520841428a17dbddc21fdd57a04b1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 17279,
            "upload_time": "2023-04-07T16:38:27",
            "upload_time_iso_8601": "2023-04-07T16:38:27.048920Z",
            "url": "https://files.pythonhosted.org/packages/38/9e/7fcbbeac904d805d7aa3d3887c0eaa41f07a5084ab22d65e26f22f96b3f6/residual2vec-0.0.12-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-07 16:38:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "skojaku",
    "github_project": "residual2vec",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "residual2vec"
}
        
Elapsed time: 0.16598s