pelote

Name	pelote JSON
Version	0.8.2 JSON
	download
home_page	http://github.com/medialab/pelote
Summary	Collection of network-related utilities for python.
upload_time	2023-09-08 12:04:04
maintainer
docs_url	None
author	Guillaume Plique
requires_python	>=3.6
license	MIT
keywords	network
VCS
bugtrack_url
requirements	ebbe llist pyllist networkx black docstring-parser importchecker pandas pytest ipysigma jupyterlab matplotlib tqdm setuptools twine wheel
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![Build Status](https://github.com/medialab/pelote/workflows/Tests/badge.svg)](https://github.com/medialab/pelote/actions)

# Pelote

Pelote is a python library full of graph-related functions that can be used to complement [networkx](https://networkx.org/) for higher-level tasks.

It mainly helps with the following things:

- Conversion of tabular data to graphs (bipartites, citation etc. in the spirit of [Table2Net](https://medialab.github.io/table2net/))
- Conversion of graphs to tabular data
- Monopartite projections of bipartite graphs
- Miscellaneous graph helper functions (filtering out nodes, edges etc.)
- Sparsification of graphs
- Reading & writing of graph formats not found in `networkx` (such as [graphology](https://graphology.github.io/) JSON)

As such it is the perfect companion to [ipysigma](https://github.com/Yomguithereal/ipysigma), our Jupyter widget that can render interactive graphs directly within your notebooks.

## Installation

You can install `pelote` with pip with the following command:

```
pip install pelote
```

If you want to be able to use the library with `pandas`, you will need to install it also:

```
pip install pandas
```

## Usage

* [Tabular data to graphs](#tabular-data-to-graphs)
  * [table_to_bipartite_graph](#table_to_bipartite_graph)
  * [tables_to_graph](#tables_to_graph)
  * [edges_table_to_graph](#edges_table_to_graph)
* [Graphs to tabular data](#graphs-to-tabular-data)
  * [graph_to_nodes_dataframe](#graph_to_nodes_dataframe)
  * [graph_to_edges_dataframe](#graph_to_edges_dataframe)
  * [graph_to_dataframes](#graph_to_dataframes)
* [Graph projection](#graph-projection)
  * [monopartite_projection](#monopartite_projection)
* [Graph sparsification](#graph-sparsification)
  * [global_threshold_sparsification](#global_threshold_sparsification)
  * [multiscale_backbone](#multiscale_backbone)
* [Miscellaneous graph-related metrics](#miscellaneous-graph-related-metrics)
  * [edge_disparity](#edge_disparity)
  * [triangular_strength](#triangular_strength)
* [Graph utilities](#graph-utilities)
  * [union_of_maximum_spanning_trees](#union_of_maximum_spanning_trees)
  * [largest_connected_component](#largest_connected_component)
  * [crop_to_largest_connected_component](#crop_to_largest_connected_component)
  * [largest_connected_component_subgraph](#largest_connected_component_subgraph)
  * [remove_edges](#remove_edges)
  * [filter_edges](#filter_edges)
  * [remove_nodes](#remove_nodes)
  * [filter_nodes](#filter_nodes)
  * [remove_leaves](#remove_leaves)
  * [filter_leaves](#filter_leaves)
* [Learning](#learning)
  * [floatsam_threshold_learner](#floatsam_threshold_learner)
* [Reading & Writing](#reading-&-writing)
  * [read_graphology_json](#read_graphology_json)
  * [write_graphology_json](#write_graphology_json)

---

### Tabular data to graphs

#### table_to_bipartite_graph

Function creating a bipartite graph from the given tabular data.

*Arguments*

* **table** *Iterable[Indexable] or pd.DataFrame* - input tabular data. It can
be a large variety of things as long as it is 1. iterable and 2.
yields indexable values such as dicts or lists. This can for instance
be a list of dicts, a csv.DictReader stream etc. It also supports
pandas DataFrame if the library is installed.
* **first_part_col** *Hashable* - the name of the column containing the
value representing a node in the resulting graph's first part.
It could be the index if your rows are lists or a key if your rows
are dicts instead.
* **second_par_col** *Hashable* - the name of the column containing the
value representing a node in the resulting graph's second part.
It could be the index if your rows are lists or a key if your rows
are dicts instead.
* **node_part_attr** *str, optional* `"part"` - name of the node attribute containing
the part it belongs to.
* **edge_weight_attr** *str, optional* `"weight"` - name of the edge attribute containing
its weight, i.e. the number of times it was found in the table.
* **first_part_data** *Sequence or Callable or Mapping, optional* `None` - sequence (i.e. list, tuple etc.)
of column from rows to keep as node attributes for the graph's first part.
Can also be a mapping (i.e. dict) from row column to node attribute
name to create.
Can also be a function returning a dict of those attributes.
Note that the first row containing a given node will take precedence over
subsequent ones regarding data to include.
* **second_part_data** *Sequence or Callable or Mapping, optional* `None` - sequence (i.e. list, tuple etc.)
of column from rows to keep as node attributes for the graph's second part.
Can also be a mapping (i.e. dict) from row column to node attribute
name to create.
Can also be a function returning a dict of those attributes.
Note that the first row containing a given node will take precedence over
subsequent ones regarding data to include.
* **first_part_name** *Hashable, optional* `None` - can be given to rename the first part.
* **second_part_name** *Hashable, optional* `None` - can be given to rename the second part.
to display as graph's second part's name.
* **disjoint_keys** *bool, optional* `False` - set this to True as an optimization
mechanism if you know your part keys are disjoint, i.e. if no
value for `first_part_col` can also be found in `second_part_col`.
If you enable this option wrongly, the result can be incorrect.

*Returns*

*nx.AnyGraph* - the bipartite graph.

#### tables_to_graph

Function creating a graph from two tables: a table of nodes and a table of edges.

```python
from pelote import tables_to_graph

table_nodes = [
    {"name": "alice", "age": 50},
    {"name": "bob", "age": 12}
]

table_edges = [
    {"source": "alice", "target": "bob", "weight": 0.8},
    {"source": "bob", "target": "alice", "weight": 0.2}
]

g = tables_to_graph(
    table_nodes, table_edges, node_col="name", node_data=["age"], edge_data=["weight"], directed=True
)
```

*Arguments*

* **nodes_table** *Iterable[Indexable] or pd.DataFrame* - input nodes in tabular
format. It can be a large variety of things as long as it is 1. iterable
and 2. yields indexable values such as dicts or lists. This can for
instance be a list of dicts, a csv.DictReader stream etc. It also supports
pandas DataFrame if the library is installed.
* **edges_table** *Iterable[Indexable] or pd.DataFrame* - input edges in tabular
format.
* **node_col** *Hashable, optional* `"key"` - the name of the column containing the nodes in the nodes_table.
It could be the index if your rows are lists or a key if your rows
are dicts instead.
* **edge_source_col** *Hashable, optional* `"source"` - the name of the column containing the edges' source
nodes in the edges_table.
* **edge_target_col** *Hashable, optional* `"target"` - the name of the column containing the edges' target
nodes in the edges_table.
* **node_data** *Sequence, optional* `[]` - sequence (i.e. list, tuple etc.)
of columns' names from the nodes_table to keep as node attributes in the resulting graph.
* **edge_data** *Sequence, optional* `[]` - sequence (i.e. list, tuple etc.) of columns' names
from the edges_table to keep as edge attributes in the resulting graph, e.g. ["weight"].
* **count_rows_as_weight** *bool, optional* `False` - set this to True to compute a weight
attribute for each edge, corresponding to the number of times it was
found in the table. The name of this attribute is defined by the
`edge_weight_attr` parameter. If set to False, only the last occurrence of
each edge will be kept in the graph.
* **edge_weight_attr** *str, optional* `"weight"` - name of the edge attribute containing
its weight, i.e. the number of times it was found in the table, if
`count_rows_as_weight` is set to True.
* **add_missing_nodes** *bool, optional* `True` - set this to True to check that the edges' sources and targets
in the edges_table are all defined in the nodes_table.
* **directed** *bool, optional* `False` - whether the resulting graph must be directed.

*Returns*

*nx.AnyGraph* - the resulting graph.

#### edges_table_to_graph

Function creating a graph from a table of edges.

*Arguments*

* **edges_table** *Iterable[Indexable] or pd.DataFrame* - input edges in tabular
format. It can be a large variety of things as long as it is 1. iterable
and 2. yields indexable values such as dicts or lists. This can for
instance be a list of dicts, a csv.DictReader stream etc. It also supports
pandas DataFrame if the library is installed.
* **edge_source_col** *Hashable, optional* `"source"` - the name of the column containing the edges' source
nodes in the edges_table.
* **edge_target_col** *Hashable, optional* `"target"` - the name of the column containing the edges' target
nodes in the edges_table.
* **edge_data** *Sequence, optional* `[]` - sequence (i.e. list, tuple etc.) of columns' names
from the edges_table to keep as edge attributes in the resulting graph, e.g. ["weight"].
* **count_rows_as_weight** *bool, optional* `False` - set this to True to compute a weight
attribute for each edge, corresponding to the number of times it was
found in the table. The name of this attribute is defined by the
`edge_weight_attr` parameter. If set to False, only the last occurrence of
each edge will be kept in the graph.
* **edge_weight_attr** *str, optional* `"weight"` - name of the edge attribute containing
its weight, i.e. the number of times it was found in the table, if
`count_rows_as_weight` is set to True.
* **directed** *bool, optional* `False` - whether the resulting graph must be directed.

*Returns*

*nx.AnyGraph* - the resulting graph.

---

### Graphs to tabular data

#### graph_to_nodes_dataframe

Function converting the given networkx graph into a pandas DataFrame of
its nodes.

```python
from pelote import graph_to_nodes_dataframe

df = graph_to_nodes_dataframe(graph)
```

*Arguments*

* **nx.AnyGraph**  - a networkx graph instance
* **node_key_col** *str, optional* `"key"` - name of the DataFrame column containing
the node keys. If None, the node keys will be used as the DataFrame
index.

*Returns*

*pd.DataFrame* - A pandas DataFrame

#### graph_to_edges_dataframe

Function converting the given networkx graph into a pandas DataFrame of
its edges.

*Arguments*

* **nx.AnyGraph**  - a networkx graph instance
* **edge_source_col** *str, optional* `"source"` - name of the DataFrame column containing
the edge source.
* **edge_target_col** *str, optional* `"target"` - name of the DataFrame column containing
the edge target.
* **source_node_data** *Iterable or Mapping, optional* `None` - iterable of attribute names
or mapping from attribute names to column name to be used to add
columns to the resulting dataframe based on source node data.
* **target_node_data** *Iterable or Mapping, optional* `None` - iterable of attribute names
or mapping from attribute names to column name to be used to add
columns to the resulting dataframe based on target node data.

*Returns*

*pd.DataFrame* - A pandas DataFrame

#### graph_to_dataframes

Function converting the given networkx graph into two pandas DataFrames:
one for its nodes, one for its edges.

*Arguments*

* **nx.AnyGraph**  - a networkx graph instance
* **node_key_col** *str, optional* `"key"` - name of the node DataFrame column containing
the node keys. If None, the node keys will be used as the DataFrame
index.
* **edge_source_col** *str, optional* `"source"` - name of the edge DataFrame column containing
the edge source.
* **edge_target_col** *str, optional* `"target"` - name of the edge DataFrame column containing
the edge target.
* **source_node_data** *Iterable or Mapping, optional* `None` - iterable of attribute names
or mapping from attribute names to column name to be used to add
columns to the edge dataframe based on source node data.
* **target_node_data** *Iterable or Mapping, optional* `None` - iterable of attribute names
or mapping from attribute names to column name to be used to add
columns to the edge dataframe based on target node data.

*Returns*

*None* - (pd.DataFrame, pd.DataFrame)

---

### Graph projection

#### monopartite_projection

Function returning the monopartite projection of a given bipartite graph
wrt one of both partitions of the graph.

That is to say the resulting graph will keep a single type of nodes sharing
weighted edges based on the neighbors they shared in the bipartite graph.

```python
import networkx as nx
from pelote import monopartite_projection

bipartite = nx.Graph()
bipartite.add_nodes_from([1, 2, 3], part='account')
bipartite.add_nodes_from([4, 5, 6], part='color')
bipartite.add_edges_from([
    (1, 4),
    (1, 5),
    (2, 6),
    (3, 4),
    (3, 6)
])

# Resulting graph will only contain nodes [1, 2, 3]
# with edges: (1, 3) and (2, 3)
monopartite = monopartite_projection(bipartite, 'account')
```

*Arguments*

* **bipartite_graph** *nx.AnyGraph* - target graph. The function will raise
if given graph is not truly bipartite.
* **part_to_keep** *Hashable or Collection* - partition to keep in the projected
graph. It can either be the value of the part node attribute in the
given graph (a string, most commonly), or a collection (a set, list etc.)
holding the nodes composing the part to keep.
* **node_part_attr** *str, optional* `"part"` - name of the node attribute containing
the part the node belongs to.
* **edge_weight_attr** *str, optional* `"weight"` - name of the edge attribute containing
the edge's weight.
* **metric** *str, optional* `None` - one of "jaccard", "overlap", "cosine", "dice",
"binary_cosine", "pmi" or "dot_product". If not given, resulting weight
will be set to the size of neighbor intersection.
* **bipartition_check** *bool, optional* `True` - whether to check if given graph
is truly bipartite. You can disable this as an optimization
strategy if you know what you are doing.
* **weight_threshold** *float, optional* `None` - if an edge weight should be less
than this threshold we would not add it to the projected
monopartite graph.

*Returns*

*nx.Graph* - the projected monopartite graph.

---

### Graph sparsification

#### global_threshold_sparsification

Function returning a copy of the given graph without edges whose weight
is less than a given threshold.

*Arguments*

* **graph** *nx.AnyGraph* - target graph.
* **weight_threshold** *float* - weight threshold.
* **edge_weight_attr** *str, optional* - name of the edge weight attribute.
* **reverse** *bool, optional* - whether to reverse the threshold condition.
That is to say an edge would be removed if its weight is greater
than the threshold.
* **keep_connected** *bool, optional* `False` - whether to keep the graph connected
as it is using the UMST method.

*Returns*

*nx.AnyGraph* - the sparse graph.

#### multiscale_backbone

Function returning the multiscale backbone of the given graph, i.e. a copy
of the graph were we only kept "relevant" edges, as defined by a
statistical test where we compare the likelihood of a weighted edge existing
vs. the null model.

*Article*
> Serrano, M. Ángeles, Marián Boguná, and Alessandro Vespignani. "Extracting the multiscale backbone of complex weighted networks." Proceedings of the national academy of sciences 106.16 (2009): 6483-6488.

*References*

- https://www.pnas.org/content/pnas/106/16/6483.full.pdf
- https://en.wikipedia.org/wiki/Disparity_filter_algorithm_of_weighted_network

*Arguments*

* **graph** *nx.AnyGraph* - target graph.
* **alpha** *float, optional* `0.05` - alpha value for the statistical test. It can
be intuitively thought of as a p-value score for an edge to be
kept in the resulting graph.
* **edge_weight_attr** *str, optional* `"weight"` - name of the edge attribute holding
the edge's weight.
* **keep_connected** *bool, optional* `False` - whether to keep the graph connected
as it is using the UMST method.

*Returns*

*nx.AnyGraph* - the sparse graph.

---

### Miscellaneous graph-related metrics

#### edge_disparity

Function computing the disparity score of each edge in the given graph. This
score is typically used to extract the multiscale backbone of a weighted
graph.

The formula from the paper (relying on integral calculus) can be simplified
to become:

```
disparity(u, v) = min(
    (1 - normalizedWeight(u, v)) ^ (degree(u) - 1)),
    (1 - normalizedWeight(v, u)) ^ (degree(v) - 1))
)
```

where

```
normalizedWeight(u, v) = weight(u, v) / weightedDegree(u)
weightedDegree(u) = sum(weight(u, v) for v in neighbors(u))
```

This score can sometimes be found reversed likewise:

```
disparity(u, v) = max(
    1 - (1 - normalizedWeight(u, v)) ^ (degree(u) - 1)),
    1 - (1 - normalizedWeight(v, u)) ^ (degree(v) - 1))
)
```

so that higher score means better edges. We chose to keep the metric close
to the paper to keep the statistical test angle. This means that, in this
implementation at least, a low score for an edge means a high relevance and
increases its chances to be kept in the backbone.

Note that this algorithm has no proper definition for directed graphs and
is only useful if edges have varying weights. This said, it could be
possible to compute the disparity score only based on edge direction, if
we drop the min part.

*Article*
> Serrano, M. Ángeles, Marián Boguná, and Alessandro Vespignani. "Extracting the multiscale backbone of complex weighted networks." Proceedings of the national academy of sciences 106.16 (2009): 6483-6488.

*References*

- https://www.pnas.org/content/pnas/106/16/6483.full.pdf
- https://en.wikipedia.org/wiki/Disparity_filter_algorithm_of_weighted_network

*Arguments*

* **graph** *nx.AnyGraph* - target graph.
* **edge_weight_attr** *str, optional* `"weight"` - name of the edge attribute containing
its weight.
* **reverse** *bool, optional* `False` - whether to reverse the metric, i.e. higher weight
means more relevant edges.

*Returns*

*dict* - Dictionnary with edges - (source, target) tuples - as keys and the disparity scores as values

#### triangular_strength

Function returning a graph edges triangular strength, sometimes also called
Simmelian strength, i.e. the number of triangles each edge is a part of.

*Arguments*

* **graph** *nx.AnyGraph* - target graph.
* **full** *bool, optional* `False` - whether to return strength for every edge,
including those with strength = 0.

*Returns*

*dict* - mapping of edges to their triangular strength.

---

### Graph utilities

#### union_of_maximum_spanning_trees

Generator yielding the edges belonging to any Maximum Spanning Tree (MST) of
the given networkx graph.

Note that this function will give to each edge with no weight a default
weight of 1.

*Article*
> Arlind Nocaj, Mark Ortmann, and Ulrik Brandes "Untangling Hairballs. From 3 to 14 Degrees of Separation." Computer & Information Science, University of Konstanz, Germany, 2014, https://dx.doi.org/10.1007/978-3-662-45803-7_9.

*References*

- https://kops.uni-konstanz.de/bitstream/handle/123456789/30583/Nocaj_0-284485.pdf

*Arguments*

* **graph** *nx.AnyGraph* - target graph.
* **edge_weight_attr** *str, optional* `"weight"` - name of the edge weight attribute.

*Yields*

*tuple* - source, target, attributes

#### largest_connected_component

Function returning the largest connected component of given networkx graph
as a set of nodes.

Note that this function will consider any given graph as undirected and
will therefore work with weakly connected components in the directed case.

*Arguments*

* **graph** *nx.AnyGraph* - target graph.

*Returns*

*set* - set of nodes representing the largest connected component.

#### crop_to_largest_connected_component

Function mutating the given networkx graph in order to keep only the
largest connected component.

Note that this function will consider any given graph as undirected and
will therefore work with weakly connected components in the directed case.

*Arguments*

* **graph** *nx.AnyGraph* - target graph.

#### largest_connected_component_subgraph

Function returning the largest connected component subgraph of the given
networkx graph.

Note that this function will consider any given graph as undirected and
will therefore work with weakly connected components in the directed case.

*Arguments*

* **graph** *nx.AnyGraph* - target graph.
* **as_view** *bool, optional* `False` - whether to return the subgraph as a view.

*Returns*

*nx.AnyGraph* - the subgraph.

#### remove_edges

Function removing all edges that do not pass a predicate function from a
given networkx graph.

Note that this function mutates the given graph.

*Arguments*

* **graph** *nx.AnyGraph* - a networkx graph.
* **predicate** *callable* - a function taking each edge source, target and
attributes and returning True if you want to keep the edge or False
if you want to remove it.

#### filter_edges

Function returning a copy of the given networkx graph but without the edges
filtered out by the given predicate function

*Arguments*

* **graph** *nx.AnyGraph* - a networkx graph.
* **predicate** *callable* - a function taking each edge source, target and
attributes and returning True if you want to keep the edge or False
if you want to remove it.

*Returns*

*nx.AnyGraph* - the filtered graph.

#### remove_nodes

Function removing all nodes that do not pass a predicate function from a
given networkx graph.

Note that this function mutates the given graph.

```python
from pelote import remove_nodes

g = nx.Graph()
g.add_node(1, weight=22)
g.add_node(2, weight=4)
g.add_edge(1, 2)

remove_nodes(g, lambda n, a: a["weight"] >= 10)
```

*Arguments*

* **graph** *nx.AnyGraph* - a networkx graph.
* **predicate** *callable* - a function taking each node and node attributes
and returning True if you want to keep the node or False if you want
to remove it.

#### filter_nodes

Function returning a copy of the given networkx graph but without the nodes
filtered out by the given predicate function

```python
from pelote import filter_nodes

g = nx.Graph()
g.add_node(1, weight=22)
g.add_node(2, weight=4)
g.add_edge(1, 2)

h = filter_nodes(g, lambda n, a: a["weight"] >= 10)
```

*Arguments*

* **graph** *nx.AnyGraph* - a networkx graph.
* **predicate** *callable* - a function taking each node and node attributes
and returning True if you want to keep the node or False if you want
to remove it.

*Returns*

*nx.AnyGraph* - the filtered graph.

#### remove_leaves

Function removing all leaves of the graph, i.e. the nodes incident to a
single edge, i.e. the nodes with degree 1.

This function is not recursive and will only remove one layer of leaves.

Note that this function mutates the given graph.

```python
from pelote import remove_leaves

g = nx.Graph()
g.add_edge(1, 2)
g.add_edge(2, 3)

remove_leaves(g)

list(g.nodes)
>>> [2]
```

*Arguments*

* **graph** *nx.AnyGraph* - a networkx graph.

#### filter_leaves

Function returning a copy of the given networkx graph but without its leaves,
i.e. the nodes incident to a single edge, i.e. the nodes with degree 1.

This function is not recursive and will only filter only one layer of leaves.

```python
from pelote import remove_leaves

g = nx.Graph()
g.add_edge(1, 2)
g.add_edge(2, 3)

h = filter_leaves(g)

list(h.nodes)
>>> [2]
```

*Arguments*

* **graph** *nx.AnyGraph* - a networkx graph.

---

### Learning

#### floatsam_threshold_learner

Function using an iterative algorithm to try and find the best weight
threshold to apply to trim the given graph's edges while keeping the
underlying community structure.

It works by iteratively increasing the threshold and stopping as soon as
a significant connected component starts to drift away from the principal
one.

This is basically an optimization algorithm applied to a complex nonlinear
function using a very naive cost heuristic, but it works decently for typical
cases as it emulates the method used by hand by some researchers when they
perform this kind of task on Gephi, for instance.

When working on metrics where lower is better (i.e. edge disparity), you
can reverse the logic of the algorithm by tweaking `starting_threshold`
and giving a negative `learning_rate`.

*Arguments*

* **graph** *nx.Graph* - Graph to sparsify.
* **starting_threshold** *float, optional* `0.0` - Starting similarity threshold.
* **learning_rate** *float, optional* `0.05` - How much to increase the threshold
at each step of the algorithm.
* **max_drifter_order** *int, optional* - Max order of component to detach itself
from the principal one before stopping the algorithm. If not
provided it will default to the logarithm of the graph's largest
connected component's order.
* **edge_weight_attr** *str, optional* `"weight"` - Name of the weight attribute.
* **on_epoch** *callable, optional* - Function called on each epoch of the
algorithm with some metadata about iteration state.

*Returns*

*float* - The found threshold

---

### Reading & Writing

#### read_graphology_json

Function reading and parsing the given json file representing a serialized
[graphology](https://graphology.github.io/) graph as a networkx graph.

Note that this function cannot parse a true mixed graph since this is not
supported by networkx.

*Arguments*

* **target** *str or Path or file or dict* - target to read and parse. Can
be a string path, a Path instance, a file buffer or already
parsed JSON data as a dict.

*Returns*

*nx.AnyGraph* - a networkx graph instance.

#### write_graphology_json

Function serializing the given networkx graph as JSON, using the
[graphology](https://graphology.github.io/) format.

Note that both node keys and attribute names will be cast to string so
they can safely be represented in JSON. As such in some cases (where
your node keys and/or attribute names are not strings), this function
will not be bijective when used with `read_graphology_json`.

*Arguments*

* **graph** *nx.AnyGraph* - graph to serialize.
* **allow_mixed_keys** *bool, optional* `False` - whether to allow graph with mixed
node key types to be serialized nonetheless. Keys will always be
cast to string so keys might clash and produce an invalid
serialization. Only use this if you know what you are doing.
* **allow_invalid_attr_names** *bool, optional* `False` - whether to allow non-string
attribute names. Note that if you chose to allow them, some might
clash and produce an invalid serialization. Only use this if you
know what you are doing.

*Returns*

*dict* - JSON data

Raw data

            {
    "_id": null,
    "home_page": "http://github.com/medialab/pelote",
    "name": "pelote",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "network",
    "author": "Guillaume Plique",
    "author_email": "guillaume.plique@sciencespo.fr",
    "download_url": "https://files.pythonhosted.org/packages/cd/11/e29f7c3be2d457e9b96193a05158caa8d76e9009b66e56a927ae6d6ecf78/pelote-0.8.2.tar.gz",
    "platform": null,
    "description": "[![Build Status](https://github.com/medialab/pelote/workflows/Tests/badge.svg)](https://github.com/medialab/pelote/actions)\n\n# Pelote\n\nPelote is a python library full of graph-related functions that can be used to complement [networkx](https://networkx.org/) for higher-level tasks.\n\nIt mainly helps with the following things:\n\n- Conversion of tabular data to graphs (bipartites, citation etc. in the spirit of [Table2Net](https://medialab.github.io/table2net/))\n- Conversion of graphs to tabular data\n- Monopartite projections of bipartite graphs\n- Miscellaneous graph helper functions (filtering out nodes, edges etc.)\n- Sparsification of graphs\n- Reading & writing of graph formats not found in `networkx` (such as [graphology](https://graphology.github.io/) JSON)\n\nAs such it is the perfect companion to [ipysigma](https://github.com/Yomguithereal/ipysigma), our Jupyter widget that can render interactive graphs directly within your notebooks.\n\n## Installation\n\nYou can install `pelote` with pip with the following command:\n\n```\npip install pelote\n```\n\nIf you want to be able to use the library with `pandas`, you will need to install it also:\n\n```\npip install pandas\n```\n\n## Usage\n\n* [Tabular data to graphs](#tabular-data-to-graphs)\n  * [table_to_bipartite_graph](#table_to_bipartite_graph)\n  * [tables_to_graph](#tables_to_graph)\n  * [edges_table_to_graph](#edges_table_to_graph)\n* [Graphs to tabular data](#graphs-to-tabular-data)\n  * [graph_to_nodes_dataframe](#graph_to_nodes_dataframe)\n  * [graph_to_edges_dataframe](#graph_to_edges_dataframe)\n  * [graph_to_dataframes](#graph_to_dataframes)\n* [Graph projection](#graph-projection)\n  * [monopartite_projection](#monopartite_projection)\n* [Graph sparsification](#graph-sparsification)\n  * [global_threshold_sparsification](#global_threshold_sparsification)\n  * [multiscale_backbone](#multiscale_backbone)\n* [Miscellaneous graph-related metrics](#miscellaneous-graph-related-metrics)\n  * [edge_disparity](#edge_disparity)\n  * [triangular_strength](#triangular_strength)\n* [Graph utilities](#graph-utilities)\n  * [union_of_maximum_spanning_trees](#union_of_maximum_spanning_trees)\n  * [largest_connected_component](#largest_connected_component)\n  * [crop_to_largest_connected_component](#crop_to_largest_connected_component)\n  * [largest_connected_component_subgraph](#largest_connected_component_subgraph)\n  * [remove_edges](#remove_edges)\n  * [filter_edges](#filter_edges)\n  * [remove_nodes](#remove_nodes)\n  * [filter_nodes](#filter_nodes)\n  * [remove_leaves](#remove_leaves)\n  * [filter_leaves](#filter_leaves)\n* [Learning](#learning)\n  * [floatsam_threshold_learner](#floatsam_threshold_learner)\n* [Reading & Writing](#reading-&-writing)\n  * [read_graphology_json](#read_graphology_json)\n  * [write_graphology_json](#write_graphology_json)\n\n---\n\n### Tabular data to graphs\n\n#### table_to_bipartite_graph\n\nFunction creating a bipartite graph from the given tabular data.\n\n*Arguments*\n\n* **table** *Iterable[Indexable] or pd.DataFrame* - input tabular data. It can\nbe a large variety of things as long as it is 1. iterable and 2.\nyields indexable values such as dicts or lists. This can for instance\nbe a list of dicts, a csv.DictReader stream etc. It also supports\npandas DataFrame if the library is installed.\n* **first_part_col** *Hashable* - the name of the column containing the\nvalue representing a node in the resulting graph's first part.\nIt could be the index if your rows are lists or a key if your rows\nare dicts instead.\n* **second_par_col** *Hashable* - the name of the column containing the\nvalue representing a node in the resulting graph's second part.\nIt could be the index if your rows are lists or a key if your rows\nare dicts instead.\n* **node_part_attr** *str, optional* `\"part\"` - name of the node attribute containing\nthe part it belongs to.\n* **edge_weight_attr** *str, optional* `\"weight\"` - name of the edge attribute containing\nits weight, i.e. the number of times it was found in the table.\n* **first_part_data** *Sequence or Callable or Mapping, optional* `None` - sequence (i.e. list, tuple etc.)\nof column from rows to keep as node attributes for the graph's first part.\nCan also be a mapping (i.e. dict) from row column to node attribute\nname to create.\nCan also be a function returning a dict of those attributes.\nNote that the first row containing a given node will take precedence over\nsubsequent ones regarding data to include.\n* **second_part_data** *Sequence or Callable or Mapping, optional* `None` - sequence (i.e. list, tuple etc.)\nof column from rows to keep as node attributes for the graph's second part.\nCan also be a mapping (i.e. dict) from row column to node attribute\nname to create.\nCan also be a function returning a dict of those attributes.\nNote that the first row containing a given node will take precedence over\nsubsequent ones regarding data to include.\n* **first_part_name** *Hashable, optional* `None` - can be given to rename the first part.\n* **second_part_name** *Hashable, optional* `None` - can be given to rename the second part.\nto display as graph's second part's name.\n* **disjoint_keys** *bool, optional* `False` - set this to True as an optimization\nmechanism if you know your part keys are disjoint, i.e. if no\nvalue for `first_part_col` can also be found in `second_part_col`.\nIf you enable this option wrongly, the result can be incorrect.\n\n*Returns*\n\n*nx.AnyGraph* - the bipartite graph.\n\n#### tables_to_graph\n\nFunction creating a graph from two tables: a table of nodes and a table of edges.\n\n```python\nfrom pelote import tables_to_graph\n\ntable_nodes = [\n    {\"name\": \"alice\", \"age\": 50},\n    {\"name\": \"bob\", \"age\": 12}\n]\n\ntable_edges = [\n    {\"source\": \"alice\", \"target\": \"bob\", \"weight\": 0.8},\n    {\"source\": \"bob\", \"target\": \"alice\", \"weight\": 0.2}\n]\n\ng = tables_to_graph(\n    table_nodes, table_edges, node_col=\"name\", node_data=[\"age\"], edge_data=[\"weight\"], directed=True\n)\n```\n\n*Arguments*\n\n* **nodes_table** *Iterable[Indexable] or pd.DataFrame* - input nodes in tabular\nformat. It can be a large variety of things as long as it is 1. iterable\nand 2. yields indexable values such as dicts or lists. This can for\ninstance be a list of dicts, a csv.DictReader stream etc. It also supports\npandas DataFrame if the library is installed.\n* **edges_table** *Iterable[Indexable] or pd.DataFrame* - input edges in tabular\nformat.\n* **node_col** *Hashable, optional* `\"key\"` - the name of the column containing the nodes in the nodes_table.\nIt could be the index if your rows are lists or a key if your rows\nare dicts instead.\n* **edge_source_col** *Hashable, optional* `\"source\"` - the name of the column containing the edges' source\nnodes in the edges_table.\n* **edge_target_col** *Hashable, optional* `\"target\"` - the name of the column containing the edges' target\nnodes in the edges_table.\n* **node_data** *Sequence, optional* `[]` - sequence (i.e. list, tuple etc.)\nof columns' names from the nodes_table to keep as node attributes in the resulting graph.\n* **edge_data** *Sequence, optional* `[]` - sequence (i.e. list, tuple etc.) of columns' names\nfrom the edges_table to keep as edge attributes in the resulting graph, e.g. [\"weight\"].\n* **count_rows_as_weight** *bool, optional* `False` - set this to True to compute a weight\nattribute for each edge, corresponding to the number of times it was\nfound in the table. The name of this attribute is defined by the\n`edge_weight_attr` parameter. If set to False, only the last occurrence of\neach edge will be kept in the graph.\n* **edge_weight_attr** *str, optional* `\"weight\"` - name of the edge attribute containing\nits weight, i.e. the number of times it was found in the table, if\n`count_rows_as_weight` is set to True.\n* **add_missing_nodes** *bool, optional* `True` - set this to True to check that the edges' sources and targets\nin the edges_table are all defined in the nodes_table.\n* **directed** *bool, optional* `False` - whether the resulting graph must be directed.\n\n*Returns*\n\n*nx.AnyGraph* - the resulting graph.\n\n#### edges_table_to_graph\n\nFunction creating a graph from a table of edges.\n\n*Arguments*\n\n* **edges_table** *Iterable[Indexable] or pd.DataFrame* - input edges in tabular\nformat. It can be a large variety of things as long as it is 1. iterable\nand 2. yields indexable values such as dicts or lists. This can for\ninstance be a list of dicts, a csv.DictReader stream etc. It also supports\npandas DataFrame if the library is installed.\n* **edge_source_col** *Hashable, optional* `\"source\"` - the name of the column containing the edges' source\nnodes in the edges_table.\n* **edge_target_col** *Hashable, optional* `\"target\"` - the name of the column containing the edges' target\nnodes in the edges_table.\n* **edge_data** *Sequence, optional* `[]` - sequence (i.e. list, tuple etc.) of columns' names\nfrom the edges_table to keep as edge attributes in the resulting graph, e.g. [\"weight\"].\n* **count_rows_as_weight** *bool, optional* `False` - set this to True to compute a weight\nattribute for each edge, corresponding to the number of times it was\nfound in the table. The name of this attribute is defined by the\n`edge_weight_attr` parameter. If set to False, only the last occurrence of\neach edge will be kept in the graph.\n* **edge_weight_attr** *str, optional* `\"weight\"` - name of the edge attribute containing\nits weight, i.e. the number of times it was found in the table, if\n`count_rows_as_weight` is set to True.\n* **directed** *bool, optional* `False` - whether the resulting graph must be directed.\n\n*Returns*\n\n*nx.AnyGraph* - the resulting graph.\n\n---\n\n### Graphs to tabular data\n\n#### graph_to_nodes_dataframe\n\nFunction converting the given networkx graph into a pandas DataFrame of\nits nodes.\n\n```python\nfrom pelote import graph_to_nodes_dataframe\n\ndf = graph_to_nodes_dataframe(graph)\n```\n\n*Arguments*\n\n* **nx.AnyGraph**  - a networkx graph instance\n* **node_key_col** *str, optional* `\"key\"` - name of the DataFrame column containing\nthe node keys. If None, the node keys will be used as the DataFrame\nindex.\n\n*Returns*\n\n*pd.DataFrame* - A pandas DataFrame\n\n#### graph_to_edges_dataframe\n\nFunction converting the given networkx graph into a pandas DataFrame of\nits edges.\n\n*Arguments*\n\n* **nx.AnyGraph**  - a networkx graph instance\n* **edge_source_col** *str, optional* `\"source\"` - name of the DataFrame column containing\nthe edge source.\n* **edge_target_col** *str, optional* `\"target\"` - name of the DataFrame column containing\nthe edge target.\n* **source_node_data** *Iterable or Mapping, optional* `None` - iterable of attribute names\nor mapping from attribute names to column name to be used to add\ncolumns to the resulting dataframe based on source node data.\n* **target_node_data** *Iterable or Mapping, optional* `None` - iterable of attribute names\nor mapping from attribute names to column name to be used to add\ncolumns to the resulting dataframe based on target node data.\n\n*Returns*\n\n*pd.DataFrame* - A pandas DataFrame\n\n#### graph_to_dataframes\n\nFunction converting the given networkx graph into two pandas DataFrames:\none for its nodes, one for its edges.\n\n*Arguments*\n\n* **nx.AnyGraph**  - a networkx graph instance\n* **node_key_col** *str, optional* `\"key\"` - name of the node DataFrame column containing\nthe node keys. If None, the node keys will be used as the DataFrame\nindex.\n* **edge_source_col** *str, optional* `\"source\"` - name of the edge DataFrame column containing\nthe edge source.\n* **edge_target_col** *str, optional* `\"target\"` - name of the edge DataFrame column containing\nthe edge target.\n* **source_node_data** *Iterable or Mapping, optional* `None` - iterable of attribute names\nor mapping from attribute names to column name to be used to add\ncolumns to the edge dataframe based on source node data.\n* **target_node_data** *Iterable or Mapping, optional* `None` - iterable of attribute names\nor mapping from attribute names to column name to be used to add\ncolumns to the edge dataframe based on target node data.\n\n*Returns*\n\n*None* - (pd.DataFrame, pd.DataFrame)\n\n---\n\n### Graph projection\n\n#### monopartite_projection\n\nFunction returning the monopartite projection of a given bipartite graph\nwrt one of both partitions of the graph.\n\nThat is to say the resulting graph will keep a single type of nodes sharing\nweighted edges based on the neighbors they shared in the bipartite graph.\n\n```python\nimport networkx as nx\nfrom pelote import monopartite_projection\n\nbipartite = nx.Graph()\nbipartite.add_nodes_from([1, 2, 3], part='account')\nbipartite.add_nodes_from([4, 5, 6], part='color')\nbipartite.add_edges_from([\n    (1, 4),\n    (1, 5),\n    (2, 6),\n    (3, 4),\n    (3, 6)\n])\n\n# Resulting graph will only contain nodes [1, 2, 3]\n# with edges: (1, 3) and (2, 3)\nmonopartite = monopartite_projection(bipartite, 'account')\n```\n\n*Arguments*\n\n* **bipartite_graph** *nx.AnyGraph* - target graph. The function will raise\nif given graph is not truly bipartite.\n* **part_to_keep** *Hashable or Collection* - partition to keep in the projected\ngraph. It can either be the value of the part node attribute in the\ngiven graph (a string, most commonly), or a collection (a set, list etc.)\nholding the nodes composing the part to keep.\n* **node_part_attr** *str, optional* `\"part\"` - name of the node attribute containing\nthe part the node belongs to.\n* **edge_weight_attr** *str, optional* `\"weight\"` - name of the edge attribute containing\nthe edge's weight.\n* **metric** *str, optional* `None` - one of \"jaccard\", \"overlap\", \"cosine\", \"dice\",\n\"binary_cosine\", \"pmi\" or \"dot_product\". If not given, resulting weight\nwill be set to the size of neighbor intersection.\n* **bipartition_check** *bool, optional* `True` - whether to check if given graph\nis truly bipartite. You can disable this as an optimization\nstrategy if you know what you are doing.\n* **weight_threshold** *float, optional* `None` - if an edge weight should be less\nthan this threshold we would not add it to the projected\nmonopartite graph.\n\n*Returns*\n\n*nx.Graph* - the projected monopartite graph.\n\n---\n\n### Graph sparsification\n\n#### global_threshold_sparsification\n\nFunction returning a copy of the given graph without edges whose weight\nis less than a given threshold.\n\n*Arguments*\n\n* **graph** *nx.AnyGraph* - target graph.\n* **weight_threshold** *float* - weight threshold.\n* **edge_weight_attr** *str, optional* - name of the edge weight attribute.\n* **reverse** *bool, optional* - whether to reverse the threshold condition.\nThat is to say an edge would be removed if its weight is greater\nthan the threshold.\n* **keep_connected** *bool, optional* `False` - whether to keep the graph connected\nas it is using the UMST method.\n\n*Returns*\n\n*nx.AnyGraph* - the sparse graph.\n\n#### multiscale_backbone\n\nFunction returning the multiscale backbone of the given graph, i.e. a copy\nof the graph were we only kept \"relevant\" edges, as defined by a\nstatistical test where we compare the likelihood of a weighted edge existing\nvs. the null model.\n\n*Article*\n> Serrano, M. \u00c1ngeles, Mari\u00e1n Bogun\u00e1, and Alessandro Vespignani. \"Extracting the multiscale backbone of complex weighted networks.\" Proceedings of the national academy of sciences 106.16 (2009): 6483-6488.\n\n*References*\n\n- https://www.pnas.org/content/pnas/106/16/6483.full.pdf\n- https://en.wikipedia.org/wiki/Disparity_filter_algorithm_of_weighted_network\n\n*Arguments*\n\n* **graph** *nx.AnyGraph* - target graph.\n* **alpha** *float, optional* `0.05` - alpha value for the statistical test. It can\nbe intuitively thought of as a p-value score for an edge to be\nkept in the resulting graph.\n* **edge_weight_attr** *str, optional* `\"weight\"` - name of the edge attribute holding\nthe edge's weight.\n* **keep_connected** *bool, optional* `False` - whether to keep the graph connected\nas it is using the UMST method.\n\n*Returns*\n\n*nx.AnyGraph* - the sparse graph.\n\n---\n\n### Miscellaneous graph-related metrics\n\n#### edge_disparity\n\nFunction computing the disparity score of each edge in the given graph. This\nscore is typically used to extract the multiscale backbone of a weighted\ngraph.\n\nThe formula from the paper (relying on integral calculus) can be simplified\nto become:\n\n```\ndisparity(u, v) = min(\n    (1 - normalizedWeight(u, v)) ^ (degree(u) - 1)),\n    (1 - normalizedWeight(v, u)) ^ (degree(v) - 1))\n)\n```\n\nwhere\n\n```\nnormalizedWeight(u, v) = weight(u, v) / weightedDegree(u)\nweightedDegree(u) = sum(weight(u, v) for v in neighbors(u))\n```\n\nThis score can sometimes be found reversed likewise:\n\n```\ndisparity(u, v) = max(\n    1 - (1 - normalizedWeight(u, v)) ^ (degree(u) - 1)),\n    1 - (1 - normalizedWeight(v, u)) ^ (degree(v) - 1))\n)\n```\n\nso that higher score means better edges. We chose to keep the metric close\nto the paper to keep the statistical test angle. This means that, in this\nimplementation at least, a low score for an edge means a high relevance and\nincreases its chances to be kept in the backbone.\n\nNote that this algorithm has no proper definition for directed graphs and\nis only useful if edges have varying weights. This said, it could be\npossible to compute the disparity score only based on edge direction, if\nwe drop the min part.\n\n*Article*\n> Serrano, M. \u00c1ngeles, Mari\u00e1n Bogun\u00e1, and Alessandro Vespignani. \"Extracting the multiscale backbone of complex weighted networks.\" Proceedings of the national academy of sciences 106.16 (2009): 6483-6488.\n\n*References*\n\n- https://www.pnas.org/content/pnas/106/16/6483.full.pdf\n- https://en.wikipedia.org/wiki/Disparity_filter_algorithm_of_weighted_network\n\n*Arguments*\n\n* **graph** *nx.AnyGraph* - target graph.\n* **edge_weight_attr** *str, optional* `\"weight\"` - name of the edge attribute containing\nits weight.\n* **reverse** *bool, optional* `False` - whether to reverse the metric, i.e. higher weight\nmeans more relevant edges.\n\n*Returns*\n\n*dict* - Dictionnary with edges - (source, target) tuples - as keys and the disparity scores as values\n\n#### triangular_strength\n\nFunction returning a graph edges triangular strength, sometimes also called\nSimmelian strength, i.e. the number of triangles each edge is a part of.\n\n*Arguments*\n\n* **graph** *nx.AnyGraph* - target graph.\n* **full** *bool, optional* `False` - whether to return strength for every edge,\nincluding those with strength = 0.\n\n*Returns*\n\n*dict* - mapping of edges to their triangular strength.\n\n---\n\n### Graph utilities\n\n#### union_of_maximum_spanning_trees\n\nGenerator yielding the edges belonging to any Maximum Spanning Tree (MST) of\nthe given networkx graph.\n\nNote that this function will give to each edge with no weight a default\nweight of 1.\n\n*Article*\n> Arlind Nocaj, Mark Ortmann, and Ulrik Brandes \"Untangling Hairballs. From 3 to 14 Degrees of Separation.\" Computer & Information Science, University of Konstanz, Germany, 2014, https://dx.doi.org/10.1007/978-3-662-45803-7_9.\n\n*References*\n\n- https://kops.uni-konstanz.de/bitstream/handle/123456789/30583/Nocaj_0-284485.pdf\n\n*Arguments*\n\n* **graph** *nx.AnyGraph* - target graph.\n* **edge_weight_attr** *str, optional* `\"weight\"` - name of the edge weight attribute.\n\n*Yields*\n\n*tuple* - source, target, attributes\n\n#### largest_connected_component\n\nFunction returning the largest connected component of given networkx graph\nas a set of nodes.\n\nNote that this function will consider any given graph as undirected and\nwill therefore work with weakly connected components in the directed case.\n\n*Arguments*\n\n* **graph** *nx.AnyGraph* - target graph.\n\n*Returns*\n\n*set* - set of nodes representing the largest connected component.\n\n#### crop_to_largest_connected_component\n\nFunction mutating the given networkx graph in order to keep only the\nlargest connected component.\n\nNote that this function will consider any given graph as undirected and\nwill therefore work with weakly connected components in the directed case.\n\n*Arguments*\n\n* **graph** *nx.AnyGraph* - target graph.\n\n#### largest_connected_component_subgraph\n\nFunction returning the largest connected component subgraph of the given\nnetworkx graph.\n\nNote that this function will consider any given graph as undirected and\nwill therefore work with weakly connected components in the directed case.\n\n*Arguments*\n\n* **graph** *nx.AnyGraph* - target graph.\n* **as_view** *bool, optional* `False` - whether to return the subgraph as a view.\n\n*Returns*\n\n*nx.AnyGraph* - the subgraph.\n\n#### remove_edges\n\nFunction removing all edges that do not pass a predicate function from a\ngiven networkx graph.\n\nNote that this function mutates the given graph.\n\n*Arguments*\n\n* **graph** *nx.AnyGraph* - a networkx graph.\n* **predicate** *callable* - a function taking each edge source, target and\nattributes and returning True if you want to keep the edge or False\nif you want to remove it.\n\n#### filter_edges\n\nFunction returning a copy of the given networkx graph but without the edges\nfiltered out by the given predicate function\n\n*Arguments*\n\n* **graph** *nx.AnyGraph* - a networkx graph.\n* **predicate** *callable* - a function taking each edge source, target and\nattributes and returning True if you want to keep the edge or False\nif you want to remove it.\n\n*Returns*\n\n*nx.AnyGraph* - the filtered graph.\n\n#### remove_nodes\n\nFunction removing all nodes that do not pass a predicate function from a\ngiven networkx graph.\n\nNote that this function mutates the given graph.\n\n```python\nfrom pelote import remove_nodes\n\ng = nx.Graph()\ng.add_node(1, weight=22)\ng.add_node(2, weight=4)\ng.add_edge(1, 2)\n\nremove_nodes(g, lambda n, a: a[\"weight\"] >= 10)\n```\n\n*Arguments*\n\n* **graph** *nx.AnyGraph* - a networkx graph.\n* **predicate** *callable* - a function taking each node and node attributes\nand returning True if you want to keep the node or False if you want\nto remove it.\n\n#### filter_nodes\n\nFunction returning a copy of the given networkx graph but without the nodes\nfiltered out by the given predicate function\n\n```python\nfrom pelote import filter_nodes\n\ng = nx.Graph()\ng.add_node(1, weight=22)\ng.add_node(2, weight=4)\ng.add_edge(1, 2)\n\nh = filter_nodes(g, lambda n, a: a[\"weight\"] >= 10)\n```\n\n*Arguments*\n\n* **graph** *nx.AnyGraph* - a networkx graph.\n* **predicate** *callable* - a function taking each node and node attributes\nand returning True if you want to keep the node or False if you want\nto remove it.\n\n*Returns*\n\n*nx.AnyGraph* - the filtered graph.\n\n#### remove_leaves\n\nFunction removing all leaves of the graph, i.e. the nodes incident to a\nsingle edge, i.e. the nodes with degree 1.\n\nThis function is not recursive and will only remove one layer of leaves.\n\nNote that this function mutates the given graph.\n\n```python\nfrom pelote import remove_leaves\n\ng = nx.Graph()\ng.add_edge(1, 2)\ng.add_edge(2, 3)\n\nremove_leaves(g)\n\nlist(g.nodes)\n>>> [2]\n```\n\n*Arguments*\n\n* **graph** *nx.AnyGraph* - a networkx graph.\n\n#### filter_leaves\n\nFunction returning a copy of the given networkx graph but without its leaves,\ni.e. the nodes incident to a single edge, i.e. the nodes with degree 1.\n\nThis function is not recursive and will only filter only one layer of leaves.\n\n```python\nfrom pelote import remove_leaves\n\ng = nx.Graph()\ng.add_edge(1, 2)\ng.add_edge(2, 3)\n\nh = filter_leaves(g)\n\nlist(h.nodes)\n>>> [2]\n```\n\n*Arguments*\n\n* **graph** *nx.AnyGraph* - a networkx graph.\n\n---\n\n### Learning\n\n#### floatsam_threshold_learner\n\nFunction using an iterative algorithm to try and find the best weight\nthreshold to apply to trim the given graph's edges while keeping the\nunderlying community structure.\n\nIt works by iteratively increasing the threshold and stopping as soon as\na significant connected component starts to drift away from the principal\none.\n\nThis is basically an optimization algorithm applied to a complex nonlinear\nfunction using a very naive cost heuristic, but it works decently for typical\ncases as it emulates the method used by hand by some researchers when they\nperform this kind of task on Gephi, for instance.\n\nWhen working on metrics where lower is better (i.e. edge disparity), you\ncan reverse the logic of the algorithm by tweaking `starting_threshold`\nand giving a negative `learning_rate`.\n\n*Arguments*\n\n* **graph** *nx.Graph* - Graph to sparsify.\n* **starting_threshold** *float, optional* `0.0` - Starting similarity threshold.\n* **learning_rate** *float, optional* `0.05` - How much to increase the threshold\nat each step of the algorithm.\n* **max_drifter_order** *int, optional* - Max order of component to detach itself\nfrom the principal one before stopping the algorithm. If not\nprovided it will default to the logarithm of the graph's largest\nconnected component's order.\n* **edge_weight_attr** *str, optional* `\"weight\"` - Name of the weight attribute.\n* **on_epoch** *callable, optional* - Function called on each epoch of the\nalgorithm with some metadata about iteration state.\n\n*Returns*\n\n*float* - The found threshold\n\n---\n\n### Reading & Writing\n\n#### read_graphology_json\n\nFunction reading and parsing the given json file representing a serialized\n[graphology](https://graphology.github.io/) graph as a networkx graph.\n\nNote that this function cannot parse a true mixed graph since this is not\nsupported by networkx.\n\n*Arguments*\n\n* **target** *str or Path or file or dict* - target to read and parse. Can\nbe a string path, a Path instance, a file buffer or already\nparsed JSON data as a dict.\n\n*Returns*\n\n*nx.AnyGraph* - a networkx graph instance.\n\n#### write_graphology_json\n\nFunction serializing the given networkx graph as JSON, using the\n[graphology](https://graphology.github.io/) format.\n\nNote that both node keys and attribute names will be cast to string so\nthey can safely be represented in JSON. As such in some cases (where\nyour node keys and/or attribute names are not strings), this function\nwill not be bijective when used with `read_graphology_json`.\n\n*Arguments*\n\n* **graph** *nx.AnyGraph* - graph to serialize.\n* **allow_mixed_keys** *bool, optional* `False` - whether to allow graph with mixed\nnode key types to be serialized nonetheless. Keys will always be\ncast to string so keys might clash and produce an invalid\nserialization. Only use this if you know what you are doing.\n* **allow_invalid_attr_names** *bool, optional* `False` - whether to allow non-string\nattribute names. Note that if you chose to allow them, some might\nclash and produce an invalid serialization. Only use this if you\nknow what you are doing.\n\n*Returns*\n\n*dict* - JSON data\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Collection of network-related utilities for python.",
    "version": "0.8.2",
    "project_urls": {
        "Homepage": "http://github.com/medialab/pelote"
    },
    "split_keywords": [
        "network"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0fa5d93fb4c83b9f7760b5be941e74e82356d242bfc4f4c7b621d05b26ac712c",
                "md5": "a29f485a5a1ace36724ef8f6408471f8",
                "sha256": "134fdcddf25f4566dffb69de6d26262195df658370b1e2f0d7aa93e5f74bf067"
            },
            "downloads": -1,
            "filename": "pelote-0.8.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a29f485a5a1ace36724ef8f6408471f8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 41378,
            "upload_time": "2023-09-08T12:04:02",
            "upload_time_iso_8601": "2023-09-08T12:04:02.553040Z",
            "url": "https://files.pythonhosted.org/packages/0f/a5/d93fb4c83b9f7760b5be941e74e82356d242bfc4f4c7b621d05b26ac712c/pelote-0.8.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cd11e29f7c3be2d457e9b96193a05158caa8d76e9009b66e56a927ae6d6ecf78",
                "md5": "741346dc868e5316893827ad15e92cfe",
                "sha256": "45b9d0aafaf3a45b33bcb764ec52c9cf166f4647b8d5bb4d11100d55dec4fd1e"
            },
            "downloads": -1,
            "filename": "pelote-0.8.2.tar.gz",
            "has_sig": false,
            "md5_digest": "741346dc868e5316893827ad15e92cfe",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 38644,
            "upload_time": "2023-09-08T12:04:04",
            "upload_time_iso_8601": "2023-09-08T12:04:04.255805Z",
            "url": "https://files.pythonhosted.org/packages/cd/11/e29f7c3be2d457e9b96193a05158caa8d76e9009b66e56a927ae6d6ecf78/pelote-0.8.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-08 12:04:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "medialab",
    "github_project": "pelote",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "ebbe",
            "specs": [
                [
                    "==",
                    "1.9.0"
                ]
            ]
        },
        {
            "name": "llist",
            "specs": [
                [
                    "==",
                    "0.7.1"
                ]
            ]
        },
        {
            "name": "pyllist",
            "specs": [
                [
                    "==",
                    "0.3"
                ]
            ]
        },
        {
            "name": "networkx",
            "specs": [
                [
                    "==",
                    "2.5.1"
                ]
            ]
        },
        {
            "name": "black",
            "specs": []
        },
        {
            "name": "docstring-parser",
            "specs": [
                [
                    "==",
                    "0.13"
                ]
            ]
        },
        {
            "name": "importchecker",
            "specs": [
                [
                    "==",
                    "2.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "1.1.5"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "7.0.1"
                ]
            ]
        },
        {
            "name": "ipysigma",
            "specs": []
        },
        {
            "name": "jupyterlab",
            "specs": []
        },
        {
            "name": "matplotlib",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "setuptools",
            "specs": []
        },
        {
            "name": "twine",
            "specs": []
        },
        {
            "name": "wheel",
            "specs": []
        }
    ],
    "lcname": "pelote"
}

Guillaume Plique