Name | networkdisk JSON |
Version |
1.1.9
JSON |
| download |
home_page | None |
Summary | NetworkDisk: On disk graph manipulation |
upload_time | 2024-12-10 13:09:19 |
maintainer | None |
docs_url | None |
author | None |
requires_python | None |
license | BSD-3-Clause |
keywords |
database
graph
networkx
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
NetworkDisk: On-disk graph manipulation
=======================================
NetworkDisk provides a way to manipulate graphs on disk.
The goal is to be as compatible as possible with (Di)Graph objects of the `NetworkX <https://networkx.org/>`_ package,
but lifting memory requirements and providing persistence of the Graph.
We aim to achieve full retro-compatibility with NetworkX core methods to ensure
that most algorithms will work as-is.
Some algorithms are not suited for on-disk graphs and will thus have poor performance.
Some will be fast enough to be used as-is.
Audience
--------
The target audience for NetworkDisk are users of NetworkX that want to manipulate on-disk graphs
without worrying about database-specific technology or learning a new database language.
No knowledge of databases is required for simple use of this module. A good knowledge of the backend
is only required for advanced usage and performance tuning.
Pro and cons
------------
There are several motivations to use on-disk graphs rather than in-memory graphs.
1. Lack of resources: The graph is too large to fit in RAM.
2. Persistence requirements and graph sharing: The graph must be saved on disk. You could combine the `networkx` and `pickle` modules to do that instead, but *NetworkDisk* saves the graph automatically, avoiding version conflict and allowing a DB user to directly access the graph DB. Also, loading a graph does not require any parsing or pre-processing.
3. Transaction support: If an algorithm changes the graph and then fails, you wish to rollback the partial changes instead of keeping them.
4. Concurrency control: You want many users to access the same graph.
5. Symbolic manipulation of subgraphs: You may want to transform the Graph implicitly without actually computing the transformation.
The main drawback of using `NetworkDisk` rather than `NetworkX` is the performance loss. For direct graph manipulation, the expected penalty
is of one order of magnitude (at least ten times slower) and for complex graph algorithms the penalty can be much worse.
Recommended Usage
-----------------
For very large graphs, we recommend to use NetworkDisk to extract a subgraph small enough to fit in RAM and then to manipulate
the subgraph as a classical NetworkX graph. For this reason, NetworkDisk can efficiently extract subgraphs
and can efficiently export a NetworkDisk graph to NetworkX.
Designing algorithms for NetworkDisk
------------------------------------
It is possible to adapt standard graph algorithms to be used efficiently on disk. It requires a specific attention
to memory management as well as avoiding repeated calls to the Backend as much
as possible.
You should avoid graph algorithms which must either copy or go through the entire graph to build a dedicated DataStructure in RAM. Graph exploration algorithms (shortest path, random walk for instance) should have usable performance.
Schemas
-------
NetworkDisk can be thought of as a dedicated `ORM <https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping>`_ for NetworkX.
That is, it is a mapping from the structure underlying NetworkX graphs to a relational database. The goal of this mapping is to reduce the cost penalty of performing many disk accesses.
In the codebase of NetworkDisk, we do not enforce one specific schema mapping.
It can be adapted to an already existing database. It is however a complicated task to design a schema mapping that
provides a correct implementation Backend. Therefore, some default fixed schemas are proposed and ready to use.
RoadMap
-------
MultiGraph and MultiDiGraph
^^^^^^^^^^^^^^^^^^^^^^^^^^^
The NetworkX implementation provides classes for MultiDiGraph and MultiGraph.
We do not provide them yet. Some specific work has to be done to adapt the current
schema of Graph and DiGraph to these situations.
Other Database Engines
^^^^^^^^^^^^^^^^^^^^^^
So far, NetworkDisk only supports SQLite local databases. A `PostgreSQL <https://www.postgresql.org/>`_
version is high on the "TODO" list of the project. Other database engines are possible and could
be easily adapted.
At its core, NetworkDisk was built with flexibility of the backend technology in mind. Adapting
from SQLite to other backends requires some routine work. The core difficulty is to provide
adjusted graph mappings that take advantage of the specificities of the engine.
For instance, PostgreSQL supports the JSON datatype, which could be useful to
improve performance.
Cache Policy
^^^^^^^^^^^^
A crude Cache policy is provided in NetworkDisk without any control either on the memory consumption
nor on any time limit to discard cached elements.
Abstract cache policies with some reasonable implementations could be added to improve this state of affairs.
Bulk cache warmup could also be of interest to start without an empty cache without having to perform
many small queries.
NoSQL relational Engines
^^^^^^^^^^^^^^^^^^^^^^^^
Storing graphs in a Graph Database Engine is also a possibility. If the data-model
fits, it could take advantage of many graph-dedicated optimizations. Other type of DataBases
could help as well (for instance distributed Key-Value stores paired with indexation engines).
A full study system per system is required and will be the subject of further works.
Indexable data fields
^^^^^^^^^^^^^^^^^^^^^
NetworkX does not provide any indexation properties nor any data-graph operator.
Some dedicated work even on NetworkX base classes could be performed.
Graph transformation and Regular Path Queries
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
One of the key features of NetworkDisk (compared to standard graph databases) is the possibility
of performing graph transformations through query rewriting.
The performance of such a transformation depends of the inner complexity of the rewriting,
the efficiency of the query optimizer, and the possibility of complex indexation approach.
A completely flexible approach to these operations is thus doomed and it should be consider separately
for each backend.
Free software
-------------
NetworkDisk is free software; you can redistribute it and/or modify it under the
terms of the `3-clause BSD License`. We welcome contributions.
Join us on `GitLab <https://gitlab.inria.fr/guillonb/networkdisk>`_.
Raw data
{
"_id": null,
"home_page": null,
"name": "networkdisk",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "database, graph, networkx",
"author": null,
"author_email": "Bruno Guillon <bruno.guillon@uca.fr>, Charles Paperman <charles.paperman@univ-lille.fr>",
"download_url": "https://files.pythonhosted.org/packages/12/2e/3be34c18400c92684394d92a3deefe769365d766a779ce467061569e671a/networkdisk-1.1.9.tar.gz",
"platform": null,
"description": "NetworkDisk: On-disk graph manipulation\n=======================================\n\nNetworkDisk provides a way to manipulate graphs on disk.\nThe goal is to be as compatible as possible with (Di)Graph objects of the `NetworkX <https://networkx.org/>`_ package,\nbut lifting memory requirements and providing persistence of the Graph.\n\nWe aim to achieve full retro-compatibility with NetworkX core methods to ensure\nthat most algorithms will work as-is.\nSome algorithms are not suited for on-disk graphs and will thus have poor performance.\nSome will be fast enough to be used as-is.\n\n\nAudience\n--------\n\nThe target audience for NetworkDisk are users of NetworkX that want to manipulate on-disk graphs\nwithout worrying about database-specific technology or learning a new database language.\nNo knowledge of databases is required for simple use of this module. A good knowledge of the backend\nis only required for advanced usage and performance tuning.\n\nPro and cons\n------------\n\nThere are several motivations to use on-disk graphs rather than in-memory graphs.\n\n1. Lack of resources: The graph is too large to fit in RAM.\n2. Persistence requirements and graph sharing: The graph must be saved on disk. You could combine the `networkx` and `pickle` modules to do that instead, but *NetworkDisk* saves the graph automatically, avoiding version conflict and allowing a DB user to directly access the graph DB. Also, loading a graph does not require any parsing or pre-processing.\n3. Transaction support: If an algorithm changes the graph and then fails, you wish to rollback the partial changes instead of keeping them.\n4. Concurrency control: You want many users to access the same graph.\n5. Symbolic manipulation of subgraphs: You may want to transform the Graph implicitly without actually computing the transformation.\n\nThe main drawback of using `NetworkDisk` rather than `NetworkX` is the performance loss. For direct graph manipulation, the expected penalty\nis of one order of magnitude (at least ten times slower) and for complex graph algorithms the penalty can be much worse.\n\nRecommended Usage\n-----------------\n\nFor very large graphs, we recommend to use NetworkDisk to extract a subgraph small enough to fit in RAM and then to manipulate\nthe subgraph as a classical NetworkX graph. For this reason, NetworkDisk can efficiently extract subgraphs \nand can efficiently export a NetworkDisk graph to NetworkX.\n\n\nDesigning algorithms for NetworkDisk\n------------------------------------\n\nIt is possible to adapt standard graph algorithms to be used efficiently on disk. It requires a specific attention\nto memory management as well as avoiding repeated calls to the Backend as much\nas possible.\n\nYou should avoid graph algorithms which must either copy or go through the entire graph to build a dedicated DataStructure in RAM. Graph exploration algorithms (shortest path, random walk for instance) should have usable performance.\n\nSchemas\n-------\n\nNetworkDisk can be thought of as a dedicated `ORM <https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping>`_ for NetworkX.\nThat is, it is a mapping from the structure underlying NetworkX graphs to a relational database. The goal of this mapping is to reduce the cost penalty of performing many disk accesses.\n\nIn the codebase of NetworkDisk, we do not enforce one specific schema mapping.\nIt can be adapted to an already existing database. It is however a complicated task to design a schema mapping that\nprovides a correct implementation Backend. Therefore, some default fixed schemas are proposed and ready to use.\n\nRoadMap\n-------\n\nMultiGraph and MultiDiGraph\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe NetworkX implementation provides classes for MultiDiGraph and MultiGraph.\nWe do not provide them yet. Some specific work has to be done to adapt the current\nschema of Graph and DiGraph to these situations.\n\nOther Database Engines\n^^^^^^^^^^^^^^^^^^^^^^\n\nSo far, NetworkDisk only supports SQLite local databases. A `PostgreSQL <https://www.postgresql.org/>`_\nversion is high on the \"TODO\" list of the project. Other database engines are possible and could\nbe easily adapted.\n\nAt its core, NetworkDisk was built with flexibility of the backend technology in mind. Adapting\nfrom SQLite to other backends requires some routine work. The core difficulty is to provide\nadjusted graph mappings that take advantage of the specificities of the engine.\nFor instance, PostgreSQL supports the JSON datatype, which could be useful to\nimprove performance.\n\nCache Policy\n^^^^^^^^^^^^\n\nA crude Cache policy is provided in NetworkDisk without any control either on the memory consumption\nnor on any time limit to discard cached elements.\n\nAbstract cache policies with some reasonable implementations could be added to improve this state of affairs.\n\nBulk cache warmup could also be of interest to start without an empty cache without having to perform\nmany small queries. \n\nNoSQL relational Engines\n^^^^^^^^^^^^^^^^^^^^^^^^\nStoring graphs in a Graph Database Engine is also a possibility. If the data-model\nfits, it could take advantage of many graph-dedicated optimizations. Other type of DataBases\ncould help as well (for instance distributed Key-Value stores paired with indexation engines).\n\nA full study system per system is required and will be the subject of further works.\n\nIndexable data fields\n^^^^^^^^^^^^^^^^^^^^^\n\nNetworkX does not provide any indexation properties nor any data-graph operator.\nSome dedicated work even on NetworkX base classes could be performed.\n\nGraph transformation and Regular Path Queries\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nOne of the key features of NetworkDisk (compared to standard graph databases) is the possibility\nof performing graph transformations through query rewriting.\nThe performance of such a transformation depends of the inner complexity of the rewriting,\nthe efficiency of the query optimizer, and the possibility of complex indexation approach.\n\nA completely flexible approach to these operations is thus doomed and it should be consider separately\nfor each backend.\n\n\n\nFree software\n-------------\n\nNetworkDisk is free software; you can redistribute it and/or modify it under the\nterms of the `3-clause BSD License`. We welcome contributions.\nJoin us on `GitLab <https://gitlab.inria.fr/guillonb/networkdisk>`_.\n",
"bugtrack_url": null,
"license": "BSD-3-Clause",
"summary": "NetworkDisk: On disk graph manipulation",
"version": "1.1.9",
"project_urls": {
"Homepage": "https://networkdisk.inria.fr",
"documentation": "https://networkdisk.inria.fr/documentation",
"repository": "https://gitlab.inria.fr/guillonb/networkdisk"
},
"split_keywords": [
"database",
" graph",
" networkx"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "45f03f3352a79a186a2712ddcf5ce47c064c3bddbc836702c0e8c7a911a79811",
"md5": "b73ff1357e40bf565748318c3fa98b68",
"sha256": "489468c1f9863ecdbfee8f5207b05610ffce8b17cb4460c6764557e7dde5dee1"
},
"downloads": -1,
"filename": "networkdisk-1.1.9-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "b73ff1357e40bf565748318c3fa98b68",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 176746,
"upload_time": "2024-12-10T13:09:11",
"upload_time_iso_8601": "2024-12-10T13:09:11.752545Z",
"url": "https://files.pythonhosted.org/packages/45/f0/3f3352a79a186a2712ddcf5ce47c064c3bddbc836702c0e8c7a911a79811/networkdisk-1.1.9-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "122e3be34c18400c92684394d92a3deefe769365d766a779ce467061569e671a",
"md5": "61d16bdbb9bb6a3e2b40f7b8e0fa7d24",
"sha256": "e618ac101fdce677a1beae935f0924c451608706f8b9925895d76ac3743ca648"
},
"downloads": -1,
"filename": "networkdisk-1.1.9.tar.gz",
"has_sig": false,
"md5_digest": "61d16bdbb9bb6a3e2b40f7b8e0fa7d24",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 143471,
"upload_time": "2024-12-10T13:09:19",
"upload_time_iso_8601": "2024-12-10T13:09:19.457521Z",
"url": "https://files.pythonhosted.org/packages/12/2e/3be34c18400c92684394d92a3deefe769365d766a779ce467061569e671a/networkdisk-1.1.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-10 13:09:19",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "networkdisk"
}