neo4j-backup


Nameneo4j-backup JSON
Version 0.4.1 PyPI version JSON
download
home_page
SummaryA simple way to backup and restore Neo4j databases without using dump files.
upload_time2023-03-13 19:07:41
maintainer
docs_urlNone
author
requires_python>=3.6
license
keywords neo4j
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # neo4j-backup
This is a project that is designed at downloading and uploading data in Neo4j Knowledge Graphs without Neo4j 
Enterprise edition. This is useful if Dump files cannot be used, or if you want to download the data that exist in
a Neo4j database in a human-readable format.

# Overview

This repo is not intended to replace the native Neo4j backup Dump files,
but rather to be used in instances where a Dump file is not an option.
Such as moving data from Neo4j to a different type of database.

Also, this repository aims to be as simple as possible with two main purposes. 
To download a Neo4j graph without using a Dump file and to be able to upload that data to a different Neo4j graph.
Only simple Cypher statements are used to import and extract data from Neo4j.
The data is downloaded as json files.
The json files are compressed with the gzip protocol by default,
but you can choose to export the data without compressing.

When creating this tool, Enterprise tools were not used. 
Meaning that APOC or any other Enterprise/Desktop exclusive tool is not needed, 
and this can be used on the community edition of Neo4j. 

This repo differs from most other Neo4j backup repos. 
For this tool, the Neo4j graph does not need to be a specific instance. 
This code will work with a Neo4j database that is running in Aura, docker, desktop, command-line, server, etc. 
The only requirements are that the python neo4j-driver needs to be able to connect to the database,
that your user has read and show constraints privileges for downloading data, and write privileges for importing data.

# Packages required

`python: >= 3.5`

`neo4j: >= 4.3.0`

`tqdm: >= 4.10.0`

# Installation

`pip install neo4j-backup`

# Supported Neo4j Database Versions

`Neo4j >= 4.1`

# Usage

The exact parameters that should be used to access the database depends on the version of the Neo4j graph that you
are trying to access. The python neo4j-driver documentation can be found at 
https://neo4j.com/docs/api/python-driver/current/api.html.

There will be times when the script will ask the user for input for (y/N) questions, 
you can set `input_yes=True` to enter yes to all input questions.

# Constraints

The only constraint that is supported in all insistence of Neo4j are `Unique node property constraints`.
Currently, this is the only supported type of constraint in this codebase.
If you need to transfer the extracted data to an Enterprise edition database,
the other constraints can be added after importing the data
(assuming the data is structured in compliance with such constraints).

## Extracting

```python
from neo4j import GraphDatabase
from neo4j_backup import Extractor

if __name__ == "__main__":
    uri = "neo4j://localhost:7687"
    username = "neo4j"
    password = "password"
    encrypted = False
    trust = "TRUST_ALL_CERTIFICATES"
    driver = GraphDatabase.driver(uri, auth=(username, password), encrypted=encrypted, trust=trust)

    database = "neo4j"

    project_dir = "data_dump"
    input_yes = False
    compress = True
    indent_size = 4  # Indent of json files
    json_file_size: int = int("0xFFFF", 16)  # Size of data in memory before dumping
    extractor = Extractor(project_dir=project_dir, driver=driver, database=database,
                          input_yes=input_yes, compress=compress, indent_size=indent_size,
                          pull_uniqueness_constraints=True)
    extractor.extract_data()
```

## Importing

```python
from neo4j import GraphDatabase
from neo4j_backup import Importer

if __name__ == "__main__":
    uri = "neo4j://localhost:7687"
    username = "neo4j"
    password = "password"
    encrypted = False
    trust = "TRUST_ALL_CERTIFICATES"
    driver = GraphDatabase.driver(uri, auth=(username, password), encrypted=encrypted, trust=trust)

    database = "dev"

    project_dir = "data_dump"
    input_yes = False
    importer = Importer(project_dir=project_dir, driver=driver, database=database, input_yes=input_yes)
    importer.import_data()
```

# Data Storage

This example shows saved data from a Node with complex data types.

```json
{
    "node_id": 72,
    "node_labels": "Person:XX",
    "node_props": {
        "bool_example": false,
        "born": 1956,
        "int_example": 1,
        "point_3d_example": "$point({x: 3.0, y: 0.0, z: 2.0, crs: 'cartesian-3d'})",
        "localdatetime_example": "$datetime('2015-07-04T19:32:24.000000000+00:00')",
        "date_example": "$date('1999-01-01')",
        "point_2d_example": "$point({x: 3.0, y: 0.0, crs: 'cartesian'})",
        "datetime_example": "$datetime('2015-06-24T12:50:35.556000000+01:00')",
        "point_geo_3d_example": "$point({x: 56.0, y: 12.0, z: 2, crs: 'wgs-84-3d'})",
        "duration_example": "$duration('P5M1DT12H')",
        "odd_prop": "time('21:40:32.142000000+01:00')",
        "example_hash_prop": "e425c2703cf3d6d063f3de705c8f55a9be5e6fdee62bba4d95ce209352da2833",
        "name": "Tom Hanks",
        "localtime_example": "$time('12:50:35.556000000+00:00')",
        "point_geo_2d_example": "$point({x: 56.0, y: 12.0, crs: 'wgs-84'})",
        "float_example": 0.334,
        "time_example": "$time('21:40:32.142000000+01:00')",
        "array_example": [
            true,
            false
        ]
        },
    "hash_props": {
        "example_hash_prop": "$time('21:40:32.142000000+01:00')"
    }
}
```
Note that the `odd_prop` was originally stored as a string in the database.
Any string stored in the backup that starts with ["$point(", "$date(", "$time(", "$datetime(", "$duration("] 
represents a temporal or spatial value, and is stored as a literal value.
Any string that starts with a literal value is assumed to always be either a temporal or spatial value. 

If there happens to be a string that exists in the database that starts with a literal value string,
a sha256 has is generated from the property string and stored in `hash_props`.
For most neo4j databases, `hash_props` will be an empty dict for all nodes and relationships.
But, this step is taken to be sure the database backups cannot be intentionally or accidentally broken by 
oddly formatted strings.

Relationships are stored in a very similar fashion.
Example showing a Relationship with complex property values.

```json
{
    "rel_id": 224,
    "start_node_id": 71,
    "end_node_id": 150,
    "rel_type": "ACTED_IN",
    "rel_props": {
        "bool_example": false,
        "roles": [
            "Chuck Noland"
        ],
        "born": 1956,
        "int_example": 1,
        "point_3d_example": "$point({x: 3.0, y: 0.0, z: 2.0, crs: 'cartesian-3d'})",
        "localdatetime_example": "$datetime('2015-07-04T19:32:24.000000000+00:00')",
        "date_example": "$date('1999-01-01')",
        "point_2d_example": "$point({x: 3.0, y: 0.0, crs: 'cartesian'})",
        "datetime_example": "$datetime('2015-06-24T12:50:35.556000000+01:00')",
        "point_geo_3d_example": "$point({x: 56.0, y: 12.0, z: 2, crs: 'wgs-84-3d'})",
        "duration_example": "$duration('P5M1DT12H')",
        "odd_prop": "time('21:40:32.142000000+01:00')",
        "example_hash_prop": "e425c2703cf3d6d063f3de705c8f55a9be5e6fdee62bba4d95ce209352da2833",
        "name": "Tom Hanks",
        "localtime_example": "$time('12:50:35.556000000+00:00')",
        "point_geo_2d_example": "$point({x: 56.0, y: 12.0, crs: 'wgs-84'})",
        "float_example": 0.334,
        "time_example": "$time('21:40:32.142000000+01:00')",
        "array_example": [
            true,
            false
        ]
    },
    "hash_props": {
        "example_hash_prop": "$time('21:40:32.142000000+01:00')"
    }
}
```

The full list of supported property types to be extracted are:
Integer, Float, String, Boolean, Point, Date, Time, LocalTime, DateTime, LocalDateTime, and Duration.
As well as arrays, but arrays are treated as second class properties and have many restrictions in Neo4j.

Temporal values can be saved, but the python-neo4j driver makes no distinction between
- Time and LocalTime
- DateTime and LocalDateTime

The only difference with the time zone cannot be specified in local times.

The following point SRID types are supported and saved as:
- 7203 : 2d-cartesian-point
- 9157 : 3d-cartesian-point
- 4326 : 2d-WGS-84-point
- 4979 : 3d-WGS-84-point

All the data is extracted to the tree structure:
- data
  - nodes_<index>.json.gz -> list of nodes
  - nodes_<index>.json.gz
  - nodes_<index>.json.gz
  - ...
  - relationships_<index>.json.gz -> list of relationships
  - relationships_<index>.json.gz
  - relationships_<index>.json.gz
  - ...
- compressed.json -> A boolean indicating whether data is compressed or not
- db_id.json -> ID of db
- node_labels.json -> List of all Node labels
- property_keys.json -> List of all property keys
- rel_types.json -> List of all Relationship types
- uniqueness_constraints.json -> List of uniqueness constraints in Neo4j db
- unique_prop_key.json -> Some unique property that does not exist in db

# Notes About Importing Data into Neo4j

An internal ID property is made when creating Nodes and properties. 
Since this script does not read the underlying file in the Neo4j database, 
some unique identifier is needed to MATCH nodes on.
Forcing the user to pass a map of unique keys for each Node is not reasonable.
This temporary internal ID property is removed from each Node at the very end.
The Neo4j database still stores that this property existed at some point on a Node,
so the property will show up on the left side of the Neo4j Desktop and when running "CALL db.propertyKeys()".
The temporary property key can not be removed from the list of internal property keys, and the issue is reported at
https://github.com/neo4j/neo4j/issues/10941.

If you need to extract data from a database with this tool, 
and are importing to a database where you have access to the neo4j-admin console.
One option is to use the Importer from neo4j_import to import the initial data into a database,
then dump that database and restore it to a new database. 
Or, you can use a tool like store-utils https://github.com/jexp/store-utils.

This really is not so much an issue, more so as an inconvenience.
If you are using the Neo4j Desktop,
it is highly recommended to just use the built-in Dump tool.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "neo4j-backup",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "neo4j",
    "author": "",
    "author_email": "Andres Hyer <andreshyer@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d7/71/e0ae2b52b594b56d0af2c9724623902355e5680a4c04a467ff207271c260/neo4j_backup-0.4.1.tar.gz",
    "platform": null,
    "description": "# neo4j-backup\nThis is a project that is designed at downloading and uploading data in Neo4j Knowledge Graphs without Neo4j \nEnterprise edition. This is useful if Dump files cannot be used, or if you want to download the data that exist in\na Neo4j database in a human-readable format.\n\n# Overview\n\nThis repo is not intended to replace the native Neo4j backup Dump files,\nbut rather to be used in instances where a Dump file is not an option.\nSuch as moving data from Neo4j to a different type of database.\n\nAlso, this repository aims to be as simple as possible with two main purposes. \nTo download a Neo4j graph without using a Dump file and to be able to upload that data to a different Neo4j graph.\nOnly simple Cypher statements are used to import and extract data from Neo4j.\nThe data is downloaded as json files.\nThe json files are compressed with the gzip protocol by default,\nbut you can choose to export the data without compressing.\n\nWhen creating this tool, Enterprise tools were not used. \nMeaning that APOC or any other Enterprise/Desktop exclusive tool is not needed, \nand this can be used on the community edition of Neo4j. \n\nThis repo differs from most other Neo4j backup repos. \nFor this tool, the Neo4j graph does not need to be a specific instance. \nThis code will work with a Neo4j database that is running in Aura, docker, desktop, command-line, server, etc. \nThe only requirements are that the python neo4j-driver needs to be able to connect to the database,\nthat your user has read and show constraints privileges for downloading data, and write privileges for importing data.\n\n# Packages required\n\n`python: >= 3.5`\n\n`neo4j: >= 4.3.0`\n\n`tqdm: >= 4.10.0`\n\n# Installation\n\n`pip install neo4j-backup`\n\n# Supported Neo4j Database Versions\n\n`Neo4j >= 4.1`\n\n# Usage\n\nThe exact parameters that should be used to access the database depends on the version of the Neo4j graph that you\nare trying to access. The python neo4j-driver documentation can be found at \nhttps://neo4j.com/docs/api/python-driver/current/api.html.\n\nThere will be times when the script will ask the user for input for (y/N) questions, \nyou can set `input_yes=True` to enter yes to all input questions.\n\n# Constraints\n\nThe only constraint that is supported in all insistence of Neo4j are `Unique node property constraints`.\nCurrently, this is the only supported type of constraint in this codebase.\nIf you need to transfer the extracted data to an Enterprise edition database,\nthe other constraints can be added after importing the data\n(assuming the data is structured in compliance with such constraints).\n\n## Extracting\n\n```python\nfrom neo4j import GraphDatabase\nfrom neo4j_backup import Extractor\n\nif __name__ == \"__main__\":\n    uri = \"neo4j://localhost:7687\"\n    username = \"neo4j\"\n    password = \"password\"\n    encrypted = False\n    trust = \"TRUST_ALL_CERTIFICATES\"\n    driver = GraphDatabase.driver(uri, auth=(username, password), encrypted=encrypted, trust=trust)\n\n    database = \"neo4j\"\n\n    project_dir = \"data_dump\"\n    input_yes = False\n    compress = True\n    indent_size = 4  # Indent of json files\n    json_file_size: int = int(\"0xFFFF\", 16)  # Size of data in memory before dumping\n    extractor = Extractor(project_dir=project_dir, driver=driver, database=database,\n                          input_yes=input_yes, compress=compress, indent_size=indent_size,\n                          pull_uniqueness_constraints=True)\n    extractor.extract_data()\n```\n\n## Importing\n\n```python\nfrom neo4j import GraphDatabase\nfrom neo4j_backup import Importer\n\nif __name__ == \"__main__\":\n    uri = \"neo4j://localhost:7687\"\n    username = \"neo4j\"\n    password = \"password\"\n    encrypted = False\n    trust = \"TRUST_ALL_CERTIFICATES\"\n    driver = GraphDatabase.driver(uri, auth=(username, password), encrypted=encrypted, trust=trust)\n\n    database = \"dev\"\n\n    project_dir = \"data_dump\"\n    input_yes = False\n    importer = Importer(project_dir=project_dir, driver=driver, database=database, input_yes=input_yes)\n    importer.import_data()\n```\n\n# Data Storage\n\nThis example shows saved data from a Node with complex data types.\n\n```json\n{\n    \"node_id\": 72,\n    \"node_labels\": \"Person:XX\",\n    \"node_props\": {\n        \"bool_example\": false,\n        \"born\": 1956,\n        \"int_example\": 1,\n        \"point_3d_example\": \"$point({x: 3.0, y: 0.0, z: 2.0, crs: 'cartesian-3d'})\",\n        \"localdatetime_example\": \"$datetime('2015-07-04T19:32:24.000000000+00:00')\",\n        \"date_example\": \"$date('1999-01-01')\",\n        \"point_2d_example\": \"$point({x: 3.0, y: 0.0, crs: 'cartesian'})\",\n        \"datetime_example\": \"$datetime('2015-06-24T12:50:35.556000000+01:00')\",\n        \"point_geo_3d_example\": \"$point({x: 56.0, y: 12.0, z: 2, crs: 'wgs-84-3d'})\",\n        \"duration_example\": \"$duration('P5M1DT12H')\",\n        \"odd_prop\": \"time('21:40:32.142000000+01:00')\",\n        \"example_hash_prop\": \"e425c2703cf3d6d063f3de705c8f55a9be5e6fdee62bba4d95ce209352da2833\",\n        \"name\": \"Tom Hanks\",\n        \"localtime_example\": \"$time('12:50:35.556000000+00:00')\",\n        \"point_geo_2d_example\": \"$point({x: 56.0, y: 12.0, crs: 'wgs-84'})\",\n        \"float_example\": 0.334,\n        \"time_example\": \"$time('21:40:32.142000000+01:00')\",\n        \"array_example\": [\n            true,\n            false\n        ]\n        },\n    \"hash_props\": {\n        \"example_hash_prop\": \"$time('21:40:32.142000000+01:00')\"\n    }\n}\n```\nNote that the `odd_prop` was originally stored as a string in the database.\nAny string stored in the backup that starts with [\"$point(\", \"$date(\", \"$time(\", \"$datetime(\", \"$duration(\"] \nrepresents a temporal or spatial value, and is stored as a literal value.\nAny string that starts with a literal value is assumed to always be either a temporal or spatial value. \n\nIf there happens to be a string that exists in the database that starts with a literal value string,\na sha256 has is generated from the property string and stored in `hash_props`.\nFor most neo4j databases, `hash_props` will be an empty dict for all nodes and relationships.\nBut, this step is taken to be sure the database backups cannot be intentionally or accidentally broken by \noddly formatted strings.\n\nRelationships are stored in a very similar fashion.\nExample showing a Relationship with complex property values.\n\n```json\n{\n    \"rel_id\": 224,\n    \"start_node_id\": 71,\n    \"end_node_id\": 150,\n    \"rel_type\": \"ACTED_IN\",\n    \"rel_props\": {\n        \"bool_example\": false,\n        \"roles\": [\n            \"Chuck Noland\"\n        ],\n        \"born\": 1956,\n        \"int_example\": 1,\n        \"point_3d_example\": \"$point({x: 3.0, y: 0.0, z: 2.0, crs: 'cartesian-3d'})\",\n        \"localdatetime_example\": \"$datetime('2015-07-04T19:32:24.000000000+00:00')\",\n        \"date_example\": \"$date('1999-01-01')\",\n        \"point_2d_example\": \"$point({x: 3.0, y: 0.0, crs: 'cartesian'})\",\n        \"datetime_example\": \"$datetime('2015-06-24T12:50:35.556000000+01:00')\",\n        \"point_geo_3d_example\": \"$point({x: 56.0, y: 12.0, z: 2, crs: 'wgs-84-3d'})\",\n        \"duration_example\": \"$duration('P5M1DT12H')\",\n        \"odd_prop\": \"time('21:40:32.142000000+01:00')\",\n        \"example_hash_prop\": \"e425c2703cf3d6d063f3de705c8f55a9be5e6fdee62bba4d95ce209352da2833\",\n        \"name\": \"Tom Hanks\",\n        \"localtime_example\": \"$time('12:50:35.556000000+00:00')\",\n        \"point_geo_2d_example\": \"$point({x: 56.0, y: 12.0, crs: 'wgs-84'})\",\n        \"float_example\": 0.334,\n        \"time_example\": \"$time('21:40:32.142000000+01:00')\",\n        \"array_example\": [\n            true,\n            false\n        ]\n    },\n    \"hash_props\": {\n        \"example_hash_prop\": \"$time('21:40:32.142000000+01:00')\"\n    }\n}\n```\n\nThe full list of supported property types to be extracted are:\nInteger, Float, String, Boolean, Point, Date, Time, LocalTime, DateTime, LocalDateTime, and Duration.\nAs well as arrays, but arrays are treated as second class properties and have many restrictions in Neo4j.\n\nTemporal values can be saved, but the python-neo4j driver makes no distinction between\n- Time and LocalTime\n- DateTime and LocalDateTime\n\nThe only difference with the time zone cannot be specified in local times.\n\nThe following point SRID types are supported and saved as:\n- 7203 : 2d-cartesian-point\n- 9157 : 3d-cartesian-point\n- 4326 : 2d-WGS-84-point\n- 4979 : 3d-WGS-84-point\n\nAll the data is extracted to the tree structure:\n- data\n  - nodes_<index>.json.gz -> list of nodes\n  - nodes_<index>.json.gz\n  - nodes_<index>.json.gz\n  - ...\n  - relationships_<index>.json.gz -> list of relationships\n  - relationships_<index>.json.gz\n  - relationships_<index>.json.gz\n  - ...\n- compressed.json -> A boolean indicating whether data is compressed or not\n- db_id.json -> ID of db\n- node_labels.json -> List of all Node labels\n- property_keys.json -> List of all property keys\n- rel_types.json -> List of all Relationship types\n- uniqueness_constraints.json -> List of uniqueness constraints in Neo4j db\n- unique_prop_key.json -> Some unique property that does not exist in db\n\n# Notes About Importing Data into Neo4j\n\nAn internal ID property is made when creating Nodes and properties. \nSince this script does not read the underlying file in the Neo4j database, \nsome unique identifier is needed to MATCH nodes on.\nForcing the user to pass a map of unique keys for each Node is not reasonable.\nThis temporary internal ID property is removed from each Node at the very end.\nThe Neo4j database still stores that this property existed at some point on a Node,\nso the property will show up on the left side of the Neo4j Desktop and when running \"CALL db.propertyKeys()\".\nThe temporary property key can not be removed from the list of internal property keys, and the issue is reported at\nhttps://github.com/neo4j/neo4j/issues/10941.\n\nIf you need to extract data from a database with this tool, \nand are importing to a database where you have access to the neo4j-admin console.\nOne option is to use the Importer from neo4j_import to import the initial data into a database,\nthen dump that database and restore it to a new database. \nOr, you can use a tool like store-utils https://github.com/jexp/store-utils.\n\nThis really is not so much an issue, more so as an inconvenience.\nIf you are using the Neo4j Desktop,\nit is highly recommended to just use the built-in Dump tool.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A simple way to backup and restore Neo4j databases without using dump files.",
    "version": "0.4.1",
    "split_keywords": [
        "neo4j"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "17b0df3ddee47c32b95657aa8d7bedf2a86384e336dfe3fc233d28c1c43c65d4",
                "md5": "bfdecec3942449a56f2dfb7fcd530180",
                "sha256": "1976b7602a580e2e89ea6b7406a911c3ac16ab18cda9697209a135e4c80733b8"
            },
            "downloads": -1,
            "filename": "neo4j_backup-0.4.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bfdecec3942449a56f2dfb7fcd530180",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 16384,
            "upload_time": "2023-03-13T19:07:39",
            "upload_time_iso_8601": "2023-03-13T19:07:39.641064Z",
            "url": "https://files.pythonhosted.org/packages/17/b0/df3ddee47c32b95657aa8d7bedf2a86384e336dfe3fc233d28c1c43c65d4/neo4j_backup-0.4.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d771e0ae2b52b594b56d0af2c9724623902355e5680a4c04a467ff207271c260",
                "md5": "1e91bf914abe095bdc4e51e5a2f18375",
                "sha256": "d7038771b7362cf88b57f0b61ac39855bbeb8e1dcaf39a74006c622b39d7d3be"
            },
            "downloads": -1,
            "filename": "neo4j_backup-0.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "1e91bf914abe095bdc4e51e5a2f18375",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 20232,
            "upload_time": "2023-03-13T19:07:41",
            "upload_time_iso_8601": "2023-03-13T19:07:41.833174Z",
            "url": "https://files.pythonhosted.org/packages/d7/71/e0ae2b52b594b56d0af2c9724623902355e5680a4c04a467ff207271c260/neo4j_backup-0.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-13 19:07:41",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "neo4j-backup"
}
        
Elapsed time: 0.04961s