Name | neo4j-runway JSON |
Version |
0.1.0
JSON |
| download |
home_page | None |
Summary | A Python library that contains tools for data discovery, data model generation and ingestion for the Neo4j graph database. |
upload_time | 2024-05-06 15:23:26 |
maintainer | None |
docs_url | None |
author | Alex Gilmore |
requires_python | <4.0,>=3.10 |
license | MIT |
keywords |
graph
neo4j
data model
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Neo4j Runway
Neo4j Runway is a Python library that simplifies the process of migrating your relational data into a graph. It provides tools that abstract communication with OpenAI to run discovery on your data and generate a data model, as well as tools to generate ingestion code and load your data into a Neo4j instance.
## Key Features
- **Data Discovery**: Harness OpenAI LLMs to provide valuable insights from your data
- **Graph Data Modeling**: Utilize OpenAI and the [Instructor](https://github.com/jxnl/instructor) Python library to create valid graph data models
- **Code Generation**: Generate ingestion code for your preferred method of loading data
- **Data Ingestion**: Load your data using Runway's built in implementation of [PyIngest](https://github.com/neo4j-field/pyingest) - Neo4j's popular ingestion tool
## Requirements
Runway uses graphviz to visualize data models. To enjoy this feature please download [graphviz](https://www.graphviz.org/download/).
You'll need a Neo4j instance to fully utilize Runway. Start up a free cloud hosted [Aura](https://console.neo4j.io) instance or download the [Neo4j Desktop app](https://neo4j.com/download/).
## Get Running in Minutes
```
pip install neo4j-runway
```
Now let's walk through a basic example.
Here we import the modules we'll be using.
```Python
import pandas as pd
from neo4j_runway import Discovery, GraphDataModeler, IngestionGenerator, LLM, PyIngest
```
### Discovery
Now we define a General Description of our data, provide brief descriptions of the columns of interest and load the data with Pandas.
```Python
USER_GENERATED_INPUT = {
'General Description': 'This is data on different countries.',
'id': 'unique id for a country.',
'name': 'the country name.',
'phone_code': 'country area code.',
'capital': 'the capital of the country.',
'currency_name': "name of the country's currency.",
'region': 'primary region of the country.',
'subregion': 'subregion location of the country.',
'timezones': 'timezones contained within the country borders.',
'latitude': 'the latitude coordinate of the country center.',
'longitude': 'the longitude coordinate of the country center.'
}
data = pd.read_csv("data/csv/countries.csv")
```
We then initialize our llm. By default we use GPT-4 and define our OpenAI API key in an environment variable.
```Python
llm = LLM()
```
And we run discovery on our data.
```Python
disc = Discovery(llm=llm, user_input=USER_GENERATED_INPUT, data=data)
discovery = disc.run()
```
### Data Modeling
We can now pass our Discovery object to a GraphDataModeler to generate our initial data model. A Discovery object isn't required here, but it provides rich context to the LLM to achieve the best results.
```Python
gdm = GraphDataModeler(llm=llm, discovery=disc)
initial_model = gdm.create_initial_model()
```
If we have graphviz installed, we can take a look at our model.
```Python
gdm.current_model.visualize()
```
![countries-first-model.svg](./images/countries-first-model.svg)
Let's make some corrections to our model and view the results.
```Python
gdm.iterate_model(user_corrections="""
Make Region node have a HAS_SUBREGION relationship with Subregion node.
Remove The relationship between Country and Region.
""")
gdm.current_model.visualize()
```
![countries-second-model.svg](./images/countries-second-model.svg)
### Code Generation
We can now use our data model to generate some ingestion code.
```Python
gen = IngestionGenerator(data_model=gdm.current_model,
username="neo4j", password="password",
uri="bolt://localhost:7687", database="neo4j",
csv_dir="data/csv/", csv_name="countries.csv")
pyingest_yaml = gen.generate_pyingest_yaml_string()
```
### Ingestion
We will use the generated PyIngest yaml config to ingest our CSV into our Neo4j instance.
```Python
PyIngest(yaml_string=pyingest_yaml, dataframe=data)
```
We can also save this as a .yaml file and use with the original [PyIngest](https://github.com/neo4j-field/pyingest).
```Python
gen.generate_pyingest_yaml_file(file_name="countries")
```
Here's a snapshot of our new graph!
![countries-graph.png](./images/countries-graph-white-background.png)
## Limitations
The current project is in beta and has the following limitations:
- Single CSV input only
- Nodes may only have a single label
- Only uniqueness constraints are supported
- Relationships may not have uniqueness constraints
- CSV columns that refer to the same node property are not supported
- Only OpenAI models may be used at this time
- The modified PyIngest function included with Runway only supports loading a local Pandas DataFrame
Raw data
{
"_id": null,
"home_page": null,
"name": "neo4j-runway",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": "graph, neo4j, data model",
"author": "Alex Gilmore",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/de/17/7ef182320a8472fb4a035474bc1f3182dd392a4e98f923f37f54bb001a5c/neo4j_runway-0.1.0.tar.gz",
"platform": null,
"description": "# Neo4j Runway\nNeo4j Runway is a Python library that simplifies the process of migrating your relational data into a graph. It provides tools that abstract communication with OpenAI to run discovery on your data and generate a data model, as well as tools to generate ingestion code and load your data into a Neo4j instance.\n\n## Key Features\n\n- **Data Discovery**: Harness OpenAI LLMs to provide valuable insights from your data\n- **Graph Data Modeling**: Utilize OpenAI and the [Instructor](https://github.com/jxnl/instructor) Python library to create valid graph data models\n- **Code Generation**: Generate ingestion code for your preferred method of loading data\n- **Data Ingestion**: Load your data using Runway's built in implementation of [PyIngest](https://github.com/neo4j-field/pyingest) - Neo4j's popular ingestion tool\n\n## Requirements\nRunway uses graphviz to visualize data models. To enjoy this feature please download [graphviz](https://www.graphviz.org/download/).\n\nYou'll need a Neo4j instance to fully utilize Runway. Start up a free cloud hosted [Aura](https://console.neo4j.io) instance or download the [Neo4j Desktop app](https://neo4j.com/download/).\n\n## Get Running in Minutes\n\n```\npip install neo4j-runway\n```\n\nNow let's walk through a basic example.\n\nHere we import the modules we'll be using.\n```Python\nimport pandas as pd\n\nfrom neo4j_runway import Discovery, GraphDataModeler, IngestionGenerator, LLM, PyIngest\n\n```\n### Discovery\nNow we define a General Description of our data, provide brief descriptions of the columns of interest and load the data with Pandas.\n```Python\nUSER_GENERATED_INPUT = {\n 'General Description': 'This is data on different countries.',\n 'id': 'unique id for a country.',\n 'name': 'the country name.',\n 'phone_code': 'country area code.',\n 'capital': 'the capital of the country.',\n 'currency_name': \"name of the country's currency.\",\n 'region': 'primary region of the country.',\n 'subregion': 'subregion location of the country.',\n 'timezones': 'timezones contained within the country borders.',\n 'latitude': 'the latitude coordinate of the country center.',\n 'longitude': 'the longitude coordinate of the country center.'\n}\n\ndata = pd.read_csv(\"data/csv/countries.csv\")\n```\n\nWe then initialize our llm. By default we use GPT-4 and define our OpenAI API key in an environment variable.\n```Python\nllm = LLM()\n```\n\nAnd we run discovery on our data.\n```Python\ndisc = Discovery(llm=llm, user_input=USER_GENERATED_INPUT, data=data)\ndiscovery = disc.run()\n```\n\n### Data Modeling\nWe can now pass our Discovery object to a GraphDataModeler to generate our initial data model. A Discovery object isn't required here, but it provides rich context to the LLM to achieve the best results.\n```Python\ngdm = GraphDataModeler(llm=llm, discovery=disc)\ninitial_model = gdm.create_initial_model()\n```\nIf we have graphviz installed, we can take a look at our model.\n```Python\ngdm.current_model.visualize()\n```\n![countries-first-model.svg](./images/countries-first-model.svg)\n\nLet's make some corrections to our model and view the results.\n```Python\ngdm.iterate_model(user_corrections=\"\"\"\nMake Region node have a HAS_SUBREGION relationship with Subregion node. \nRemove The relationship between Country and Region.\n\"\"\")\ngdm.current_model.visualize()\n```\n![countries-second-model.svg](./images/countries-second-model.svg)\n\n### Code Generation\nWe can now use our data model to generate some ingestion code.\n\n```Python\ngen = IngestionGenerator(data_model=gdm.current_model, \n username=\"neo4j\", password=\"password\", \n uri=\"bolt://localhost:7687\", database=\"neo4j\", \n csv_dir=\"data/csv/\", csv_name=\"countries.csv\")\n\npyingest_yaml = gen.generate_pyingest_yaml_string()\n\n```\n### Ingestion\nWe will use the generated PyIngest yaml config to ingest our CSV into our Neo4j instance. \n```Python\nPyIngest(yaml_string=pyingest_yaml, dataframe=data)\n```\nWe can also save this as a .yaml file and use with the original [PyIngest](https://github.com/neo4j-field/pyingest).\n```Python\ngen.generate_pyingest_yaml_file(file_name=\"countries\")\n```\nHere's a snapshot of our new graph!\n\n![countries-graph.png](./images/countries-graph-white-background.png)\n\n## Limitations\nThe current project is in beta and has the following limitations:\n- Single CSV input only\n- Nodes may only have a single label\n- Only uniqueness constraints are supported\n- Relationships may not have uniqueness constraints\n- CSV columns that refer to the same node property are not supported\n- Only OpenAI models may be used at this time\n- The modified PyIngest function included with Runway only supports loading a local Pandas DataFrame\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python library that contains tools for data discovery, data model generation and ingestion for the Neo4j graph database.",
"version": "0.1.0",
"project_urls": null,
"split_keywords": [
"graph",
" neo4j",
" data model"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6a48168402de2a657424f6a1b372d072ae92f617ddaee1436a1a958a0aa7739d",
"md5": "14139c14152cc6400b33993bca5a39b4",
"sha256": "a09027bdbbef289e175f5221a70fc3dc4d55e093b98aa8844ac7e449104537bb"
},
"downloads": -1,
"filename": "neo4j_runway-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "14139c14152cc6400b33993bca5a39b4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 28718,
"upload_time": "2024-05-06T15:23:25",
"upload_time_iso_8601": "2024-05-06T15:23:25.359737Z",
"url": "https://files.pythonhosted.org/packages/6a/48/168402de2a657424f6a1b372d072ae92f617ddaee1436a1a958a0aa7739d/neo4j_runway-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "de177ef182320a8472fb4a035474bc1f3182dd392a4e98f923f37f54bb001a5c",
"md5": "6bfcaef739271ac3eb8f26dd2807e2c6",
"sha256": "a88a87c4fa2128c9d09a47c47ab7f9ea94388f1bacdcb66ccd8f2adf6b4fc455"
},
"downloads": -1,
"filename": "neo4j_runway-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "6bfcaef739271ac3eb8f26dd2807e2c6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 23711,
"upload_time": "2024-05-06T15:23:26",
"upload_time_iso_8601": "2024-05-06T15:23:26.809070Z",
"url": "https://files.pythonhosted.org/packages/de/17/7ef182320a8472fb4a035474bc1f3182dd392a4e98f923f37f54bb001a5c/neo4j_runway-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-06 15:23:26",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "neo4j-runway"
}