neo4j-runway


Nameneo4j-runway JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryA Python library that contains tools for data discovery, data model generation and ingestion for the Neo4j graph database.
upload_time2024-05-06 15:23:26
maintainerNone
docs_urlNone
authorAlex Gilmore
requires_python<4.0,>=3.10
licenseMIT
keywords graph neo4j data model
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Neo4j Runway
Neo4j Runway is a Python library that simplifies the process of migrating your relational data into a graph. It provides tools that abstract communication with OpenAI to run discovery on your data and generate a data model, as well as tools to generate ingestion code and load your data into a Neo4j instance.

## Key Features

- **Data Discovery**: Harness OpenAI LLMs to provide valuable insights from your data
- **Graph Data Modeling**: Utilize OpenAI and the [Instructor](https://github.com/jxnl/instructor) Python library to create valid graph data models
- **Code Generation**: Generate ingestion code for your preferred method of loading data
- **Data Ingestion**: Load your data using Runway's built in implementation of [PyIngest](https://github.com/neo4j-field/pyingest) - Neo4j's popular ingestion tool

## Requirements
Runway uses graphviz to visualize data models. To enjoy this feature please download [graphviz](https://www.graphviz.org/download/).

You'll need a Neo4j instance to fully utilize Runway. Start up a free cloud hosted [Aura](https://console.neo4j.io) instance or download the [Neo4j Desktop app](https://neo4j.com/download/).

## Get Running in Minutes

```
pip install neo4j-runway
```

Now let's walk through a basic example.

Here we import the modules we'll be using.
```Python
import pandas as pd

from neo4j_runway import Discovery, GraphDataModeler, IngestionGenerator, LLM, PyIngest

```
### Discovery
Now we define a General Description of our data, provide brief descriptions of the columns of interest and load the data with Pandas.
```Python
USER_GENERATED_INPUT = {
    'General Description': 'This is data on different countries.',
    'id': 'unique id for a country.',
    'name': 'the country name.',
    'phone_code': 'country area code.',
    'capital': 'the capital of the country.',
    'currency_name': "name of the country's currency.",
    'region': 'primary region of the country.',
    'subregion': 'subregion location of the country.',
    'timezones': 'timezones contained within the country borders.',
    'latitude': 'the latitude coordinate of the country center.',
    'longitude': 'the longitude coordinate of the country center.'
}

data = pd.read_csv("data/csv/countries.csv")
```

We then initialize our llm. By default we use GPT-4 and define our OpenAI API key in an environment variable.
```Python
llm = LLM()
```

And we run discovery on our data.
```Python
disc = Discovery(llm=llm, user_input=USER_GENERATED_INPUT, data=data)
discovery = disc.run()
```

### Data Modeling
We can now pass our Discovery object to a GraphDataModeler to generate our initial data model. A Discovery object isn't required here, but it provides rich context to the LLM to achieve the best results.
```Python
gdm = GraphDataModeler(llm=llm, discovery=disc)
initial_model = gdm.create_initial_model()
```
If we have graphviz installed, we can take a look at our model.
```Python
gdm.current_model.visualize()
```
![countries-first-model.svg](./images/countries-first-model.svg)

Let's make some corrections to our model and view the results.
```Python
gdm.iterate_model(user_corrections="""
Make Region node have a HAS_SUBREGION relationship with Subregion node. 
Remove The relationship between Country and Region.
""")
gdm.current_model.visualize()
```
![countries-second-model.svg](./images/countries-second-model.svg)

### Code Generation
We can now use our data model to generate some ingestion code.

```Python
gen = IngestionGenerator(data_model=gdm.current_model, 
                         username="neo4j", password="password", 
                         uri="bolt://localhost:7687", database="neo4j", 
                         csv_dir="data/csv/", csv_name="countries.csv")

pyingest_yaml = gen.generate_pyingest_yaml_string()

```
### Ingestion
We will use the generated PyIngest yaml config to ingest our CSV into our Neo4j instance. 
```Python
PyIngest(yaml_string=pyingest_yaml, dataframe=data)
```
We can also save this as a .yaml file and use with the original [PyIngest](https://github.com/neo4j-field/pyingest).
```Python
gen.generate_pyingest_yaml_file(file_name="countries")
```
Here's a snapshot of our new graph!

![countries-graph.png](./images/countries-graph-white-background.png)

## Limitations
The current project is in beta and has the following limitations:
- Single CSV input only
- Nodes may only have a single label
- Only uniqueness constraints are supported
- Relationships may not have uniqueness constraints
- CSV columns that refer to the same node property are not supported
- Only OpenAI models may be used at this time
- The modified PyIngest function included with Runway only supports loading a local Pandas DataFrame



            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "neo4j-runway",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "graph, neo4j, data model",
    "author": "Alex Gilmore",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/de/17/7ef182320a8472fb4a035474bc1f3182dd392a4e98f923f37f54bb001a5c/neo4j_runway-0.1.0.tar.gz",
    "platform": null,
    "description": "# Neo4j Runway\nNeo4j Runway is a Python library that simplifies the process of migrating your relational data into a graph. It provides tools that abstract communication with OpenAI to run discovery on your data and generate a data model, as well as tools to generate ingestion code and load your data into a Neo4j instance.\n\n## Key Features\n\n- **Data Discovery**: Harness OpenAI LLMs to provide valuable insights from your data\n- **Graph Data Modeling**: Utilize OpenAI and the [Instructor](https://github.com/jxnl/instructor) Python library to create valid graph data models\n- **Code Generation**: Generate ingestion code for your preferred method of loading data\n- **Data Ingestion**: Load your data using Runway's built in implementation of [PyIngest](https://github.com/neo4j-field/pyingest) - Neo4j's popular ingestion tool\n\n## Requirements\nRunway uses graphviz to visualize data models. To enjoy this feature please download [graphviz](https://www.graphviz.org/download/).\n\nYou'll need a Neo4j instance to fully utilize Runway. Start up a free cloud hosted [Aura](https://console.neo4j.io) instance or download the [Neo4j Desktop app](https://neo4j.com/download/).\n\n## Get Running in Minutes\n\n```\npip install neo4j-runway\n```\n\nNow let's walk through a basic example.\n\nHere we import the modules we'll be using.\n```Python\nimport pandas as pd\n\nfrom neo4j_runway import Discovery, GraphDataModeler, IngestionGenerator, LLM, PyIngest\n\n```\n### Discovery\nNow we define a General Description of our data, provide brief descriptions of the columns of interest and load the data with Pandas.\n```Python\nUSER_GENERATED_INPUT = {\n    'General Description': 'This is data on different countries.',\n    'id': 'unique id for a country.',\n    'name': 'the country name.',\n    'phone_code': 'country area code.',\n    'capital': 'the capital of the country.',\n    'currency_name': \"name of the country's currency.\",\n    'region': 'primary region of the country.',\n    'subregion': 'subregion location of the country.',\n    'timezones': 'timezones contained within the country borders.',\n    'latitude': 'the latitude coordinate of the country center.',\n    'longitude': 'the longitude coordinate of the country center.'\n}\n\ndata = pd.read_csv(\"data/csv/countries.csv\")\n```\n\nWe then initialize our llm. By default we use GPT-4 and define our OpenAI API key in an environment variable.\n```Python\nllm = LLM()\n```\n\nAnd we run discovery on our data.\n```Python\ndisc = Discovery(llm=llm, user_input=USER_GENERATED_INPUT, data=data)\ndiscovery = disc.run()\n```\n\n### Data Modeling\nWe can now pass our Discovery object to a GraphDataModeler to generate our initial data model. A Discovery object isn't required here, but it provides rich context to the LLM to achieve the best results.\n```Python\ngdm = GraphDataModeler(llm=llm, discovery=disc)\ninitial_model = gdm.create_initial_model()\n```\nIf we have graphviz installed, we can take a look at our model.\n```Python\ngdm.current_model.visualize()\n```\n![countries-first-model.svg](./images/countries-first-model.svg)\n\nLet's make some corrections to our model and view the results.\n```Python\ngdm.iterate_model(user_corrections=\"\"\"\nMake Region node have a HAS_SUBREGION relationship with Subregion node. \nRemove The relationship between Country and Region.\n\"\"\")\ngdm.current_model.visualize()\n```\n![countries-second-model.svg](./images/countries-second-model.svg)\n\n### Code Generation\nWe can now use our data model to generate some ingestion code.\n\n```Python\ngen = IngestionGenerator(data_model=gdm.current_model, \n                         username=\"neo4j\", password=\"password\", \n                         uri=\"bolt://localhost:7687\", database=\"neo4j\", \n                         csv_dir=\"data/csv/\", csv_name=\"countries.csv\")\n\npyingest_yaml = gen.generate_pyingest_yaml_string()\n\n```\n### Ingestion\nWe will use the generated PyIngest yaml config to ingest our CSV into our Neo4j instance. \n```Python\nPyIngest(yaml_string=pyingest_yaml, dataframe=data)\n```\nWe can also save this as a .yaml file and use with the original [PyIngest](https://github.com/neo4j-field/pyingest).\n```Python\ngen.generate_pyingest_yaml_file(file_name=\"countries\")\n```\nHere's a snapshot of our new graph!\n\n![countries-graph.png](./images/countries-graph-white-background.png)\n\n## Limitations\nThe current project is in beta and has the following limitations:\n- Single CSV input only\n- Nodes may only have a single label\n- Only uniqueness constraints are supported\n- Relationships may not have uniqueness constraints\n- CSV columns that refer to the same node property are not supported\n- Only OpenAI models may be used at this time\n- The modified PyIngest function included with Runway only supports loading a local Pandas DataFrame\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python library that contains tools for data discovery, data model generation and ingestion for the Neo4j graph database.",
    "version": "0.1.0",
    "project_urls": null,
    "split_keywords": [
        "graph",
        " neo4j",
        " data model"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6a48168402de2a657424f6a1b372d072ae92f617ddaee1436a1a958a0aa7739d",
                "md5": "14139c14152cc6400b33993bca5a39b4",
                "sha256": "a09027bdbbef289e175f5221a70fc3dc4d55e093b98aa8844ac7e449104537bb"
            },
            "downloads": -1,
            "filename": "neo4j_runway-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "14139c14152cc6400b33993bca5a39b4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 28718,
            "upload_time": "2024-05-06T15:23:25",
            "upload_time_iso_8601": "2024-05-06T15:23:25.359737Z",
            "url": "https://files.pythonhosted.org/packages/6a/48/168402de2a657424f6a1b372d072ae92f617ddaee1436a1a958a0aa7739d/neo4j_runway-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "de177ef182320a8472fb4a035474bc1f3182dd392a4e98f923f37f54bb001a5c",
                "md5": "6bfcaef739271ac3eb8f26dd2807e2c6",
                "sha256": "a88a87c4fa2128c9d09a47c47ab7f9ea94388f1bacdcb66ccd8f2adf6b4fc455"
            },
            "downloads": -1,
            "filename": "neo4j_runway-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "6bfcaef739271ac3eb8f26dd2807e2c6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 23711,
            "upload_time": "2024-05-06T15:23:26",
            "upload_time_iso_8601": "2024-05-06T15:23:26.809070Z",
            "url": "https://files.pythonhosted.org/packages/de/17/7ef182320a8472fb4a035474bc1f3182dd392a4e98f923f37f54bb001a5c/neo4j_runway-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-06 15:23:26",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "neo4j-runway"
}
        
Elapsed time: 0.23831s