agefreighter


Nameagefreighter JSON
Version 0.8.1 PyPI version JSON
download
home_pageNone
SummaryAgeFreighter is a Python package that helps you to create a graph database using Azure Database for PostgreSQL.
upload_time2025-02-14 13:07:54
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9.21
licenseMIT License Copyright (c) 2024 Rio Fujita Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords
VCS
bugtrack_url
requirements asyncio azure-identity azure-mgmt-postgresqlflexibleservers azure-mgmt-storage azure-storage-blob fastavro gremlinpython neo4j nest-asyncio networkx pandas psycopg pyarrow
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # AGEFreighter

a Python package that helps you to create a graph database using Azure Database for PostgreSQL.

[Apache AGEā„¢](https://age.apache.org/) is a PostgreSQL Graph database compatible with PostgreSQL's distributed assets and leverages graph data structures to analyze and use relationships and patterns in data.

[Azure Database for PostgreSQL](https://azure.microsoft.com/en-us/services/postgresql/) is a managed database service that is based on the open-source Postgres database engine.

[Introducing support for Graph data in Azure Database for PostgreSQL (Preview)](https://techcommunity.microsoft.com/blog/adforpostgresql/introducing-support-for-graph-data-in-azure-database-for-postgresql-preview/4275628).

## Table of Contents

- [Features](#features)
- [Benchmark](#benchmark)
- [Prerequisites](#prerequisites)
- [Install](#install)
- [Which class to use](#which-class-to-use)
- [Usage of CSVFreighter](#usage-of-csvfreighter)
- [Usage of MultiCSVFreighter (1)](#usage-of-multicsvfreighter-1)
- [Usage of MultiCSVFreighter (2)](#usage-of-multicsvfreighter-2)
- [Usage of AvroFreighter](#usage-of-avrofreighter)
- [Usage of ParquetFreighter](#usage-of-parquetfreighter)
- [Usage of AzureStorageFreighter](#usage-of-azurestoragefreighter)
- [Usage of MultiAzureStorageFreighter](#usage-of-multiazurestoragefreighter)
- [Usage of NetworkxFreighter](#usage-of-networkxfreighter)
- [Usage of CosmosGremlinFreighter](#usage-of-cosmosgremlinfreighter)
- [Usage of CosmosNoSQLFreighter](#usage-of-cosmosnosqlfreighter)
- [Usage of Neo4jFreighter](#usage-of-neo4jfreighter)
- [Usage of PGFreighter](#usage-of-pgfreighter)
- [How to edit the CSV files to load them to the graph database with PGFreighter](#how-to-edit-the-csv-files-to-load-them-to-the-graph-database-with-pgfreighter)
- [Classes](#classes)
- [Method](#method)
- [Arguments](#arguments)
- [Release Notes](#release-notes)
- [License](#license)

## Features

- Asynchronous connection pool support for psycopg PostgreSQL driver
- 'direct_loading' option for loading data directly into the graph. If 'direct_loading' is True, the data is loaded into the graph using the 'INSERT' statement, not Cypher queries.
- 'COPY' protocol support for loading data into the graph. If 'use_copy' is True, the data is loaded into the graph using the 'COPY' protocol.
- AzureStorageFreighter and MultiAzureStorageFreighter classes to load vast amounts of graph data from Azure Storage. Typically, the number of rows in the CSV files exceeds from a million to a billion.

## Benchmark

The result with Azure Database for PostgreSQL, General Purpose, D16ds_v4, 16 vCores, 64 GiB RAM, 512 GiB storage (7,500 IOPS)
See, [tests/agefreightertester.py](https://github.com/rioriost/agefreighter/blob/main/tests/agefreightertester.py)

```bash
for d in `ls data`; do echo $d; wc -l data/$d/* | grep total; done
airroute
   23500 total
countries
   20200 total
payment_large
 96520015 total
payment_small
   96520 total
transaction
   43003 total
```

```bash
AgeFreighter version: 0.8.0
Summary of all tests are as followings:
Test for AzureStorageFreighter, chunk_size(96), direct_loading(False), use_copy(False): SUCCEEDED,  42.95 seconds
Test for MultiAzureStorageFreighter, chunk_size(96), direct_loading(False), use_copy(False): SUCCEEDED,  72.49 seconds
Test for AvroFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  0.94 seconds
Test for CosmosGremlinFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  5.11 seconds
Test for CSVFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  0.84 seconds
Test for MultiCSVFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  0.66 seconds
Test for MultiCSVFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  0.64 seconds
Test for Neo4jFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  3.45 seconds
Test for NetworkXFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  0.88 seconds
Test for ParquetFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  0.91 seconds
Test for PGFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  1.07 seconds
```

## Prerequisites

- over Python 3.9
- This module runs on [psycopg](https://www.psycopg.org/) and [psycopg_pool](https://www.psycopg.org/)
- Enable the Apache AGE extension in your Azure Database for PostgreSQL instance. Login Azure Portal, go to 'server parameters' blade, and check 'AGE" on within 'azure.extensions' and 'shared_preload_libraries' parameters. See, above blog post for more information.
- Load the AGE extension in your PostgreSQL database.

```sql
CREATE EXTENSION IF NOT EXISTS age CASCADE;
```

## Install

- with python venv

```bash
mkdir your_project
cd your_project
python3 -m venv .venv
source .venv/bin/activate
pip install agefreighter
```

- with uv

```bash
uv init your_project
cd your_project
uv venv
source .venv/bin/activate
uv add agefreighter
```

## Which class to use

![Decision Tree](https://github.com/rioriost/agefreighter/raw/main/images/Decision_tree.png)

## Usage of CSVFreighter

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import asyncio
import os
from agefreighter import Factory


async def main():
    instance = Factory.create_instance("CSVFreighter")

    await instance.connect(
        dsn=os.environ["PG_CONNECTION_STRING"],
        max_connections=64,
        min_connections=4,
    )

    await instance.load(
        graph_name="Transaction",
        start_v_label="Customer",
        start_id="CustomerID",
        start_props=["Name", "Address", "Email", "Phone"],
        edge_type="BOUGHT",
        edge_props=[],
        end_v_label="Product",
        end_id="ProductID",
        end_props=["Phrase", "SKU", "Price", "Color", "Size", "Weight"],
        csv_path="data/transaction/customer_product_bought.csv",
        use_copy=True,
        drop_graph=True,
        create_graph=True,
        progress=True,
    )


if __name__ == "__main__":
    import asyncio
    import sys

    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    asyncio.run(main())
```

### File Format for CSVFreighter

CSVFreighter class loads data from single CSV file. The CSV file should have the following format.

customer_product_bought.csv: The CSV file should have 'id', 'start_vertex_type', 'end_vertex_type', two id columns, 'CustomerID' and 'ProductID' in the following sample, to be used as start and end vertex IDs, and other columns as properties.

```csv
"id","CustomerID","start_vertex_type","Name","Address","Email","Phone","ProductID","end_vertex_type","Phrase","SKU","Price","Color","Size","Weight"
"1","1967","Customer","Jeffrey Joyce","26888 Brett Streets Apt. 325 South Meganberg, CA 80228","madison05@example.com","881-538-6881x35597","120","Product","Networked 3rdgeneration data-warehouse","7246676575258","834.33","DarkKhaki","S","586"
"2","8674","Customer","Craig Burton","280 Sellers Lock North Scott, AR 15307","andersonalexander@example.com","+1-677-235-8289","557","Product","Profit-focused attitude-oriented emulation","6102707440852","953.89","MediumSeaGreen","L","665"
```

See, [data/transaction/customer_product_bought.csv](https://github.com/rioriost/agefreighter/blob/main/data/transaction/customer_product_bought.csv).

## Usage of MultiCSVFreighter (1)

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import asyncio
import os
from agefreighter import Factory


async def main():
    instance = Factory.create_instance("MultiCSVFreighter")

    await instance.connect(
        dsn=os.environ["PG_CONNECTION_STRING"],
        max_connections=64,
        min_connections=4,
    )

    await instance.load(
        graph_name="Countries",
        vertex_csv_paths=[
            "data/countries/country.csv",
            "data/countries/city.csv",
        ],
        vertex_labels=["Country", "City"],
        edge_csv_paths=["data/countries/has_country_city.csv"],
        edge_types=["has"],
        use_copy=True,
        drop_graph=True,
        create_graph=True,
        progress=True,
    )


if __name__ == "__main__":
    import asyncio
    import sys

    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    asyncio.run(main())
```

### File Format for MultiCSVFreighter (1)

MultiCSVFreighter class loads data from multiple CSV files. The CSV files should have the following format.
MultiCSVFreighter class handles all the columns in CSV except 'id' column as properties and all the columns in CSV except 'start_id' / 'start_vertex_type' / 'end_id' / 'end_vertex_type' columns as properties for the edges.

country.csv: The node CSV file should have 'id' column and other columns as properties.

```csv
"id","Name","Capital","Population","ISO","TLD","FlagURL"
"1","El Salvador","Kristybury","355169921","TN","wxu","https://dummyimage.com/777x133"
"2","Lebanon","New William","413929227","UK","akj","https://picsum.photos/772/459"
```

city.csv: The node CSV file should have 'id' column and other columns as properties.

```csv
"id","Name","Latitude","Longitude"
"1","Michaelmouth","-56.4217435","-44.924586"
"2","Bryantton","-62.714695","-162.083092"
```

has_country_city.csv: The edge CSV file should have 'id', 'start_id', 'start_vertex_type', 'end_id', 'end_vertex_type', and other columns as properties.

```csv
"id","start_id","start_vertex_type","end_id","end_vertex_type","since"
"1","86","Country","3633","City","1975-12-07 04:45:00.790431"
"2","22","Country","6194","City","1984-06-05 13:23:51.858147"
```

See, [data/countries/](https://github.com/rioriost/agefreighter/blob/main/data/countries/).

## Usage of MultiCSVFreighter (2)

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import asyncio
import os
from agefreighter import Factory


async def main():
    instance = Factory.create_instance("MultiCSVFreighter")

    await instance.connect(
        dsn=os.environ["PG_CONNECTION_STRING"],
        max_connections=64,
        min_connections=4,
    )
    await instance.load(
        graph_name="AirRoute",
        vertex_csv_paths=[
            "data/airroute/airport.csv",
        ],
        vertex_labels=["AirPort"],
        edge_csv_paths=["data/airroute/airroute_airport_airport.csv"],
        edge_types=["ROUTE"],
        edge_props = ["distance"],
        use_copy=True,
        drop_graph=True,
        create_graph=True,
        progress=True,
    )


if __name__ == "__main__":
    import asyncio
    import sys

    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    asyncio.run(main())
```

### File Format for MultiCSVFreighter (2)

If the edge connects the same type of vertices, the CSV files should have the following format.

airport.csv: The node CSV file should have 'id' column and other columns as properties.

```csv
"id","Name","City","Country","IATA","ICAO","Latitude","Longitude","Altitude","Timezone","DST","Tz"
"1","East Annatown Airport","East Annatown","Eritrea","SHZ","XTIK","-2.783983","-100.199060","823","Africa/Luanda","E","Europe/Skopje"
"2","Port Laura Airport","Port Laura","Montenegro","TQY","WDLC","4.331082","-72.411319","121","Asia/Dhaka","Z","Africa/Kigali"
```

airroute_airport_airport.csv: The edge CSV file should have 'id', 'start_id', 'start_vertex_type', 'end_id', 'end_vertex_type', and other columns as properties.

```csv
"id","start_id","start_vertex_type","end_id","end_vertex_type","distance"
"1","1388","AirPort","794","AirPort","1373"
"2","2998","AirPort","823","AirPort","11833"
```

See, [data/airroute/](https://github.com/rioriost/agefreighter/blob/main/data/airroute/).

## Usage of AvroFreighter

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import asyncio
import os
from agefreighter import Factory


async def main():
    instance = Factory.create_instance("AvroFreighter")

    await instance.connect(
        dsn=os.environ["PG_CONNECTION_STRING"],
        max_connections=64,
        min_connections=4,
    )

    await instance.load(
        graph_name="Transaction",
        start_v_label="Customer",
        start_id="CustomerID",
        start_props=["Name", "Address", "Email", "Phone"],
        edge_type="BOUGHT",
        edge_props=[],
        end_v_label="Product",
        end_id="ProductID",
        end_props=["Phrase", "SKU", "Price", "Color", "Size", "Weight"],
        avro_path="data/transaction/customer_product_bought.avro",
        use_copy=True,
        drop_graph=True,
        create_graph=True,
        progress=True,
    )


if __name__ == "__main__":
    import asyncio
    import sys

    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    asyncio.run(main())
```

### File Format for AvroFreighter

AvroFreighter class loads data from Avro file. The Avro file should have the following format.

```json
{
    "type": "record",
    "name": "customer_product_bought",
    "fields": [
        {
            "name": "id",
            "type": "int"
        },
        {
            "name": "CustomerID",
            "type": "int"
        },
        {
            "name": "start_vertex_type",
            "type": "string"
        },
        {
            "name": "Name",
            "type": "string"
        },
        {
            "name": "Address",
            "type": "string"
        },
        {
            "name": "Email",
            "type": "string"
        },
        {
            "name": "Phone",
            "type": "string"
        },
        {
            "name": "ProductID",
            "type": "int"
        },
        {
            "name": "end_vertex_type",
            "type": "string"
        },
        {
            "name": "Phrase",
            "type": "string"
        },
        {
            "name": "SKU",
            "type": "string"
        },
        {
            "name": "Price",
            "type": "float"
        },
        {
            "name": "Color",
            "type": "string"
        },
        {
            "name": "Size",
            "type": "string"
        },
        {
            "name": "Weight",
            "type": "int"
        }
    ]
}
{
    "id": 1,
    "CustomerID": 1967,
    "start_vertex_type": "Customer",
    "Name": "Jeffrey Joyce",
    "Address": "26888 Brett Streets Apt. 325 South Meganberg, CA 80228",
    "Email": "madison05@example.com",
    "Phone": "881-538-6881x35597",
    "ProductID": 120,
    "end_vertex_type": "Product",
    "Phrase": "Networked 3rdgeneration data-warehouse",
    "SKU": "7246676575258",
    "Price": 834.3300170898438,
    "Color": "DarkKhaki",
    "Size": "S",
    "Weight": 586
}
{
    "id": 2,
    "CustomerID": 8674,
    "start_vertex_type": "Customer",
    "Name": "Craig Burton",
    "Address": "280 Sellers Lock North Scott, AR 15307",
    "Email": "andersonalexander@example.com",
    "Phone": "+1-677-235-8289",
    "ProductID": 557,
    "end_vertex_type": "Product",
    "Phrase": "Profit-focused attitude-oriented emulation",
    "SKU": "6102707440852",
    "Price": 953.8900146484375,
    "Color": "MediumSeaGreen",
    "Size": "L",
    "Weight": 665
}
```

See, [data/transaction/customer_product_bought.avro](https://github.com/rioriost/agefreighter/blob/main/data/transaction/customer_product_bought.avro).

## Usage of ParquetFreighter

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import asyncio
import os
from agefreighter import Factory


async def main():
    instance = Factory.create_instance("ParquetFreighter")

    await instance.connect(
        dsn=os.environ["PG_CONNECTION_STRING"],
        max_connections=64,
        min_connections=4,
    )

    await instance.load(
        graph_name="Transaction",
        start_v_label="Customer",
        start_id="CustomerID",
        start_props=["Name", "Address", "Email", "Phone"],
        edge_type="BOUGHT",
        edge_props=[],
        end_v_label="Product",
        end_id="ProductID",
        end_props=["Phrase", "SKU", "Price", "Color", "Size", "Weight"],
        parquet_path="data/transaction/customer_product_bought.parquet",
        use_copy=True,
        drop_graph=True,
        create_graph=True,
        progress=True,
    )


if __name__ == "__main__":
    import asyncio
    import sys

    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    asyncio.run(main())
```

### File Format for ParquetFreighter

ParquetFreighter class loads data from Parquet file. The Parquet file should have the following format.

```
required group field_id=-1 schema {
  optional int64 field_id=-1 id;
  optional int64 field_id=-1 CustomerID;
  optional binary field_id=-1 start_vertex_type (String);
  optional binary field_id=-1 Name (String);
  optional binary field_id=-1 Address (String);
  optional binary field_id=-1 Email (String);
  optional binary field_id=-1 Phone (String);
  optional int64 field_id=-1 ProductID;
  optional binary field_id=-1 end_vertex_type (String);
  optional binary field_id=-1 Phrase (String);
  optional int64 field_id=-1 SKU;
  optional double field_id=-1 Price;
  optional binary field_id=-1 Color (String);
  optional binary field_id=-1 Size (String);
  optional int64 field_id=-1 Weight;
}

   id  CustomerID start_vertex_type           Name                                            Address  ...            SKU   Price           Color Size Weight
0   1        1967          Customer  Jeffrey Joyce  26888 Brett Streets Apt. 325 South Meganberg, ...  ...  7246676575258  834.33       DarkKhaki    S    586
1   2        8674          Customer   Craig Burton             280 Sellers Lock North Scott, AR 15307  ...  6102707440852  953.89  MediumSeaGreen    L    665
```

See, [data/transaction/customer_product_bought.parquet](https://github.com/rioriost/agefreighter/blob/main/data/transaction/customer_product_bought.parquet).

## Usage of AzureStorageFreighter

### Prerequisites

Install the Azure CLI and login with your Azure account.

macOS

```bash
brew update && brew install azure-cli
```

Windows

```shell
winget install -e --id Microsoft.AzureCLI
```

Linux (RHEL)

```bash
sudo rpm --import https://packages.microsoft.com/keys/microsoft.asc

# for RHEL 9
sudo dnf install -y https://packages.microsoft.com/config/rhel/9.0/packages-microsoft-prod.rpm
# for RHEL 8
sudo dnf install -y https://packages.microsoft.com/config/rhel/8/packages-microsoft-prod.rpm

sudo dnf install azure-cli
```

Linux (Ubuntu)

```bash
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
```

Afrer installing the Azure CLI, login with your Azure account.

```bash
az login
```

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import asyncio
import os
from agefreighter import Factory


async def main():
    instance = Factory.create_instance("AzureStorageFreighter")

    await instance.connect(
        dsn=os.environ["PG_CONNECTION_STRING"],
        max_connections=64,
        min_connections=4,
    )

    await instance.load(
        graph_name="Transaction",
        start_v_label="Customer",
        start_id="CustomerID",
        start_props=["Name", "Address", "Email", "Phone"],
        edge_type="BOUGHT",
        edge_props=[],
        end_v_label="Product",
        end_id="ProductID",
        end_props=["Phrase", "SKU", "Price", "Color", "Size", "Weight"],
        csv_path="data/transaction/customer_product_bought.csv",
        drop_graph=True,
        create_graph=True,
    )


if __name__ == "__main__":
    import asyncio
    import sys

    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    asyncio.run(main())
```

### File Format for AzureStorageFreighter

AzureStorageFreighter class loads data from Azure Storage and expects the exactly same format as CSVFreighter.

## Usage of MultiAzureStorageFreighter

Banchmark: loading 965 million rows of data from Azure Storage to Azure Database for PostgreSQL with MultiAzureStorageFreighter class.

```bash
% wc -l data/payment_large/*
  900001 data/payment_large/bitcoinaddress.csv
 2700001 data/payment_large/cookie.csv
 1200001 data/payment_large/creditcard.csv
 1600001 data/payment_large/cryptoaddress.csv
  960001 data/payment_large/email.csv
 2200001 data/payment_large/ip.csv
 4000001 data/payment_large/partnerenduser.csv
 7000001 data/payment_large/payment.csv
 1000000 data/payment_large/performedby_cookie_payment.csv
 1000000 data/payment_large/performedby_creditcard_payment.csv
 1000000 data/payment_large/performedby_cryptoaddress_payment.csv
 1000000 data/payment_large/performedby_email_payment.csv
 1000000 data/payment_large/performedby_phone_payment.csv
  960001 data/payment_large/phone.csv
 8000001 data/payment_large/usedby_cookie_payment.csv
 8000001 data/payment_large/usedby_creditcard_payment.csv
 8000001 data/payment_large/usedby_cryptoaddress_payment.csv
 8000001 data/payment_large/usedby_email_payment.csv
 8000001 data/payment_large/usedby_phone_payment.csv
 6000000 data/payment_large/usedin_cookie_payment.csv
 6000001 data/payment_large/usedin_creditcard_payment.csv
 6000000 data/payment_large/usedin_cryptoaddress_payment.csv
 6000000 data/payment_large/usedin_email_payment.csv
 6000000 data/payment_large/usedin_phone_payment.csv
 96520015 total
```

The result with Azure Database for PostgreSQL, General Purpose, D16ds_v4, 16 vCores, 64 GiB RAM, 512 GiB storage (7,500 IOPS)

```bash
Finding Subscription ID...
Enabling extension...
Creating storage account...
Uploading files...
Creating temporary tables...
Loading files to temporary tables...
Creating a graph...
Creating a graph: Done!
AgeFreighter version: 0.8.0
Summary of all tests are as followings:
Test for MultiAzureStorageFreighter, chunk_size(96), direct_loading(False), use_copy(False): SUCCEEDED,  2179.69 seconds
```

### Prerequisites

Install the Azure CLI and login with your Azure account.

macOS

```bash
brew update && brew install azure-cli
```

Windows

```shell
winget install -e --id Microsoft.AzureCLI
```

Linux (RHEL)

```bash
sudo rpm --import https://packages.microsoft.com/keys/microsoft.asc

# for RHEL 9
sudo dnf install -y https://packages.microsoft.com/config/rhel/9.0/packages-microsoft-prod.rpm
# for RHEL 8
sudo dnf install -y https://packages.microsoft.com/config/rhel/8/packages-microsoft-prod.rpm

sudo dnf install azure-cli
```

Linux (Ubuntu)

```bash
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
```

Afrer installing the Azure CLI, login with your Azure account.

```bash
az login
```

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import asyncio
import os
from agefreighter import Factory


async def main():
    instance = Factory.create_instance("MultiAzureStorageFreighter")

    await instance.connect(
        dsn=os.environ["PG_CONNECTION_STRING"],
        max_connections=64,
        min_connections=4,
    )

    data_dir = "data/payment_small/"

    await instance.load(
        graph_name="AgeTester",
        vertex_args=[
            {
                "csv_path": f"{data_dir}bitcoinaddress.csv",
                "label": "BitcoinAddress",
                "id": "id",
                "props": [
                    "available_since",
                    "inserted_at",
                    "address",
                    "schema_version",
                ],
            },
            {
                "csv_path": f"{data_dir}cookie.csv",
                "label": "Cookie",
                "id": "id",
                "props": [
                    "available_since",
                    "inserted_at",
                    "uaid",
                    "schema_version",
                ],
            },
            {
                "csv_path": f"{data_dir}ip.csv",
                "label": "IP",
                "id": "id",
                "props": [
                    "available_since",
                    "inserted_at",
                    "address",
                    "schema_version",
                ],
            },
            {
                "csv_path": f"{data_dir}phone.csv",
                "label": "Phone",
                "id": "id",
                "props": [
                    "available_since",
                    "inserted_at",
                    "address",
                    "schema_version",
                ],
            },
            {
                "csv_path": f"{data_dir}email.csv",
                "label": "Email",
                "id": "id",
                "props": [
                    "available_since",
                    "inserted_at",
                    "email",
                    "domain",
                    "handle",
                    "schema_version",
                ],
            },
            {
                "csv_path": f"{data_dir}payment.csv",
                "label": "Payment",
                "id": "id",
                "props": [
                    "available_since",
                    "inserted_at",
                    "payment_id",
                    "schema_version",
                ],
            },
            {
                "csv_path": f"{data_dir}creditcard.csv",
                "label": "CreditCard",
                "id": "id",
                "props": [
                    "available_since",
                    "inserted_at",
                    "expiry_month",
                    "expiry_year",
                    "masked_number",
                    "creditcard_identifier",
                    "schema_version",
                ],
            },
            {
                "csv_path": f"{data_dir}partnerenduser.csv",
                "label": "PartnerEndUser",
                "id": "id",
                "props": [
                    "available_since",
                    "inserted_at",
                    "partner_end_user_id",
                    "schema_version",
                ],
            },
            {
                "csv_path": f"{data_dir}cryptoaddress.csv",
                "label": "CryptoAddress",
                "id": "id",
                "props": [
                    "available_since",
                    "inserted_at",
                    "address",
                    "currency",
                    "full_address",
                    "schema_version",
                    "tag",
                ],
            },
        ],
        edge_args=[
            {
                "csv_paths": [
                    f"{data_dir}usedin_cookie_payment.csv",
                    f"{data_dir}usedin_creditcard_payment.csv",
                    f"{data_dir}usedin_cryptoaddress_payment.csv",
                    f"{data_dir}usedin_email_payment.csv",
                    f"{data_dir}usedin_phone_payment.csv",
                ],
                "type": "UsedIn",
            },
            {
                "csv_paths": [
                    f"{data_dir}usedby_cookie_payment.csv",
                    f"{data_dir}usedby_creditcard_payment.csv",
                    f"{data_dir}usedby_cryptoaddress_payment.csv",
                    f"{data_dir}usedby_email_payment.csv",
                    f"{data_dir}usedby_phone_payment.csv",
                ],
                "type": "UsedBy",
            },
            {
                "csv_paths": [
                    f"{data_dir}performedby_cookie_payment.csv",
                    f"{data_dir}performedby_creditcard_payment.csv",
                    f"{data_dir}performedby_cryptoaddress_payment.csv",
                    f"{data_dir}performedby_email_payment.csv",
                    f"{data_dir}performedby_phone_payment.csv",
                ],
                "type": "PerformedBy",
            },
        ],
        drop_graph=True,
        create_graph=True,
    )


if __name__ == "__main__":
    import asyncio
    import sys

    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    asyncio.run(main())
```

### File Format for MultiAzureStorageFreighter

MultiAzureStorageFreighter class loads data from Azure Storage and expects the exactly same format as MultiCSVFreighter.

See, [data/payment_small/](https://github.com/rioriost/agefreighter/blob/main/data/payment_small/).

### What AzureStorageFreighter / MultiAzureStorageFreighter do

1. Find the Subscription ID of your Azure account.
2. Enable Azure Storage Extension in your Azure Database for PostgreSQL instance.
3. Create a Storage Account in the resource group where your Azure Database for PostgreSQL instance is located.
4. Create a container in the Storage Account.
5. Upload CSV files to the container.
6. Create temporary tables where the CSV files are loaded into.
7. Load the data from the temporary tables to the graph.

## Usage of NetworkxFreighter

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import asyncio
import os
from agefreighter import Factory

import networkx as nx


async def main():
    instance = Factory.create_instance("NetworkXFreighter")

    networkx_graph = nx.DiGraph()
    networkx_graph.add_node(
        1,
        name="Jeffrey Joyce",
        address="26888 Brett Streets Apt. 325 South Meganberg, CA 80228",
        email="madison05@example.com",
        phone="881-538-6881x35597",
        customerid=1967,
        label="Customer",
    )
    networkx_graph.add_node(
        2,
        name="Craig Burton",
        address="280 Sellers Lock North Scott, AR 15307",
        email="andersonalexander@example.com",
        phone="+1-677-235-8289",
        customerid=8674,
        label="Customer",
    )
    networkx_graph.add_node(
        3,
        phrase="Networked 3rdgeneration data-warehouse",
        sku="7246676575258",
        price=834.33,
        color="DarkKhaki",
        size="S",
        weight=586,
        productid=120,
        label="Product",
    )
    networkx_graph.add_node(
        4,
        phrase="Profit-focused attitude-oriented emulation",
        sku="6102707440852",
        price=953.89,
        color="MediumSeaGreen",
        size="L",
        weight=665,
        productid=557,
        label="Product",
    )

    networkx_graph.add_edge(1, 3, since="1975-12-07 04:45:00.790431", label="BOUGHT")
    networkx_graph.add_edge(2, 4, since="1984-06-05 13:23:51.858147", label="BOUGHT")

    await instance.connect(
        dsn=os.environ["PG_CONNECTION_STRING"],
        max_connections=64,
        min_connections=4,
    )
    await instance.load(
        graph_name="Transaction",
        networkx_graph=networkx_graph,
        id_map={
            "Customer": "CustomerID",
            "Product": "ProductID",
        },
        drop_graph=True,
        create_graph=True,
        progress=True,
    )


if __name__ == "__main__":
    import asyncio
    import sys

    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    asyncio.run(main())
```

You can also load a networkx graph from a pickle file.

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import asyncio
import os
from agefreighter import Factory

import pickle


async def main():
    instance = Factory.create_instance("NetworkXFreighter")

    await instance.connect(
        dsn=os.environ["PG_CONNECTION_STRING"],
        max_connections=64,
        min_connections=4,
    )
    await instance.load(
        graph_name="Transaction",
        networkx_graph=pickle.load(
            open("data/transaction/customer_product_bought.pickle", "rb")
        ),
        id_map={
            "Customer": "CustomerID",
            "Product": "ProductID",
        },
        drop_graph=True,
        create_graph=True,
        progress=True,
    )


if __name__ == "__main__":
    import asyncio
    import sys

    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    asyncio.run(main())
```

See, [data/transaction/customer_product_bought.pickle](https://github.com/rioriost/agefreighter/blob/main/data/transaction/customer_product_bought.pickle).

## Usage of CosmosNoSQLFreighter

CosmosNoSQLFreighter uses NoSQL API to get data loaded via the Gremlin API for better performance.

```bash
AgeFreighter version: 0.8.1
Summary of all tests are as followings:
Test for CosmosGremlinFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  4.53 seconds
Test for CosmosNoSQLFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  2.25 seconds
```

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import asyncio
import os
from agefreighter import Factory


async def main():
    instance = Factory.create_instance("CosmosNoSQLFreighter")

    await instance.connect(
        dsn=os.environ["PG_CONNECTION_STRING"],
        max_connections=64,
        min_connections=4,
    )

    await instance.load(
        graph_name="Transaction",
        cosmos_endpoint=os.environ["COSMOS_ENDPOINT"],
        cosmos_key=os.environ["COSMOS_KEY"],
        cosmos_database="db1",
        cosmos_container="transaction",
        id_map={
            "Customer": "CustomerID",
            "Product": "ProductID",
        },
        drop_graph=True,
        create_graph=True,
        progress=True,
    )


if __name__ == "__main__":
    import asyncio
    import sys

    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    asyncio.run(main())
```

The above code suppose that you have a Cosmos DB account with a Gremlin API database named 'db1' with a container named 'transaction' in the Azure Portal.
You can find the '.NET SDK URI' starting with 'https://', not 'wss://' for 'cosmos_endpoint' env variable and 'PRIMARY KEY' / 'SECONDARY KEY' for 'cosmos_key' env variable in the 'Keys' blade of the Cosmos DB account.

### Document Format for CosmosNoSQLFreighter

CosmosNoSQLFreighter class loads data from Cosmos DB. The Cosmos DB should have the following format.

```json
{
  "label": "Customer",
  "id": "272e8291-d1a4-4238-ad27-92ec2101425d",
  "Name": [
    {
      "id": "ee24bc0d-1821-4854-bd00-aafba2419a1f",
      "_value": "Alicia Herrera"
    }
  ],
  "CustomerID": [
    {
      "id": "f1aafc14-1ecd-4437-b0b4-ce9926ec0924",
      "_value": "1828"
    }
  ],
  "Address": [
    {
      "id": "c611e8b5-a33a-4669-8fbb-36c0737d7ab8",
      "_value": "906 Shannon Views Apt. 370 Ryanbury, CA 73165"
    }
  ],
  "Email": [
    {
      "id": "1f951bf3-47bf-4af8-951c-7915ba810500",
      "_value": "jillian49@example.com"
    }
  ],
  "Phone": [
    {
      "id": "75ca0b65-b5a5-4fdd-8fa5-35db476fede6",
      "_value": "+1-351-871-4405x226"
    }
  ],
  "pk": "1",
  "_rid": "tfxpAMsq0qXfkwEAAAAADA==",
  "_self": "dbs/tfxpAA==/colls/tfxpAMsq0qU=/docs/tfxpAMsq0qXfkwEAAAAADA==/",
  "_etag": "\"2f0055e1-0000-2300-0000-67909dd00000\"",
  "_attachments": "attachments/",
  "_ts": 1737530832
}

{
  "label": "Product",
  "id": "b342f7ac-0170-4d10-973c-e01910e79320",
  "Phrase": [
    {
      "id": "3da120da-c934-4c31-95b2-973e5a5c88ee",
      "_value": "Reverse-engineered asymmetric leverage"
    }
  ],
  "ProductID": [
    {
      "id": "dab94ea0-dfe4-4e30-b660-cc2dd9cfd9b6",
      "_value": "113"
    }
  ],
  "SKU": [
    {
      "id": "2df5bc68-2764-4d5e-a531-eccbc9049f98",
      "_value": "280217698898"
    }
  ],
  "Price": [
    {
      "id": "4c10a970-8b00-4f6c-9ae0-244946dffcb7",
      "_value": "559.27"
    }
  ],
  "Color": [
    {
      "id": "69961024-4807-4ac9-a126-2e71f5f885b7",
      "_value": "White"
    }
  ],
  "Size": [
    {
      "id": "39fad2fd-f879-49f4-9712-473be9d578f5",
      "_value": "L"
    }
  ],
  "Weight": [
    {
      "id": "41e10b86-32c9-4599-b985-6b97ae0719be",
      "_value": "633"
    }
  ],
  "pk": "1",
  "_rid": "tfxpAMsq0qV0uAEAAAAADA==",
  "_self": "dbs/tfxpAA==/colls/tfxpAMsq0qU=/docs/tfxpAMsq0qV0uAEAAAAADA==/",
  "_etag": "\"30009b06-0000-2300-0000-67909ddd0000\"",
  "_attachments": "attachments/",
  "_ts": 1737530845
}

{
  "label": "BOUGHT",
  "id": "674441a4-457a-4d59-a9a8-5c6e2d57ea16",
  "_sink": "510a9b04-d351-4279-b42a-0ea51f7f1a8c",
  "_sinkLabel": "Product",
  "_sinkPartition": "1",
  "_vertexId": "3549c350-59ec-46c7-8e49-589bc0cc6da6",
  "_vertexLabel": "Customer",
  "_isEdge": true,
  "pk": "1",
  "_rid": "tfxpAMsq0qX3ygEAAAAADA==",
  "_self": "dbs/tfxpAA==/colls/tfxpAMsq0qU=/docs/tfxpAMsq0qX3ygEAAAAADA==/",
  "_etag": "\"3000ab19-0000-2300-0000-67909de90000\"",
  "_attachments": "attachments/",
  "_ts": 1737530857
}
```

## Usage of CosmosGremlinFreighter

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import asyncio
import os
from agefreighter import Factory


async def main():
    instance = Factory.create_instance("CosmosGremlinFreighter")

    await instance.connect(
        dsn=os.environ["PG_CONNECTION_STRING"],
        max_connections=64,
        min_connections=4,
    )

    await instance.load(
        graph_name="Transaction",
        cosmos_gremlin_endpoint=os.environ["COSMOS_GREMLIN_ENDPOINT"],
        cosmos_gremlin_key=os.environ["COSMOS_GREMLIN_KEY"],
        cosmos_username="/dbs/db1/colls/transaction",
        id_map={
            "Customer": "CustomerID",
            "Product": "ProductID",
        },
        drop_graph=True,
        create_graph=True,
        progress=True,
    )


if __name__ == "__main__":
    import asyncio
    import sys

    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    asyncio.run(main())
```

The above code suppose that you have a Cosmos DB account with a Gremlin API database named 'db1' with a container named 'transaction' in the Azure Portal.
You can find the 'GREMLIN URI' for 'cosmos_gremlin_endpoint' env variable and 'PRIMARY KEY' / 'SECONDARY KEY' for 'cosmos_gremlin_key' env variable in the 'Keys' blade of the Cosmos DB account.

### Document Format for CosmosGremlinFreighter

CosmosGremlinFreighter class loads data from Cosmos DB. The Cosmos DB should have the following format.

node: g.V().limit(1)

```json
[
  {
    "id": "272e8291-d1a4-4238-ad27-92ec2101425d",
    "label": "Customer",
    "type": "vertex",
    "properties": {
      "Name": [
        {
          "id": "ee24bc0d-1821-4854-bd00-aafba2419a1f",
          "value": "Alicia Herrera"
        }
      ],
      "CustomerID": [
        {
          "id": "f1aafc14-1ecd-4437-b0b4-ce9926ec0924",
          "value": "1828"
        }
      ],
      "Address": [
        {
          "id": "c611e8b5-a33a-4669-8fbb-36c0737d7ab8",
          "value": "906 Shannon Views Apt. 370 Ryanbury, CA 73165"
        }
      ],
      "Email": [
        {
          "id": "1f951bf3-47bf-4af8-951c-7915ba810500",
          "value": "jillian49@example.com"
        }
      ],
      "Phone": [
        {
          "id": "75ca0b65-b5a5-4fdd-8fa5-35db476fede6",
          "value": "+1-351-871-4405x226"
        }
      ],
      "pk": [
        {
          "id": "272e8291-d1a4-4238-ad27-92ec2101425d|pk",
          "value": "1"
        }
      ]
    }
  }
]
```

edge: g.E().limit(1)

```json
[
  {
    "id": "efc34e39-d674-40df-9c6a-f8a24c8b8d77",
    "label": "BOUGHT",
    "type": "edge",
    "inVLabel": "Product",
    "outVLabel": "Customer",
    "inV": "1430cbf2-7d25-4c38-8ff7-f2b215bfdcad",
    "outV": "390565dc-67d7-4179-a241-8cd2f8df82b2"
  }
]
```

## Usage of Neo4jFreighter

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import asyncio
import os
from agefreighter import Factory


async def main():
    instance = Factory.create_instance("Neo4jFreighter")

    await instance.connect(
        dsn=os.environ["PG_CONNECTION_STRING"],
        max_connections=64,
        min_connections=4,
    )

    await instance.load(
        graph_name="Transaction",
        neo4j_uri=os.environ["NEO4J_URI"],
        neo4j_user=os.environ["NEO4J_USER"],
        neo4j_password=os.environ["NEO4J_PASSWORD"],
        id_map={
            "Customer": "CustomerID",
            "Product": "ProductID",
        },
        drop_graph=True,
        create_graph=True,
        progress=True,
    )


if __name__ == "__main__":
    import asyncio
    import sys

    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    asyncio.run(main())
```

The above code suppose that you have a Neo4j or compatible graph DB.

### Data Format for Neo4jFreighter

node: MATCH (n) RETURN n LIMIT 1

```json
{
  "identity": 0,
  "labels": ["Customer"],
  "properties": {
    "Email": "madison05@example.com",
    "Address": "26888 Brett Streets Apt. 325 South Meganberg, CA 80228",
    "Customer": "Customer",
    "Phone": "881-538-6881x35597",
    "CustomerID": 1967,
    "Name": "Jeffrey Joyce"
  },
  "elementId": "4:148f2025-c6d2-4e47-8661-b2f1a28f24aa:0"
}
```

edge: MATCH ()-[r]->() RETURN r LIMIT 1

```json
{
  "identity": 20000,
  "start": 0,
  "end": 17272,
  "type": "BOUGHT",
  "properties": {
    "from": 1967,
    "to": 120
  },
  "elementId": "5:148f2025-c6d2-4e47-8661-b2f1a28f24aa:20000",
  "startNodeElementId": "4:148f2025-c6d2-4e47-8661-b2f1a28f24aa:0",
  "endNodeElementId": "4:148f2025-c6d2-4e47-8661-b2f1a28f24aa:17272"
}
```

## Usage of PGFreighter

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import asyncio
import os
from agefreighter import Factory


async def main():
    instance = Factory.create_instance("PGFreighter")

    await instance.connect(
        dsn=os.environ["PG_CONNECTION_STRING"],
        max_connections=64,
        min_connections=4,
    )

    await instance.load(
        graph_name="Transaction",
        source_pg_con_string=os.environ["SRC_PG_CONNECTION_STRING"],
        source_tables={
            "start": "Customer",
            "end": "Product",
            "edges": "BOUGHT",
        },
        id_map={
            "Customer": "CustomerID",
            "Product": "ProductID",
        },
        drop_graph=True,
        create_graph=True,
        progress=True,
    )


if __name__ == "__main__":
    import asyncio
    import sys

    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    asyncio.run(main())
```

### Table schemas for PGFreighter

Customer table schema

```sql
postgres=# \d+ "Customer"
                                                                  Table "public.Customer"
     Column     |  Type   | Collation | Nullable |                      Default                       | Storage  | Compression | Stats target | Description
----------------+---------+-----------+----------+----------------------------------------------------+----------+-------------+--------------+-------------
 CustomerSerial | integer |           | not null | nextval('"Customer_CustomerSerial_seq"'::regclass) | plain    |             |              |
 CustomerID     | text    |           |          |                                                    | extended |             |              |
 Name           | text    |           |          |                                                    | extended |             |              |
 Address        | text    |           |          |                                                    | extended |             |              |
 Email          | text    |           |          |                                                    | extended |             |              |
 Phone          | text    |           |          |                                                    | extended |             |              |
Indexes:
    "Customer_CustomerID_idx" btree ("CustomerID")
Access method: heap
```

Product table schema

```sql
postgres=# \d+ "Product"
                                                                 Table "public.Product"
    Column     |  Type   | Collation | Nullable |                     Default                      | Storage  | Compression | Stats target | Description
---------------+---------+-----------+----------+--------------------------------------------------+----------+-------------+--------------+-------------
 ProductSerial | integer |           | not null | nextval('"Product_ProductSerial_seq"'::regclass) | plain    |             |              |
 ProductID     | text    |           |          |                                                  | extended |             |              |
 Phrase        | text    |           |          |                                                  | extended |             |              |
 SKU           | text    |           |          |                                                  | extended |             |              |
 Price         | real    |           |          |                                                  | plain    |             |              |
 Color         | text    |           |          |                                                  | extended |             |              |
 Size          | text    |           |          |                                                  | extended |             |              |
 Weight        | integer |           |          |                                                  | plain    |             |              |
Indexes:
    "Product_ProductID_idx" btree ("ProductID")
Access method: heap
```

BOUGHT table schema

```sql
postgres=# \d+ "BOUGHT"
                                                                Table "public.BOUGHT"
    Column    |  Type   | Collation | Nullable |                    Default                     | Storage  | Compression | Stats target | Description
--------------+---------+-----------+----------+------------------------------------------------+----------+-------------+--------------+-------------
 BoughtSerial | integer |           | not null | nextval('"BOUGHT_BoughtSerial_seq"'::regclass) | plain    |             |              |
 CustomerID   | text    |           |          |                                                | extended |             |              |
 ProductID    | text    |           |          |                                                | extended |             |              |
Indexes:
    "BOUGHT_CustomerID_idx" btree ("CustomerID")
    "BOUGHT_ProductID_idx" btree ("ProductID")
Access method: heap
```

## How to edit the CSV files to load them to the graph database with PGFreighter

Example: [krlawrence graph](https://github.com/krlawrence/graph/tree/master/sample-data)

1. Download air-routes-latest-edges.csv and air-routes-latest-nodes.csv

2. Edit air-routes-latest-nodes.csv

- original

```csv
~id,~label,type:string,code:string,icao:string,desc:string,region:string,runways:int,longest:int,elev:int,country:string,city:string,lat:double,lon:double,author:string,date:string
0,version,version,0.89,,Air Routes Data - Version: 0.89 Generated: 2022-08-29 14:10:18 UTC; Graph created by Kelvin R. Lawrence; Please let me know of any errors you find in the graph or routes that should be added.,,,,,,,,,Kelvin R. Lawrence,2022-08-29 14:10:18 UTC
```

- edited

```csv
id,label,type,code,icao,desc,region,runways,longest,elev,country,city,lat,lon,author,date
1,airport,airport,ATL,KATL,Hartsfield - Jackson Atlanta International Airport,US-GA,5,12390,1026,US,Atlanta,33.6366996765137,-84.4281005859375,,
```

- remove the second line
- edit the first line (CSV Header)

3. Edit air-routes-latest-edges.csv

- original

```csv
~id,~from,~to,~label,dist:int
3749,1,3,route,809
```

- edited

```csv
id,start_id,end_id,label,dist,start_vertex_type,end_vertex_type
3749,1,3,route,809,airport,airport
```

- edit the first line (CSV Header)
- add start_vertex_type and end_vertex_type columns to each lines

4. Install agefreighter

- with python venv

```bash
mkdir your_project
cd your_project
python3 -m venv .venv
source .venv/bin/activate
pip install agefreighter
```

- with uv

```bash
uv init your_project
cd your_project
uv venv
source .venv/bin/activate
uv add agefreighter
```

5. Make a Python script as below and locate the script in the same directory with the CSV files

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
from agefreighter import Factory


async def main():
    loader = Factory.create_instance("MultiCSVFreighter")
    await loader.connect(
        dsn=os.environ["PG_CONNECTION_STRING"],
        max_connections=64,
        min_connections=4,
    )
    await loader.load(
        vertex_csv_paths=["air-routes-latest-nodes.csv"],
        vertex_labels=["airport"],
        edge_csv_paths=["air-routes-latest-edges.csv"],
        edge_types=["route"],
        graph_name="air_route",
        use_copy=True,
        drop_graph=True,
        progress=True,
    )


if __name__ == "__main__":
    import asyncio
    import sys

    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    asyncio.run(main())
```

6. Deploy Azure Database for PostgreSQL and enable Apache AGE extension on Azure Portal
   [Introducing support for Graph data in Azure Database for PostgreSQL (Preview)](https://techcommunity.microsoft.com/blog/adforpostgresql/introducing-support-for-graph-data-in-azure-database-for-postgresql-preview/4275628).

7. Set the PostgreSQL connection string as an environment variable

```shell
export PG_CONNECTION_STRING="host=xxxxxx.postgres.database.azure.com port=5432 dbname=postgres user=......"
```

8. Run the script

```shell
python3 <script_name>.py
```

9. Check the graph created in the PostgreSQL database

```sql
% psql $PG_CONNECTION_STRING
psql (16.6 (Homebrew), server 16.4)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

postgres=> SET search_path = ag_catalog, "$user", public;
SET
postgres=> select * from air_route.airport limit 1;
       id        |                                                                                                                                                                       properties
-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 844424930131969 | {"id": "1", "lat": "33.6366996765137", "lon": "-84.4281005859375", "city": "Atlanta", "code": "ATL", "date": "nan", "desc": "Hartsfield - Jackson Atlanta International Airport", "elev": "1026.0", "icao": "KATL", "type": "airport", "label": "airport", "author": "nan", "region": "US-GA", "country": "US", "longest": "12390.0", "runways": "5.0"}
(1 row)

postgres=> select * from air_route.route limit 1;
        id        |    start_id     |     end_id      |                    properties
------------------+-----------------+-----------------+---------------------------------------------------
 1125899906842625 | 844424930131969 | 844424930131971 | {"id": "3749", "dist": "809.0", "label": "route"}
(1 row)
```

## Classes

- [AGEFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/agefreighter.txt)
- [AzureStorageFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/azurestoragefreighter.txt)
- [AvroFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/avrofreighter.txt)
- [CosmosGremlinFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/cosmosgremlinfreighter.txt)
- [CosmosNoSQLFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/cosmosnosqlfreighter.txt)
- [CSVFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/csvfreighter.txt)
- [MultiAzureStorageFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/multiazurestoragefreighter.txt)
- [MultiCSVFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/multicsvfreighter.txt)
- [Neo4jFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/neo4jfreighter.txt)
- [NetworkXFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/networkxfreighter.txt)
- [ParquetFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/parguetfreighter.txt)
- [PGFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/pgfreighter.txt)

## Method

All the classes have the same load() method. The method loads data into a graph database.

## Arguments

- Common arguments

  - graph_name (str) : the name of the graph
  - chunk_size (int) : the number of rows to be loaded at once
  - direct_loading (bool) : if True, the data is loaded into the graph using the 'INSERT' statement, not Cypher queries
  - use_copy (bool) : if True, the data is loaded into the graph using the 'COPY' protocol
  - create_graph (bool) : if True, the graph will be created after the existing graph is dropped
  - progress (bool) : if True, the progress of the loading is shown

- Common arguments for 'Single Source' classes

  - AvroFreighter
  - AzureStorageFreighter
  - CosmosGremlinFreighter
  - Neo4jFreighter
  - NetworkXFreighter
  - ParquetFreighter
  - PGFreighter
    - start_v_label (str): Start Vertex Label
    - start_id (str): Start Vertex ID
    - start_props (list): Start Vertex Properties
    - end_v_label (str): End Vertex Label
    - end_id (str): End Vertex ID
    - end_props (list): End Vertex Properties
    - edge_type (str): Edge Type
    - edge_props (list): Edge Properties

- Class specific arguments

  - AzureStorageFreighter

    - csv_path (str): The path to the CSV file.

  - AvroFreighter

    - avro_path (str): The path to the Avro file.

  - CosmosGremlinFreighter

    - cosmos_gremlin_endpoint (str): The Cosmos Gremlin endpoint.
    - cosmos_gremlin_key (str): The Cosmos Gremlin key.
    - cosmos_username (str): The Cosmos username.
    - id_map (dict): ID Mapping

  - CosmosGremlinFreighter

    - cosmos_endpoint (str): The Cosmos endpoint.
    - cosmos_key (str): The Cosmos key.
    - cosmos_database (str): The Cosmos database.
    - cosmos_container (str): The Cosmos container.
    - id_map (dict): ID Mapping

  - CSVFreighter

    - csv_path (str): The path to the CSV file.

  - MultiAzureStorageFreighter

    - vertex_args (list): Vertex Arguments.
    - edge_args (list): Edge Arguments.

  - MultiCSVFreighter

    - vertex_csv_paths (list): The paths to the vertex CSV files.
    - vertex_labels (list): The labels of the vertices.
    - edge_csv_paths (list): The paths to the edge CSV files.
    - edge_types (list): The types of the edges.

  - Neo4jFreighter

    - neo4j_uri (str): The URI of the Neo4j database.
    - neo4j_user (str): The username of the Neo4j database.
    - neo4j_password (str): The password of the Neo4j database.
    - neo4j_database (str): The database of the Neo4j database.
    - id_map (dict): ID Mapping

  - NetworkXFreighter

    - networkx_graph (nx.Graph): The NetworkX graph.
    - id_map (dict): ID Mapping

  - ParquetFreighter

    - parquet_path (str): The path to the Parquet file.

  - PGFreighter
    - source_pg_con_string (str): The connection string of the source PostgreSQL database.
    - source_schema (str): The source schema.
    - source_tables (list): The source tables.
    - id_map (dict): ID Mapping

## Release Notes

### 0.8.1 Release

- Added CosmosNoSQLFreighter class.
- CosmosGremlinFreighter class will be obsoleted in the future.

### 0.8.0 Release

- Introduced unit tests for the classes to improve the quality of the package. Currently, the tests are only for a few classes.
- Fixed code to improve the robustness of the package.

### 0.7.5 Release

- Added 'progress' argument to the load() method. It's implemented as an optional argument for all the classes. Thanks to @cjoakim for the suggestion.

### 0.7.4 Release

- Changed the required module from psycopg / psycopg_pool to psycopg[binary,pool]

### 0.7.3 Release

- Added min_connections argument to the connect() method. Added the limitation of UNIX environment to import 'resource' module.

### 0.7.2 Release

- Added Mermaid diagram for the document.

### 0.7.1 Release

- Tuned MultiAzureStorageFreighter.

### 0.7.0 Release

- Added MultiAzureStorageFreighter.

### 0.6.1 Release

- Refactored the documents. Added sample data. Fixed some bugs.

### 0.6.0 Release

- Added edge properties support.
  - 'edge_props' argument (list) is added to the 'load()' method.
- 'drop_graph' argument is obsoleted. 'create_graph' argument is added.
  - 'create_graph' is set to True by default. CAUTION: If the graph already exists, the graph is dropped before loading the data.
  - If 'create_graph' is False, the data is loaded into the existing graph.

### 0.5.3 Release -AzureStorageFreighter-

- AzureStorageFreighter class is totally refactored for better performance and scalability.
  - 0.5.2 didn't work well for large files.
  - Now, it works well for large files.
    Checked with a 5.4GB CSV file consisting of 10M of start vertices, 10K of end vertices, and 25M edges,
    it took 512 seconds to load the data into the graph database with PostgreSQL Flex,
    Standard_D32ds_v4 (32 vcpus, 128 GiB memory) and 512TB / 7500 iops of storage.
  - Tested data was generated with tests/generate_dummy_data.py.
  - UDF to load the data to graph is no longer used.
- However, please note that it is still in the early stages of implementation, so there is room for optimization and potential issues due to insufficient testing.

### 0.5.2 Release -AzureStorageFreighter-

- AzureStorageFreighter class is used to load data from Azure Storage into the graph database. It's totally different from other classes. The class works as follows:
  - If the argument, 'subscription_id' is not set, the class tries to find the Azure Subscription ID from your local environment using the 'az' command.
  - Creates an Azure Storage account and a blob container under the resource group where the PostgreSQL server runs in.
  - Enables the 'azure_storage' extension in the PostgreSQL server, if it's not enabled.
  - Uploads the CSV file to the blob container.
  - Creates a UDF (User Defined Function) named 'load_from_azure_storage' in the PostgreSQL server. The UDF loads data from the Azure Storage into the graph database.
  - Executes the UDF.
- The above process takes time to prepare for loading data, making it unsuitable for loading small files, but effective for loading large files. For instance, it takes under 3 seconds to load 'actorfilms.csv' after uploading.
- However, please note that it is still in the early stages of implementation, so there is room for optimization and potential issues due to insufficient testing.

### 0.5.0 Release

Refactored the code to make it more readable and maintainable with the separated classes for factory model.
Please note how to use the new version of the package is tottally different from the previous versions.

## For more information about [Apache AGE](https://age.apache.org/)

- Apache AGE : https://age.apache.org/
- GitHub : https://github.com/apache/age
- Document : https://age.apache.org/age-manual/master/index.html

## License

MIT License

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "agefreighter",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9.21",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Rio Fujita <rifujita@microsoft.com>",
    "download_url": "https://files.pythonhosted.org/packages/a1/5a/c661c231b8501ee70e53271d30eff1be7357d55997bae3c7ed8068e19f2c/agefreighter-0.8.1.tar.gz",
    "platform": null,
    "description": "# AGEFreighter\n\na Python package that helps you to create a graph database using Azure Database for PostgreSQL.\n\n[Apache AGE\u2122](https://age.apache.org/) is a PostgreSQL Graph database compatible with PostgreSQL's distributed assets and leverages graph data structures to analyze and use relationships and patterns in data.\n\n[Azure Database for PostgreSQL](https://azure.microsoft.com/en-us/services/postgresql/) is a managed database service that is based on the open-source Postgres database engine.\n\n[Introducing support for Graph data in Azure Database for PostgreSQL (Preview)](https://techcommunity.microsoft.com/blog/adforpostgresql/introducing-support-for-graph-data-in-azure-database-for-postgresql-preview/4275628).\n\n## Table of Contents\n\n- [Features](#features)\n- [Benchmark](#benchmark)\n- [Prerequisites](#prerequisites)\n- [Install](#install)\n- [Which class to use](#which-class-to-use)\n- [Usage of CSVFreighter](#usage-of-csvfreighter)\n- [Usage of MultiCSVFreighter (1)](#usage-of-multicsvfreighter-1)\n- [Usage of MultiCSVFreighter (2)](#usage-of-multicsvfreighter-2)\n- [Usage of AvroFreighter](#usage-of-avrofreighter)\n- [Usage of ParquetFreighter](#usage-of-parquetfreighter)\n- [Usage of AzureStorageFreighter](#usage-of-azurestoragefreighter)\n- [Usage of MultiAzureStorageFreighter](#usage-of-multiazurestoragefreighter)\n- [Usage of NetworkxFreighter](#usage-of-networkxfreighter)\n- [Usage of CosmosGremlinFreighter](#usage-of-cosmosgremlinfreighter)\n- [Usage of CosmosNoSQLFreighter](#usage-of-cosmosnosqlfreighter)\n- [Usage of Neo4jFreighter](#usage-of-neo4jfreighter)\n- [Usage of PGFreighter](#usage-of-pgfreighter)\n- [How to edit the CSV files to load them to the graph database with PGFreighter](#how-to-edit-the-csv-files-to-load-them-to-the-graph-database-with-pgfreighter)\n- [Classes](#classes)\n- [Method](#method)\n- [Arguments](#arguments)\n- [Release Notes](#release-notes)\n- [License](#license)\n\n## Features\n\n- Asynchronous connection pool support for psycopg PostgreSQL driver\n- 'direct_loading' option for loading data directly into the graph. If 'direct_loading' is True, the data is loaded into the graph using the 'INSERT' statement, not Cypher queries.\n- 'COPY' protocol support for loading data into the graph. If 'use_copy' is True, the data is loaded into the graph using the 'COPY' protocol.\n- AzureStorageFreighter and MultiAzureStorageFreighter classes to load vast amounts of graph data from Azure Storage. Typically, the number of rows in the CSV files exceeds from a million to a billion.\n\n## Benchmark\n\nThe result with Azure Database for PostgreSQL, General Purpose, D16ds_v4, 16 vCores, 64 GiB RAM, 512 GiB storage (7,500 IOPS)\nSee, [tests/agefreightertester.py](https://github.com/rioriost/agefreighter/blob/main/tests/agefreightertester.py)\n\n```bash\nfor d in `ls data`; do echo $d; wc -l data/$d/* | grep total; done\nairroute\n   23500 total\ncountries\n   20200 total\npayment_large\n 96520015 total\npayment_small\n   96520 total\ntransaction\n   43003 total\n```\n\n```bash\nAgeFreighter version: 0.8.0\nSummary of all tests are as followings:\nTest for AzureStorageFreighter, chunk_size(96), direct_loading(False), use_copy(False): SUCCEEDED,  42.95 seconds\nTest for MultiAzureStorageFreighter, chunk_size(96), direct_loading(False), use_copy(False): SUCCEEDED,  72.49 seconds\nTest for AvroFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  0.94 seconds\nTest for CosmosGremlinFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  5.11 seconds\nTest for CSVFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  0.84 seconds\nTest for MultiCSVFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  0.66 seconds\nTest for MultiCSVFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  0.64 seconds\nTest for Neo4jFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  3.45 seconds\nTest for NetworkXFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  0.88 seconds\nTest for ParquetFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  0.91 seconds\nTest for PGFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  1.07 seconds\n```\n\n## Prerequisites\n\n- over Python 3.9\n- This module runs on [psycopg](https://www.psycopg.org/) and [psycopg_pool](https://www.psycopg.org/)\n- Enable the Apache AGE extension in your Azure Database for PostgreSQL instance. Login Azure Portal, go to 'server parameters' blade, and check 'AGE\" on within 'azure.extensions' and 'shared_preload_libraries' parameters. See, above blog post for more information.\n- Load the AGE extension in your PostgreSQL database.\n\n```sql\nCREATE EXTENSION IF NOT EXISTS age CASCADE;\n```\n\n## Install\n\n- with python venv\n\n```bash\nmkdir your_project\ncd your_project\npython3 -m venv .venv\nsource .venv/bin/activate\npip install agefreighter\n```\n\n- with uv\n\n```bash\nuv init your_project\ncd your_project\nuv venv\nsource .venv/bin/activate\nuv add agefreighter\n```\n\n## Which class to use\n\n![Decision Tree](https://github.com/rioriost/agefreighter/raw/main/images/Decision_tree.png)\n\n## Usage of CSVFreighter\n\n```python\n#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\nimport asyncio\nimport os\nfrom agefreighter import Factory\n\n\nasync def main():\n    instance = Factory.create_instance(\"CSVFreighter\")\n\n    await instance.connect(\n        dsn=os.environ[\"PG_CONNECTION_STRING\"],\n        max_connections=64,\n        min_connections=4,\n    )\n\n    await instance.load(\n        graph_name=\"Transaction\",\n        start_v_label=\"Customer\",\n        start_id=\"CustomerID\",\n        start_props=[\"Name\", \"Address\", \"Email\", \"Phone\"],\n        edge_type=\"BOUGHT\",\n        edge_props=[],\n        end_v_label=\"Product\",\n        end_id=\"ProductID\",\n        end_props=[\"Phrase\", \"SKU\", \"Price\", \"Color\", \"Size\", \"Weight\"],\n        csv_path=\"data/transaction/customer_product_bought.csv\",\n        use_copy=True,\n        drop_graph=True,\n        create_graph=True,\n        progress=True,\n    )\n\n\nif __name__ == \"__main__\":\n    import asyncio\n    import sys\n\n    if sys.platform == \"win32\":\n        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())\n\n    asyncio.run(main())\n```\n\n### File Format for CSVFreighter\n\nCSVFreighter class loads data from single CSV file. The CSV file should have the following format.\n\ncustomer_product_bought.csv: The CSV file should have 'id', 'start_vertex_type', 'end_vertex_type', two id columns, 'CustomerID' and 'ProductID' in the following sample, to be used as start and end vertex IDs, and other columns as properties.\n\n```csv\n\"id\",\"CustomerID\",\"start_vertex_type\",\"Name\",\"Address\",\"Email\",\"Phone\",\"ProductID\",\"end_vertex_type\",\"Phrase\",\"SKU\",\"Price\",\"Color\",\"Size\",\"Weight\"\n\"1\",\"1967\",\"Customer\",\"Jeffrey Joyce\",\"26888 Brett Streets Apt. 325 South Meganberg, CA 80228\",\"madison05@example.com\",\"881-538-6881x35597\",\"120\",\"Product\",\"Networked 3rdgeneration data-warehouse\",\"7246676575258\",\"834.33\",\"DarkKhaki\",\"S\",\"586\"\n\"2\",\"8674\",\"Customer\",\"Craig Burton\",\"280 Sellers Lock North Scott, AR 15307\",\"andersonalexander@example.com\",\"+1-677-235-8289\",\"557\",\"Product\",\"Profit-focused attitude-oriented emulation\",\"6102707440852\",\"953.89\",\"MediumSeaGreen\",\"L\",\"665\"\n```\n\nSee, [data/transaction/customer_product_bought.csv](https://github.com/rioriost/agefreighter/blob/main/data/transaction/customer_product_bought.csv).\n\n## Usage of MultiCSVFreighter (1)\n\n```python\n#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\nimport asyncio\nimport os\nfrom agefreighter import Factory\n\n\nasync def main():\n    instance = Factory.create_instance(\"MultiCSVFreighter\")\n\n    await instance.connect(\n        dsn=os.environ[\"PG_CONNECTION_STRING\"],\n        max_connections=64,\n        min_connections=4,\n    )\n\n    await instance.load(\n        graph_name=\"Countries\",\n        vertex_csv_paths=[\n            \"data/countries/country.csv\",\n            \"data/countries/city.csv\",\n        ],\n        vertex_labels=[\"Country\", \"City\"],\n        edge_csv_paths=[\"data/countries/has_country_city.csv\"],\n        edge_types=[\"has\"],\n        use_copy=True,\n        drop_graph=True,\n        create_graph=True,\n        progress=True,\n    )\n\n\nif __name__ == \"__main__\":\n    import asyncio\n    import sys\n\n    if sys.platform == \"win32\":\n        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())\n\n    asyncio.run(main())\n```\n\n### File Format for MultiCSVFreighter (1)\n\nMultiCSVFreighter class loads data from multiple CSV files. The CSV files should have the following format.\nMultiCSVFreighter class handles all the columns in CSV except 'id' column as properties and all the columns in CSV except 'start_id' / 'start_vertex_type' / 'end_id' / 'end_vertex_type' columns as properties for the edges.\n\ncountry.csv: The node CSV file should have 'id' column and other columns as properties.\n\n```csv\n\"id\",\"Name\",\"Capital\",\"Population\",\"ISO\",\"TLD\",\"FlagURL\"\n\"1\",\"El Salvador\",\"Kristybury\",\"355169921\",\"TN\",\"wxu\",\"https://dummyimage.com/777x133\"\n\"2\",\"Lebanon\",\"New William\",\"413929227\",\"UK\",\"akj\",\"https://picsum.photos/772/459\"\n```\n\ncity.csv: The node CSV file should have 'id' column and other columns as properties.\n\n```csv\n\"id\",\"Name\",\"Latitude\",\"Longitude\"\n\"1\",\"Michaelmouth\",\"-56.4217435\",\"-44.924586\"\n\"2\",\"Bryantton\",\"-62.714695\",\"-162.083092\"\n```\n\nhas_country_city.csv: The edge CSV file should have 'id', 'start_id', 'start_vertex_type', 'end_id', 'end_vertex_type', and other columns as properties.\n\n```csv\n\"id\",\"start_id\",\"start_vertex_type\",\"end_id\",\"end_vertex_type\",\"since\"\n\"1\",\"86\",\"Country\",\"3633\",\"City\",\"1975-12-07 04:45:00.790431\"\n\"2\",\"22\",\"Country\",\"6194\",\"City\",\"1984-06-05 13:23:51.858147\"\n```\n\nSee, [data/countries/](https://github.com/rioriost/agefreighter/blob/main/data/countries/).\n\n## Usage of MultiCSVFreighter (2)\n\n```python\n#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\nimport asyncio\nimport os\nfrom agefreighter import Factory\n\n\nasync def main():\n    instance = Factory.create_instance(\"MultiCSVFreighter\")\n\n    await instance.connect(\n        dsn=os.environ[\"PG_CONNECTION_STRING\"],\n        max_connections=64,\n        min_connections=4,\n    )\n    await instance.load(\n        graph_name=\"AirRoute\",\n        vertex_csv_paths=[\n            \"data/airroute/airport.csv\",\n        ],\n        vertex_labels=[\"AirPort\"],\n        edge_csv_paths=[\"data/airroute/airroute_airport_airport.csv\"],\n        edge_types=[\"ROUTE\"],\n        edge_props = [\"distance\"],\n        use_copy=True,\n        drop_graph=True,\n        create_graph=True,\n        progress=True,\n    )\n\n\nif __name__ == \"__main__\":\n    import asyncio\n    import sys\n\n    if sys.platform == \"win32\":\n        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())\n\n    asyncio.run(main())\n```\n\n### File Format for MultiCSVFreighter (2)\n\nIf the edge connects the same type of vertices, the CSV files should have the following format.\n\nairport.csv: The node CSV file should have 'id' column and other columns as properties.\n\n```csv\n\"id\",\"Name\",\"City\",\"Country\",\"IATA\",\"ICAO\",\"Latitude\",\"Longitude\",\"Altitude\",\"Timezone\",\"DST\",\"Tz\"\n\"1\",\"East Annatown Airport\",\"East Annatown\",\"Eritrea\",\"SHZ\",\"XTIK\",\"-2.783983\",\"-100.199060\",\"823\",\"Africa/Luanda\",\"E\",\"Europe/Skopje\"\n\"2\",\"Port Laura Airport\",\"Port Laura\",\"Montenegro\",\"TQY\",\"WDLC\",\"4.331082\",\"-72.411319\",\"121\",\"Asia/Dhaka\",\"Z\",\"Africa/Kigali\"\n```\n\nairroute_airport_airport.csv: The edge CSV file should have 'id', 'start_id', 'start_vertex_type', 'end_id', 'end_vertex_type', and other columns as properties.\n\n```csv\n\"id\",\"start_id\",\"start_vertex_type\",\"end_id\",\"end_vertex_type\",\"distance\"\n\"1\",\"1388\",\"AirPort\",\"794\",\"AirPort\",\"1373\"\n\"2\",\"2998\",\"AirPort\",\"823\",\"AirPort\",\"11833\"\n```\n\nSee, [data/airroute/](https://github.com/rioriost/agefreighter/blob/main/data/airroute/).\n\n## Usage of AvroFreighter\n\n```python\n#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\nimport asyncio\nimport os\nfrom agefreighter import Factory\n\n\nasync def main():\n    instance = Factory.create_instance(\"AvroFreighter\")\n\n    await instance.connect(\n        dsn=os.environ[\"PG_CONNECTION_STRING\"],\n        max_connections=64,\n        min_connections=4,\n    )\n\n    await instance.load(\n        graph_name=\"Transaction\",\n        start_v_label=\"Customer\",\n        start_id=\"CustomerID\",\n        start_props=[\"Name\", \"Address\", \"Email\", \"Phone\"],\n        edge_type=\"BOUGHT\",\n        edge_props=[],\n        end_v_label=\"Product\",\n        end_id=\"ProductID\",\n        end_props=[\"Phrase\", \"SKU\", \"Price\", \"Color\", \"Size\", \"Weight\"],\n        avro_path=\"data/transaction/customer_product_bought.avro\",\n        use_copy=True,\n        drop_graph=True,\n        create_graph=True,\n        progress=True,\n    )\n\n\nif __name__ == \"__main__\":\n    import asyncio\n    import sys\n\n    if sys.platform == \"win32\":\n        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())\n\n    asyncio.run(main())\n```\n\n### File Format for AvroFreighter\n\nAvroFreighter class loads data from Avro file. The Avro file should have the following format.\n\n```json\n{\n    \"type\": \"record\",\n    \"name\": \"customer_product_bought\",\n    \"fields\": [\n        {\n            \"name\": \"id\",\n            \"type\": \"int\"\n        },\n        {\n            \"name\": \"CustomerID\",\n            \"type\": \"int\"\n        },\n        {\n            \"name\": \"start_vertex_type\",\n            \"type\": \"string\"\n        },\n        {\n            \"name\": \"Name\",\n            \"type\": \"string\"\n        },\n        {\n            \"name\": \"Address\",\n            \"type\": \"string\"\n        },\n        {\n            \"name\": \"Email\",\n            \"type\": \"string\"\n        },\n        {\n            \"name\": \"Phone\",\n            \"type\": \"string\"\n        },\n        {\n            \"name\": \"ProductID\",\n            \"type\": \"int\"\n        },\n        {\n            \"name\": \"end_vertex_type\",\n            \"type\": \"string\"\n        },\n        {\n            \"name\": \"Phrase\",\n            \"type\": \"string\"\n        },\n        {\n            \"name\": \"SKU\",\n            \"type\": \"string\"\n        },\n        {\n            \"name\": \"Price\",\n            \"type\": \"float\"\n        },\n        {\n            \"name\": \"Color\",\n            \"type\": \"string\"\n        },\n        {\n            \"name\": \"Size\",\n            \"type\": \"string\"\n        },\n        {\n            \"name\": \"Weight\",\n            \"type\": \"int\"\n        }\n    ]\n}\n{\n    \"id\": 1,\n    \"CustomerID\": 1967,\n    \"start_vertex_type\": \"Customer\",\n    \"Name\": \"Jeffrey Joyce\",\n    \"Address\": \"26888 Brett Streets Apt. 325 South Meganberg, CA 80228\",\n    \"Email\": \"madison05@example.com\",\n    \"Phone\": \"881-538-6881x35597\",\n    \"ProductID\": 120,\n    \"end_vertex_type\": \"Product\",\n    \"Phrase\": \"Networked 3rdgeneration data-warehouse\",\n    \"SKU\": \"7246676575258\",\n    \"Price\": 834.3300170898438,\n    \"Color\": \"DarkKhaki\",\n    \"Size\": \"S\",\n    \"Weight\": 586\n}\n{\n    \"id\": 2,\n    \"CustomerID\": 8674,\n    \"start_vertex_type\": \"Customer\",\n    \"Name\": \"Craig Burton\",\n    \"Address\": \"280 Sellers Lock North Scott, AR 15307\",\n    \"Email\": \"andersonalexander@example.com\",\n    \"Phone\": \"+1-677-235-8289\",\n    \"ProductID\": 557,\n    \"end_vertex_type\": \"Product\",\n    \"Phrase\": \"Profit-focused attitude-oriented emulation\",\n    \"SKU\": \"6102707440852\",\n    \"Price\": 953.8900146484375,\n    \"Color\": \"MediumSeaGreen\",\n    \"Size\": \"L\",\n    \"Weight\": 665\n}\n```\n\nSee, [data/transaction/customer_product_bought.avro](https://github.com/rioriost/agefreighter/blob/main/data/transaction/customer_product_bought.avro).\n\n## Usage of ParquetFreighter\n\n```python\n#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\nimport asyncio\nimport os\nfrom agefreighter import Factory\n\n\nasync def main():\n    instance = Factory.create_instance(\"ParquetFreighter\")\n\n    await instance.connect(\n        dsn=os.environ[\"PG_CONNECTION_STRING\"],\n        max_connections=64,\n        min_connections=4,\n    )\n\n    await instance.load(\n        graph_name=\"Transaction\",\n        start_v_label=\"Customer\",\n        start_id=\"CustomerID\",\n        start_props=[\"Name\", \"Address\", \"Email\", \"Phone\"],\n        edge_type=\"BOUGHT\",\n        edge_props=[],\n        end_v_label=\"Product\",\n        end_id=\"ProductID\",\n        end_props=[\"Phrase\", \"SKU\", \"Price\", \"Color\", \"Size\", \"Weight\"],\n        parquet_path=\"data/transaction/customer_product_bought.parquet\",\n        use_copy=True,\n        drop_graph=True,\n        create_graph=True,\n        progress=True,\n    )\n\n\nif __name__ == \"__main__\":\n    import asyncio\n    import sys\n\n    if sys.platform == \"win32\":\n        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())\n\n    asyncio.run(main())\n```\n\n### File Format for ParquetFreighter\n\nParquetFreighter class loads data from Parquet file. The Parquet file should have the following format.\n\n```\nrequired group field_id=-1 schema {\n  optional int64 field_id=-1 id;\n  optional int64 field_id=-1 CustomerID;\n  optional binary field_id=-1 start_vertex_type (String);\n  optional binary field_id=-1 Name (String);\n  optional binary field_id=-1 Address (String);\n  optional binary field_id=-1 Email (String);\n  optional binary field_id=-1 Phone (String);\n  optional int64 field_id=-1 ProductID;\n  optional binary field_id=-1 end_vertex_type (String);\n  optional binary field_id=-1 Phrase (String);\n  optional int64 field_id=-1 SKU;\n  optional double field_id=-1 Price;\n  optional binary field_id=-1 Color (String);\n  optional binary field_id=-1 Size (String);\n  optional int64 field_id=-1 Weight;\n}\n\n   id  CustomerID start_vertex_type           Name                                            Address  ...            SKU   Price           Color Size Weight\n0   1        1967          Customer  Jeffrey Joyce  26888 Brett Streets Apt. 325 South Meganberg, ...  ...  7246676575258  834.33       DarkKhaki    S    586\n1   2        8674          Customer   Craig Burton             280 Sellers Lock North Scott, AR 15307  ...  6102707440852  953.89  MediumSeaGreen    L    665\n```\n\nSee, [data/transaction/customer_product_bought.parquet](https://github.com/rioriost/agefreighter/blob/main/data/transaction/customer_product_bought.parquet).\n\n## Usage of AzureStorageFreighter\n\n### Prerequisites\n\nInstall the Azure CLI and login with your Azure account.\n\nmacOS\n\n```bash\nbrew update && brew install azure-cli\n```\n\nWindows\n\n```shell\nwinget install -e --id Microsoft.AzureCLI\n```\n\nLinux (RHEL)\n\n```bash\nsudo rpm --import https://packages.microsoft.com/keys/microsoft.asc\n\n# for RHEL 9\nsudo dnf install -y https://packages.microsoft.com/config/rhel/9.0/packages-microsoft-prod.rpm\n# for RHEL 8\nsudo dnf install -y https://packages.microsoft.com/config/rhel/8/packages-microsoft-prod.rpm\n\nsudo dnf install azure-cli\n```\n\nLinux (Ubuntu)\n\n```bash\ncurl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash\n```\n\nAfrer installing the Azure CLI, login with your Azure account.\n\n```bash\naz login\n```\n\n```python\n#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\nimport asyncio\nimport os\nfrom agefreighter import Factory\n\n\nasync def main():\n    instance = Factory.create_instance(\"AzureStorageFreighter\")\n\n    await instance.connect(\n        dsn=os.environ[\"PG_CONNECTION_STRING\"],\n        max_connections=64,\n        min_connections=4,\n    )\n\n    await instance.load(\n        graph_name=\"Transaction\",\n        start_v_label=\"Customer\",\n        start_id=\"CustomerID\",\n        start_props=[\"Name\", \"Address\", \"Email\", \"Phone\"],\n        edge_type=\"BOUGHT\",\n        edge_props=[],\n        end_v_label=\"Product\",\n        end_id=\"ProductID\",\n        end_props=[\"Phrase\", \"SKU\", \"Price\", \"Color\", \"Size\", \"Weight\"],\n        csv_path=\"data/transaction/customer_product_bought.csv\",\n        drop_graph=True,\n        create_graph=True,\n    )\n\n\nif __name__ == \"__main__\":\n    import asyncio\n    import sys\n\n    if sys.platform == \"win32\":\n        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())\n\n    asyncio.run(main())\n```\n\n### File Format for AzureStorageFreighter\n\nAzureStorageFreighter class loads data from Azure Storage and expects the exactly same format as CSVFreighter.\n\n## Usage of MultiAzureStorageFreighter\n\nBanchmark: loading 965 million rows of data from Azure Storage to Azure Database for PostgreSQL with MultiAzureStorageFreighter class.\n\n```bash\n% wc -l data/payment_large/*\n  900001 data/payment_large/bitcoinaddress.csv\n 2700001 data/payment_large/cookie.csv\n 1200001 data/payment_large/creditcard.csv\n 1600001 data/payment_large/cryptoaddress.csv\n  960001 data/payment_large/email.csv\n 2200001 data/payment_large/ip.csv\n 4000001 data/payment_large/partnerenduser.csv\n 7000001 data/payment_large/payment.csv\n 1000000 data/payment_large/performedby_cookie_payment.csv\n 1000000 data/payment_large/performedby_creditcard_payment.csv\n 1000000 data/payment_large/performedby_cryptoaddress_payment.csv\n 1000000 data/payment_large/performedby_email_payment.csv\n 1000000 data/payment_large/performedby_phone_payment.csv\n  960001 data/payment_large/phone.csv\n 8000001 data/payment_large/usedby_cookie_payment.csv\n 8000001 data/payment_large/usedby_creditcard_payment.csv\n 8000001 data/payment_large/usedby_cryptoaddress_payment.csv\n 8000001 data/payment_large/usedby_email_payment.csv\n 8000001 data/payment_large/usedby_phone_payment.csv\n 6000000 data/payment_large/usedin_cookie_payment.csv\n 6000001 data/payment_large/usedin_creditcard_payment.csv\n 6000000 data/payment_large/usedin_cryptoaddress_payment.csv\n 6000000 data/payment_large/usedin_email_payment.csv\n 6000000 data/payment_large/usedin_phone_payment.csv\n 96520015 total\n```\n\nThe result with Azure Database for PostgreSQL, General Purpose, D16ds_v4, 16 vCores, 64 GiB RAM, 512 GiB storage (7,500 IOPS)\n\n```bash\nFinding Subscription ID...\nEnabling extension...\nCreating storage account...\nUploading files...\nCreating temporary tables...\nLoading files to temporary tables...\nCreating a graph...\nCreating a graph: Done!\nAgeFreighter version: 0.8.0\nSummary of all tests are as followings:\nTest for MultiAzureStorageFreighter, chunk_size(96), direct_loading(False), use_copy(False): SUCCEEDED,  2179.69 seconds\n```\n\n### Prerequisites\n\nInstall the Azure CLI and login with your Azure account.\n\nmacOS\n\n```bash\nbrew update && brew install azure-cli\n```\n\nWindows\n\n```shell\nwinget install -e --id Microsoft.AzureCLI\n```\n\nLinux (RHEL)\n\n```bash\nsudo rpm --import https://packages.microsoft.com/keys/microsoft.asc\n\n# for RHEL 9\nsudo dnf install -y https://packages.microsoft.com/config/rhel/9.0/packages-microsoft-prod.rpm\n# for RHEL 8\nsudo dnf install -y https://packages.microsoft.com/config/rhel/8/packages-microsoft-prod.rpm\n\nsudo dnf install azure-cli\n```\n\nLinux (Ubuntu)\n\n```bash\ncurl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash\n```\n\nAfrer installing the Azure CLI, login with your Azure account.\n\n```bash\naz login\n```\n\n```python\n#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\nimport asyncio\nimport os\nfrom agefreighter import Factory\n\n\nasync def main():\n    instance = Factory.create_instance(\"MultiAzureStorageFreighter\")\n\n    await instance.connect(\n        dsn=os.environ[\"PG_CONNECTION_STRING\"],\n        max_connections=64,\n        min_connections=4,\n    )\n\n    data_dir = \"data/payment_small/\"\n\n    await instance.load(\n        graph_name=\"AgeTester\",\n        vertex_args=[\n            {\n                \"csv_path\": f\"{data_dir}bitcoinaddress.csv\",\n                \"label\": \"BitcoinAddress\",\n                \"id\": \"id\",\n                \"props\": [\n                    \"available_since\",\n                    \"inserted_at\",\n                    \"address\",\n                    \"schema_version\",\n                ],\n            },\n            {\n                \"csv_path\": f\"{data_dir}cookie.csv\",\n                \"label\": \"Cookie\",\n                \"id\": \"id\",\n                \"props\": [\n                    \"available_since\",\n                    \"inserted_at\",\n                    \"uaid\",\n                    \"schema_version\",\n                ],\n            },\n            {\n                \"csv_path\": f\"{data_dir}ip.csv\",\n                \"label\": \"IP\",\n                \"id\": \"id\",\n                \"props\": [\n                    \"available_since\",\n                    \"inserted_at\",\n                    \"address\",\n                    \"schema_version\",\n                ],\n            },\n            {\n                \"csv_path\": f\"{data_dir}phone.csv\",\n                \"label\": \"Phone\",\n                \"id\": \"id\",\n                \"props\": [\n                    \"available_since\",\n                    \"inserted_at\",\n                    \"address\",\n                    \"schema_version\",\n                ],\n            },\n            {\n                \"csv_path\": f\"{data_dir}email.csv\",\n                \"label\": \"Email\",\n                \"id\": \"id\",\n                \"props\": [\n                    \"available_since\",\n                    \"inserted_at\",\n                    \"email\",\n                    \"domain\",\n                    \"handle\",\n                    \"schema_version\",\n                ],\n            },\n            {\n                \"csv_path\": f\"{data_dir}payment.csv\",\n                \"label\": \"Payment\",\n                \"id\": \"id\",\n                \"props\": [\n                    \"available_since\",\n                    \"inserted_at\",\n                    \"payment_id\",\n                    \"schema_version\",\n                ],\n            },\n            {\n                \"csv_path\": f\"{data_dir}creditcard.csv\",\n                \"label\": \"CreditCard\",\n                \"id\": \"id\",\n                \"props\": [\n                    \"available_since\",\n                    \"inserted_at\",\n                    \"expiry_month\",\n                    \"expiry_year\",\n                    \"masked_number\",\n                    \"creditcard_identifier\",\n                    \"schema_version\",\n                ],\n            },\n            {\n                \"csv_path\": f\"{data_dir}partnerenduser.csv\",\n                \"label\": \"PartnerEndUser\",\n                \"id\": \"id\",\n                \"props\": [\n                    \"available_since\",\n                    \"inserted_at\",\n                    \"partner_end_user_id\",\n                    \"schema_version\",\n                ],\n            },\n            {\n                \"csv_path\": f\"{data_dir}cryptoaddress.csv\",\n                \"label\": \"CryptoAddress\",\n                \"id\": \"id\",\n                \"props\": [\n                    \"available_since\",\n                    \"inserted_at\",\n                    \"address\",\n                    \"currency\",\n                    \"full_address\",\n                    \"schema_version\",\n                    \"tag\",\n                ],\n            },\n        ],\n        edge_args=[\n            {\n                \"csv_paths\": [\n                    f\"{data_dir}usedin_cookie_payment.csv\",\n                    f\"{data_dir}usedin_creditcard_payment.csv\",\n                    f\"{data_dir}usedin_cryptoaddress_payment.csv\",\n                    f\"{data_dir}usedin_email_payment.csv\",\n                    f\"{data_dir}usedin_phone_payment.csv\",\n                ],\n                \"type\": \"UsedIn\",\n            },\n            {\n                \"csv_paths\": [\n                    f\"{data_dir}usedby_cookie_payment.csv\",\n                    f\"{data_dir}usedby_creditcard_payment.csv\",\n                    f\"{data_dir}usedby_cryptoaddress_payment.csv\",\n                    f\"{data_dir}usedby_email_payment.csv\",\n                    f\"{data_dir}usedby_phone_payment.csv\",\n                ],\n                \"type\": \"UsedBy\",\n            },\n            {\n                \"csv_paths\": [\n                    f\"{data_dir}performedby_cookie_payment.csv\",\n                    f\"{data_dir}performedby_creditcard_payment.csv\",\n                    f\"{data_dir}performedby_cryptoaddress_payment.csv\",\n                    f\"{data_dir}performedby_email_payment.csv\",\n                    f\"{data_dir}performedby_phone_payment.csv\",\n                ],\n                \"type\": \"PerformedBy\",\n            },\n        ],\n        drop_graph=True,\n        create_graph=True,\n    )\n\n\nif __name__ == \"__main__\":\n    import asyncio\n    import sys\n\n    if sys.platform == \"win32\":\n        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())\n\n    asyncio.run(main())\n```\n\n### File Format for MultiAzureStorageFreighter\n\nMultiAzureStorageFreighter class loads data from Azure Storage and expects the exactly same format as MultiCSVFreighter.\n\nSee, [data/payment_small/](https://github.com/rioriost/agefreighter/blob/main/data/payment_small/).\n\n### What AzureStorageFreighter / MultiAzureStorageFreighter do\n\n1. Find the Subscription ID of your Azure account.\n2. Enable Azure Storage Extension in your Azure Database for PostgreSQL instance.\n3. Create a Storage Account in the resource group where your Azure Database for PostgreSQL instance is located.\n4. Create a container in the Storage Account.\n5. Upload CSV files to the container.\n6. Create temporary tables where the CSV files are loaded into.\n7. Load the data from the temporary tables to the graph.\n\n## Usage of NetworkxFreighter\n\n```python\n#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\nimport asyncio\nimport os\nfrom agefreighter import Factory\n\nimport networkx as nx\n\n\nasync def main():\n    instance = Factory.create_instance(\"NetworkXFreighter\")\n\n    networkx_graph = nx.DiGraph()\n    networkx_graph.add_node(\n        1,\n        name=\"Jeffrey Joyce\",\n        address=\"26888 Brett Streets Apt. 325 South Meganberg, CA 80228\",\n        email=\"madison05@example.com\",\n        phone=\"881-538-6881x35597\",\n        customerid=1967,\n        label=\"Customer\",\n    )\n    networkx_graph.add_node(\n        2,\n        name=\"Craig Burton\",\n        address=\"280 Sellers Lock North Scott, AR 15307\",\n        email=\"andersonalexander@example.com\",\n        phone=\"+1-677-235-8289\",\n        customerid=8674,\n        label=\"Customer\",\n    )\n    networkx_graph.add_node(\n        3,\n        phrase=\"Networked 3rdgeneration data-warehouse\",\n        sku=\"7246676575258\",\n        price=834.33,\n        color=\"DarkKhaki\",\n        size=\"S\",\n        weight=586,\n        productid=120,\n        label=\"Product\",\n    )\n    networkx_graph.add_node(\n        4,\n        phrase=\"Profit-focused attitude-oriented emulation\",\n        sku=\"6102707440852\",\n        price=953.89,\n        color=\"MediumSeaGreen\",\n        size=\"L\",\n        weight=665,\n        productid=557,\n        label=\"Product\",\n    )\n\n    networkx_graph.add_edge(1, 3, since=\"1975-12-07 04:45:00.790431\", label=\"BOUGHT\")\n    networkx_graph.add_edge(2, 4, since=\"1984-06-05 13:23:51.858147\", label=\"BOUGHT\")\n\n    await instance.connect(\n        dsn=os.environ[\"PG_CONNECTION_STRING\"],\n        max_connections=64,\n        min_connections=4,\n    )\n    await instance.load(\n        graph_name=\"Transaction\",\n        networkx_graph=networkx_graph,\n        id_map={\n            \"Customer\": \"CustomerID\",\n            \"Product\": \"ProductID\",\n        },\n        drop_graph=True,\n        create_graph=True,\n        progress=True,\n    )\n\n\nif __name__ == \"__main__\":\n    import asyncio\n    import sys\n\n    if sys.platform == \"win32\":\n        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())\n\n    asyncio.run(main())\n```\n\nYou can also load a networkx graph from a pickle file.\n\n```python\n#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\nimport asyncio\nimport os\nfrom agefreighter import Factory\n\nimport pickle\n\n\nasync def main():\n    instance = Factory.create_instance(\"NetworkXFreighter\")\n\n    await instance.connect(\n        dsn=os.environ[\"PG_CONNECTION_STRING\"],\n        max_connections=64,\n        min_connections=4,\n    )\n    await instance.load(\n        graph_name=\"Transaction\",\n        networkx_graph=pickle.load(\n            open(\"data/transaction/customer_product_bought.pickle\", \"rb\")\n        ),\n        id_map={\n            \"Customer\": \"CustomerID\",\n            \"Product\": \"ProductID\",\n        },\n        drop_graph=True,\n        create_graph=True,\n        progress=True,\n    )\n\n\nif __name__ == \"__main__\":\n    import asyncio\n    import sys\n\n    if sys.platform == \"win32\":\n        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())\n\n    asyncio.run(main())\n```\n\nSee, [data/transaction/customer_product_bought.pickle](https://github.com/rioriost/agefreighter/blob/main/data/transaction/customer_product_bought.pickle).\n\n## Usage of CosmosNoSQLFreighter\n\nCosmosNoSQLFreighter uses NoSQL API to get data loaded via the Gremlin API for better performance.\n\n```bash\nAgeFreighter version: 0.8.1\nSummary of all tests are as followings:\nTest for CosmosGremlinFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  4.53 seconds\nTest for CosmosNoSQLFreighter, chunk_size(96), direct_loading(False), use_copy(True): SUCCEEDED,  2.25 seconds\n```\n\n```python\n#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\nimport asyncio\nimport os\nfrom agefreighter import Factory\n\n\nasync def main():\n    instance = Factory.create_instance(\"CosmosNoSQLFreighter\")\n\n    await instance.connect(\n        dsn=os.environ[\"PG_CONNECTION_STRING\"],\n        max_connections=64,\n        min_connections=4,\n    )\n\n    await instance.load(\n        graph_name=\"Transaction\",\n        cosmos_endpoint=os.environ[\"COSMOS_ENDPOINT\"],\n        cosmos_key=os.environ[\"COSMOS_KEY\"],\n        cosmos_database=\"db1\",\n        cosmos_container=\"transaction\",\n        id_map={\n            \"Customer\": \"CustomerID\",\n            \"Product\": \"ProductID\",\n        },\n        drop_graph=True,\n        create_graph=True,\n        progress=True,\n    )\n\n\nif __name__ == \"__main__\":\n    import asyncio\n    import sys\n\n    if sys.platform == \"win32\":\n        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())\n\n    asyncio.run(main())\n```\n\nThe above code suppose that you have a Cosmos DB account with a Gremlin API database named 'db1' with a container named 'transaction' in the Azure Portal.\nYou can find the '.NET SDK URI' starting with 'https://', not 'wss://' for 'cosmos_endpoint' env variable and 'PRIMARY KEY' / 'SECONDARY KEY' for 'cosmos_key' env variable in the 'Keys' blade of the Cosmos DB account.\n\n### Document Format for CosmosNoSQLFreighter\n\nCosmosNoSQLFreighter class loads data from Cosmos DB. The Cosmos DB should have the following format.\n\n```json\n{\n  \"label\": \"Customer\",\n  \"id\": \"272e8291-d1a4-4238-ad27-92ec2101425d\",\n  \"Name\": [\n    {\n      \"id\": \"ee24bc0d-1821-4854-bd00-aafba2419a1f\",\n      \"_value\": \"Alicia Herrera\"\n    }\n  ],\n  \"CustomerID\": [\n    {\n      \"id\": \"f1aafc14-1ecd-4437-b0b4-ce9926ec0924\",\n      \"_value\": \"1828\"\n    }\n  ],\n  \"Address\": [\n    {\n      \"id\": \"c611e8b5-a33a-4669-8fbb-36c0737d7ab8\",\n      \"_value\": \"906 Shannon Views Apt. 370 Ryanbury, CA 73165\"\n    }\n  ],\n  \"Email\": [\n    {\n      \"id\": \"1f951bf3-47bf-4af8-951c-7915ba810500\",\n      \"_value\": \"jillian49@example.com\"\n    }\n  ],\n  \"Phone\": [\n    {\n      \"id\": \"75ca0b65-b5a5-4fdd-8fa5-35db476fede6\",\n      \"_value\": \"+1-351-871-4405x226\"\n    }\n  ],\n  \"pk\": \"1\",\n  \"_rid\": \"tfxpAMsq0qXfkwEAAAAADA==\",\n  \"_self\": \"dbs/tfxpAA==/colls/tfxpAMsq0qU=/docs/tfxpAMsq0qXfkwEAAAAADA==/\",\n  \"_etag\": \"\\\"2f0055e1-0000-2300-0000-67909dd00000\\\"\",\n  \"_attachments\": \"attachments/\",\n  \"_ts\": 1737530832\n}\n\n{\n  \"label\": \"Product\",\n  \"id\": \"b342f7ac-0170-4d10-973c-e01910e79320\",\n  \"Phrase\": [\n    {\n      \"id\": \"3da120da-c934-4c31-95b2-973e5a5c88ee\",\n      \"_value\": \"Reverse-engineered asymmetric leverage\"\n    }\n  ],\n  \"ProductID\": [\n    {\n      \"id\": \"dab94ea0-dfe4-4e30-b660-cc2dd9cfd9b6\",\n      \"_value\": \"113\"\n    }\n  ],\n  \"SKU\": [\n    {\n      \"id\": \"2df5bc68-2764-4d5e-a531-eccbc9049f98\",\n      \"_value\": \"280217698898\"\n    }\n  ],\n  \"Price\": [\n    {\n      \"id\": \"4c10a970-8b00-4f6c-9ae0-244946dffcb7\",\n      \"_value\": \"559.27\"\n    }\n  ],\n  \"Color\": [\n    {\n      \"id\": \"69961024-4807-4ac9-a126-2e71f5f885b7\",\n      \"_value\": \"White\"\n    }\n  ],\n  \"Size\": [\n    {\n      \"id\": \"39fad2fd-f879-49f4-9712-473be9d578f5\",\n      \"_value\": \"L\"\n    }\n  ],\n  \"Weight\": [\n    {\n      \"id\": \"41e10b86-32c9-4599-b985-6b97ae0719be\",\n      \"_value\": \"633\"\n    }\n  ],\n  \"pk\": \"1\",\n  \"_rid\": \"tfxpAMsq0qV0uAEAAAAADA==\",\n  \"_self\": \"dbs/tfxpAA==/colls/tfxpAMsq0qU=/docs/tfxpAMsq0qV0uAEAAAAADA==/\",\n  \"_etag\": \"\\\"30009b06-0000-2300-0000-67909ddd0000\\\"\",\n  \"_attachments\": \"attachments/\",\n  \"_ts\": 1737530845\n}\n\n{\n  \"label\": \"BOUGHT\",\n  \"id\": \"674441a4-457a-4d59-a9a8-5c6e2d57ea16\",\n  \"_sink\": \"510a9b04-d351-4279-b42a-0ea51f7f1a8c\",\n  \"_sinkLabel\": \"Product\",\n  \"_sinkPartition\": \"1\",\n  \"_vertexId\": \"3549c350-59ec-46c7-8e49-589bc0cc6da6\",\n  \"_vertexLabel\": \"Customer\",\n  \"_isEdge\": true,\n  \"pk\": \"1\",\n  \"_rid\": \"tfxpAMsq0qX3ygEAAAAADA==\",\n  \"_self\": \"dbs/tfxpAA==/colls/tfxpAMsq0qU=/docs/tfxpAMsq0qX3ygEAAAAADA==/\",\n  \"_etag\": \"\\\"3000ab19-0000-2300-0000-67909de90000\\\"\",\n  \"_attachments\": \"attachments/\",\n  \"_ts\": 1737530857\n}\n```\n\n## Usage of CosmosGremlinFreighter\n\n```python\n#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\nimport asyncio\nimport os\nfrom agefreighter import Factory\n\n\nasync def main():\n    instance = Factory.create_instance(\"CosmosGremlinFreighter\")\n\n    await instance.connect(\n        dsn=os.environ[\"PG_CONNECTION_STRING\"],\n        max_connections=64,\n        min_connections=4,\n    )\n\n    await instance.load(\n        graph_name=\"Transaction\",\n        cosmos_gremlin_endpoint=os.environ[\"COSMOS_GREMLIN_ENDPOINT\"],\n        cosmos_gremlin_key=os.environ[\"COSMOS_GREMLIN_KEY\"],\n        cosmos_username=\"/dbs/db1/colls/transaction\",\n        id_map={\n            \"Customer\": \"CustomerID\",\n            \"Product\": \"ProductID\",\n        },\n        drop_graph=True,\n        create_graph=True,\n        progress=True,\n    )\n\n\nif __name__ == \"__main__\":\n    import asyncio\n    import sys\n\n    if sys.platform == \"win32\":\n        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())\n\n    asyncio.run(main())\n```\n\nThe above code suppose that you have a Cosmos DB account with a Gremlin API database named 'db1' with a container named 'transaction' in the Azure Portal.\nYou can find the 'GREMLIN URI' for 'cosmos_gremlin_endpoint' env variable and 'PRIMARY KEY' / 'SECONDARY KEY' for 'cosmos_gremlin_key' env variable in the 'Keys' blade of the Cosmos DB account.\n\n### Document Format for CosmosGremlinFreighter\n\nCosmosGremlinFreighter class loads data from Cosmos DB. The Cosmos DB should have the following format.\n\nnode: g.V().limit(1)\n\n```json\n[\n  {\n    \"id\": \"272e8291-d1a4-4238-ad27-92ec2101425d\",\n    \"label\": \"Customer\",\n    \"type\": \"vertex\",\n    \"properties\": {\n      \"Name\": [\n        {\n          \"id\": \"ee24bc0d-1821-4854-bd00-aafba2419a1f\",\n          \"value\": \"Alicia Herrera\"\n        }\n      ],\n      \"CustomerID\": [\n        {\n          \"id\": \"f1aafc14-1ecd-4437-b0b4-ce9926ec0924\",\n          \"value\": \"1828\"\n        }\n      ],\n      \"Address\": [\n        {\n          \"id\": \"c611e8b5-a33a-4669-8fbb-36c0737d7ab8\",\n          \"value\": \"906 Shannon Views Apt. 370 Ryanbury, CA 73165\"\n        }\n      ],\n      \"Email\": [\n        {\n          \"id\": \"1f951bf3-47bf-4af8-951c-7915ba810500\",\n          \"value\": \"jillian49@example.com\"\n        }\n      ],\n      \"Phone\": [\n        {\n          \"id\": \"75ca0b65-b5a5-4fdd-8fa5-35db476fede6\",\n          \"value\": \"+1-351-871-4405x226\"\n        }\n      ],\n      \"pk\": [\n        {\n          \"id\": \"272e8291-d1a4-4238-ad27-92ec2101425d|pk\",\n          \"value\": \"1\"\n        }\n      ]\n    }\n  }\n]\n```\n\nedge: g.E().limit(1)\n\n```json\n[\n  {\n    \"id\": \"efc34e39-d674-40df-9c6a-f8a24c8b8d77\",\n    \"label\": \"BOUGHT\",\n    \"type\": \"edge\",\n    \"inVLabel\": \"Product\",\n    \"outVLabel\": \"Customer\",\n    \"inV\": \"1430cbf2-7d25-4c38-8ff7-f2b215bfdcad\",\n    \"outV\": \"390565dc-67d7-4179-a241-8cd2f8df82b2\"\n  }\n]\n```\n\n## Usage of Neo4jFreighter\n\n```python\n#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\nimport asyncio\nimport os\nfrom agefreighter import Factory\n\n\nasync def main():\n    instance = Factory.create_instance(\"Neo4jFreighter\")\n\n    await instance.connect(\n        dsn=os.environ[\"PG_CONNECTION_STRING\"],\n        max_connections=64,\n        min_connections=4,\n    )\n\n    await instance.load(\n        graph_name=\"Transaction\",\n        neo4j_uri=os.environ[\"NEO4J_URI\"],\n        neo4j_user=os.environ[\"NEO4J_USER\"],\n        neo4j_password=os.environ[\"NEO4J_PASSWORD\"],\n        id_map={\n            \"Customer\": \"CustomerID\",\n            \"Product\": \"ProductID\",\n        },\n        drop_graph=True,\n        create_graph=True,\n        progress=True,\n    )\n\n\nif __name__ == \"__main__\":\n    import asyncio\n    import sys\n\n    if sys.platform == \"win32\":\n        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())\n\n    asyncio.run(main())\n```\n\nThe above code suppose that you have a Neo4j or compatible graph DB.\n\n### Data Format for Neo4jFreighter\n\nnode: MATCH (n) RETURN n LIMIT 1\n\n```json\n{\n  \"identity\": 0,\n  \"labels\": [\"Customer\"],\n  \"properties\": {\n    \"Email\": \"madison05@example.com\",\n    \"Address\": \"26888 Brett Streets Apt. 325 South Meganberg, CA 80228\",\n    \"Customer\": \"Customer\",\n    \"Phone\": \"881-538-6881x35597\",\n    \"CustomerID\": 1967,\n    \"Name\": \"Jeffrey Joyce\"\n  },\n  \"elementId\": \"4:148f2025-c6d2-4e47-8661-b2f1a28f24aa:0\"\n}\n```\n\nedge: MATCH ()-[r]->() RETURN r LIMIT 1\n\n```json\n{\n  \"identity\": 20000,\n  \"start\": 0,\n  \"end\": 17272,\n  \"type\": \"BOUGHT\",\n  \"properties\": {\n    \"from\": 1967,\n    \"to\": 120\n  },\n  \"elementId\": \"5:148f2025-c6d2-4e47-8661-b2f1a28f24aa:20000\",\n  \"startNodeElementId\": \"4:148f2025-c6d2-4e47-8661-b2f1a28f24aa:0\",\n  \"endNodeElementId\": \"4:148f2025-c6d2-4e47-8661-b2f1a28f24aa:17272\"\n}\n```\n\n## Usage of PGFreighter\n\n```python\n#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\nimport asyncio\nimport os\nfrom agefreighter import Factory\n\n\nasync def main():\n    instance = Factory.create_instance(\"PGFreighter\")\n\n    await instance.connect(\n        dsn=os.environ[\"PG_CONNECTION_STRING\"],\n        max_connections=64,\n        min_connections=4,\n    )\n\n    await instance.load(\n        graph_name=\"Transaction\",\n        source_pg_con_string=os.environ[\"SRC_PG_CONNECTION_STRING\"],\n        source_tables={\n            \"start\": \"Customer\",\n            \"end\": \"Product\",\n            \"edges\": \"BOUGHT\",\n        },\n        id_map={\n            \"Customer\": \"CustomerID\",\n            \"Product\": \"ProductID\",\n        },\n        drop_graph=True,\n        create_graph=True,\n        progress=True,\n    )\n\n\nif __name__ == \"__main__\":\n    import asyncio\n    import sys\n\n    if sys.platform == \"win32\":\n        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())\n\n    asyncio.run(main())\n```\n\n### Table schemas for PGFreighter\n\nCustomer table schema\n\n```sql\npostgres=# \\d+ \"Customer\"\n                                                                  Table \"public.Customer\"\n     Column     |  Type   | Collation | Nullable |                      Default                       | Storage  | Compression | Stats target | Description\n----------------+---------+-----------+----------+----------------------------------------------------+----------+-------------+--------------+-------------\n CustomerSerial | integer |           | not null | nextval('\"Customer_CustomerSerial_seq\"'::regclass) | plain    |             |              |\n CustomerID     | text    |           |          |                                                    | extended |             |              |\n Name           | text    |           |          |                                                    | extended |             |              |\n Address        | text    |           |          |                                                    | extended |             |              |\n Email          | text    |           |          |                                                    | extended |             |              |\n Phone          | text    |           |          |                                                    | extended |             |              |\nIndexes:\n    \"Customer_CustomerID_idx\" btree (\"CustomerID\")\nAccess method: heap\n```\n\nProduct table schema\n\n```sql\npostgres=# \\d+ \"Product\"\n                                                                 Table \"public.Product\"\n    Column     |  Type   | Collation | Nullable |                     Default                      | Storage  | Compression | Stats target | Description\n---------------+---------+-----------+----------+--------------------------------------------------+----------+-------------+--------------+-------------\n ProductSerial | integer |           | not null | nextval('\"Product_ProductSerial_seq\"'::regclass) | plain    |             |              |\n ProductID     | text    |           |          |                                                  | extended |             |              |\n Phrase        | text    |           |          |                                                  | extended |             |              |\n SKU           | text    |           |          |                                                  | extended |             |              |\n Price         | real    |           |          |                                                  | plain    |             |              |\n Color         | text    |           |          |                                                  | extended |             |              |\n Size          | text    |           |          |                                                  | extended |             |              |\n Weight        | integer |           |          |                                                  | plain    |             |              |\nIndexes:\n    \"Product_ProductID_idx\" btree (\"ProductID\")\nAccess method: heap\n```\n\nBOUGHT table schema\n\n```sql\npostgres=# \\d+ \"BOUGHT\"\n                                                                Table \"public.BOUGHT\"\n    Column    |  Type   | Collation | Nullable |                    Default                     | Storage  | Compression | Stats target | Description\n--------------+---------+-----------+----------+------------------------------------------------+----------+-------------+--------------+-------------\n BoughtSerial | integer |           | not null | nextval('\"BOUGHT_BoughtSerial_seq\"'::regclass) | plain    |             |              |\n CustomerID   | text    |           |          |                                                | extended |             |              |\n ProductID    | text    |           |          |                                                | extended |             |              |\nIndexes:\n    \"BOUGHT_CustomerID_idx\" btree (\"CustomerID\")\n    \"BOUGHT_ProductID_idx\" btree (\"ProductID\")\nAccess method: heap\n```\n\n## How to edit the CSV files to load them to the graph database with PGFreighter\n\nExample: [krlawrence graph](https://github.com/krlawrence/graph/tree/master/sample-data)\n\n1. Download air-routes-latest-edges.csv and air-routes-latest-nodes.csv\n\n2. Edit air-routes-latest-nodes.csv\n\n- original\n\n```csv\n~id,~label,type:string,code:string,icao:string,desc:string,region:string,runways:int,longest:int,elev:int,country:string,city:string,lat:double,lon:double,author:string,date:string\n0,version,version,0.89,,Air Routes Data - Version: 0.89 Generated: 2022-08-29 14:10:18 UTC; Graph created by Kelvin R. Lawrence; Please let me know of any errors you find in the graph or routes that should be added.,,,,,,,,,Kelvin R. Lawrence,2022-08-29 14:10:18 UTC\n```\n\n- edited\n\n```csv\nid,label,type,code,icao,desc,region,runways,longest,elev,country,city,lat,lon,author,date\n1,airport,airport,ATL,KATL,Hartsfield - Jackson Atlanta International Airport,US-GA,5,12390,1026,US,Atlanta,33.6366996765137,-84.4281005859375,,\n```\n\n- remove the second line\n- edit the first line (CSV Header)\n\n3. Edit air-routes-latest-edges.csv\n\n- original\n\n```csv\n~id,~from,~to,~label,dist:int\n3749,1,3,route,809\n```\n\n- edited\n\n```csv\nid,start_id,end_id,label,dist,start_vertex_type,end_vertex_type\n3749,1,3,route,809,airport,airport\n```\n\n- edit the first line (CSV Header)\n- add start_vertex_type and end_vertex_type columns to each lines\n\n4. Install agefreighter\n\n- with python venv\n\n```bash\nmkdir your_project\ncd your_project\npython3 -m venv .venv\nsource .venv/bin/activate\npip install agefreighter\n```\n\n- with uv\n\n```bash\nuv init your_project\ncd your_project\nuv venv\nsource .venv/bin/activate\nuv add agefreighter\n```\n\n5. Make a Python script as below and locate the script in the same directory with the CSV files\n\n```python\n#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\nimport os\nfrom agefreighter import Factory\n\n\nasync def main():\n    loader = Factory.create_instance(\"MultiCSVFreighter\")\n    await loader.connect(\n        dsn=os.environ[\"PG_CONNECTION_STRING\"],\n        max_connections=64,\n        min_connections=4,\n    )\n    await loader.load(\n        vertex_csv_paths=[\"air-routes-latest-nodes.csv\"],\n        vertex_labels=[\"airport\"],\n        edge_csv_paths=[\"air-routes-latest-edges.csv\"],\n        edge_types=[\"route\"],\n        graph_name=\"air_route\",\n        use_copy=True,\n        drop_graph=True,\n        progress=True,\n    )\n\n\nif __name__ == \"__main__\":\n    import asyncio\n    import sys\n\n    if sys.platform == \"win32\":\n        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())\n\n    asyncio.run(main())\n```\n\n6. Deploy Azure Database for PostgreSQL and enable Apache AGE extension on Azure Portal\n   [Introducing support for Graph data in Azure Database for PostgreSQL (Preview)](https://techcommunity.microsoft.com/blog/adforpostgresql/introducing-support-for-graph-data-in-azure-database-for-postgresql-preview/4275628).\n\n7. Set the PostgreSQL connection string as an environment variable\n\n```shell\nexport PG_CONNECTION_STRING=\"host=xxxxxx.postgres.database.azure.com port=5432 dbname=postgres user=......\"\n```\n\n8. Run the script\n\n```shell\npython3 <script_name>.py\n```\n\n9. Check the graph created in the PostgreSQL database\n\n```sql\n% psql $PG_CONNECTION_STRING\npsql (16.6 (Homebrew), server 16.4)\nSSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)\nType \"help\" for help.\n\npostgres=> SET search_path = ag_catalog, \"$user\", public;\nSET\npostgres=> select * from air_route.airport limit 1;\n       id        |                                                                                                                                                                       properties\n-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n 844424930131969 | {\"id\": \"1\", \"lat\": \"33.6366996765137\", \"lon\": \"-84.4281005859375\", \"city\": \"Atlanta\", \"code\": \"ATL\", \"date\": \"nan\", \"desc\": \"Hartsfield - Jackson Atlanta International Airport\", \"elev\": \"1026.0\", \"icao\": \"KATL\", \"type\": \"airport\", \"label\": \"airport\", \"author\": \"nan\", \"region\": \"US-GA\", \"country\": \"US\", \"longest\": \"12390.0\", \"runways\": \"5.0\"}\n(1 row)\n\npostgres=> select * from air_route.route limit 1;\n        id        |    start_id     |     end_id      |                    properties\n------------------+-----------------+-----------------+---------------------------------------------------\n 1125899906842625 | 844424930131969 | 844424930131971 | {\"id\": \"3749\", \"dist\": \"809.0\", \"label\": \"route\"}\n(1 row)\n```\n\n## Classes\n\n- [AGEFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/agefreighter.txt)\n- [AzureStorageFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/azurestoragefreighter.txt)\n- [AvroFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/avrofreighter.txt)\n- [CosmosGremlinFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/cosmosgremlinfreighter.txt)\n- [CosmosNoSQLFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/cosmosnosqlfreighter.txt)\n- [CSVFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/csvfreighter.txt)\n- [MultiAzureStorageFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/multiazurestoragefreighter.txt)\n- [MultiCSVFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/multicsvfreighter.txt)\n- [Neo4jFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/neo4jfreighter.txt)\n- [NetworkXFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/networkxfreighter.txt)\n- [ParquetFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/parguetfreighter.txt)\n- [PGFreighter](https://github.com/rioriost/agefreighter/blob/main/docs/pgfreighter.txt)\n\n## Method\n\nAll the classes have the same load() method. The method loads data into a graph database.\n\n## Arguments\n\n- Common arguments\n\n  - graph_name (str) : the name of the graph\n  - chunk_size (int) : the number of rows to be loaded at once\n  - direct_loading (bool) : if True, the data is loaded into the graph using the 'INSERT' statement, not Cypher queries\n  - use_copy (bool) : if True, the data is loaded into the graph using the 'COPY' protocol\n  - create_graph (bool) : if True, the graph will be created after the existing graph is dropped\n  - progress (bool) : if True, the progress of the loading is shown\n\n- Common arguments for 'Single Source' classes\n\n  - AvroFreighter\n  - AzureStorageFreighter\n  - CosmosGremlinFreighter\n  - Neo4jFreighter\n  - NetworkXFreighter\n  - ParquetFreighter\n  - PGFreighter\n    - start_v_label (str): Start Vertex Label\n    - start_id (str): Start Vertex ID\n    - start_props (list): Start Vertex Properties\n    - end_v_label (str): End Vertex Label\n    - end_id (str): End Vertex ID\n    - end_props (list): End Vertex Properties\n    - edge_type (str): Edge Type\n    - edge_props (list): Edge Properties\n\n- Class specific arguments\n\n  - AzureStorageFreighter\n\n    - csv_path (str): The path to the CSV file.\n\n  - AvroFreighter\n\n    - avro_path (str): The path to the Avro file.\n\n  - CosmosGremlinFreighter\n\n    - cosmos_gremlin_endpoint (str): The Cosmos Gremlin endpoint.\n    - cosmos_gremlin_key (str): The Cosmos Gremlin key.\n    - cosmos_username (str): The Cosmos username.\n    - id_map (dict): ID Mapping\n\n  - CosmosGremlinFreighter\n\n    - cosmos_endpoint (str): The Cosmos endpoint.\n    - cosmos_key (str): The Cosmos key.\n    - cosmos_database (str): The Cosmos database.\n    - cosmos_container (str): The Cosmos container.\n    - id_map (dict): ID Mapping\n\n  - CSVFreighter\n\n    - csv_path (str): The path to the CSV file.\n\n  - MultiAzureStorageFreighter\n\n    - vertex_args (list): Vertex Arguments.\n    - edge_args (list): Edge Arguments.\n\n  - MultiCSVFreighter\n\n    - vertex_csv_paths (list): The paths to the vertex CSV files.\n    - vertex_labels (list): The labels of the vertices.\n    - edge_csv_paths (list): The paths to the edge CSV files.\n    - edge_types (list): The types of the edges.\n\n  - Neo4jFreighter\n\n    - neo4j_uri (str): The URI of the Neo4j database.\n    - neo4j_user (str): The username of the Neo4j database.\n    - neo4j_password (str): The password of the Neo4j database.\n    - neo4j_database (str): The database of the Neo4j database.\n    - id_map (dict): ID Mapping\n\n  - NetworkXFreighter\n\n    - networkx_graph (nx.Graph): The NetworkX graph.\n    - id_map (dict): ID Mapping\n\n  - ParquetFreighter\n\n    - parquet_path (str): The path to the Parquet file.\n\n  - PGFreighter\n    - source_pg_con_string (str): The connection string of the source PostgreSQL database.\n    - source_schema (str): The source schema.\n    - source_tables (list): The source tables.\n    - id_map (dict): ID Mapping\n\n## Release Notes\n\n### 0.8.1 Release\n\n- Added CosmosNoSQLFreighter class.\n- CosmosGremlinFreighter class will be obsoleted in the future.\n\n### 0.8.0 Release\n\n- Introduced unit tests for the classes to improve the quality of the package. Currently, the tests are only for a few classes.\n- Fixed code to improve the robustness of the package.\n\n### 0.7.5 Release\n\n- Added 'progress' argument to the load() method. It's implemented as an optional argument for all the classes. Thanks to @cjoakim for the suggestion.\n\n### 0.7.4 Release\n\n- Changed the required module from psycopg / psycopg_pool to psycopg[binary,pool]\n\n### 0.7.3 Release\n\n- Added min_connections argument to the connect() method. Added the limitation of UNIX environment to import 'resource' module.\n\n### 0.7.2 Release\n\n- Added Mermaid diagram for the document.\n\n### 0.7.1 Release\n\n- Tuned MultiAzureStorageFreighter.\n\n### 0.7.0 Release\n\n- Added MultiAzureStorageFreighter.\n\n### 0.6.1 Release\n\n- Refactored the documents. Added sample data. Fixed some bugs.\n\n### 0.6.0 Release\n\n- Added edge properties support.\n  - 'edge_props' argument (list) is added to the 'load()' method.\n- 'drop_graph' argument is obsoleted. 'create_graph' argument is added.\n  - 'create_graph' is set to True by default. CAUTION: If the graph already exists, the graph is dropped before loading the data.\n  - If 'create_graph' is False, the data is loaded into the existing graph.\n\n### 0.5.3 Release -AzureStorageFreighter-\n\n- AzureStorageFreighter class is totally refactored for better performance and scalability.\n  - 0.5.2 didn't work well for large files.\n  - Now, it works well for large files.\n    Checked with a 5.4GB CSV file consisting of 10M of start vertices, 10K of end vertices, and 25M edges,\n    it took 512 seconds to load the data into the graph database with PostgreSQL Flex,\n    Standard_D32ds_v4 (32 vcpus, 128 GiB memory) and 512TB / 7500 iops of storage.\n  - Tested data was generated with tests/generate_dummy_data.py.\n  - UDF to load the data to graph is no longer used.\n- However, please note that it is still in the early stages of implementation, so there is room for optimization and potential issues due to insufficient testing.\n\n### 0.5.2 Release -AzureStorageFreighter-\n\n- AzureStorageFreighter class is used to load data from Azure Storage into the graph database. It's totally different from other classes. The class works as follows:\n  - If the argument, 'subscription_id' is not set, the class tries to find the Azure Subscription ID from your local environment using the 'az' command.\n  - Creates an Azure Storage account and a blob container under the resource group where the PostgreSQL server runs in.\n  - Enables the 'azure_storage' extension in the PostgreSQL server, if it's not enabled.\n  - Uploads the CSV file to the blob container.\n  - Creates a UDF (User Defined Function) named 'load_from_azure_storage' in the PostgreSQL server. The UDF loads data from the Azure Storage into the graph database.\n  - Executes the UDF.\n- The above process takes time to prepare for loading data, making it unsuitable for loading small files, but effective for loading large files. For instance, it takes under 3 seconds to load 'actorfilms.csv' after uploading.\n- However, please note that it is still in the early stages of implementation, so there is room for optimization and potential issues due to insufficient testing.\n\n### 0.5.0 Release\n\nRefactored the code to make it more readable and maintainable with the separated classes for factory model.\nPlease note how to use the new version of the package is tottally different from the previous versions.\n\n## For more information about [Apache AGE](https://age.apache.org/)\n\n- Apache AGE : https://age.apache.org/\n- GitHub : https://github.com/apache/age\n- Document : https://age.apache.org/age-manual/master/index.html\n\n## License\n\nMIT License\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024 Rio Fujita  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "AgeFreighter is a Python package that helps you to create a graph database using Azure Database for PostgreSQL.",
    "version": "0.8.1",
    "project_urls": {
        "Homepage": "https://github.com/rioriost/agefreighter",
        "Issues": "https://github.com/rioriost/agefreighter/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e44af1eeed3b9b5b71fff883bd9e9a3ba8e5c3bc2cd20e6321a5bd5af7a2d02e",
                "md5": "5556fd411aa81567fe9a41df80e26584",
                "sha256": "51354d411c539c4f7bea30a408cbcd497e304c8d86e887badb05b8c4bf0c8ece"
            },
            "downloads": -1,
            "filename": "agefreighter-0.8.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5556fd411aa81567fe9a41df80e26584",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9.21",
            "size": 54022,
            "upload_time": "2025-02-14T13:07:50",
            "upload_time_iso_8601": "2025-02-14T13:07:50.730083Z",
            "url": "https://files.pythonhosted.org/packages/e4/4a/f1eeed3b9b5b71fff883bd9e9a3ba8e5c3bc2cd20e6321a5bd5af7a2d02e/agefreighter-0.8.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a15ac661c231b8501ee70e53271d30eff1be7357d55997bae3c7ed8068e19f2c",
                "md5": "a0aab37fac0825dff81ef6fc5daf675a",
                "sha256": "7111387e2de92800a45c2513be4dbbb7f9516e4b5c97d1bde70b7368604b77d8"
            },
            "downloads": -1,
            "filename": "agefreighter-0.8.1.tar.gz",
            "has_sig": false,
            "md5_digest": "a0aab37fac0825dff81ef6fc5daf675a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9.21",
            "size": 8271205,
            "upload_time": "2025-02-14T13:07:54",
            "upload_time_iso_8601": "2025-02-14T13:07:54.778169Z",
            "url": "https://files.pythonhosted.org/packages/a1/5a/c661c231b8501ee70e53271d30eff1be7357d55997bae3c7ed8068e19f2c/agefreighter-0.8.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-14 13:07:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rioriost",
    "github_project": "agefreighter",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "asyncio",
            "specs": [
                [
                    ">=",
                    "3.4.3"
                ]
            ]
        },
        {
            "name": "azure-identity",
            "specs": [
                [
                    ">=",
                    "1.19.0"
                ]
            ]
        },
        {
            "name": "azure-mgmt-postgresqlflexibleservers",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "azure-mgmt-storage",
            "specs": [
                [
                    ">=",
                    "22.0.0"
                ]
            ]
        },
        {
            "name": "azure-storage-blob",
            "specs": [
                [
                    ">=",
                    "12.24.1"
                ]
            ]
        },
        {
            "name": "fastavro",
            "specs": [
                [
                    ">=",
                    "1.10.0"
                ]
            ]
        },
        {
            "name": "gremlinpython",
            "specs": [
                [
                    ">=",
                    "3.7.3"
                ]
            ]
        },
        {
            "name": "neo4j",
            "specs": [
                [
                    ">=",
                    "5.27.0"
                ]
            ]
        },
        {
            "name": "nest-asyncio",
            "specs": [
                [
                    ">=",
                    "1.6.0"
                ]
            ]
        },
        {
            "name": "networkx",
            "specs": [
                [
                    ">=",
                    "3.2.1"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "2.2.3"
                ]
            ]
        },
        {
            "name": "psycopg",
            "specs": [
                [
                    ">=",
                    "3.2.4"
                ]
            ]
        },
        {
            "name": "pyarrow",
            "specs": [
                [
                    ">=",
                    "19.0.0"
                ]
            ]
        }
    ],
    "lcname": "agefreighter"
}
        
Elapsed time: 0.38976s