arangoimport


Namearangoimport JSON
Version 0.1.11 PyPI version JSON
download
home_pageNone
SummaryA high-performance tool for importing Neo4j JSONL graph data exports into ArangoDB
upload_time2025-02-04 04:23:45
maintainerNone
docs_urlNone
authorTrent Leslie
requires_python>=3.11
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ArangoImport

A high-performance tool for importing Neo4j JSONL graph data exports into ArangoDB.

## Features

- Import Neo4j database exports into ArangoDB
- Efficient parallel processing of large JSONL files
- Support for both local and Docker ArangoDB instances
- Dynamic memory management and batch sizing
- Connection pooling for optimal performance
- Progress tracking and detailed logging
- Available as both CLI tool and Python package

## Installation

```bash
pip install arangoimport
```

## Quick Start

1. Export your Neo4j database to JSONL:
   ```cypher
   CALL apoc.export.json.all("path/to/export.jsonl", {useTypes: true})
   ```

2. Import into ArangoDB using either method:

   ### A. Command Line Interface (CLI)
   After installation, the `arangoimport` command is available in your terminal:
   ```bash
   # Show help and available options
   arangoimport --help
   
   # Import data with default settings (will prompt for password)
   arangoimport import-data /path/to/neo4j_export.jsonl
   
   # Import with custom settings
   arangoimport import-data /path/to/neo4j_export.jsonl \
       --db-name my_graph \
       --host arangodb.example.com \
       --port 8530 \
       --username graph_user
   ```

   ### B. Python API
   ```python
   from arangoimport.connection import ArangoConfig
   from arangoimport.importer import parallel_load_data
   
   # Configure database connection
   db_config = ArangoConfig(
       host="localhost",
       port=8529,
       username="root",
       password="your_password",  # Or use ARANGO_PASSWORD env var
       db_name="db_name"
   )
   
   # Import the data
   nodes, edges = parallel_load_data(
       "path/to/neo4j_export.jsonl",
       dict(db_config),
       num_processes=None  # None means use (CPU count - 1)
   )
   
   print(f"Successfully imported {nodes:,} nodes and {edges:,} edges!")
   ```

## Environment Variables

- `ARANGO_PASSWORD`: Database password (avoid hardcoding in scripts)
- `ARANGO_USER`: Username (default: root)

## CLI Options

### General Options
- `--file <string>`: The file to import ("-" for stdin)
- `--type <string>`: Input format (auto/csv/json/jsonl/tsv, default: auto)
- `--collection <string>`: Target collection name
- `--create-collection <boolean>`: Create collection if missing (default: false)
- `--create-collection-type <string>`: Collection type if created (document/edge, default: document)
- `--create-database <boolean>`: Create database if missing (default: false)
- `--threads <uint32>`: Number of parallel import threads (default: 32)
- `--batch-size <uint64>`: Data batch size in bytes (default: 8MB)
- `--progress <boolean>`: Show progress (default: true)

### Server Connection
- `--server.database <string>`: Target database (default: "_system")
- `--server.endpoint <string>`: Server endpoint (default: "http+tcp://127.0.0.1:8529")
- `--server.username <string>`: Username (default: "root")
- `--server.password <string>`: Password (prompted if not provided)
- `--server.authentication <boolean>`: Require authentication (default: true)

### Performance Options
- `--auto-rate-limit <boolean>`: Auto-adjust loading rate (default: false)
- `--compress-transfer <boolean>`: Compress data transfer (default: false)
- `--max-errors <uint64>`: Maximum errors before stopping (default: 20)
- `--skip-validation <boolean>`: Skip schema validation (default: false)

For a complete list of options, run:
```bash
arangoimport --help
```

## Docker Support

When using Docker, ensure your ArangoDB container is running:
```bash
docker run -p 8529:8529 -e ARANGO_ROOT_PASSWORD=yourpassword arangodb:latest
```

Then import using either the CLI or Python API, pointing to the exposed port.

## Performance Tuning

The importer automatically optimizes for:
- Available system memory
- CPU cores (uses CPU count - 1 by default)
- Network conditions

You can fine-tune performance with:
- `--threads`: Control parallel threads
- `--batch-size`: Adjust batch size
- `--auto-rate-limit`: Enable automatic rate limiting
- `--compress-transfer`: Enable data compression

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the LICENSE file for details.
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "arangoimport",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": null,
    "author": "Trent Leslie",
    "author_email": "trent.leslie@phenomehealth.com",
    "download_url": "https://files.pythonhosted.org/packages/64/ad/b9d4ec9b4717083b12bc3cdeb17255bbe7453237cadab8f3d51c075e48ed/arangoimport-0.1.11.tar.gz",
    "platform": null,
    "description": "# ArangoImport\n\nA high-performance tool for importing Neo4j JSONL graph data exports into ArangoDB.\n\n## Features\n\n- Import Neo4j database exports into ArangoDB\n- Efficient parallel processing of large JSONL files\n- Support for both local and Docker ArangoDB instances\n- Dynamic memory management and batch sizing\n- Connection pooling for optimal performance\n- Progress tracking and detailed logging\n- Available as both CLI tool and Python package\n\n## Installation\n\n```bash\npip install arangoimport\n```\n\n## Quick Start\n\n1. Export your Neo4j database to JSONL:\n   ```cypher\n   CALL apoc.export.json.all(\"path/to/export.jsonl\", {useTypes: true})\n   ```\n\n2. Import into ArangoDB using either method:\n\n   ### A. Command Line Interface (CLI)\n   After installation, the `arangoimport` command is available in your terminal:\n   ```bash\n   # Show help and available options\n   arangoimport --help\n   \n   # Import data with default settings (will prompt for password)\n   arangoimport import-data /path/to/neo4j_export.jsonl\n   \n   # Import with custom settings\n   arangoimport import-data /path/to/neo4j_export.jsonl \\\n       --db-name my_graph \\\n       --host arangodb.example.com \\\n       --port 8530 \\\n       --username graph_user\n   ```\n\n   ### B. Python API\n   ```python\n   from arangoimport.connection import ArangoConfig\n   from arangoimport.importer import parallel_load_data\n   \n   # Configure database connection\n   db_config = ArangoConfig(\n       host=\"localhost\",\n       port=8529,\n       username=\"root\",\n       password=\"your_password\",  # Or use ARANGO_PASSWORD env var\n       db_name=\"db_name\"\n   )\n   \n   # Import the data\n   nodes, edges = parallel_load_data(\n       \"path/to/neo4j_export.jsonl\",\n       dict(db_config),\n       num_processes=None  # None means use (CPU count - 1)\n   )\n   \n   print(f\"Successfully imported {nodes:,} nodes and {edges:,} edges!\")\n   ```\n\n## Environment Variables\n\n- `ARANGO_PASSWORD`: Database password (avoid hardcoding in scripts)\n- `ARANGO_USER`: Username (default: root)\n\n## CLI Options\n\n### General Options\n- `--file <string>`: The file to import (\"-\" for stdin)\n- `--type <string>`: Input format (auto/csv/json/jsonl/tsv, default: auto)\n- `--collection <string>`: Target collection name\n- `--create-collection <boolean>`: Create collection if missing (default: false)\n- `--create-collection-type <string>`: Collection type if created (document/edge, default: document)\n- `--create-database <boolean>`: Create database if missing (default: false)\n- `--threads <uint32>`: Number of parallel import threads (default: 32)\n- `--batch-size <uint64>`: Data batch size in bytes (default: 8MB)\n- `--progress <boolean>`: Show progress (default: true)\n\n### Server Connection\n- `--server.database <string>`: Target database (default: \"_system\")\n- `--server.endpoint <string>`: Server endpoint (default: \"http+tcp://127.0.0.1:8529\")\n- `--server.username <string>`: Username (default: \"root\")\n- `--server.password <string>`: Password (prompted if not provided)\n- `--server.authentication <boolean>`: Require authentication (default: true)\n\n### Performance Options\n- `--auto-rate-limit <boolean>`: Auto-adjust loading rate (default: false)\n- `--compress-transfer <boolean>`: Compress data transfer (default: false)\n- `--max-errors <uint64>`: Maximum errors before stopping (default: 20)\n- `--skip-validation <boolean>`: Skip schema validation (default: false)\n\nFor a complete list of options, run:\n```bash\narangoimport --help\n```\n\n## Docker Support\n\nWhen using Docker, ensure your ArangoDB container is running:\n```bash\ndocker run -p 8529:8529 -e ARANGO_ROOT_PASSWORD=yourpassword arangodb:latest\n```\n\nThen import using either the CLI or Python API, pointing to the exposed port.\n\n## Performance Tuning\n\nThe importer automatically optimizes for:\n- Available system memory\n- CPU cores (uses CPU count - 1 by default)\n- Network conditions\n\nYou can fine-tune performance with:\n- `--threads`: Control parallel threads\n- `--batch-size`: Adjust batch size\n- `--auto-rate-limit`: Enable automatic rate limiting\n- `--compress-transfer`: Enable data compression\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.",
    "bugtrack_url": null,
    "license": null,
    "summary": "A high-performance tool for importing Neo4j JSONL graph data exports into ArangoDB",
    "version": "0.1.11",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8a1502cb3b20c042d5ec2e78486683480704ca3e5e4c6b95d8341c67d1bfae54",
                "md5": "4c8514b71b74a79e2f191f96b817f05a",
                "sha256": "162a3ca56dbc9318259ac5cd5673da526ea47d77ea439de35d51fb5643c4853d"
            },
            "downloads": -1,
            "filename": "arangoimport-0.1.11-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4c8514b71b74a79e2f191f96b817f05a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 18197,
            "upload_time": "2025-02-04T04:23:42",
            "upload_time_iso_8601": "2025-02-04T04:23:42.902406Z",
            "url": "https://files.pythonhosted.org/packages/8a/15/02cb3b20c042d5ec2e78486683480704ca3e5e4c6b95d8341c67d1bfae54/arangoimport-0.1.11-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "64adb9d4ec9b4717083b12bc3cdeb17255bbe7453237cadab8f3d51c075e48ed",
                "md5": "4809f36b8f91b33935e82310301dceca",
                "sha256": "0c94addde473b27b6423cabb41b1239fd4981dc799d3eeb2c324e1c10b1a5394"
            },
            "downloads": -1,
            "filename": "arangoimport-0.1.11.tar.gz",
            "has_sig": false,
            "md5_digest": "4809f36b8f91b33935e82310301dceca",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 17483,
            "upload_time": "2025-02-04T04:23:45",
            "upload_time_iso_8601": "2025-02-04T04:23:45.233004Z",
            "url": "https://files.pythonhosted.org/packages/64/ad/b9d4ec9b4717083b12bc3cdeb17255bbe7453237cadab8f3d51c075e48ed/arangoimport-0.1.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-04 04:23:45",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "arangoimport"
}
        
Elapsed time: 0.45991s