graphfaker


Namegraphfaker JSON
Version 0.3.0 PyPI version JSON
download
home_pageNone
Summaryan open-source python library for generating, and loading both synthetic and real-world graph datasets
upload_time2025-07-26 07:32:43
maintainerNone
docs_urlNone
authorNone
requires_pythonNone
licenseNone
keywords faker graph-data flights osmnx graphs graphfaker
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # graphfaker

graphfaker is a Python library for generating and loading synthetic and real-world datasets tailored for graph-based applications. It supports `faker`  as social graph, OpenStreetMap (OSM) road networks, and real airline flight networks. Use it for data science, research, teaching, rapid prototyping, and more!

*Note: The authors and graphgeeks labs do not hold any responsibility for the correctness of this generator.*

[![PyPI version](https://img.shields.io/pypi/v/graphfaker.svg)](https://pypi.python.org/pypi/graphfaker)
[![Docs Status](https://readthedocs.org/projects/graphfaker/badge/?version=latest)](https://graphfaker.readthedocs.io/en/latest/?version=latest)
[![Dependency Status](https://pyup.io/repos/github/denironyx/graphfaker/shield.svg)](https://pyup.io/repos/github/denironyx/graphfaker/)
[![image](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

---

Join our Discord server 👇

[![](https://dcbadge.limes.pink/api/server/https://discord.gg/mQQz9bRRpH)](https://discord.gg/mQQz9bRRpH)


### Problem Statement
Graph data is essential for solving complex problems in various fields, including social network analysis, transportation modeling, recommendation systems, and fraud detection. However, many professionals, researchers, and students face a common challenge: a lack of easily accessible, realistic graph datasets for testing, learning, and benchmarking. Real-world graph data is often restricted due to privacy concerns, complexity, or large size, making experimentation difficult.

### Solution: graphfaker
GraphFaker is an open-source Python library designed to generate, load, and export synthetic graph datasets in a user-friendly and configurable way. It enables users to generate graph tailored to their specific needs, allowing for better experimentation and learning without needing to think about where the data is coming from or how to fetch the data.

## Features
- **Multiple Graph Sources:**
  - `faker`: Synthetic “social-knowledge” graphs powered by Faker (people, places, organizations, events, products with rich attributes and relationships)
  - `osm`: Real-world street networks directly from OpenStreetMap (by place name, address, or bounding box)
  - `flights`: Flight/airline networks from Bureau of Transportation Statistics (airlines ↔ airports ↔ flight legs, complete with cancellation and delay flags)
- **Unstructured Data Source:**
  - `WikiFetcher`: Raw Wikipedia page data (title, summary, content, sections, links, references) ready for custom graph or RAG pipelines
- **Easy CLI & Python Library**

This removes friction around data acquisition, letting you focus on algorithms, teaching or rapid prototyping.

## ✨ Key Features

| Source        | What It Gives You                                                                                                                                                                                     |
| ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Faker**     | Synthetic social-knowledge graphs with configurable sizes, weighted and directional relationships.                                                      |
| **OSM**       | Real road or walking networks via OSMnx under the hood—fetch by place, address, or bounding box; simplify topology; project to UTM.                                                |
| **Flights**   | Airline/airport graph from BTS on-time performance data: nodes for carriers, airports, flights; edges for OPERATED\_BY, DEPARTS\_FROM, ARRIVES\_AT; batch or date-range support; subgraph sampling.   |
| **WikiFetcher** | Raw page dumps (title, summary, content, sections, links, references) as JSON |


---

*Disclaimer: This is still a work in progress (WIP). With logging and debugging print statement. Our goal for releasing early is to get feedback and reiterate.*

## Installation

Install from PyPI:
```sh
uv pip install graphfaker
```

For development:
```sh
git clone https://github.com/graphgeeks-lab/graphfaker.git
cd graphfaker
uv pip install -e .
```

---

## Quick Start

---

### Python Library Usage

```python
from graphfaker import GraphFaker

gf = GraphFaker()
# Synthetic social/knowledge graph
g1 = gf.generate_graph(source="faker", total_nodes=200, total_edges=800)
# OSM road network
g2 = gf.generate_graph(source="osm", place="Chinatown, San Francisco, California", network_type="drive")
# Flight network
g3 = gf.generate_graph(source="flights", year=2024, month=1)

# Fetch Wikipedia page data
from graphfaker import WikiFetcher
page = WikiFetcher.fetch_page("Graph theory")
print(page['summary'])
print(page['content'])
WikiFetcher.export_page_json(page, "graph_theory.json")

```

#### Advanced: Date Range for Flights

Note this isn't recommended and it's still being tested. We are working on ways to make this faster.

```python
g = gf.generate_graph(source="flights", date_range=("2024-01-01", "2024-01-15"))
```


### CLI Usage (WIP)

Show help:
```sh
python -m graphfaker.cli --help
```

#### Generate a Synthetic Social Graph
```sh
python -m graphfaker.cli  \
    --fetcher faker \
    --total-nodes 100 \
    --total-edges 500
```

#### Generate a Real-World Road Network (OSM)
```sh
python -m graphfaker.cli  \
    --fetcher osm \
    --place "Berlin, Germany" \
    --network-type drive
```

#### Generate a Flight Network (Airlines/Airports/Flights)
```sh
python -m graphfaker.cli \
    --fetcher flights \
    --country "United States" \
    --year 2024 \
    --month 1
```

You can also use `--date-range` for custom time spans (e.g., `--date-range "2024-01-01,2024-01-15"`).

---

## Future Plans: Graph Export Formats

- **GraphML**: General graph analysis/visualization (`--export graph.graphml`)
- **JSON/JSON-LD**: Knowledge graphs/web apps (`--export data.json`)
- **CSV**: Tabular analysis/database imports (`--export edges.csv`)
- **RDF**: Semantic web/linked data (`--export graph.ttl`)

---

## Future Plans: Integration with Graph Tools

GraphFaker generates NetworkX graph objects that can be easily integrated with:
- **Graph databases**: Neo4j, Kuzu, TigerGraph
- **Analysis tools**: NetworkX, SNAP, graph-tool
- **ML frameworks**: PyTorch Geometric, DGL, TensorFlow GNN
- **Visualization**: G.V, Gephi, Cytoscape, D3.js

---

## On the Horizon:

- Handling large graph -> millions of nodes
- Using NLP/LLM to fetch graph data -> "Fetch flight data for Jan 2024"
- Connects to any graph database/engine of choice -> "Establish connections to graph database/engine of choice"


---

## Documentation

Full documentation: https://graphfaker.readthedocs.io

---
⭐ Star the Repo

If you find this project valuable, star ⭐ this repository to support the work and help others discover it!

---

## License
MIT License

## Credits
Created with Cookiecutter and the `audreyr/cookiecutter-pypackage` project template.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "graphfaker",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": "Dennis Irorere <denironyx@gmail.com>",
    "keywords": "faker, graph-data, flights, osmnx, graphs, graphfaker",
    "author": null,
    "author_email": "Dennis Irorere <denironyx@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/69/ba/f293ba325a323ee99092615b4a4202636f17c8f9fb419f13ebbcecfbc7ec/graphfaker-0.3.0.tar.gz",
    "platform": null,
    "description": "# graphfaker\r\n\r\ngraphfaker is a Python library for generating and loading synthetic and real-world datasets tailored for graph-based applications. It supports `faker`  as social graph, OpenStreetMap (OSM) road networks, and real airline flight networks. Use it for data science, research, teaching, rapid prototyping, and more!\r\n\r\n*Note: The authors and graphgeeks labs do not hold any responsibility for the correctness of this generator.*\r\n\r\n[![PyPI version](https://img.shields.io/pypi/v/graphfaker.svg)](https://pypi.python.org/pypi/graphfaker)\r\n[![Docs Status](https://readthedocs.org/projects/graphfaker/badge/?version=latest)](https://graphfaker.readthedocs.io/en/latest/?version=latest)\r\n[![Dependency Status](https://pyup.io/repos/github/denironyx/graphfaker/shield.svg)](https://pyup.io/repos/github/denironyx/graphfaker/)\r\n[![image](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\r\n\r\n---\r\n\r\nJoin our Discord server \ud83d\udc47\r\n\r\n[![](https://dcbadge.limes.pink/api/server/https://discord.gg/mQQz9bRRpH)](https://discord.gg/mQQz9bRRpH)\r\n\r\n\r\n### Problem Statement\r\nGraph data is essential for solving complex problems in various fields, including social network analysis, transportation modeling, recommendation systems, and fraud detection. However, many professionals, researchers, and students face a common challenge: a lack of easily accessible, realistic graph datasets for testing, learning, and benchmarking. Real-world graph data is often restricted due to privacy concerns, complexity, or large size, making experimentation difficult.\r\n\r\n### Solution: graphfaker\r\nGraphFaker is an open-source Python library designed to generate, load, and export synthetic graph datasets in a user-friendly and configurable way. It enables users to generate graph tailored to their specific needs, allowing for better experimentation and learning without needing to think about where the data is coming from or how to fetch the data.\r\n\r\n## Features\r\n- **Multiple Graph Sources:**\r\n  - `faker`: Synthetic \u201csocial-knowledge\u201d graphs powered by Faker (people, places, organizations, events, products with rich attributes and relationships)\r\n  - `osm`: Real-world street networks directly from OpenStreetMap (by place name, address, or bounding box)\r\n  - `flights`: Flight/airline networks from Bureau of Transportation Statistics (airlines \u2194 airports \u2194 flight legs, complete with cancellation and delay flags)\r\n- **Unstructured Data Source:**\r\n  - `WikiFetcher`: Raw Wikipedia page data (title, summary, content, sections, links, references) ready for custom graph or RAG pipelines\r\n- **Easy CLI & Python Library**\r\n\r\nThis removes friction around data acquisition, letting you focus on algorithms, teaching or rapid prototyping.\r\n\r\n## \u2728 Key Features\r\n\r\n| Source        | What It Gives You                                                                                                                                                                                     |\r\n| ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\r\n| **Faker**     | Synthetic social-knowledge graphs with configurable sizes, weighted and directional relationships.                                                      |\r\n| **OSM**       | Real road or walking networks via OSMnx under the hood\u2014fetch by place, address, or bounding box; simplify topology; project to UTM.                                                |\r\n| **Flights**   | Airline/airport graph from BTS on-time performance data: nodes for carriers, airports, flights; edges for OPERATED\\_BY, DEPARTS\\_FROM, ARRIVES\\_AT; batch or date-range support; subgraph sampling.   |\r\n| **WikiFetcher** | Raw page dumps (title, summary, content, sections, links, references) as JSON |\r\n\r\n\r\n---\r\n\r\n*Disclaimer: This is still a work in progress (WIP). With logging and debugging print statement. Our goal for releasing early is to get feedback and reiterate.*\r\n\r\n## Installation\r\n\r\nInstall from PyPI:\r\n```sh\r\nuv pip install graphfaker\r\n```\r\n\r\nFor development:\r\n```sh\r\ngit clone https://github.com/graphgeeks-lab/graphfaker.git\r\ncd graphfaker\r\nuv pip install -e .\r\n```\r\n\r\n---\r\n\r\n## Quick Start\r\n\r\n---\r\n\r\n### Python Library Usage\r\n\r\n```python\r\nfrom graphfaker import GraphFaker\r\n\r\ngf = GraphFaker()\r\n# Synthetic social/knowledge graph\r\ng1 = gf.generate_graph(source=\"faker\", total_nodes=200, total_edges=800)\r\n# OSM road network\r\ng2 = gf.generate_graph(source=\"osm\", place=\"Chinatown, San Francisco, California\", network_type=\"drive\")\r\n# Flight network\r\ng3 = gf.generate_graph(source=\"flights\", year=2024, month=1)\r\n\r\n# Fetch Wikipedia page data\r\nfrom graphfaker import WikiFetcher\r\npage = WikiFetcher.fetch_page(\"Graph theory\")\r\nprint(page['summary'])\r\nprint(page['content'])\r\nWikiFetcher.export_page_json(page, \"graph_theory.json\")\r\n\r\n```\r\n\r\n#### Advanced: Date Range for Flights\r\n\r\nNote this isn't recommended and it's still being tested. We are working on ways to make this faster.\r\n\r\n```python\r\ng = gf.generate_graph(source=\"flights\", date_range=(\"2024-01-01\", \"2024-01-15\"))\r\n```\r\n\r\n\r\n### CLI Usage (WIP)\r\n\r\nShow help:\r\n```sh\r\npython -m graphfaker.cli --help\r\n```\r\n\r\n#### Generate a Synthetic Social Graph\r\n```sh\r\npython -m graphfaker.cli  \\\r\n    --fetcher faker \\\r\n    --total-nodes 100 \\\r\n    --total-edges 500\r\n```\r\n\r\n#### Generate a Real-World Road Network (OSM)\r\n```sh\r\npython -m graphfaker.cli  \\\r\n    --fetcher osm \\\r\n    --place \"Berlin, Germany\" \\\r\n    --network-type drive\r\n```\r\n\r\n#### Generate a Flight Network (Airlines/Airports/Flights)\r\n```sh\r\npython -m graphfaker.cli \\\r\n    --fetcher flights \\\r\n    --country \"United States\" \\\r\n    --year 2024 \\\r\n    --month 1\r\n```\r\n\r\nYou can also use `--date-range` for custom time spans (e.g., `--date-range \"2024-01-01,2024-01-15\"`).\r\n\r\n---\r\n\r\n## Future Plans: Graph Export Formats\r\n\r\n- **GraphML**: General graph analysis/visualization (`--export graph.graphml`)\r\n- **JSON/JSON-LD**: Knowledge graphs/web apps (`--export data.json`)\r\n- **CSV**: Tabular analysis/database imports (`--export edges.csv`)\r\n- **RDF**: Semantic web/linked data (`--export graph.ttl`)\r\n\r\n---\r\n\r\n## Future Plans: Integration with Graph Tools\r\n\r\nGraphFaker generates NetworkX graph objects that can be easily integrated with:\r\n- **Graph databases**: Neo4j, Kuzu, TigerGraph\r\n- **Analysis tools**: NetworkX, SNAP, graph-tool\r\n- **ML frameworks**: PyTorch Geometric, DGL, TensorFlow GNN\r\n- **Visualization**: G.V, Gephi, Cytoscape, D3.js\r\n\r\n---\r\n\r\n## On the Horizon:\r\n\r\n- Handling large graph -> millions of nodes\r\n- Using NLP/LLM to fetch graph data -> \"Fetch flight data for Jan 2024\"\r\n- Connects to any graph database/engine of choice -> \"Establish connections to graph database/engine of choice\"\r\n\r\n\r\n---\r\n\r\n## Documentation\r\n\r\nFull documentation: https://graphfaker.readthedocs.io\r\n\r\n---\r\n\u2b50 Star the Repo\r\n\r\nIf you find this project valuable, star \u2b50 this repository to support the work and help others discover it!\r\n\r\n---\r\n\r\n## License\r\nMIT License\r\n\r\n## Credits\r\nCreated with Cookiecutter and the `audreyr/cookiecutter-pypackage` project template.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "an open-source python library for generating, and loading both synthetic and real-world graph datasets",
    "version": "0.3.0",
    "project_urls": {
        "bugs": "https://github.com/graphgeeks-lab/graphfaker/issues",
        "changelog": "https://github.com/graphgeeks-lab/graphfaker/blob/master/changelog.md",
        "homepage": "https://github.com/graphgeeks-lab/graphfaker/"
    },
    "split_keywords": [
        "faker",
        " graph-data",
        " flights",
        " osmnx",
        " graphs",
        " graphfaker"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a1518825218b50faa2d98edcd5244e158d5db7098703bf804e4b9383b6f74dc0",
                "md5": "7ebc410204221bd2b24f6a9d6634c3b4",
                "sha256": "def019371d0201a0ce699206cee37a37bd77532cc3077345f77fd8d3a4fd37f2"
            },
            "downloads": -1,
            "filename": "graphfaker-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7ebc410204221bd2b24f6a9d6634c3b4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 19852,
            "upload_time": "2025-07-26T07:32:42",
            "upload_time_iso_8601": "2025-07-26T07:32:42.071529Z",
            "url": "https://files.pythonhosted.org/packages/a1/51/8825218b50faa2d98edcd5244e158d5db7098703bf804e4b9383b6f74dc0/graphfaker-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "69baf293ba325a323ee99092615b4a4202636f17c8f9fb419f13ebbcecfbc7ec",
                "md5": "058a2069fb757de7a16a320a85ec5da6",
                "sha256": "e9eef05c84a428e4b63c32c0620fa4c4a5a809bdafa3fe674ee12c4cfff0a676"
            },
            "downloads": -1,
            "filename": "graphfaker-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "058a2069fb757de7a16a320a85ec5da6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 170117,
            "upload_time": "2025-07-26T07:32:43",
            "upload_time_iso_8601": "2025-07-26T07:32:43.432686Z",
            "url": "https://files.pythonhosted.org/packages/69/ba/f293ba325a323ee99092615b4a4202636f17c8f9fb419f13ebbcecfbc7ec/graphfaker-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-26 07:32:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "graphgeeks-lab",
    "github_project": "graphfaker",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "graphfaker"
}
        
Elapsed time: 1.05911s