Name | cpg2py JSON |
Version |
1.0.5
JSON |
| download |
home_page | https://github.com/YichaoXu/cpg2py |
Summary | A graph-based data structure designed for querying CSV files in Joern format in Python |
upload_time | 2025-02-21 03:08:20 |
maintainer | None |
docs_url | None |
author | Yichao Xu |
requires_python | >=3.6 |
license | MIT License
Copyright (c) 2025 Yichao Xu
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
|
keywords |
joern
cpg
graph
csv
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# **cpg2py: Graph-Based Query Engine for Joern CSV Files**
`cpg2py` is a Python library that provides a lightweight **graph-based query engine** for analyzing **Code Property Graphs (CPG)** extracted from Joern CSV files. The library offers an **abstract base class (ABC) architecture**, allowing users to extend and implement their own custom graph queries.
---
## **🚀 Features**
- **MultiDiGraph Representation**: A directed multi-graph with support for multiple edges between nodes.
- **CSV-Based Graph Construction**: Reads `nodes.csv` and `rels.csv` to construct a graph structure.
- **Extensible Abstract Base Classes (ABC)**:
- `AbcGraphQuerier` for implementing **custom graph queries**.
- `AbcNodeQuerier` for interacting with **nodes**.
- `AbcEdgeQuerier` for interacting with **edges**.
- **Built-in Query Mechanisms**:
- **Retrieve all edges**.
- **Get incoming (**``**) and outgoing (**``**) edges of a node**.
- **Find successors (**``**) and predecessors (**``**)**.
- **Traverse AST, Control Flow, and Data Flow Graphs**.
---
## **📚 Installation**
To install the package, use:
```bash
pip install git+https://github.com/YichaoXu/cpg2py.git
```
Or clone the pip repository:
```bash
pip install cpg2py
```
---
## **📂 File Structure**
- **`nodes.csv`** (Example):
```csv
id:int labels:label type flags:string_array lineno:int code childnum:int funcid:int classname namespace endlineno:int name doccomment
0 Filesystem Directory "input"
1 Filesystem File "example.php"
2 AST AST_TOPLEVEL TOPLEVEL_FILE 1 "" 25 "/input/example.php"
````
- **`rels.csv`** (Example):
```csv
start end type
2 3 ENTRY
2 4 EXIT
6 7 ENTRY
6 9 PARENT_OF
````
---
## **📚 Usage**
### **1️⃣ Load Graph from Joern CSVs**
```python
from cpg2py import cpg_graph
# Load graph from CSV files
graph = cpg_graph("nodes.csv", "rels.csv")
```
---
### **2️⃣ Query Nodes & Edges**
```python
# Get a specific node
node = graph.node("2")
print(node.name, node.type) # Example output: "/tmp/example.php" AST_TOPLEVEL
# Get a specific edge
edge = graph.edge("2", "3", "ENTRY")
print(edge.type) # Output: ENTRY
```
---
### **3️⃣ Get Node Connections**
```python
# Get all outgoing edges from a node
outgoing_edges = graph.succ(node)
for out_node in outgoing_edges:
print(out_node.id, out_node.name)
# Get all incoming edges to a node
incoming_edges = graph.prev(node)
for in_node in incoming_edges:
print(in_node.id, in_node.name)
```
---
### **4️⃣ AST and Flow Queries**
```python
# Get top-level file node for a given node
top_file = graph.topfile_node("5")
print(top_file.name) # Output: "example.php"
# Get child nodes in the AST hierarchy
children = graph.children(node)
print([child.id for child in children])
# Get data flow successors
flow_successors = graph.flow_to(node)
print([succ.id for succ in flow_successors])
```
---
## **🛠 Abstract Base Classes (ABC)**
The following abstract base classes (`ABC`) provide interfaces for extending **node**, **edge**, and **graph** querying behavior.
---
### **🔹 AbcNodeQuerier (Abstract Node Interface)**
This class defines how nodes interact with the graph storage.
```python
from cpg2py.abc import AbcNodeQuerier
class MyNodeQuerier(AbcNodeQuerier):
def __init__(self, graph, nid):
super().__init__(graph, nid)
@property
def name(self):
return self.get_property("name")
```
---
### **🔹 AbcEdgeQuerier (Abstract Edge Interface)**
Defines the querying mechanisms for edges in the graph.
```python
from cpg2py.abc import AbcEdgeQuerier
class MyEdgeQuerier(AbcEdgeQuerier):
def __init__(self, graph, f_nid, t_nid, e_type):
super().__init__(graph, f_nid, t_nid, e_type)
@property
def type(self):
return self.get_property("type")
```
---
### **🔹 AbcGraphQuerier (Abstract Graph Interface)**
This class provides an interface for implementing custom graph query mechanisms.
```python
from cpg2py.abc import AbcGraphQuerier
class MyGraphQuerier(AbcGraphQuerier):
def node(self, nid: str):
return MyNodeQuerier(self.storage, nid)
def edge(self, fid, tid, eid):
return MyEdgeQuerier(self.storage, fid, tid, eid)
```
---
## **🔍 Querying The Graph**
After implementing the abstract classes, you can perform advanced queries:
```python
graph = MyGraphQuerier(storage)
# Query node properties
node = graph.node("5")
print(node.name) # Example Output: "main"
# Query edge properties
edge = graph.edge("5", "6", "FLOWS_TO")
print(edge.type) # Output: "FLOWS_TO"
```
---
## **🐝 API Reference**
For a more detail APIs document please see our [APIs doc](docs/APIs.md)
- **Graph Functions**:
- `cpg_graph(node_csv, edge_csv)`: Loads graph from CSV files.
- `graph.node(nid)`: Retrieves a node by ID.
- `graph.edge(fid, tid, eid)`: Retrieves an edge.
- `graph.succ(node)`: Gets successor nodes.
- `graph.prev(node)`: Gets predecessor nodes.
- **Node Properties**:
- `.name`: Node name.
- `.type`: Node type.
- `.line_num`: Source code line number.
- **Edge Properties**:
- `.start`: Edge start node.
- `.end`: Edge end node.
- `.type`: Edge type.
---
## **🌟 License**
This project is licensed under the **MIT License**.
Raw data
{
"_id": null,
"home_page": "https://github.com/YichaoXu/cpg2py",
"name": "cpg2py",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "Joern, CPG, Graph, CSV",
"author": "Yichao Xu",
"author_email": "Yichao Xu <yxu166@jhu.edu>",
"download_url": "https://files.pythonhosted.org/packages/37/ad/cf1098c7c4d4d06365d8390b632753404fe6f26baf20fb6b6916ad266afa/cpg2py-1.0.5.tar.gz",
"platform": null,
"description": "# **cpg2py: Graph-Based Query Engine for Joern CSV Files**\n\n`cpg2py` is a Python library that provides a lightweight **graph-based query engine** for analyzing **Code Property Graphs (CPG)** extracted from Joern CSV files. The library offers an **abstract base class (ABC) architecture**, allowing users to extend and implement their own custom graph queries.\n\n---\n\n## **\ud83d\ude80 Features**\n\n- **MultiDiGraph Representation**: A directed multi-graph with support for multiple edges between nodes.\n- **CSV-Based Graph Construction**: Reads `nodes.csv` and `rels.csv` to construct a graph structure.\n- **Extensible Abstract Base Classes (ABC)**:\n - `AbcGraphQuerier` for implementing **custom graph queries**.\n - `AbcNodeQuerier` for interacting with **nodes**.\n - `AbcEdgeQuerier` for interacting with **edges**.\n- **Built-in Query Mechanisms**:\n - **Retrieve all edges**.\n - **Get incoming (**``**) and outgoing (**``**) edges of a node**.\n - **Find successors (**``**) and predecessors (**``**)**.\n - **Traverse AST, Control Flow, and Data Flow Graphs**.\n\n---\n\n## **\ud83d\udcda Installation**\n\nTo install the package, use:\n\n```bash\npip install git+https://github.com/YichaoXu/cpg2py.git\n```\n\nOr clone the pip repository:\n\n```bash\npip install cpg2py\n```\n\n---\n\n## **\ud83d\udcc2 File Structure**\n\n- **`nodes.csv`** (Example):\n```csv\nid:int\tlabels:label\ttype\tflags:string_array\tlineno:int\tcode\tchildnum:int\tfuncid:int\tclassname\tnamespace\tendlineno:int\tname\tdoccomment\n0\tFilesystem\tDirectory\t\t\t\t\t\t\t\t\t\"input\"\t\n1\tFilesystem\tFile\t\t\t\t\t\t\t\t\t\"example.php\"\t\n2\tAST\tAST_TOPLEVEL\tTOPLEVEL_FILE\t1\t\t\t\t\t\"\"\t25\t\"/input/example.php\"\t\n\n````\n- **`rels.csv`** (Example):\n```csv\nstart\tend\ttype\n2\t3\tENTRY\n2\t4\tEXIT\n6\t7\tENTRY\n6\t9\tPARENT_OF\n````\n\n---\n\n## **\ud83d\udcda Usage**\n\n### **1\ufe0f\u20e3 Load Graph from Joern CSVs**\n\n```python\nfrom cpg2py import cpg_graph\n\n# Load graph from CSV files\ngraph = cpg_graph(\"nodes.csv\", \"rels.csv\")\n```\n\n---\n\n### **2\ufe0f\u20e3 Query Nodes & Edges**\n\n```python\n# Get a specific node\nnode = graph.node(\"2\")\nprint(node.name, node.type) # Example output: \"/tmp/example.php\" AST_TOPLEVEL\n\n# Get a specific edge\nedge = graph.edge(\"2\", \"3\", \"ENTRY\")\nprint(edge.type) # Output: ENTRY\n```\n\n---\n\n### **3\ufe0f\u20e3 Get Node Connections**\n\n```python\n# Get all outgoing edges from a node\noutgoing_edges = graph.succ(node)\nfor out_node in outgoing_edges:\n print(out_node.id, out_node.name)\n\n# Get all incoming edges to a node\nincoming_edges = graph.prev(node)\nfor in_node in incoming_edges:\n print(in_node.id, in_node.name)\n```\n\n---\n\n### **4\ufe0f\u20e3 AST and Flow Queries**\n\n```python\n# Get top-level file node for a given node\ntop_file = graph.topfile_node(\"5\")\nprint(top_file.name) # Output: \"example.php\"\n\n# Get child nodes in the AST hierarchy\nchildren = graph.children(node)\nprint([child.id for child in children])\n\n# Get data flow successors\nflow_successors = graph.flow_to(node)\nprint([succ.id for succ in flow_successors])\n```\n\n---\n\n## **\ud83d\udee0 Abstract Base Classes (ABC)**\n\nThe following abstract base classes (`ABC`) provide interfaces for extending **node**, **edge**, and **graph** querying behavior.\n\n---\n\n### **\ud83d\udd39 AbcNodeQuerier (Abstract Node Interface)**\n\nThis class defines how nodes interact with the graph storage.\n\n```python\nfrom cpg2py.abc import AbcNodeQuerier\n\nclass MyNodeQuerier(AbcNodeQuerier):\n def __init__(self, graph, nid):\n super().__init__(graph, nid)\n\n @property\n def name(self):\n return self.get_property(\"name\")\n```\n\n---\n\n### **\ud83d\udd39 AbcEdgeQuerier (Abstract Edge Interface)**\n\nDefines the querying mechanisms for edges in the graph.\n\n```python\nfrom cpg2py.abc import AbcEdgeQuerier\n\nclass MyEdgeQuerier(AbcEdgeQuerier):\n def __init__(self, graph, f_nid, t_nid, e_type):\n super().__init__(graph, f_nid, t_nid, e_type)\n\n @property\n def type(self):\n return self.get_property(\"type\")\n```\n\n---\n\n### **\ud83d\udd39 AbcGraphQuerier (Abstract Graph Interface)**\n\nThis class provides an interface for implementing custom graph query mechanisms.\n\n```python\nfrom cpg2py.abc import AbcGraphQuerier\n\nclass MyGraphQuerier(AbcGraphQuerier):\n def node(self, nid: str):\n return MyNodeQuerier(self.storage, nid)\n\n def edge(self, fid, tid, eid):\n return MyEdgeQuerier(self.storage, fid, tid, eid)\n```\n\n---\n\n## **\ud83d\udd0d Querying The Graph**\n\nAfter implementing the abstract classes, you can perform advanced queries:\n\n```python\ngraph = MyGraphQuerier(storage)\n\n# Query node properties\nnode = graph.node(\"5\")\nprint(node.name) # Example Output: \"main\"\n\n# Query edge properties\nedge = graph.edge(\"5\", \"6\", \"FLOWS_TO\")\nprint(edge.type) # Output: \"FLOWS_TO\"\n```\n\n---\n\n## **\ud83d\udc1d API Reference**\n\nFor a more detail APIs document please see our [APIs doc](docs/APIs.md) \n\n- **Graph Functions**:\n - `cpg_graph(node_csv, edge_csv)`: Loads graph from CSV files.\n - `graph.node(nid)`: Retrieves a node by ID.\n - `graph.edge(fid, tid, eid)`: Retrieves an edge.\n - `graph.succ(node)`: Gets successor nodes.\n - `graph.prev(node)`: Gets predecessor nodes.\n- **Node Properties**:\n - `.name`: Node name.\n - `.type`: Node type.\n - `.line_num`: Source code line number.\n- **Edge Properties**:\n - `.start`: Edge start node.\n - `.end`: Edge end node.\n - `.type`: Edge type.\n\n---\n\n## **\ud83c\udf1f License**\n\nThis project is licensed under the **MIT License**.\n\n",
"bugtrack_url": null,
"license": "MIT License\n \n Copyright (c) 2025 Yichao Xu\n \n Permission is hereby granted, free of charge, to any person obtaining a copy\n of this software and associated documentation files (the \"Software\"), to deal\n in the Software without restriction, including without limitation the rights\n to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n copies of the Software, and to permit persons to whom the Software is\n furnished to do so, subject to the following conditions:\n \n The above copyright notice and this permission notice shall be included in all\n copies or substantial portions of the Software.\n \n THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n SOFTWARE.\n ",
"summary": "A graph-based data structure designed for querying CSV files in Joern format in Python",
"version": "1.0.5",
"project_urls": {
"Homepage": "https://github.com/YichaoXu/cpg2py",
"Repository": "https://github.com/YichaoXu/cpg2py"
},
"split_keywords": [
"joern",
" cpg",
" graph",
" csv"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "eafee65b8526bd3795ac4198b7494c6434c52cf6a6e3a5050240e065ee804369",
"md5": "0f7bf8556d512515632756c4eaa966b6",
"sha256": "2a0e5e3928610f1812f47de193cc8dfc4947ff8fd9f19a4d16b0d648f2cb5758"
},
"downloads": -1,
"filename": "cpg2py-1.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0f7bf8556d512515632756c4eaa966b6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 11840,
"upload_time": "2025-02-21T03:08:18",
"upload_time_iso_8601": "2025-02-21T03:08:18.836429Z",
"url": "https://files.pythonhosted.org/packages/ea/fe/e65b8526bd3795ac4198b7494c6434c52cf6a6e3a5050240e065ee804369/cpg2py-1.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "37adcf1098c7c4d4d06365d8390b632753404fe6f26baf20fb6b6916ad266afa",
"md5": "fd3330be4ce4a42a6e9754b162e6a6cf",
"sha256": "8561b79730f58aab5f5ac47710426569b32f234c5bba6df1fbc3d8f772ddbe2a"
},
"downloads": -1,
"filename": "cpg2py-1.0.5.tar.gz",
"has_sig": false,
"md5_digest": "fd3330be4ce4a42a6e9754b162e6a6cf",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 11864,
"upload_time": "2025-02-21T03:08:20",
"upload_time_iso_8601": "2025-02-21T03:08:20.587190Z",
"url": "https://files.pythonhosted.org/packages/37/ad/cf1098c7c4d4d06365d8390b632753404fe6f26baf20fb6b6916ad266afa/cpg2py-1.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-21 03:08:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "YichaoXu",
"github_project": "cpg2py",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "cpg2py"
}