# Graphed Excel
[![current release version](https://img.shields.io/github/release/dalager/graphedexcel.svg?style=flat-square)](https://github.com/dalager/graphedexcel/releases)
[![pypi version](https://img.shields.io/pypi/v/graphedexcel.svg?style=flat-square)](https://pypi.python.org/pypi/graphedexcel)
![Python Version](https://img.shields.io/badge/python-3.10%3A3.12-blue?style=flat-square)
[![codecov](https://codecov.io/github/dalager/graphedexcel/branch/main/graph/badge.svg?token=CJM0EAUF9M)](https://codecov.io/github/dalager/graphedexcel)
[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/9620/badge)](https://www.bestpractices.dev/projects/9620)
<img src="docs/images/Book1.xlsx.png" align="right" width="300" alt="Plot from Example Book1.xlsx file">
Tool to analyze and visualize dependencies between cells in Excel spreadsheets in order to get an understanding of the complexity.
Will generate a graph of the dependencies between cells in an Excel spreadsheet. Data extracted with `openpyxl` (<https://foss.heptapod.net/openpyxl/openpyxl>), the graph is generated with the `networkx` library (<https://networkx.org/>) and is visualized using `matplotlib`.
<br clear="right"/>
## Definitions
Single-cell references in a formula sitting in cell `A3` like `=A1+A2` is considered a dependency between the node `A3` and the nodes `A2` and `A1`.
```mermaid
graph TD
A3 --> A1
A3 --> A2
A3["A3=A1+A2"]
```
A range defined in a formula like `=SUM(B1:B3)` is kept as a single node in the graph, but all the containing cells are expanded as dependencies of the range node.
So when a cell, `C1` contains `=SUM(B1:B3)` the graph will look like this:
```mermaid
graph TD
R -->B1
R -->B2
R -->B3
R["B1:B3"]
C1 --> R
C1["C1=SUM(B1:B3)"]
```
## Installation from pypi package
PyPi project: [graphedexcel](https://pypi.org/project/graphedexcel/)
```bash
pip install graphedexcel
```
## Installation from source
```bash
python -m venv venv
source venv/bin/activate
pip install -e .
```
## Usage
```bash
python -m graphedexcel <path_to_excel_file>
```
### Parameters from `--help`
```
usage: graphedexcel [-h] [--as-directed-graph] [--no-visualize]
[--layout {spring,circular,kamada_kawai,shell,spectral}]
[--config CONFIG] [--output-path OUTPUT_PATH]
[--open-image]
path_to_excel
Process an Excel file to build and visualize dependency graphs.
positional arguments:
path_to_excel Path to the Excel file to process.
options:
-h, --help show this help message and exit
--as-directed-graph, -d
Treat the dependency graph as directed.
--no-visualize, -n Skip the visualization of the dependency
graph.
--layout, -l {spring,circular,kamada_kawai,shell,spectral}
Layout algorithm for graph visualization
(default: spring).
--config, -c CONFIG Path to the configuration file for
visualization. See README for details.
--output-path, -o OUTPUT_PATH
Specify the output path for the generated
graph image.
--open-image Open the generated image after visualization.
--hide-legends Do not show legends in the visualization. (Default: False)
```
## Sample output
The following is the output of running the script on the sample `docs/Book1.xlsx` file.
```bash
=== Dependency Graph Summary ===
Cell/Node count 70
Dependency count 100
=== Most connected nodes ===
Range Madness!A2:A11 22
Range Madness!B2:B11 11
Range Madness!F1 10
Main Sheet!B5 4
Main Sheet!B22 4
Detached !A2:A4 4
Range Madness!B2 4
Range Madness!B3 4
Range Madness!B4 4
Range Madness!B5 4
=== Most used functions ===
SUM 4
POWER 1
Visualizing the graph of dependencies.
This might take a while...
Graph visualization saved to images/.\Book1.xlsx.png
```
## Sample plot
More in `docs/images` folder.
![Sample graph](docs/images/simplified_1.xlsx5.png)
## Customizing Graph Visualization Settings
You can customize the graph visualization settings by passing a path to a JSON configuration file. This allows you to override the default settings with your own preferences.
Look at <https://networkx.org/documentation/stable/reference/generated/networkx.drawing.nx_pylab.draw_networkx.html> for the available settings.
### Default Settings
The default settings for the graph visualization in the various sizes (from `graph_visualizer.py`):
```python
# Default settings for the graph visualization
base_graph_settings = {
"node_size": 50, # the size of the node
"width": 0.2, # the width of the edge between nodes
"edge_color": "black", # the color of the edge between nodes
"linewidths": 0, # the stroke width of the node border
"with_labels": False, # whether to show the node labels
"font_size": 10, # the size of the node labels
"cmap": "tab20b", # the color map to use for coloring nodes
"fig_size": (10, 10), # the size of the figure
}
# Sized-based settings for small, medium, and large graphs
small_graph_settings = {
"with_labels": False,
"alpha": 0.8}
medium_graph_settings = {
"node_size": 30,
"with_labels": False,
"alpha": 0.4,
"fig_size": (20, 20),
}
large_graph_settings = {
"node_size": 20,
"with_labels": False,
"alpha": 0.2,
"fig_size": (25, 25),
}
```
### Custom JSON Configuration
To override these settings, create a JSON file (e.g., graph_settings.json) with the desired settings. Here is an example of a JSON configuration file:
```json
{
"node_size": 40,
"edge_color": "blue",
"with_labels": true,
"font_size": 12,
"alpha": 0.6
}
```
### Using the Custom Configuration
To use the custom configuration, pass the path to the JSON file as an argument to the script:
```bash
python -m graphedexcel myexcel.xlsx --config graph_settings.json
```
This will render the graph using the custom settings defined in the JSON file.
## Tests
Just run pytest in the root folder.
```bash
pytest
```
### Bandit Security Tests
To run the Bandit (<https://github.com/PyCQA/bandit>) security tests, you can use the following command.
It will report on medium and high severity safety issues.
```bash
poetry run bandit -c pyproject.toml -r . -lll
```
## Run with Docker
If you don't want to install the python dependencies on your machine, you can run the script in a Docker container. The following command will build the Docker image and run the script on the sample `docs/Book1.xlsx` file.
With a powershell terminal:
```powershell
docker build -t graphedexcel .
docker run --rm -v ${pwd}/docs:/app/docs graphedexcel docs/Book1.xlsx -o docs/av.png
```
Image will be saved in the docs folder
Raw data
{
"_id": null,
"home_page": "https://github.com/dalager/graphedexcel",
"name": "graphedexcel",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "excel, graph, dependencies, visualization",
"author": "Christian Dalager",
"author_email": "christian@dalager.com",
"download_url": "https://files.pythonhosted.org/packages/ef/8d/2d996d0976367e73ed4e8ceb1f33d50a8b3527d3fda9b4ee814821ffff7b/graphedexcel-1.2.2.tar.gz",
"platform": null,
"description": "# Graphed Excel\n\n[![current release version](https://img.shields.io/github/release/dalager/graphedexcel.svg?style=flat-square)](https://github.com/dalager/graphedexcel/releases)\n[![pypi version](https://img.shields.io/pypi/v/graphedexcel.svg?style=flat-square)](https://pypi.python.org/pypi/graphedexcel)\n![Python Version](https://img.shields.io/badge/python-3.10%3A3.12-blue?style=flat-square)\n[![codecov](https://codecov.io/github/dalager/graphedexcel/branch/main/graph/badge.svg?token=CJM0EAUF9M)](https://codecov.io/github/dalager/graphedexcel)\n[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/9620/badge)](https://www.bestpractices.dev/projects/9620)\n\n<img src=\"docs/images/Book1.xlsx.png\" align=\"right\" width=\"300\" alt=\"Plot from Example Book1.xlsx file\">\n\nTool to analyze and visualize dependencies between cells in Excel spreadsheets in order to get an understanding of the complexity.\n\nWill generate a graph of the dependencies between cells in an Excel spreadsheet. Data extracted with `openpyxl` (<https://foss.heptapod.net/openpyxl/openpyxl>), the graph is generated with the `networkx` library (<https://networkx.org/>) and is visualized using `matplotlib`.\n\n<br clear=\"right\"/>\n\n## Definitions\n\nSingle-cell references in a formula sitting in cell `A3` like `=A1+A2` is considered a dependency between the node `A3` and the nodes `A2` and `A1`.\n\n```mermaid\ngraph TD\n A3 --> A1\n A3 --> A2\n A3[\"A3=A1+A2\"]\n```\n\nA range defined in a formula like `=SUM(B1:B3)` is kept as a single node in the graph, but all the containing cells are expanded as dependencies of the range node.\n\nSo when a cell, `C1` contains `=SUM(B1:B3)` the graph will look like this:\n\n```mermaid\n\ngraph TD\n R -->B1\n R -->B2\n R -->B3\n R[\"B1:B3\"]\n C1 --> R\n\n C1[\"C1=SUM(B1:B3)\"]\n\n```\n\n## Installation from pypi package\n\nPyPi project: [graphedexcel](https://pypi.org/project/graphedexcel/)\n\n```bash\npip install graphedexcel\n```\n\n## Installation from source\n\n```bash\n\npython -m venv venv\nsource venv/bin/activate\npip install -e .\n```\n\n## Usage\n\n```bash\npython -m graphedexcel <path_to_excel_file>\n```\n\n### Parameters from `--help`\n\n```\nusage: graphedexcel [-h] [--as-directed-graph] [--no-visualize]\n [--layout {spring,circular,kamada_kawai,shell,spectral}]\n [--config CONFIG] [--output-path OUTPUT_PATH]\n [--open-image]\n path_to_excel\n\nProcess an Excel file to build and visualize dependency graphs.\n\npositional arguments:\n path_to_excel Path to the Excel file to process.\n\noptions:\n -h, --help show this help message and exit\n --as-directed-graph, -d\n Treat the dependency graph as directed.\n --no-visualize, -n Skip the visualization of the dependency\n graph.\n --layout, -l {spring,circular,kamada_kawai,shell,spectral}\n Layout algorithm for graph visualization\n (default: spring).\n --config, -c CONFIG Path to the configuration file for\n visualization. See README for details.\n --output-path, -o OUTPUT_PATH\n Specify the output path for the generated\n graph image.\n --open-image Open the generated image after visualization.\n --hide-legends Do not show legends in the visualization. (Default: False)\n```\n\n## Sample output\n\nThe following is the output of running the script on the sample `docs/Book1.xlsx` file.\n\n```bash\n=== Dependency Graph Summary ===\nCell/Node count 70\nDependency count 100\n\n\n=== Most connected nodes ===\nRange Madness!A2:A11 22\nRange Madness!B2:B11 11\nRange Madness!F1 10\nMain Sheet!B5 4\nMain Sheet!B22 4\nDetached !A2:A4 4\nRange Madness!B2 4\nRange Madness!B3 4\nRange Madness!B4 4\nRange Madness!B5 4\n\n=== Most used functions ===\nSUM 4\nPOWER 1\n\nVisualizing the graph of dependencies.\nThis might take a while...\n\nGraph visualization saved to images/.\\Book1.xlsx.png\n```\n\n## Sample plot\n\nMore in `docs/images` folder.\n\n![Sample graph](docs/images/simplified_1.xlsx5.png)\n\n## Customizing Graph Visualization Settings\n\nYou can customize the graph visualization settings by passing a path to a JSON configuration file. This allows you to override the default settings with your own preferences.\n\nLook at <https://networkx.org/documentation/stable/reference/generated/networkx.drawing.nx_pylab.draw_networkx.html> for the available settings.\n\n### Default Settings\n\nThe default settings for the graph visualization in the various sizes (from `graph_visualizer.py`):\n\n```python\n# Default settings for the graph visualization\nbase_graph_settings = {\n \"node_size\": 50, # the size of the node\n \"width\": 0.2, # the width of the edge between nodes\n \"edge_color\": \"black\", # the color of the edge between nodes\n \"linewidths\": 0, # the stroke width of the node border\n \"with_labels\": False, # whether to show the node labels\n \"font_size\": 10, # the size of the node labels\n \"cmap\": \"tab20b\", # the color map to use for coloring nodes\n \"fig_size\": (10, 10), # the size of the figure\n}\n\n# Sized-based settings for small, medium, and large graphs\nsmall_graph_settings = {\n \"with_labels\": False,\n \"alpha\": 0.8}\n\nmedium_graph_settings = {\n \"node_size\": 30,\n \"with_labels\": False,\n \"alpha\": 0.4,\n \"fig_size\": (20, 20),\n}\n\nlarge_graph_settings = {\n \"node_size\": 20,\n \"with_labels\": False,\n \"alpha\": 0.2,\n \"fig_size\": (25, 25),\n}\n\n```\n\n### Custom JSON Configuration\n\nTo override these settings, create a JSON file (e.g., graph_settings.json) with the desired settings. Here is an example of a JSON configuration file:\n\n```json\n{\n \"node_size\": 40,\n \"edge_color\": \"blue\",\n \"with_labels\": true,\n \"font_size\": 12,\n \"alpha\": 0.6\n}\n```\n\n### Using the Custom Configuration\n\nTo use the custom configuration, pass the path to the JSON file as an argument to the script:\n\n```bash\npython -m graphedexcel myexcel.xlsx --config graph_settings.json\n```\n\nThis will render the graph using the custom settings defined in the JSON file.\n\n## Tests\n\nJust run pytest in the root folder.\n\n```bash\npytest\n```\n\n### Bandit Security Tests\n\nTo run the Bandit (<https://github.com/PyCQA/bandit>) security tests, you can use the following command.\nIt will report on medium and high severity safety issues.\n\n```bash\npoetry run bandit -c pyproject.toml -r . -lll\n```\n\n## Run with Docker\n\nIf you don't want to install the python dependencies on your machine, you can run the script in a Docker container. The following command will build the Docker image and run the script on the sample `docs/Book1.xlsx` file.\n\nWith a powershell terminal:\n\n```powershell\ndocker build -t graphedexcel .\ndocker run --rm -v ${pwd}/docs:/app/docs graphedexcel docs/Book1.xlsx -o docs/av.png\n```\n\nImage will be saved in the docs folder\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Tool to analyze and visualize dependencies between cells in Excel spreadsheets",
"version": "1.2.2",
"project_urls": {
"Documentation": "https://github.com/dalager/graphedexcel",
"Homepage": "https://github.com/dalager/graphedexcel",
"Repository": "https://github.com/dalager/graphedexcel"
},
"split_keywords": [
"excel",
" graph",
" dependencies",
" visualization"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "10720e18f377fda2b5429e9e1202558c05a45ab87f834dff490845c4f5a6505e",
"md5": "3d0c062ef62393cac79f5336b6ef6d84",
"sha256": "0fa835cee7f0a359eef64c521c6ee0599deb07fc60e060f57ad274f42a477121"
},
"downloads": -1,
"filename": "graphedexcel-1.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3d0c062ef62393cac79f5336b6ef6d84",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 13288,
"upload_time": "2024-11-09T21:51:48",
"upload_time_iso_8601": "2024-11-09T21:51:48.732572Z",
"url": "https://files.pythonhosted.org/packages/10/72/0e18f377fda2b5429e9e1202558c05a45ab87f834dff490845c4f5a6505e/graphedexcel-1.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ef8d2d996d0976367e73ed4e8ceb1f33d50a8b3527d3fda9b4ee814821ffff7b",
"md5": "4109af7c0881a0763f5a3643c287325c",
"sha256": "7fbb9d00ed5d64fa9ee8be921fb922659f6afe4f8540cd2796ff075254216480"
},
"downloads": -1,
"filename": "graphedexcel-1.2.2.tar.gz",
"has_sig": false,
"md5_digest": "4109af7c0881a0763f5a3643c287325c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 12709,
"upload_time": "2024-11-09T21:51:50",
"upload_time_iso_8601": "2024-11-09T21:51:50.621804Z",
"url": "https://files.pythonhosted.org/packages/ef/8d/2d996d0976367e73ed4e8ceb1f33d50a8b3527d3fda9b4ee814821ffff7b/graphedexcel-1.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-09 21:51:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dalager",
"github_project": "graphedexcel",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "graphedexcel"
}