graphedexcel


Namegraphedexcel JSON
Version 1.2.2 PyPI version JSON
download
home_pagehttps://github.com/dalager/graphedexcel
SummaryTool to analyze and visualize dependencies between cells in Excel spreadsheets
upload_time2024-11-09 21:51:50
maintainerNone
docs_urlNone
authorChristian Dalager
requires_python>=3.10
licenseMIT
keywords excel graph dependencies visualization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Graphed Excel

[![current release version](https://img.shields.io/github/release/dalager/graphedexcel.svg?style=flat-square)](https://github.com/dalager/graphedexcel/releases)
[![pypi version](https://img.shields.io/pypi/v/graphedexcel.svg?style=flat-square)](https://pypi.python.org/pypi/graphedexcel)
![Python Version](https://img.shields.io/badge/python-3.10%3A3.12-blue?style=flat-square)
[![codecov](https://codecov.io/github/dalager/graphedexcel/branch/main/graph/badge.svg?token=CJM0EAUF9M)](https://codecov.io/github/dalager/graphedexcel)
[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/9620/badge)](https://www.bestpractices.dev/projects/9620)

<img src="docs/images/Book1.xlsx.png" align="right" width="300" alt="Plot from Example Book1.xlsx file">

Tool to analyze and visualize dependencies between cells in Excel spreadsheets in order to get an understanding of the complexity.

Will generate a graph of the dependencies between cells in an Excel spreadsheet. Data extracted with `openpyxl` (<https://foss.heptapod.net/openpyxl/openpyxl>), the graph is generated with the `networkx` library (<https://networkx.org/>) and is visualized using `matplotlib`.

<br clear="right"/>

## Definitions

Single-cell references in a formula sitting in cell `A3` like `=A1+A2` is considered a dependency between the node `A3` and the nodes `A2` and `A1`.

```mermaid
graph TD
    A3 --> A1
    A3 --> A2
    A3["A3=A1+A2"]
```

A range defined in a formula like `=SUM(B1:B3)` is kept as a single node in the graph, but all the containing cells are expanded as dependencies of the range node.

So when a cell, `C1` contains `=SUM(B1:B3)` the graph will look like this:

```mermaid

graph TD
    R -->B1
    R -->B2
    R -->B3
    R["B1:B3"]
    C1 --> R

    C1["C1=SUM(B1:B3)"]

```

## Installation from pypi package

PyPi project: [graphedexcel](https://pypi.org/project/graphedexcel/)

```bash
pip install graphedexcel
```

## Installation from source

```bash

python -m venv venv
source venv/bin/activate
pip install -e .
```

## Usage

```bash
python -m graphedexcel <path_to_excel_file>
```

### Parameters from `--help`

```
usage: graphedexcel [-h] [--as-directed-graph] [--no-visualize]
                    [--layout {spring,circular,kamada_kawai,shell,spectral}]
                    [--config CONFIG] [--output-path OUTPUT_PATH]
                    [--open-image]
                    path_to_excel

Process an Excel file to build and visualize dependency graphs.

positional arguments:
  path_to_excel         Path to the Excel file to process.

options:
  -h, --help            show this help message and exit
  --as-directed-graph, -d
                        Treat the dependency graph as directed.
  --no-visualize, -n    Skip the visualization of the dependency
                        graph.
  --layout, -l {spring,circular,kamada_kawai,shell,spectral}
                        Layout algorithm for graph visualization
                        (default: spring).
  --config, -c CONFIG   Path to the configuration file for
                        visualization. See README for details.
  --output-path, -o OUTPUT_PATH
                        Specify the output path for the generated
                        graph image.
  --open-image          Open the generated image after visualization.
  --hide-legends        Do not show legends in the visualization. (Default: False)
```

## Sample output

The following is the output of running the script on the sample `docs/Book1.xlsx` file.

```bash
===  Dependency Graph Summary ===
Cell/Node count                70
Dependency count              100


===  Most connected nodes     ===
Range Madness!A2:A11           22
Range Madness!B2:B11           11
Range Madness!F1               10
Main Sheet!B5                   4
Main Sheet!B22                  4
Detached !A2:A4                 4
Range Madness!B2                4
Range Madness!B3                4
Range Madness!B4                4
Range Madness!B5                4

===  Most used functions      ===
SUM                             4
POWER                           1

Visualizing the graph of dependencies.
This might take a while...

Graph visualization saved to images/.\Book1.xlsx.png
```

## Sample plot

More in `docs/images` folder.

![Sample graph](docs/images/simplified_1.xlsx5.png)

## Customizing Graph Visualization Settings

You can customize the graph visualization settings by passing a path to a JSON configuration file. This allows you to override the default settings with your own preferences.

Look at <https://networkx.org/documentation/stable/reference/generated/networkx.drawing.nx_pylab.draw_networkx.html> for the available settings.

### Default Settings

The default settings for the graph visualization in the various sizes (from `graph_visualizer.py`):

```python
# Default settings for the graph visualization
base_graph_settings = {
    "node_size": 50,        # the size of the node
    "width": 0.2,           # the width of the edge between nodes
    "edge_color": "black",  # the color of the edge between nodes
    "linewidths": 0,        # the stroke width of the node border
    "with_labels": False,   # whether to show the node labels
    "font_size": 10,        # the size of the node labels
    "cmap": "tab20b",       # the color map to use for coloring nodes
    "fig_size": (10, 10),   # the size of the figure
}

# Sized-based settings for small, medium, and large graphs
small_graph_settings = {
    "with_labels": False,
    "alpha": 0.8}

medium_graph_settings = {
    "node_size": 30,
    "with_labels": False,
    "alpha": 0.4,
    "fig_size": (20, 20),
}

large_graph_settings = {
    "node_size": 20,
    "with_labels": False,
    "alpha": 0.2,
    "fig_size": (25, 25),
}

```

### Custom JSON Configuration

To override these settings, create a JSON file (e.g., graph_settings.json) with the desired settings. Here is an example of a JSON configuration file:

```json
{
  "node_size": 40,
  "edge_color": "blue",
  "with_labels": true,
  "font_size": 12,
  "alpha": 0.6
}
```

### Using the Custom Configuration

To use the custom configuration, pass the path to the JSON file as an argument to the script:

```bash
python -m graphedexcel myexcel.xlsx --config graph_settings.json
```

This will render the graph using the custom settings defined in the JSON file.

## Tests

Just run pytest in the root folder.

```bash
pytest
```

### Bandit Security Tests

To run the Bandit (<https://github.com/PyCQA/bandit>) security tests, you can use the following command.
It will report on medium and high severity safety issues.

```bash
poetry run bandit -c pyproject.toml -r . -lll
```

## Run with Docker

If you don't want to install the python dependencies on your machine, you can run the script in a Docker container. The following command will build the Docker image and run the script on the sample `docs/Book1.xlsx` file.

With a powershell terminal:

```powershell
docker build -t graphedexcel .
docker run --rm -v ${pwd}/docs:/app/docs graphedexcel docs/Book1.xlsx -o docs/av.png
```

Image will be saved in the docs folder


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/dalager/graphedexcel",
    "name": "graphedexcel",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "excel, graph, dependencies, visualization",
    "author": "Christian Dalager",
    "author_email": "christian@dalager.com",
    "download_url": "https://files.pythonhosted.org/packages/ef/8d/2d996d0976367e73ed4e8ceb1f33d50a8b3527d3fda9b4ee814821ffff7b/graphedexcel-1.2.2.tar.gz",
    "platform": null,
    "description": "# Graphed Excel\n\n[![current release version](https://img.shields.io/github/release/dalager/graphedexcel.svg?style=flat-square)](https://github.com/dalager/graphedexcel/releases)\n[![pypi version](https://img.shields.io/pypi/v/graphedexcel.svg?style=flat-square)](https://pypi.python.org/pypi/graphedexcel)\n![Python Version](https://img.shields.io/badge/python-3.10%3A3.12-blue?style=flat-square)\n[![codecov](https://codecov.io/github/dalager/graphedexcel/branch/main/graph/badge.svg?token=CJM0EAUF9M)](https://codecov.io/github/dalager/graphedexcel)\n[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/9620/badge)](https://www.bestpractices.dev/projects/9620)\n\n<img src=\"docs/images/Book1.xlsx.png\" align=\"right\" width=\"300\" alt=\"Plot from Example Book1.xlsx file\">\n\nTool to analyze and visualize dependencies between cells in Excel spreadsheets in order to get an understanding of the complexity.\n\nWill generate a graph of the dependencies between cells in an Excel spreadsheet. Data extracted with `openpyxl` (<https://foss.heptapod.net/openpyxl/openpyxl>), the graph is generated with the `networkx` library (<https://networkx.org/>) and is visualized using `matplotlib`.\n\n<br clear=\"right\"/>\n\n## Definitions\n\nSingle-cell references in a formula sitting in cell `A3` like `=A1+A2` is considered a dependency between the node `A3` and the nodes `A2` and `A1`.\n\n```mermaid\ngraph TD\n    A3 --> A1\n    A3 --> A2\n    A3[\"A3=A1+A2\"]\n```\n\nA range defined in a formula like `=SUM(B1:B3)` is kept as a single node in the graph, but all the containing cells are expanded as dependencies of the range node.\n\nSo when a cell, `C1` contains `=SUM(B1:B3)` the graph will look like this:\n\n```mermaid\n\ngraph TD\n    R -->B1\n    R -->B2\n    R -->B3\n    R[\"B1:B3\"]\n    C1 --> R\n\n    C1[\"C1=SUM(B1:B3)\"]\n\n```\n\n## Installation from pypi package\n\nPyPi project: [graphedexcel](https://pypi.org/project/graphedexcel/)\n\n```bash\npip install graphedexcel\n```\n\n## Installation from source\n\n```bash\n\npython -m venv venv\nsource venv/bin/activate\npip install -e .\n```\n\n## Usage\n\n```bash\npython -m graphedexcel <path_to_excel_file>\n```\n\n### Parameters from `--help`\n\n```\nusage: graphedexcel [-h] [--as-directed-graph] [--no-visualize]\n                    [--layout {spring,circular,kamada_kawai,shell,spectral}]\n                    [--config CONFIG] [--output-path OUTPUT_PATH]\n                    [--open-image]\n                    path_to_excel\n\nProcess an Excel file to build and visualize dependency graphs.\n\npositional arguments:\n  path_to_excel         Path to the Excel file to process.\n\noptions:\n  -h, --help            show this help message and exit\n  --as-directed-graph, -d\n                        Treat the dependency graph as directed.\n  --no-visualize, -n    Skip the visualization of the dependency\n                        graph.\n  --layout, -l {spring,circular,kamada_kawai,shell,spectral}\n                        Layout algorithm for graph visualization\n                        (default: spring).\n  --config, -c CONFIG   Path to the configuration file for\n                        visualization. See README for details.\n  --output-path, -o OUTPUT_PATH\n                        Specify the output path for the generated\n                        graph image.\n  --open-image          Open the generated image after visualization.\n  --hide-legends        Do not show legends in the visualization. (Default: False)\n```\n\n## Sample output\n\nThe following is the output of running the script on the sample `docs/Book1.xlsx` file.\n\n```bash\n===  Dependency Graph Summary ===\nCell/Node count                70\nDependency count              100\n\n\n===  Most connected nodes     ===\nRange Madness!A2:A11           22\nRange Madness!B2:B11           11\nRange Madness!F1               10\nMain Sheet!B5                   4\nMain Sheet!B22                  4\nDetached !A2:A4                 4\nRange Madness!B2                4\nRange Madness!B3                4\nRange Madness!B4                4\nRange Madness!B5                4\n\n===  Most used functions      ===\nSUM                             4\nPOWER                           1\n\nVisualizing the graph of dependencies.\nThis might take a while...\n\nGraph visualization saved to images/.\\Book1.xlsx.png\n```\n\n## Sample plot\n\nMore in `docs/images` folder.\n\n![Sample graph](docs/images/simplified_1.xlsx5.png)\n\n## Customizing Graph Visualization Settings\n\nYou can customize the graph visualization settings by passing a path to a JSON configuration file. This allows you to override the default settings with your own preferences.\n\nLook at <https://networkx.org/documentation/stable/reference/generated/networkx.drawing.nx_pylab.draw_networkx.html> for the available settings.\n\n### Default Settings\n\nThe default settings for the graph visualization in the various sizes (from `graph_visualizer.py`):\n\n```python\n# Default settings for the graph visualization\nbase_graph_settings = {\n    \"node_size\": 50,        # the size of the node\n    \"width\": 0.2,           # the width of the edge between nodes\n    \"edge_color\": \"black\",  # the color of the edge between nodes\n    \"linewidths\": 0,        # the stroke width of the node border\n    \"with_labels\": False,   # whether to show the node labels\n    \"font_size\": 10,        # the size of the node labels\n    \"cmap\": \"tab20b\",       # the color map to use for coloring nodes\n    \"fig_size\": (10, 10),   # the size of the figure\n}\n\n# Sized-based settings for small, medium, and large graphs\nsmall_graph_settings = {\n    \"with_labels\": False,\n    \"alpha\": 0.8}\n\nmedium_graph_settings = {\n    \"node_size\": 30,\n    \"with_labels\": False,\n    \"alpha\": 0.4,\n    \"fig_size\": (20, 20),\n}\n\nlarge_graph_settings = {\n    \"node_size\": 20,\n    \"with_labels\": False,\n    \"alpha\": 0.2,\n    \"fig_size\": (25, 25),\n}\n\n```\n\n### Custom JSON Configuration\n\nTo override these settings, create a JSON file (e.g., graph_settings.json) with the desired settings. Here is an example of a JSON configuration file:\n\n```json\n{\n  \"node_size\": 40,\n  \"edge_color\": \"blue\",\n  \"with_labels\": true,\n  \"font_size\": 12,\n  \"alpha\": 0.6\n}\n```\n\n### Using the Custom Configuration\n\nTo use the custom configuration, pass the path to the JSON file as an argument to the script:\n\n```bash\npython -m graphedexcel myexcel.xlsx --config graph_settings.json\n```\n\nThis will render the graph using the custom settings defined in the JSON file.\n\n## Tests\n\nJust run pytest in the root folder.\n\n```bash\npytest\n```\n\n### Bandit Security Tests\n\nTo run the Bandit (<https://github.com/PyCQA/bandit>) security tests, you can use the following command.\nIt will report on medium and high severity safety issues.\n\n```bash\npoetry run bandit -c pyproject.toml -r . -lll\n```\n\n## Run with Docker\n\nIf you don't want to install the python dependencies on your machine, you can run the script in a Docker container. The following command will build the Docker image and run the script on the sample `docs/Book1.xlsx` file.\n\nWith a powershell terminal:\n\n```powershell\ndocker build -t graphedexcel .\ndocker run --rm -v ${pwd}/docs:/app/docs graphedexcel docs/Book1.xlsx -o docs/av.png\n```\n\nImage will be saved in the docs folder\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Tool to analyze and visualize dependencies between cells in Excel spreadsheets",
    "version": "1.2.2",
    "project_urls": {
        "Documentation": "https://github.com/dalager/graphedexcel",
        "Homepage": "https://github.com/dalager/graphedexcel",
        "Repository": "https://github.com/dalager/graphedexcel"
    },
    "split_keywords": [
        "excel",
        " graph",
        " dependencies",
        " visualization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "10720e18f377fda2b5429e9e1202558c05a45ab87f834dff490845c4f5a6505e",
                "md5": "3d0c062ef62393cac79f5336b6ef6d84",
                "sha256": "0fa835cee7f0a359eef64c521c6ee0599deb07fc60e060f57ad274f42a477121"
            },
            "downloads": -1,
            "filename": "graphedexcel-1.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3d0c062ef62393cac79f5336b6ef6d84",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 13288,
            "upload_time": "2024-11-09T21:51:48",
            "upload_time_iso_8601": "2024-11-09T21:51:48.732572Z",
            "url": "https://files.pythonhosted.org/packages/10/72/0e18f377fda2b5429e9e1202558c05a45ab87f834dff490845c4f5a6505e/graphedexcel-1.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ef8d2d996d0976367e73ed4e8ceb1f33d50a8b3527d3fda9b4ee814821ffff7b",
                "md5": "4109af7c0881a0763f5a3643c287325c",
                "sha256": "7fbb9d00ed5d64fa9ee8be921fb922659f6afe4f8540cd2796ff075254216480"
            },
            "downloads": -1,
            "filename": "graphedexcel-1.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "4109af7c0881a0763f5a3643c287325c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 12709,
            "upload_time": "2024-11-09T21:51:50",
            "upload_time_iso_8601": "2024-11-09T21:51:50.621804Z",
            "url": "https://files.pythonhosted.org/packages/ef/8d/2d996d0976367e73ed4e8ceb1f33d50a8b3527d3fda9b4ee814821ffff7b/graphedexcel-1.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-09 21:51:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dalager",
    "github_project": "graphedexcel",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "graphedexcel"
}
        
Elapsed time: 2.60446s