NEExT


NameNEExT JSON
Version 0.2.10 PyPI version JSON
download
home_pageNone
SummaryNetwork Embedding Experimentation Toolkit - A powerful framework for graph analysis, embedding computation, and machine learning on graph-structured data
upload_time2025-07-11 02:45:36
maintainerNone
docs_urlNone
authorNone
requires_python<3.13,>=3.9
licenseMIT
keywords embedding graph graph-ml machine-learning network network-analysis
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # NEExT: Network Embedding Experimentation Toolkit

NEExT is a powerful Python framework for graph analysis, embedding computation, and machine learning on graph-structured data. It provides a unified interface for working with different graph backends (NetworkX and iGraph), computing node features, generating graph embeddings, and training machine learning models.

## 📚 Documentation

Detailed documentation is available in the `docs` directory. Build it locally or visit the online documentation at [NEExT Documentation](https://neext.readthedocs.io/en/latest/).

## 🌟 Features

- **Flexible Graph Handling**
  - Support for both NetworkX and iGraph backends
  - Automatic graph reindexing and largest component filtering
  - Node sampling capabilities for large graphs
  - Rich attribute support for nodes and edges

- **Comprehensive Node Features**
  - PageRank
  - Degree Centrality
  - Closeness Centrality
  - Betweenness Centrality
  - Eigenvector Centrality
  - Clustering Coefficient
  - Local Efficiency
  - LSME (Local Structural Motif Embeddings)

- **Graph Embeddings**
  - Approximate Wasserstein
  - Exact Wasserstein
  - Sinkhorn Vectorizer
  - Customizable embedding dimensions

- **Machine Learning Integration**
  - Classification and regression support
  - Dataset balancing options
  - Cross-validation with customizable splits
  - Feature importance analysis

### Custom Node Feature Functions

NEExT allows you to define and compute your own custom node feature functions alongside the built-in ones. This provides great flexibility for experimenting with novel graph metrics.

**Defining a Custom Feature Function:**

Your custom feature function must adhere to the following structure:

1.  **Input**: It must accept a single argument, which will be a `graph` object. This object provides access to the graph's structure (nodes, edges) and properties (e.g., `graph.nodes`, `graph.graph_id`, `graph.G` which is the underlying NetworkX or iGraph object).
2.  **Output**: It must return a `pandas.DataFrame` with the following specific columns in order:
    *   `"node_id"`: Identifiers for the nodes for which features are computed.
    *   `"graph_id"`: The identifier of the graph to which these nodes belong.
    *   One or more feature columns: These columns should contain the computed feature values. The naming convention for these columns should ideally follow the pattern `your_feature_name_0`, `your_feature_name_1`, etc., if your feature has multiple components or is expanded over hops (though a single feature column like `your_feature_name` is also acceptable).

**Example:**

Here's how you can define a simple custom feature function and use it:

```python
import pandas as pd

# 1. Define your custom feature function
# This function must be defined at the top level of your script/module
# if you plan to use multiprocessing (n_jobs != 1).
def my_node_degree_squared(graph):
    nodes = list(graph.nodes) # or range(graph.G.vcount()) for igraph if nodes are 0-indexed
    graph_id = graph.graph_id
    
    if hasattr(graph.G, 'degree'): # Handles both NetworkX and iGraph
        if isinstance(graph.G, nx.Graph): # NetworkX
            degrees = [graph.G.degree(n) for n in nodes]
        else: # iGraph
            degrees = graph.G.degree(nodes)
    else:
        raise TypeError("Graph object does not have a degree method.")
        
    degree_squared_values = [d**2 for d in degrees]
    
    df = pd.DataFrame({
        'node_id': nodes,
        'graph_id': graph_id,
        'degree_sq_0': degree_squared_values
    })
    # Ensure the correct column order
    return df[['node_id', 'graph_id', 'degree_sq_0']]

# 2. Prepare the list of custom feature methods
my_feature_methods = [
    {"feature_name": "my_degree_squared", "feature_function": my_node_degree_squared}
]

# 3. Pass it to compute_node_features
# Initialize NEExT and load your graph_collection as shown in the Quick Start
# nxt = NEExT()
# graph_collection = nxt.read_from_csv(...)

features = nxt.compute_node_features(
    graph_collection=graph_collection,
    feature_list=["page_rank", "my_degree_squared"], # Include your custom feature name
    feature_vector_length=3, # Applies to built-in features that use it
    my_feature_methods=my_feature_methods
)

print(features.features_df.head())
```

When you include `"my_degree_squared"` in the `feature_list` and provide `my_feature_methods`, NEExT will automatically register and compute your custom function. If `"all"` is in `feature_list`, your custom registered function will also be included in the computation.

## 📦 Installation

### Basic Installation
```bash
pip install NEExT
```

### Development Installation
```bash
# Clone the repository
git clone https://github.com/ashdehghan/NEExT.git
cd NEExT

# Install with development dependencies
pip install -e ".[dev]"
```

### Additional Components
```bash
# For running tests
pip install -e ".[test]"

# For building documentation
pip install -e ".[docs]"

# For running experiments
pip install -e ".[experiments]"

# Install all components
pip install -e ".[dev,test,docs,experiments]"
```

## 🚀 Quick Start

### Basic Usage

```python
from NEExT import NEExT

# Initialize the framework
nxt = NEExT()
nxt.set_log_level("INFO")

# Load graph data
graph_collection = nxt.read_from_csv(
    edges_path="edges.csv",
    node_graph_mapping_path="node_graph_mapping.csv",
    graph_label_path="graph_labels.csv",
    reindex_nodes=True,
    filter_largest_component=True,
    graph_type="igraph"
)

# Compute node features
features = nxt.compute_node_features(
    graph_collection=graph_collection,
    feature_list=["all"],
    feature_vector_length=3
)

# Compute graph embeddings
embeddings = nxt.compute_graph_embeddings(
    graph_collection=graph_collection,
    features=features,
    embedding_algorithm="approx_wasserstein",
    embedding_dimension=3
)

# Train a classifier
model_results = nxt.train_ml_model(
    graph_collection=graph_collection,
    embeddings=embeddings,
    model_type="classifier",
    sample_size=50
)
```

### Working with Large Graphs

NEExT supports node sampling for handling large graphs:

```python
# Load graphs with 70% of nodes
graph_collection = nxt.read_from_csv(
    edges_path="edges.csv",
    node_graph_mapping_path="node_graph_mapping.csv",
    node_sample_rate=0.7  # Use 70% of nodes
)
```

### Feature Importance Analysis

```python
# Compute feature importance
importance_df = nxt.compute_feature_importance(
    graph_collection=graph_collection,
    features=features,
    feature_importance_algorithm="supervised_fast",
    embedding_algorithm="approx_wasserstein"
)
```

## 📊 Experiments

NEExT includes several pre-built experiments in the `examples/experiments` directory:

### Node Sampling Experiment
Investigates the effect of node sampling on classifier accuracy:
```bash
cd examples/experiments
python node_sampling_experiments.py
```

## 📝 Input File Formats

### edges.csv
```csv
src_node_id,dest_node_id
0,1
1,2
...
```

### node_graph_mapping.csv
```csv
node_id,graph_id
0,1
1,1
2,2
...
```

### graph_labels.csv
```csv
graph_id,graph_label
1,0
2,1
...
```

## 🛠️ Development

### Running Tests
```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=NEExT

# Run specific test file
pytest tests/test_node_sampling.py
```

### Building Documentation
```bash
cd docs
make html
```

### Code Style
The project uses several tools for code quality:
```bash
# Format code
black .

# Sort imports
isort .

# Check style
flake8 .

# Type checking
mypy .
```

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run tests
5. Submit a pull request

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

## 👥 Authors

- Ash Dehghan - [ash.dehghan@gmail.com](mailto:ash.dehghan@gmail.com)

## 🙏 Acknowledgments

- NetworkX team for the graph algorithms
- iGraph team for the efficient graph operations
- Scikit-learn team for machine learning components

## 📧 Contact

For questions and support:
- Email: ash@anomalypoint.com
- GitHub Issues: [NEExT Issues](https://github.com/ashdehghan/NEExT/issues)

## 🔄 Version History

- 0.1.0
  - Initial release
  - Basic graph operations
  - Node feature computation
  - Graph embeddings
  - Machine learning integration

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "NEExT",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.9",
    "maintainer_email": "Ash Dehghan <ash.dehghan@gmail.com>",
    "keywords": "embedding, graph, graph-ml, machine-learning, network, network-analysis",
    "author": null,
    "author_email": "Ash Dehghan <ash.dehghan@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/52/b2/cf37d6a388ae8e447e4da15f13163ec825e2edc514bdadf5727bd93b0bc6/neext-0.2.10.tar.gz",
    "platform": null,
    "description": "# NEExT: Network Embedding Experimentation Toolkit\n\nNEExT is a powerful Python framework for graph analysis, embedding computation, and machine learning on graph-structured data. It provides a unified interface for working with different graph backends (NetworkX and iGraph), computing node features, generating graph embeddings, and training machine learning models.\n\n## \ud83d\udcda Documentation\n\nDetailed documentation is available in the `docs` directory. Build it locally or visit the online documentation at [NEExT Documentation](https://neext.readthedocs.io/en/latest/).\n\n## \ud83c\udf1f Features\n\n- **Flexible Graph Handling**\n  - Support for both NetworkX and iGraph backends\n  - Automatic graph reindexing and largest component filtering\n  - Node sampling capabilities for large graphs\n  - Rich attribute support for nodes and edges\n\n- **Comprehensive Node Features**\n  - PageRank\n  - Degree Centrality\n  - Closeness Centrality\n  - Betweenness Centrality\n  - Eigenvector Centrality\n  - Clustering Coefficient\n  - Local Efficiency\n  - LSME (Local Structural Motif Embeddings)\n\n- **Graph Embeddings**\n  - Approximate Wasserstein\n  - Exact Wasserstein\n  - Sinkhorn Vectorizer\n  - Customizable embedding dimensions\n\n- **Machine Learning Integration**\n  - Classification and regression support\n  - Dataset balancing options\n  - Cross-validation with customizable splits\n  - Feature importance analysis\n\n### Custom Node Feature Functions\n\nNEExT allows you to define and compute your own custom node feature functions alongside the built-in ones. This provides great flexibility for experimenting with novel graph metrics.\n\n**Defining a Custom Feature Function:**\n\nYour custom feature function must adhere to the following structure:\n\n1.  **Input**: It must accept a single argument, which will be a `graph` object. This object provides access to the graph's structure (nodes, edges) and properties (e.g., `graph.nodes`, `graph.graph_id`, `graph.G` which is the underlying NetworkX or iGraph object).\n2.  **Output**: It must return a `pandas.DataFrame` with the following specific columns in order:\n    *   `\"node_id\"`: Identifiers for the nodes for which features are computed.\n    *   `\"graph_id\"`: The identifier of the graph to which these nodes belong.\n    *   One or more feature columns: These columns should contain the computed feature values. The naming convention for these columns should ideally follow the pattern `your_feature_name_0`, `your_feature_name_1`, etc., if your feature has multiple components or is expanded over hops (though a single feature column like `your_feature_name` is also acceptable).\n\n**Example:**\n\nHere's how you can define a simple custom feature function and use it:\n\n```python\nimport pandas as pd\n\n# 1. Define your custom feature function\n# This function must be defined at the top level of your script/module\n# if you plan to use multiprocessing (n_jobs != 1).\ndef my_node_degree_squared(graph):\n    nodes = list(graph.nodes) # or range(graph.G.vcount()) for igraph if nodes are 0-indexed\n    graph_id = graph.graph_id\n    \n    if hasattr(graph.G, 'degree'): # Handles both NetworkX and iGraph\n        if isinstance(graph.G, nx.Graph): # NetworkX\n            degrees = [graph.G.degree(n) for n in nodes]\n        else: # iGraph\n            degrees = graph.G.degree(nodes)\n    else:\n        raise TypeError(\"Graph object does not have a degree method.\")\n        \n    degree_squared_values = [d**2 for d in degrees]\n    \n    df = pd.DataFrame({\n        'node_id': nodes,\n        'graph_id': graph_id,\n        'degree_sq_0': degree_squared_values\n    })\n    # Ensure the correct column order\n    return df[['node_id', 'graph_id', 'degree_sq_0']]\n\n# 2. Prepare the list of custom feature methods\nmy_feature_methods = [\n    {\"feature_name\": \"my_degree_squared\", \"feature_function\": my_node_degree_squared}\n]\n\n# 3. Pass it to compute_node_features\n# Initialize NEExT and load your graph_collection as shown in the Quick Start\n# nxt = NEExT()\n# graph_collection = nxt.read_from_csv(...)\n\nfeatures = nxt.compute_node_features(\n    graph_collection=graph_collection,\n    feature_list=[\"page_rank\", \"my_degree_squared\"], # Include your custom feature name\n    feature_vector_length=3, # Applies to built-in features that use it\n    my_feature_methods=my_feature_methods\n)\n\nprint(features.features_df.head())\n```\n\nWhen you include `\"my_degree_squared\"` in the `feature_list` and provide `my_feature_methods`, NEExT will automatically register and compute your custom function. If `\"all\"` is in `feature_list`, your custom registered function will also be included in the computation.\n\n## \ud83d\udce6 Installation\n\n### Basic Installation\n```bash\npip install NEExT\n```\n\n### Development Installation\n```bash\n# Clone the repository\ngit clone https://github.com/ashdehghan/NEExT.git\ncd NEExT\n\n# Install with development dependencies\npip install -e \".[dev]\"\n```\n\n### Additional Components\n```bash\n# For running tests\npip install -e \".[test]\"\n\n# For building documentation\npip install -e \".[docs]\"\n\n# For running experiments\npip install -e \".[experiments]\"\n\n# Install all components\npip install -e \".[dev,test,docs,experiments]\"\n```\n\n## \ud83d\ude80 Quick Start\n\n### Basic Usage\n\n```python\nfrom NEExT import NEExT\n\n# Initialize the framework\nnxt = NEExT()\nnxt.set_log_level(\"INFO\")\n\n# Load graph data\ngraph_collection = nxt.read_from_csv(\n    edges_path=\"edges.csv\",\n    node_graph_mapping_path=\"node_graph_mapping.csv\",\n    graph_label_path=\"graph_labels.csv\",\n    reindex_nodes=True,\n    filter_largest_component=True,\n    graph_type=\"igraph\"\n)\n\n# Compute node features\nfeatures = nxt.compute_node_features(\n    graph_collection=graph_collection,\n    feature_list=[\"all\"],\n    feature_vector_length=3\n)\n\n# Compute graph embeddings\nembeddings = nxt.compute_graph_embeddings(\n    graph_collection=graph_collection,\n    features=features,\n    embedding_algorithm=\"approx_wasserstein\",\n    embedding_dimension=3\n)\n\n# Train a classifier\nmodel_results = nxt.train_ml_model(\n    graph_collection=graph_collection,\n    embeddings=embeddings,\n    model_type=\"classifier\",\n    sample_size=50\n)\n```\n\n### Working with Large Graphs\n\nNEExT supports node sampling for handling large graphs:\n\n```python\n# Load graphs with 70% of nodes\ngraph_collection = nxt.read_from_csv(\n    edges_path=\"edges.csv\",\n    node_graph_mapping_path=\"node_graph_mapping.csv\",\n    node_sample_rate=0.7  # Use 70% of nodes\n)\n```\n\n### Feature Importance Analysis\n\n```python\n# Compute feature importance\nimportance_df = nxt.compute_feature_importance(\n    graph_collection=graph_collection,\n    features=features,\n    feature_importance_algorithm=\"supervised_fast\",\n    embedding_algorithm=\"approx_wasserstein\"\n)\n```\n\n## \ud83d\udcca Experiments\n\nNEExT includes several pre-built experiments in the `examples/experiments` directory:\n\n### Node Sampling Experiment\nInvestigates the effect of node sampling on classifier accuracy:\n```bash\ncd examples/experiments\npython node_sampling_experiments.py\n```\n\n## \ud83d\udcdd Input File Formats\n\n### edges.csv\n```csv\nsrc_node_id,dest_node_id\n0,1\n1,2\n...\n```\n\n### node_graph_mapping.csv\n```csv\nnode_id,graph_id\n0,1\n1,1\n2,2\n...\n```\n\n### graph_labels.csv\n```csv\ngraph_id,graph_label\n1,0\n2,1\n...\n```\n\n## \ud83d\udee0\ufe0f Development\n\n### Running Tests\n```bash\n# Run all tests\npytest\n\n# Run with coverage\npytest --cov=NEExT\n\n# Run specific test file\npytest tests/test_node_sampling.py\n```\n\n### Building Documentation\n```bash\ncd docs\nmake html\n```\n\n### Code Style\nThe project uses several tools for code quality:\n```bash\n# Format code\nblack .\n\n# Sort imports\nisort .\n\n# Check style\nflake8 .\n\n# Type checking\nmypy .\n```\n\n## \ud83e\udd1d Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Run tests\n5. Submit a pull request\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## \ud83d\udc65 Authors\n\n- Ash Dehghan - [ash.dehghan@gmail.com](mailto:ash.dehghan@gmail.com)\n\n## \ud83d\ude4f Acknowledgments\n\n- NetworkX team for the graph algorithms\n- iGraph team for the efficient graph operations\n- Scikit-learn team for machine learning components\n\n## \ud83d\udce7 Contact\n\nFor questions and support:\n- Email: ash@anomalypoint.com\n- GitHub Issues: [NEExT Issues](https://github.com/ashdehghan/NEExT/issues)\n\n## \ud83d\udd04 Version History\n\n- 0.1.0\n  - Initial release\n  - Basic graph operations\n  - Node feature computation\n  - Graph embeddings\n  - Machine learning integration\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Network Embedding Experimentation Toolkit - A powerful framework for graph analysis, embedding computation, and machine learning on graph-structured data",
    "version": "0.2.10",
    "project_urls": {
        "Documentation": "https://neext.readthedocs.io",
        "Homepage": "https://github.com/ashdehghan/NEExT",
        "Issues": "https://github.com/ashdehghan/NEExT/issues",
        "Repository": "https://github.com/ashdehghan/NEExT"
    },
    "split_keywords": [
        "embedding",
        " graph",
        " graph-ml",
        " machine-learning",
        " network",
        " network-analysis"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7f0b8dad70732bb1beb4e028d5126b29e60587d15c99b8e8b1d8fd3e547c3de8",
                "md5": "2c4561b1ca24fbc01a6daef3879f71c9",
                "sha256": "1af50636c6303776b06c50f213fbaf08db6286292f73b8613f880febf197b7e5"
            },
            "downloads": -1,
            "filename": "neext-0.2.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2c4561b1ca24fbc01a6daef3879f71c9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.9",
            "size": 52299,
            "upload_time": "2025-07-11T02:45:34",
            "upload_time_iso_8601": "2025-07-11T02:45:34.057087Z",
            "url": "https://files.pythonhosted.org/packages/7f/0b/8dad70732bb1beb4e028d5126b29e60587d15c99b8e8b1d8fd3e547c3de8/neext-0.2.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "52b2cf37d6a388ae8e447e4da15f13163ec825e2edc514bdadf5727bd93b0bc6",
                "md5": "77cb802f724040ab03b8d8e14b385235",
                "sha256": "a61eaa6da61262215d3d914d14c2022f1b3fd762289f2bd0c287c3c4fe0766dc"
            },
            "downloads": -1,
            "filename": "neext-0.2.10.tar.gz",
            "has_sig": false,
            "md5_digest": "77cb802f724040ab03b8d8e14b385235",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.9",
            "size": 1924214,
            "upload_time": "2025-07-11T02:45:36",
            "upload_time_iso_8601": "2025-07-11T02:45:36.134314Z",
            "url": "https://files.pythonhosted.org/packages/52/b2/cf37d6a388ae8e447e4da15f13163ec825e2edc514bdadf5727bd93b0bc6/neext-0.2.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-11 02:45:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ashdehghan",
    "github_project": "NEExT",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "neext"
}
        
Elapsed time: 0.42293s