nipd-framework

Name	nipd-framework JSON
Version	1.0.0 JSON
	download
home_page	https://github.com/maximusJWL/nipd-framework
Summary	Network Iterated Prisoner's Dilemma Framework for Multi-Agent Learning
upload_time	2025-08-30 23:39:45
maintainer	None
docs_url	None
author	maximusjwl
requires_python	>=3.13
license	CC BY-NC 4.0
keywords	multi-agent prisoner-dilemma reinforcement-learning game-theory networks cooperation defection social-dilemmas
VCS
bugtrack_url
requirements	numpy matplotlib torch tqdm pandas seaborn networkx ipykernel jupyter notebook
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # NIPD Framework - Network Iterated Prisoner's Dilemma

A comprehensive Python package for simulating and analyzing multi-agent learning in network-structured environments using the Iterated Prisoner's Dilemma.

## Installation

```bash
# Install from PyPI
pip install nipd-framework

# Or install from source
git clone https://github.com/maximusJWL/nipd-framework
cd nipd-framework
pip install -e .
```

## Package Contents

The package includes:
- **Core simulation engine** (`nipd.agent_simulator`, `nipd.network_environment`, etc.)
- **Pretrained models** (one final model per algorithm type)
- **Training scripts** (for users who want to train their own models)
- **Example usage scripts**

**Note**: Training outputs (curves, reports, intermediate models) are excluded from the package to keep it lightweight. Only the final pretrained models are included.

## Quick Start

```python
import nipd

# Create a simple simulation
agent_config = {
    'titfortat': 5,
    'cooperator': 3,
    'defector': 2
}

network_config = {
    'type': 'small_world',
    'k_neighbors': 3,
    'rewire_prob': 0.1
}

simulation_config = {
    'episode_length': 100,
    'num_episodes': 1,
    'reward_matrix': [[3.0, 0.0], [5.0, 1.0]],
    'use_system_rewards': False,
    'noise': {'enabled': True, 'probability': 0.05}
}

# Run simulation
simulator = nipd.AgentSimulator(agent_config, network_config, simulation_config)
simulator.run_simulation()
simulator.create_visualizations()
```

### Command Line Usage

The package includes a command-line interface:

```bash
# Run the default simulation
nipd-simulate

# Run with custom configuration (modify agent_simulator.py first)
python -m nipd.agent_simulator
```

### Examples

See the `examples/` directory for more detailed usage examples:

```bash
# Basic simulation example
python examples/basic_simulation.py

# Advanced simulation with multiple agent types
python examples/advanced_simulation.py
```

## Overview

The NIPD Framework implements various reinforcement learning algorithms for studying cooperation and defection strategies in multi-agent environments, specifically focusing on the Network Iterated Prisoner's Dilemma game.

## Overview

The project contains implementations of different reinforcement learning agents that can learn to cooperate or defect in a networked environment. Each agent type uses different learning algorithms and can be configured for online learning (adapting during gameplay) or offline learning (using pre-trained models).

## Package Structure

```
nipd-framework/
├── nipd/                           # Main package directory
│   ├── __init__.py                # Package initialization
│   ├── agent_simulator.py         # Main simulation engine
│   ├── network_environment.py     # Network environment implementation
│   ├── network_tops.py            # Network topology utilities
│   ├── round.py                   # Game round implementation
│   └── models/                    # Agent model implementations
│       ├── standard_mappo/        # Standard MAPPO agents
│       ├── cooperative_mappo/     # Cooperative MAPPO agents
│       ├── decentralized_ppo/     # Decentralized PPO agents
│       ├── lola/                  # LOLA agents
│       ├── q_network/             # Q-Network agents
│       ├── simple_q_learning/     # Simple Q-Learning agents
│       ├── online_mappo/          # Online learning MAPPO agents
│       ├── online_decentralized_ppo/ # Online learning DPPO agents
│       ├── online_lola/           # Online learning LOLA agents
│       ├── online_q_network/      # Online learning Q-Network agents
│       └── online_simple_q_learning/ # Online learning Simple Q-Learning agents
├── examples/                       # Example scripts
│   ├── basic_simulation.py        # Basic usage example
│   └── advanced_simulation.py     # Advanced usage example
├── setup.py                       # Package setup configuration
├── requirements.txt               # Python dependencies
├── LICENSE                        # CC BY-NC 4.0 License
└── README.md                      # This file
```

## Agent Types

### Standard Agents (Pre-trained)
- **Standard MAPPO**: Multi-Agent Proximal Policy Optimization with centralized training
- **Cooperative MAPPO**: MAPPO variant optimized for cooperation
- **Decentralized PPO**: Decentralized Proximal Policy Optimization
- **LOLA**: Learning with Opponent-Learning Awareness
- **Q-Network**: Deep Q-Network implementation
- **Simple Q-Learning**: Tabular Q-Learning with state discretization

### Online Learning Agents
- **Online MAPPO**: MAPPO with online learning capabilities
- **Online DPPO**: Decentralized PPO with online learning
- **Online LOLA**: LOLA with online learning and opponent modeling
- **Online Q-Network**: Q-Network with online learning
- **Online Simple Q-Learning**: Simple Q-Learning with online adaptation

## Key Features

### Online Learning
Online learning agents can adapt their strategies during gameplay based on observations of opponent behavior. They update their networks after each round and can switch between cooperation and defection strategies based on the observed opponent's behavior.

### Observation Format
All agents use a standardized 4-dimensional observation format:
- `own_prev`: Agent's previous action (0 = cooperate, 1 = defect)
- `neighbor_prev`: Neighbor's previous action
- `neighbor_coop_rate`: Neighbor's cooperation rate
- `neighbors_norm`: Normalized number of neighbors

### Reward System
The system supports both private rewards (individual agent rewards) and local rewards (collective performance). Agents can be configured to optimize for either individual gain or collective benefit.

### Noise Implementation
The simulation includes configurable noise that can cause agents to execute the opposite action than intended, modeling real-world uncertainty.

## Usage

### Basic Simulation

```python
import nipd

# Agent Configuration
agent_config = {
    'titfortat': 5,      # 5 Tit-for-Tat agents
    'cooperator': 3,     # 3 Always Cooperate agents
    'defector': 2,       # 2 Always Defect agents
    'decentralized_ppo': 0,
    'standard_mappo': 0,
    'cooperative_mappo': 0,
    'lola': 0,
    'q_learner': 0,
    'online_simple_q': 0,
    'online_q_network': 0,
    'online_decentralized_ppo': 0,
    'online_lola': 0,
    'online_mappo': 0
}

# Network Configuration
network_config = {
    'type': 'small_world',
    'k_neighbors': 4,
    'rewire_prob': 0.1
}

# Simulation Configuration
simulation_config = {
    'episode_length': 250,
    'num_episodes': 1,
    'reward_matrix': [[3.0, 0.0], [5.0, 1.0]],
    'use_system_rewards': False,
    'noise': {'enabled': True, 'probability': 0.05}
}

# Create and run simulator
simulator = nipd.AgentSimulator(agent_config, network_config, simulation_config)
simulator.run_simulation()
simulator.create_visualizations()
```

# Simulation Configuration
simulation_config = {
    'episode_length': 200,        # Number of timesteps per episode
    'num_episodes': 1,           # Number of episodes to run
    'reward_matrix': [            # Prisoner's Dilemma reward matrix
        [3.0, 0.0],              # Cooperate vs [Cooperate, Defect]
        [5.0, 1.0]               # Defect vs [Cooperate, Defect]
    ],
    'use_system_rewards': False,      # True: use system-wide rewards, False: use private rewards
    'noise': {
        'enabled': True,              # Enable action noise in simulation
        'probability': 0.05,          # Probability of action flip per agent per timestep (0.0-1.0)
        'description': 'Random chance for agents to execute opposite action than intended'
    }
}

# Create and run simulation
simulator = AgentSimulator(agent_config, network_config, simulation_config)
simulator.run_simulation()
```

### Using Different Agent Types

```python
# Configure agent types with counts
agent_config = {
    'decentralized_ppo': 2,        # 2 decentralized PPO agents
    'standard_mappo': 1,           # 1 standard MAPPO agent
    'cooperative_mappo': 1,        # 1 cooperative MAPPO agent
    'lola': 2,                     # 2 LOLA agents
    'q_learner': 1,                # 1 Q-learning agent
    'titfortat': 1,                # 1 tit-for-tat agent
    'cooperator': 1,               # 1 cooperator agent
    'defector': 1,                 # 1 defector agent
    # Online learning agents (initialized from pretrained models)
    'online_simple_q': 0,
    'online_q_network': 0,
    'online_decentralized_ppo': 0,
    'online_lola': 0,
    'online_mappo': 0
}

simulator = AgentSimulator(agent_config, network_config, simulation_config)
simulator.run_simulation()
```

### System vs Private Optimization

```python
# For system optimization (agents optimize for collective benefit)
simulation_config['use_system_rewards'] = True

# For private optimization (agents optimize for individual benefit)
simulation_config['use_system_rewards'] = False
```

### Adding Noise

```python
# Add 10% noise to actions
simulation_config['noise'] = {
    'enabled': True,
    'probability': 0.1,  # 10% chance of action flip
    'description': 'Random chance for agents to execute opposite action than intended'
}
```

## Configuration Options

### Network Configuration
- `type`: Network topology ('small_world', 'random', 'ring')
- `k_neighbors`: Number of neighbors per agent
- `rewire_prob`: Rewiring probability for small-world networks

### Simulation Configuration
- `episode_length`: Number of timesteps per episode
- `num_episodes`: Number of episodes to run
- `reward_matrix`: Payoff matrix for the game
- `use_system_rewards`: Whether agents optimize for system or private rewards
- `noise`: Noise configuration dictionary with `enabled`, `probability`, and `description`

### Agent Configuration
The `agent_config` dictionary specifies how many agents of each type to include:
- `decentralized_ppo`: Number of decentralized PPO agents
- `standard_mappo`: Number of standard MAPPO agents
- `cooperative_mappo`: Number of cooperative MAPPO agents
- `lola`: Number of LOLA agents
- `q_learner`: Number of Q-learning agents
- `titfortat`: Number of tit-for-tat agents
- `cooperator`: Number of always-cooperate agents
- `defector`: Number of always-defect agents
- `online_*`: Number of online learning agents (initialized from pretrained models)

## Output and Analysis

The simulation generates several outputs:

1. **CSV Files**: Detailed round-by-round data including actions, rewards, and cooperation rates
2. **Visualizations**: 
   - Move history heatmaps
   - Cooperation rate over time
   - Score per round graphs
   - Network topology visualization
3. **Leaderboard**: Final agent rankings based on total scores
4. **Logs**: Detailed simulation logs with agent behavior and learning updates

## Model Loading

All models are stored in the centralized `models/` directory. Each agent type has its own subdirectory containing:
- Trained model files (`.pt` for PyTorch models, `.json` for Q-tables)
- Training metrics and logs
- Model configurations

## Setup and Dependencies

### Virtual Environment Setup
```bash
# Create virtual environment
python -m venv .venv

# Activate virtual environment
# Windows:
.venv\Scripts\activate.bat
# Linux/Mac:
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Install Jupyter kernel
python -m ipykernel install --user --name nipd_env --display-name "NIPD Environment"
```

### Dependencies
- Python 3.13+
- PyTorch
- NumPy
- Matplotlib
- Pandas
- NetworkX
- Jupyter
- ipykernel

## Running Examples

### Quick Start
```bash
python agent_simulator.py
```

### Compare Training Approaches
```bash
python compare_training_approaches.py
```

### Custom Simulation
```python
# Create your own simulation script
from agent_simulator import AgentSimulator

# Define your configuration
agent_config = {...}
network_config = {...}
simulation_config = {...}

# Run simulation
simulator = AgentSimulator(agent_config, network_config, simulation_config)
simulator.run_simulation()
```

## Research Applications

This codebase is designed for research in:
- Multi-agent reinforcement learning
- Cooperation and defection strategies
- Network effects on agent behavior
- Online learning and adaptation
- Social dilemma games
- Emergent cooperation

## Notes

- Online learning agents require pretrained models to initialize from
- The system supports universal model loading for different agent counts
- All agents use the same observation format for fair comparison
- Noise implementation helps model real-world uncertainty
- System optimization can lead to different emergent behaviors than private optimization

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/maximusJWL/nipd-framework",
    "name": "nipd-framework",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.13",
    "maintainer_email": null,
    "keywords": "multi-agent, prisoner-dilemma, reinforcement-learning, game-theory, networks, cooperation, defection, social-dilemmas",
    "author": "maximusjwl",
    "author_email": "max.lams99@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/5b/2d/357ecb25f4176f3c1dac34fb4c4fef7111ffaf7072fd3a8e70eb28d6d163/nipd_framework-1.0.0.tar.gz",
    "platform": null,
    "description": "# NIPD Framework - Network Iterated Prisoner's Dilemma\r\n\r\nA comprehensive Python package for simulating and analyzing multi-agent learning in network-structured environments using the Iterated Prisoner's Dilemma.\r\n\r\n## Installation\r\n\r\n```bash\r\n# Install from PyPI\r\npip install nipd-framework\r\n\r\n# Or install from source\r\ngit clone https://github.com/maximusJWL/nipd-framework\r\ncd nipd-framework\r\npip install -e .\r\n```\r\n\r\n## Package Contents\r\n\r\nThe package includes:\r\n- **Core simulation engine** (`nipd.agent_simulator`, `nipd.network_environment`, etc.)\r\n- **Pretrained models** (one final model per algorithm type)\r\n- **Training scripts** (for users who want to train their own models)\r\n- **Example usage scripts**\r\n\r\n**Note**: Training outputs (curves, reports, intermediate models) are excluded from the package to keep it lightweight. Only the final pretrained models are included.\r\n\r\n## Quick Start\r\n\r\n```python\r\nimport nipd\r\n\r\n# Create a simple simulation\r\nagent_config = {\r\n    'titfortat': 5,\r\n    'cooperator': 3,\r\n    'defector': 2\r\n}\r\n\r\nnetwork_config = {\r\n    'type': 'small_world',\r\n    'k_neighbors': 3,\r\n    'rewire_prob': 0.1\r\n}\r\n\r\nsimulation_config = {\r\n    'episode_length': 100,\r\n    'num_episodes': 1,\r\n    'reward_matrix': [[3.0, 0.0], [5.0, 1.0]],\r\n    'use_system_rewards': False,\r\n    'noise': {'enabled': True, 'probability': 0.05}\r\n}\r\n\r\n# Run simulation\r\nsimulator = nipd.AgentSimulator(agent_config, network_config, simulation_config)\r\nsimulator.run_simulation()\r\nsimulator.create_visualizations()\r\n```\r\n\r\n### Command Line Usage\r\n\r\nThe package includes a command-line interface:\r\n\r\n```bash\r\n# Run the default simulation\r\nnipd-simulate\r\n\r\n# Run with custom configuration (modify agent_simulator.py first)\r\npython -m nipd.agent_simulator\r\n```\r\n\r\n### Examples\r\n\r\nSee the `examples/` directory for more detailed usage examples:\r\n\r\n```bash\r\n# Basic simulation example\r\npython examples/basic_simulation.py\r\n\r\n# Advanced simulation with multiple agent types\r\npython examples/advanced_simulation.py\r\n```\r\n\r\n## Overview\r\n\r\nThe NIPD Framework implements various reinforcement learning algorithms for studying cooperation and defection strategies in multi-agent environments, specifically focusing on the Network Iterated Prisoner's Dilemma game.\r\n\r\n## Overview\r\n\r\nThe project contains implementations of different reinforcement learning agents that can learn to cooperate or defect in a networked environment. Each agent type uses different learning algorithms and can be configured for online learning (adapting during gameplay) or offline learning (using pre-trained models).\r\n\r\n## Package Structure\r\n\r\n```\r\nnipd-framework/\r\n\u251c\u2500\u2500 nipd/                           # Main package directory\r\n\u2502   \u251c\u2500\u2500 __init__.py                # Package initialization\r\n\u2502   \u251c\u2500\u2500 agent_simulator.py         # Main simulation engine\r\n\u2502   \u251c\u2500\u2500 network_environment.py     # Network environment implementation\r\n\u2502   \u251c\u2500\u2500 network_tops.py            # Network topology utilities\r\n\u2502   \u251c\u2500\u2500 round.py                   # Game round implementation\r\n\u2502   \u2514\u2500\u2500 models/                    # Agent model implementations\r\n\u2502       \u251c\u2500\u2500 standard_mappo/        # Standard MAPPO agents\r\n\u2502       \u251c\u2500\u2500 cooperative_mappo/     # Cooperative MAPPO agents\r\n\u2502       \u251c\u2500\u2500 decentralized_ppo/     # Decentralized PPO agents\r\n\u2502       \u251c\u2500\u2500 lola/                  # LOLA agents\r\n\u2502       \u251c\u2500\u2500 q_network/             # Q-Network agents\r\n\u2502       \u251c\u2500\u2500 simple_q_learning/     # Simple Q-Learning agents\r\n\u2502       \u251c\u2500\u2500 online_mappo/          # Online learning MAPPO agents\r\n\u2502       \u251c\u2500\u2500 online_decentralized_ppo/ # Online learning DPPO agents\r\n\u2502       \u251c\u2500\u2500 online_lola/           # Online learning LOLA agents\r\n\u2502       \u251c\u2500\u2500 online_q_network/      # Online learning Q-Network agents\r\n\u2502       \u2514\u2500\u2500 online_simple_q_learning/ # Online learning Simple Q-Learning agents\r\n\u251c\u2500\u2500 examples/                       # Example scripts\r\n\u2502   \u251c\u2500\u2500 basic_simulation.py        # Basic usage example\r\n\u2502   \u2514\u2500\u2500 advanced_simulation.py     # Advanced usage example\r\n\u251c\u2500\u2500 setup.py                       # Package setup configuration\r\n\u251c\u2500\u2500 requirements.txt               # Python dependencies\r\n\u251c\u2500\u2500 LICENSE                        # CC BY-NC 4.0 License\r\n\u2514\u2500\u2500 README.md                      # This file\r\n```\r\n\r\n## Agent Types\r\n\r\n### Standard Agents (Pre-trained)\r\n- **Standard MAPPO**: Multi-Agent Proximal Policy Optimization with centralized training\r\n- **Cooperative MAPPO**: MAPPO variant optimized for cooperation\r\n- **Decentralized PPO**: Decentralized Proximal Policy Optimization\r\n- **LOLA**: Learning with Opponent-Learning Awareness\r\n- **Q-Network**: Deep Q-Network implementation\r\n- **Simple Q-Learning**: Tabular Q-Learning with state discretization\r\n\r\n### Online Learning Agents\r\n- **Online MAPPO**: MAPPO with online learning capabilities\r\n- **Online DPPO**: Decentralized PPO with online learning\r\n- **Online LOLA**: LOLA with online learning and opponent modeling\r\n- **Online Q-Network**: Q-Network with online learning\r\n- **Online Simple Q-Learning**: Simple Q-Learning with online adaptation\r\n\r\n## Key Features\r\n\r\n### Online Learning\r\nOnline learning agents can adapt their strategies during gameplay based on observations of opponent behavior. They update their networks after each round and can switch between cooperation and defection strategies based on the observed opponent's behavior.\r\n\r\n### Observation Format\r\nAll agents use a standardized 4-dimensional observation format:\r\n- `own_prev`: Agent's previous action (0 = cooperate, 1 = defect)\r\n- `neighbor_prev`: Neighbor's previous action\r\n- `neighbor_coop_rate`: Neighbor's cooperation rate\r\n- `neighbors_norm`: Normalized number of neighbors\r\n\r\n### Reward System\r\nThe system supports both private rewards (individual agent rewards) and local rewards (collective performance). Agents can be configured to optimize for either individual gain or collective benefit.\r\n\r\n### Noise Implementation\r\nThe simulation includes configurable noise that can cause agents to execute the opposite action than intended, modeling real-world uncertainty.\r\n\r\n## Usage\r\n\r\n### Basic Simulation\r\n\r\n```python\r\nimport nipd\r\n\r\n# Agent Configuration\r\nagent_config = {\r\n    'titfortat': 5,      # 5 Tit-for-Tat agents\r\n    'cooperator': 3,     # 3 Always Cooperate agents\r\n    'defector': 2,       # 2 Always Defect agents\r\n    'decentralized_ppo': 0,\r\n    'standard_mappo': 0,\r\n    'cooperative_mappo': 0,\r\n    'lola': 0,\r\n    'q_learner': 0,\r\n    'online_simple_q': 0,\r\n    'online_q_network': 0,\r\n    'online_decentralized_ppo': 0,\r\n    'online_lola': 0,\r\n    'online_mappo': 0\r\n}\r\n\r\n# Network Configuration\r\nnetwork_config = {\r\n    'type': 'small_world',\r\n    'k_neighbors': 4,\r\n    'rewire_prob': 0.1\r\n}\r\n\r\n# Simulation Configuration\r\nsimulation_config = {\r\n    'episode_length': 250,\r\n    'num_episodes': 1,\r\n    'reward_matrix': [[3.0, 0.0], [5.0, 1.0]],\r\n    'use_system_rewards': False,\r\n    'noise': {'enabled': True, 'probability': 0.05}\r\n}\r\n\r\n# Create and run simulator\r\nsimulator = nipd.AgentSimulator(agent_config, network_config, simulation_config)\r\nsimulator.run_simulation()\r\nsimulator.create_visualizations()\r\n```\r\n\r\n# Simulation Configuration\r\nsimulation_config = {\r\n    'episode_length': 200,        # Number of timesteps per episode\r\n    'num_episodes': 1,           # Number of episodes to run\r\n    'reward_matrix': [            # Prisoner's Dilemma reward matrix\r\n        [3.0, 0.0],              # Cooperate vs [Cooperate, Defect]\r\n        [5.0, 1.0]               # Defect vs [Cooperate, Defect]\r\n    ],\r\n    'use_system_rewards': False,      # True: use system-wide rewards, False: use private rewards\r\n    'noise': {\r\n        'enabled': True,              # Enable action noise in simulation\r\n        'probability': 0.05,          # Probability of action flip per agent per timestep (0.0-1.0)\r\n        'description': 'Random chance for agents to execute opposite action than intended'\r\n    }\r\n}\r\n\r\n# Create and run simulation\r\nsimulator = AgentSimulator(agent_config, network_config, simulation_config)\r\nsimulator.run_simulation()\r\n```\r\n\r\n### Using Different Agent Types\r\n\r\n```python\r\n# Configure agent types with counts\r\nagent_config = {\r\n    'decentralized_ppo': 2,        # 2 decentralized PPO agents\r\n    'standard_mappo': 1,           # 1 standard MAPPO agent\r\n    'cooperative_mappo': 1,        # 1 cooperative MAPPO agent\r\n    'lola': 2,                     # 2 LOLA agents\r\n    'q_learner': 1,                # 1 Q-learning agent\r\n    'titfortat': 1,                # 1 tit-for-tat agent\r\n    'cooperator': 1,               # 1 cooperator agent\r\n    'defector': 1,                 # 1 defector agent\r\n    # Online learning agents (initialized from pretrained models)\r\n    'online_simple_q': 0,\r\n    'online_q_network': 0,\r\n    'online_decentralized_ppo': 0,\r\n    'online_lola': 0,\r\n    'online_mappo': 0\r\n}\r\n\r\nsimulator = AgentSimulator(agent_config, network_config, simulation_config)\r\nsimulator.run_simulation()\r\n```\r\n\r\n### System vs Private Optimization\r\n\r\n```python\r\n# For system optimization (agents optimize for collective benefit)\r\nsimulation_config['use_system_rewards'] = True\r\n\r\n# For private optimization (agents optimize for individual benefit)\r\nsimulation_config['use_system_rewards'] = False\r\n```\r\n\r\n### Adding Noise\r\n\r\n```python\r\n# Add 10% noise to actions\r\nsimulation_config['noise'] = {\r\n    'enabled': True,\r\n    'probability': 0.1,  # 10% chance of action flip\r\n    'description': 'Random chance for agents to execute opposite action than intended'\r\n}\r\n```\r\n\r\n## Configuration Options\r\n\r\n### Network Configuration\r\n- `type`: Network topology ('small_world', 'random', 'ring')\r\n- `k_neighbors`: Number of neighbors per agent\r\n- `rewire_prob`: Rewiring probability for small-world networks\r\n\r\n### Simulation Configuration\r\n- `episode_length`: Number of timesteps per episode\r\n- `num_episodes`: Number of episodes to run\r\n- `reward_matrix`: Payoff matrix for the game\r\n- `use_system_rewards`: Whether agents optimize for system or private rewards\r\n- `noise`: Noise configuration dictionary with `enabled`, `probability`, and `description`\r\n\r\n### Agent Configuration\r\nThe `agent_config` dictionary specifies how many agents of each type to include:\r\n- `decentralized_ppo`: Number of decentralized PPO agents\r\n- `standard_mappo`: Number of standard MAPPO agents\r\n- `cooperative_mappo`: Number of cooperative MAPPO agents\r\n- `lola`: Number of LOLA agents\r\n- `q_learner`: Number of Q-learning agents\r\n- `titfortat`: Number of tit-for-tat agents\r\n- `cooperator`: Number of always-cooperate agents\r\n- `defector`: Number of always-defect agents\r\n- `online_*`: Number of online learning agents (initialized from pretrained models)\r\n\r\n## Output and Analysis\r\n\r\nThe simulation generates several outputs:\r\n\r\n1. **CSV Files**: Detailed round-by-round data including actions, rewards, and cooperation rates\r\n2. **Visualizations**: \r\n   - Move history heatmaps\r\n   - Cooperation rate over time\r\n   - Score per round graphs\r\n   - Network topology visualization\r\n3. **Leaderboard**: Final agent rankings based on total scores\r\n4. **Logs**: Detailed simulation logs with agent behavior and learning updates\r\n\r\n## Model Loading\r\n\r\nAll models are stored in the centralized `models/` directory. Each agent type has its own subdirectory containing:\r\n- Trained model files (`.pt` for PyTorch models, `.json` for Q-tables)\r\n- Training metrics and logs\r\n- Model configurations\r\n\r\n## Setup and Dependencies\r\n\r\n### Virtual Environment Setup\r\n```bash\r\n# Create virtual environment\r\npython -m venv .venv\r\n\r\n# Activate virtual environment\r\n# Windows:\r\n.venv\\Scripts\\activate.bat\r\n# Linux/Mac:\r\nsource .venv/bin/activate\r\n\r\n# Install dependencies\r\npip install -r requirements.txt\r\n\r\n# Install Jupyter kernel\r\npython -m ipykernel install --user --name nipd_env --display-name \"NIPD Environment\"\r\n```\r\n\r\n### Dependencies\r\n- Python 3.13+\r\n- PyTorch\r\n- NumPy\r\n- Matplotlib\r\n- Pandas\r\n- NetworkX\r\n- Jupyter\r\n- ipykernel\r\n\r\n## Running Examples\r\n\r\n### Quick Start\r\n```bash\r\npython agent_simulator.py\r\n```\r\n\r\n### Compare Training Approaches\r\n```bash\r\npython compare_training_approaches.py\r\n```\r\n\r\n### Custom Simulation\r\n```python\r\n# Create your own simulation script\r\nfrom agent_simulator import AgentSimulator\r\n\r\n# Define your configuration\r\nagent_config = {...}\r\nnetwork_config = {...}\r\nsimulation_config = {...}\r\n\r\n# Run simulation\r\nsimulator = AgentSimulator(agent_config, network_config, simulation_config)\r\nsimulator.run_simulation()\r\n```\r\n\r\n## Research Applications\r\n\r\nThis codebase is designed for research in:\r\n- Multi-agent reinforcement learning\r\n- Cooperation and defection strategies\r\n- Network effects on agent behavior\r\n- Online learning and adaptation\r\n- Social dilemma games\r\n- Emergent cooperation\r\n\r\n## Notes\r\n\r\n- Online learning agents require pretrained models to initialize from\r\n- The system supports universal model loading for different agent counts\r\n- All agents use the same observation format for fair comparison\r\n- Noise implementation helps model real-world uncertainty\r\n- System optimization can lead to different emergent behaviors than private optimization\r\n",
    "bugtrack_url": null,
    "license": "CC BY-NC 4.0",
    "summary": "Network Iterated Prisoner's Dilemma Framework for Multi-Agent Learning",
    "version": "1.0.0",
    "project_urls": {
        "Bug Reports": "https://github.com/maximusJWL/nipd-framework/issues",
        "Documentation": "https://github.com/maximusJWL/nipd-framework#readme",
        "Homepage": "https://github.com/maximusJWL/nipd-framework",
        "Source": "https://github.com/maximusJWL/nipd-framework"
    },
    "split_keywords": [
        "multi-agent",
        " prisoner-dilemma",
        " reinforcement-learning",
        " game-theory",
        " networks",
        " cooperation",
        " defection",
        " social-dilemmas"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "343314c609b5dcb1eb0cf764f8f064599ce18acadebd4b21188d5bb5abea704f",
                "md5": "2bcb2f82334dc29cc9c7908be2566183",
                "sha256": "5ebd2fbfebf8ec6ffd39e7e85f2c208726c266659bfe9988c34fff926a3ade53"
            },
            "downloads": -1,
            "filename": "nipd_framework-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2bcb2f82334dc29cc9c7908be2566183",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.13",
            "size": 46488776,
            "upload_time": "2025-08-30T23:38:32",
            "upload_time_iso_8601": "2025-08-30T23:38:32.872430Z",
            "url": "https://files.pythonhosted.org/packages/34/33/14c609b5dcb1eb0cf764f8f064599ce18acadebd4b21188d5bb5abea704f/nipd_framework-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5b2d357ecb25f4176f3c1dac34fb4c4fef7111ffaf7072fd3a8e70eb28d6d163",
                "md5": "216b9f6e1f68958f6178e6628d44b6c2",
                "sha256": "1a0bc2f13563d14abbb0db8cdee81d9ee01fe1c667d208899c54621c1b7aaaff"
            },
            "downloads": -1,
            "filename": "nipd_framework-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "216b9f6e1f68958f6178e6628d44b6c2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.13",
            "size": 46455909,
            "upload_time": "2025-08-30T23:39:45",
            "upload_time_iso_8601": "2025-08-30T23:39:45.867460Z",
            "url": "https://files.pythonhosted.org/packages/5b/2d/357ecb25f4176f3c1dac34fb4c4fef7111ffaf7072fd3a8e70eb28d6d163/nipd_framework-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-30 23:39:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "maximusJWL",
    "github_project": "nipd-framework",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.21.0"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.5.0"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "1.9.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4.62.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    ">=",
                    "0.11.0"
                ]
            ]
        },
        {
            "name": "networkx",
            "specs": [
                [
                    ">=",
                    "2.6.0"
                ]
            ]
        },
        {
            "name": "ipykernel",
            "specs": [
                [
                    ">=",
                    "6.0.0"
                ]
            ]
        },
        {
            "name": "jupyter",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "notebook",
            "specs": [
                [
                    ">=",
                    "6.4.0"
                ]
            ]
        }
    ],
    "lcname": "nipd-framework"
}

maximusjwl