# AsyncFlow — Event-Loop Aware Simulator for Async Distributed Systems
Created and maintained by @GioeleB00.
[](https://pypi.org/project/asyncflow-sim/)
[](https://pypi.org/project/asyncflow-sim/)
[](LICENSE)
[](#)
[](https://github.com/astral-sh/ruff)
[](https://mypy-lang.org/)
[](https://docs.pytest.org/)
[](https://simpy.readthedocs.io/)
-----
AsyncFlow is a discrete-event simulator for modeling and analyzing the performance of asynchronous, distributed backend systems built with SimPy. You describe your system's topology—its servers, network links, and load balancers—and AsyncFlow simulates the entire lifecycle of requests as they move through it.
It provides a **digital twin** of your service, modeling not just the high-level architecture but also the low-level behavior of each server's **event loop**, including explicit **CPU work**, **RAM residency**, and **I/O waits**. This allows you to run realistic "what-if" scenarios that behave like production systems rather than toy benchmarks.
### What Problem Does It Solve?
Modern async stacks like FastAPI are incredibly performant, but predicting their behavior under real-world load is difficult. Capacity planning often relies on guesswork, expensive cloud-based load tests, or discovering bottlenecks only after a production failure. AsyncFlow is designed to replace that uncertainty with **data-driven forecasting**, allowing you to understand how your system will perform before you deploy a single line of code.
### How Does It Work? An Example Topology
AsyncFlow models your system as a directed graph of interconnected components. A typical setup might look like this:

### What Questions Can It Answer?
By running simulations on your defined topology, you can get quantitative answers to critical engineering questions, such as:
* How does **p95 latency** change if active users increase from 100 to 200?
* What is the impact on the system if the **client-to-server network latency** increases by 3ms?
* Will a specific API endpoint—with a pipeline of parsing, RAM allocation, and database I/O—hold its **SLA at a load of 40 requests per second**?
---
## Installation
Install from PyPI: `pip install asyncflow-sim`
## Requirements
* **Python 3.12+** (tested on 3.12, 3.13)
* **OS:** Linux, macOS, or Windows
* **Installed automatically (runtime deps):**
**SimPy** (DES engine), **NumPy**, **Matplotlib**, **Pydantic** + **pydantic-settings**, **PyYAML**.
---
## Quick Start
### 1) Define a realistic YAML
Save as `my_service.yml`.
The full YAML schema is explained in `docs/guides/yaml-input-builder.md` and validated by Pydantic models (see `docs/internals/simulation-input.md`).
```yaml
rqs_input:
id: generator-1
avg_active_users: { mean: 100, distribution: poisson }
avg_request_per_minute_per_user: { mean: 100, distribution: poisson }
user_sampling_window: 60
topology_graph:
nodes:
client: { id: client-1 }
servers:
- id: app-1
server_resources: { cpu_cores: 1, ram_mb: 2048 }
endpoints:
- endpoint_name: /api
# Realistic pipeline on one async server:
# - 2 ms CPU parsing (blocks the event loop)
# - 120 MB RAM working set (held until the request leaves the server)
# - 12 ms DB-like I/O (non-blocking wait)
steps:
- kind: initial_parsing
step_operation: { cpu_time: 0.002 }
- kind: ram
step_operation: { necessary_ram: 120 }
- kind: io_db
step_operation: { io_waiting_time: 0.012 }
edges:
- { id: gen-client, source: generator-1, target: client-1,
latency: { mean: 0.003, distribution: exponential } }
- { id: client-app, source: client-1, target: app-1,
latency: { mean: 0.003, distribution: exponential } }
- { id: app-client, source: app-1, target: client-1,
latency: { mean: 0.003, distribution: exponential } }
sim_settings:
total_simulation_time: 300
sample_period_s: 0.05
enabled_sample_metrics:
- ready_queue_len
- event_loop_io_sleep
- ram_in_use
- edge_concurrent_connection
enabled_event_metrics:
- rqs_clock
```
Prefer building scenarios in Python? There’s a Python builder with the same semantics (create nodes, edges, endpoints programmatically). See **`docs/guides/python-builder.md`**.
### 2) Run and export charts
Save as `run_my_service.py`.
```python
from __future__ import annotations
from pathlib import Path
import simpy
import matplotlib.pyplot as plt
from asyncflow.runtime.simulation_runner import SimulationRunner
from asyncflow.metrics.analyzer import ResultsAnalyzer
def main() -> None:
script_dir = Path(__file__).parent
yaml_path = script_dir / "my_service.yml"
out_path = script_dir / "my_service_plots.png"
env = simpy.Environment()
runner = SimulationRunner.from_yaml(env=env, yaml_path=yaml_path)
res: ResultsAnalyzer = runner.run()
# Print a concise latency summary
print(res.format_latency_stats())
# 2x2: Latency | Throughput | Ready (first server) | RAM (first server)
fig, axes = plt.subplots(2, 2, figsize=(12, 8), dpi=160)
res.plot_latency_distribution(axes[0, 0])
res.plot_throughput(axes[0, 1])
sids = res.list_server_ids()
if sids:
sid = sids[0]
res.plot_single_server_ready_queue(axes[1, 0], sid)
res.plot_single_server_ram(axes[1, 1], sid)
else:
for ax in (axes[1, 0], axes[1, 1]):
ax.text(0.5, 0.5, "No servers", ha="center", va="center")
ax.axis("off")
fig.tight_layout()
fig.savefig(out_path)
print(f"Plots saved to: {out_path}")
if __name__ == "__main__":
main()
```
Run the python script
You’ll get latency stats in the terminal and a PNG with four charts (latency distribution, throughput, server queues, RAM usage).
**Want more?**
For ready-to-run scenarios—including examples using the Pythonic builder and multi-server topologies—check out the `examples/` directory in the repository.
## Development
If you want to contribute or run the full test suite locally, follow these steps.
### Requirements
* **Python 3.12+** (tested on 3.12, 3.13)
* **OS:** Linux, macOS, or Windows
* **Runtime deps installed by the package:** SimPy, NumPy, Matplotlib, Pydantic, PyYAML, pydantic-settings
**Prerequisites:** Git, Python 3.12+ in `PATH`, `curl` (Linux/macOS/WSL), PowerShell 7+ (Windows)
---
## Project setup
```bash
git clone https://github.com/AsyncFlow-Sim/AsyncFlow.git
cd AsyncFlow
```
From the repo root, run the **one-shot post-clone setup**:
**Linux / macOS / WSL**
```bash
bash scripts/dev_setup.sh
```
**Windows (PowerShell)**
```powershell
# If scripts are blocked by policy, run this in the same PowerShell session:
# Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
.\scripts\dev_setup.ps1
```
**What this does (concise):**
* Ensures **Poetry** is available (installs if missing).
* Uses a **project-local `.venv`**.
* Removes `poetry.lock` for a **clean dependency resolve** (dev policy).
* Installs the project **with dev extras**.
* Runs **ruff**, **mypy**, and **pytest (with coverage)**.
**Quick sanity check after setup:**
```bash
poetry --version
poetry run python -V
```
> **Note (lock policy):** `dev_setup` intentionally removes `poetry.lock` to avoid cross-platform conflicts during development.
**Scripts (for quick access):**
* [`scripts/dev_setup.sh`](scripts/dev_setup.sh) / [`scripts/dev_setup.ps1`](scripts/dev_setup.ps1)
* [`scripts/quality_check.sh`](scripts/quality_check.sh) / [`scripts/quality_check.ps1`](scripts/quality_check.ps1)
* [`scripts/run_tests.sh`](scripts/run_tests.sh) / [`scripts/run_tests.ps1`](scripts/run_tests.ps1)
---
### Handy scripts (after setup)
#### 1) Lint + type check
**Linux / macOS / WSL**
```bash
bash scripts/quality_check.sh
```
**Windows (PowerShell)**
```powershell
.\scripts\quality_check.ps1
```
Runs **ruff** (lint/format check) and **mypy** on `src` and `tests`.
#### 2) Run tests with coverage (unit + integration)
**Linux / macOS / WSL**
```bash
bash scripts/run_tests.sh
```
**Windows (PowerShell)**
```powershell
.\scripts\run_tests.ps1
```
#### 3) Run system tests
**Linux / macOS / WSL**
```bash
bash scripts/run_sys_tests.sh
```
**Windows (PowerShell)**
```powershell
.\scripts\run_sys_tests.ps1
```
Executes **pytest** with a terminal coverage summary (no XML, no slowest list).
## What AsyncFlow Models (v0.1)
AsyncFlow provides a detailed simulation of your backend system. Here is a high-level overview of the core components it models. For a deeper technical dive into the implementation and design rationale, follow the links to the internal documentation.
* **Async Event Loop:** Simulates a single-threaded, non-blocking event loop per server. **CPU steps** block the loop, while **I/O steps** are non-blocking, accurately modeling `asyncio` behavior.
* *(Deep Dive: `docs/internals/runtime-and-resources.md`)*
* **System Resources:** Models finite server resources, including **CPU cores** and **RAM (MB)**. Requests must acquire these resources, creating natural back-pressure and contention when the system is under load.
* *(Deep Dive: `docs/internals/runtime-and-resources.md`)*
* **Endpoints & Request Lifecycles:** Models server endpoints as a linear sequence of **steps**. Each step is a distinct operation, such as `cpu_bound_operation`, `io_wait`, or `ram` allocation.
* *(Schema Definition: `docs/internals/simulation-input.md`)*
* **Network Edges:** Simulates the connections between system components. Each edge has a configurable **latency** (drawn from a probability distribution) and an optional **dropout rate** to model packet loss.
* *(Schema Definition: `docs/internals/simulation-input.md` | Runtime Behavior: `docs/internals/runtime-and-resources.md`)*
* **Stochastic Workload:** Generates user traffic based on a two-stage sampling model, combining the number of active users and their request rate per minute to produce a realistic, fluctuating load (RPS) on the system.
* *(Modeling Details with mathematical explanation and clear assumptions: `docs/internals/requests-generator.md`)*
* **Metrics & Outputs:** Collects two types of data: **time-series metrics** (e.g., `ready_queue_len`, `ram_in_use`) and **event-based data** (`RqsClock`). This raw data is used to calculate final KPIs like **p95/p99 latency** and **throughput**.
* *(Metric Reference: `docs/internals/metrics`)*
## Current Limitations (v0.1)
* Network realism: base latency + optional drops (no bandwidth/payload/TCP yet).
* Single event loop per server: no multi-process/multi-node servers yet.
* Linear endpoint flows: no branching/fan-out within an endpoint.
* No thread-level concurrency; modeling OS threads and scheduler/context switching is out of scope.”
* Stationary workload: no diurnal patterns or feedback/backpressure.
* Sampling cadence: very short spikes can be missed if `sample_period_s` is large.
## Roadmap (Order is not indicative of priority)
This roadmap outlines the key development areas to transform AsyncFlow into a comprehensive framework for statistical analysis and resilience modeling of distributed systems.
### 1. Monte Carlo Simulation Engine
**Why:** To overcome the limitations of a single simulation run and obtain statistically robust results. This transforms the simulator from an "intuition" tool into an engineering tool for data-driven decisions with confidence intervals.
* **Independent Replications:** Run the same simulation N times with different random seeds to sample the space of possible outcomes.
* **Warm-up Period Management:** Introduce a "warm-up" period to be discarded from the analysis, ensuring that metrics are calculated only on the steady-state portion of the simulation.
* **Ensemble Aggregation:** Calculate means, standard deviations, and confidence intervals for aggregated metrics (latency, throughput) across all replications.
* **Confidence Bands:** Visualize time-series data (e.g., queue lengths) with confidence bands to show variability over time.
### 2. Realistic Service Times (Stochastic Service Times)
**Why:** Constant service times underestimate tail latencies (p95/p99), which are almost always driven by "slow" requests. Modeling this variability is crucial for a realistic analysis of bottlenecks.
* **Distributions for Steps:** Allow parameters like `cpu_time` and `io_waiting_time` in an `EndpointStep` to be sampled from statistical distributions (e.g., Lognormal, Gamma, Weibull) instead of being fixed values.
* **Per-Request Sampling:** Each request will sample its own service times independently, simulating the natural variability of a real-world system.
### 3. Component Library Expansion
**Why:** To increase the variety and realism of the architectures that can be modeled.
* **New System Nodes:**
* `CacheRuntime`: To model caching layers (e.g., Redis) with hit/miss logic, TTL, and warm-up behavior.
* `APIGatewayRuntime`: To simulate API Gateways with features like rate-limiting and authentication caching.
* `DBRuntime`: A more advanced model for databases featuring connection pool contention and row-level locking.
* **New Load Balancer Algorithms:** Add more advanced routing strategies (e.g., Weighted Round Robin, Least Response Time).
### 4. Fault and Event Injection
**Why:** To test the resilience and behavior of the system under non-ideal conditions, a fundamental use case for Site Reliability Engineering (SRE).
* **API for Scheduled Events:** Introduce a system to schedule events at specific simulation times, such as:
* **Node Down/Up:** Turn a server off and on to test the load balancer's failover logic.
* **Degraded Edge:** Drastically increase the latency or drop rate of a network link.
* **Error Bursts:** Simulate a temporary increase in the rate of application errors.
### 5. Advanced Network Modeling
**Why:** To more faithfully model network-related bottlenecks that are not solely dependent on latency.
* **Bandwidth and Payload Size:** Introduce the concepts of link bandwidth and request/response size to simulate delays caused by data transfer.
* **Retries and Timeouts:** Model retry and timeout logic at the client or internal service level.
### 6. Complex Endpoint Flows
**Why:** To model more realistic business logic that does not follow a linear path.
* **Conditional Branching:** Introduce the ability to have conditional steps within an endpoint (e.g., a different path for a cache hit vs. a cache miss).
* **Fan-out / Fan-in:** Model scenarios where a service calls multiple downstream services in parallel and waits for their responses.
### 7. Backpressure and Autoscaling
**Why:** To simulate the behavior of modern, adaptive systems that react to load.
* **Dynamic Rate Limiting:** Introduce backpressure mechanisms where services slow down the acceptance of new requests if their internal queues exceed a certain threshold.
* **Autoscaling Policies:** Model simple Horizontal Pod Autoscaler (HPA) policies where the number of server replicas increases or decreases based on metrics like CPU utilization or queue length.
Raw data
{
"_id": null,
"home_page": "https://github.com/AsyncFlow-Sim/AsyncFlow",
"name": "asyncflow-sim",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.12",
"maintainer_email": null,
"keywords": "simulation, simpy, asyncio, capacity-planning, performance, fastapi, uvicorn, distributed-systems, queuing-theory",
"author": "Gioele Botta",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/7c/57/f23541bdaa83ba4946b7f49ec3d2419eec3be51fe62f78e725e8442efb2b/asyncflow_sim-0.1.0a2.tar.gz",
"platform": null,
"description": "\n# AsyncFlow \u2014 Event-Loop Aware Simulator for Async Distributed Systems\n\nCreated and maintained by @GioeleB00.\n\n[](https://pypi.org/project/asyncflow-sim/)\n[](https://pypi.org/project/asyncflow-sim/)\n[](LICENSE)\n[](#)\n[](https://github.com/astral-sh/ruff)\n[](https://mypy-lang.org/)\n[](https://docs.pytest.org/)\n[](https://simpy.readthedocs.io/)\n\n-----\n\nAsyncFlow is a discrete-event simulator for modeling and analyzing the performance of asynchronous, distributed backend systems built with SimPy. You describe your system's topology\u2014its servers, network links, and load balancers\u2014and AsyncFlow simulates the entire lifecycle of requests as they move through it.\n\nIt provides a **digital twin** of your service, modeling not just the high-level architecture but also the low-level behavior of each server's **event loop**, including explicit **CPU work**, **RAM residency**, and **I/O waits**. This allows you to run realistic \"what-if\" scenarios that behave like production systems rather than toy benchmarks.\n\n### What Problem Does It Solve?\n\nModern async stacks like FastAPI are incredibly performant, but predicting their behavior under real-world load is difficult. Capacity planning often relies on guesswork, expensive cloud-based load tests, or discovering bottlenecks only after a production failure. AsyncFlow is designed to replace that uncertainty with **data-driven forecasting**, allowing you to understand how your system will perform before you deploy a single line of code.\n\n### How Does It Work? An Example Topology\n\nAsyncFlow models your system as a directed graph of interconnected components. A typical setup might look like this:\n\n\n\n### What Questions Can It Answer?\n\nBy running simulations on your defined topology, you can get quantitative answers to critical engineering questions, such as:\n\n * How does **p95 latency** change if active users increase from 100 to 200?\n * What is the impact on the system if the **client-to-server network latency** increases by 3ms?\n * Will a specific API endpoint\u2014with a pipeline of parsing, RAM allocation, and database I/O\u2014hold its **SLA at a load of 40 requests per second**?\n---\n\n## Installation\n\nInstall from PyPI: `pip install asyncflow-sim`\n\n\n## Requirements\n\n* **Python 3.12+** (tested on 3.12, 3.13)\n* **OS:** Linux, macOS, or Windows\n* **Installed automatically (runtime deps):**\n **SimPy** (DES engine), **NumPy**, **Matplotlib**, **Pydantic** + **pydantic-settings**, **PyYAML**.\n---\n\n## Quick Start\n\n### 1) Define a realistic YAML\n\nSave as `my_service.yml`.\n\nThe full YAML schema is explained in `docs/guides/yaml-input-builder.md` and validated by Pydantic models (see `docs/internals/simulation-input.md`).\n\n```yaml\nrqs_input:\n id: generator-1\n avg_active_users: { mean: 100, distribution: poisson }\n avg_request_per_minute_per_user: { mean: 100, distribution: poisson }\n user_sampling_window: 60\n\ntopology_graph:\n nodes:\n client: { id: client-1 }\n\n servers:\n - id: app-1\n server_resources: { cpu_cores: 1, ram_mb: 2048 }\n endpoints:\n - endpoint_name: /api\n # Realistic pipeline on one async server:\n # - 2 ms CPU parsing (blocks the event loop)\n # - 120 MB RAM working set (held until the request leaves the server)\n # - 12 ms DB-like I/O (non-blocking wait)\n steps:\n - kind: initial_parsing\n step_operation: { cpu_time: 0.002 }\n - kind: ram\n step_operation: { necessary_ram: 120 }\n - kind: io_db\n step_operation: { io_waiting_time: 0.012 }\n\n edges:\n - { id: gen-client, source: generator-1, target: client-1,\n latency: { mean: 0.003, distribution: exponential } }\n - { id: client-app, source: client-1, target: app-1,\n latency: { mean: 0.003, distribution: exponential } }\n - { id: app-client, source: app-1, target: client-1,\n latency: { mean: 0.003, distribution: exponential } }\n\nsim_settings:\n total_simulation_time: 300\n sample_period_s: 0.05\n enabled_sample_metrics:\n - ready_queue_len\n - event_loop_io_sleep\n - ram_in_use\n - edge_concurrent_connection\n enabled_event_metrics:\n - rqs_clock\n```\n\nPrefer building scenarios in Python? There\u2019s a Python builder with the same semantics (create nodes, edges, endpoints programmatically). See **`docs/guides/python-builder.md`**.\n\n### 2) Run and export charts\n\nSave as `run_my_service.py`.\n\n```python\nfrom __future__ import annotations\n\nfrom pathlib import Path\nimport simpy\nimport matplotlib.pyplot as plt\n\nfrom asyncflow.runtime.simulation_runner import SimulationRunner\nfrom asyncflow.metrics.analyzer import ResultsAnalyzer\n\n\ndef main() -> None:\n script_dir = Path(__file__).parent\n yaml_path = script_dir / \"my_service.yml\"\n out_path = script_dir / \"my_service_plots.png\"\n\n env = simpy.Environment()\n runner = SimulationRunner.from_yaml(env=env, yaml_path=yaml_path)\n res: ResultsAnalyzer = runner.run()\n\n # Print a concise latency summary\n print(res.format_latency_stats())\n\n # 2x2: Latency | Throughput | Ready (first server) | RAM (first server)\n fig, axes = plt.subplots(2, 2, figsize=(12, 8), dpi=160)\n\n res.plot_latency_distribution(axes[0, 0])\n res.plot_throughput(axes[0, 1])\n\n sids = res.list_server_ids()\n if sids:\n sid = sids[0]\n res.plot_single_server_ready_queue(axes[1, 0], sid)\n res.plot_single_server_ram(axes[1, 1], sid)\n else:\n for ax in (axes[1, 0], axes[1, 1]):\n ax.text(0.5, 0.5, \"No servers\", ha=\"center\", va=\"center\")\n ax.axis(\"off\")\n\n fig.tight_layout()\n fig.savefig(out_path)\n print(f\"Plots saved to: {out_path}\")\n\n\nif __name__ == \"__main__\":\n main()\n\n```\n\nRun the python script\n\nYou\u2019ll get latency stats in the terminal and a PNG with four charts (latency distribution, throughput, server queues, RAM usage).\n\n**Want more?** \n\nFor ready-to-run scenarios\u2014including examples using the Pythonic builder and multi-server topologies\u2014check out the `examples/` directory in the repository.\n\n## Development\n\nIf you want to contribute or run the full test suite locally, follow these steps.\n\n### Requirements\n\n* **Python 3.12+** (tested on 3.12, 3.13)\n* **OS:** Linux, macOS, or Windows\n* **Runtime deps installed by the package:** SimPy, NumPy, Matplotlib, Pydantic, PyYAML, pydantic-settings\n\n**Prerequisites:** Git, Python 3.12+ in `PATH`, `curl` (Linux/macOS/WSL), PowerShell 7+ (Windows)\n\n---\n\n## Project setup\n\n```bash\ngit clone https://github.com/AsyncFlow-Sim/AsyncFlow.git\ncd AsyncFlow\n```\n\nFrom the repo root, run the **one-shot post-clone setup**:\n\n**Linux / macOS / WSL**\n\n```bash\nbash scripts/dev_setup.sh\n```\n\n**Windows (PowerShell)**\n\n```powershell\n# If scripts are blocked by policy, run this in the same PowerShell session:\n# Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass\n.\\scripts\\dev_setup.ps1\n```\n\n**What this does (concise):**\n\n* Ensures **Poetry** is available (installs if missing).\n* Uses a **project-local `.venv`**.\n* Removes `poetry.lock` for a **clean dependency resolve** (dev policy).\n* Installs the project **with dev extras**.\n* Runs **ruff**, **mypy**, and **pytest (with coverage)**.\n\n**Quick sanity check after setup:**\n\n```bash\npoetry --version\npoetry run python -V\n```\n\n> **Note (lock policy):** `dev_setup` intentionally removes `poetry.lock` to avoid cross-platform conflicts during development.\n\n**Scripts (for quick access):**\n\n* [`scripts/dev_setup.sh`](scripts/dev_setup.sh) / [`scripts/dev_setup.ps1`](scripts/dev_setup.ps1)\n* [`scripts/quality_check.sh`](scripts/quality_check.sh) / [`scripts/quality_check.ps1`](scripts/quality_check.ps1)\n* [`scripts/run_tests.sh`](scripts/run_tests.sh) / [`scripts/run_tests.ps1`](scripts/run_tests.ps1)\n\n---\n\n### Handy scripts (after setup)\n\n#### 1) Lint + type check\n\n**Linux / macOS / WSL**\n\n```bash\nbash scripts/quality_check.sh\n```\n\n**Windows (PowerShell)**\n\n```powershell\n.\\scripts\\quality_check.ps1\n```\n\nRuns **ruff** (lint/format check) and **mypy** on `src` and `tests`.\n\n#### 2) Run tests with coverage (unit + integration)\n\n**Linux / macOS / WSL**\n\n```bash\nbash scripts/run_tests.sh\n```\n\n**Windows (PowerShell)**\n\n```powershell\n.\\scripts\\run_tests.ps1\n```\n\n#### 3) Run system tests\n\n**Linux / macOS / WSL**\n\n```bash\nbash scripts/run_sys_tests.sh\n```\n\n**Windows (PowerShell)**\n\n```powershell\n.\\scripts\\run_sys_tests.ps1\n```\n\nExecutes **pytest** with a terminal coverage summary (no XML, no slowest list).\n\n\n\n## What AsyncFlow Models (v0.1)\n\nAsyncFlow provides a detailed simulation of your backend system. Here is a high-level overview of the core components it models. For a deeper technical dive into the implementation and design rationale, follow the links to the internal documentation.\n\n* **Async Event Loop:** Simulates a single-threaded, non-blocking event loop per server. **CPU steps** block the loop, while **I/O steps** are non-blocking, accurately modeling `asyncio` behavior.\n * *(Deep Dive: `docs/internals/runtime-and-resources.md`)*\n\n* **System Resources:** Models finite server resources, including **CPU cores** and **RAM (MB)**. Requests must acquire these resources, creating natural back-pressure and contention when the system is under load.\n * *(Deep Dive: `docs/internals/runtime-and-resources.md`)*\n\n* **Endpoints & Request Lifecycles:** Models server endpoints as a linear sequence of **steps**. Each step is a distinct operation, such as `cpu_bound_operation`, `io_wait`, or `ram` allocation.\n * *(Schema Definition: `docs/internals/simulation-input.md`)*\n\n* **Network Edges:** Simulates the connections between system components. Each edge has a configurable **latency** (drawn from a probability distribution) and an optional **dropout rate** to model packet loss.\n * *(Schema Definition: `docs/internals/simulation-input.md` | Runtime Behavior: `docs/internals/runtime-and-resources.md`)*\n\n* **Stochastic Workload:** Generates user traffic based on a two-stage sampling model, combining the number of active users and their request rate per minute to produce a realistic, fluctuating load (RPS) on the system.\n * *(Modeling Details with mathematical explanation and clear assumptions: `docs/internals/requests-generator.md`)*\n\n* **Metrics & Outputs:** Collects two types of data: **time-series metrics** (e.g., `ready_queue_len`, `ram_in_use`) and **event-based data** (`RqsClock`). This raw data is used to calculate final KPIs like **p95/p99 latency** and **throughput**.\n * *(Metric Reference: `docs/internals/metrics`)*\n\n## Current Limitations (v0.1)\n\n* Network realism: base latency + optional drops (no bandwidth/payload/TCP yet).\n* Single event loop per server: no multi-process/multi-node servers yet.\n* Linear endpoint flows: no branching/fan-out within an endpoint.\n* No thread-level concurrency; modeling OS threads and scheduler/context switching is out of scope.\u201d\n* Stationary workload: no diurnal patterns or feedback/backpressure.\n* Sampling cadence: very short spikes can be missed if `sample_period_s` is large.\n\n\n## Roadmap (Order is not indicative of priority)\n\nThis roadmap outlines the key development areas to transform AsyncFlow into a comprehensive framework for statistical analysis and resilience modeling of distributed systems.\n\n### 1. Monte Carlo Simulation Engine\n\n**Why:** To overcome the limitations of a single simulation run and obtain statistically robust results. This transforms the simulator from an \"intuition\" tool into an engineering tool for data-driven decisions with confidence intervals.\n\n* **Independent Replications:** Run the same simulation N times with different random seeds to sample the space of possible outcomes.\n* **Warm-up Period Management:** Introduce a \"warm-up\" period to be discarded from the analysis, ensuring that metrics are calculated only on the steady-state portion of the simulation.\n* **Ensemble Aggregation:** Calculate means, standard deviations, and confidence intervals for aggregated metrics (latency, throughput) across all replications.\n* **Confidence Bands:** Visualize time-series data (e.g., queue lengths) with confidence bands to show variability over time.\n\n### 2. Realistic Service Times (Stochastic Service Times)\n\n**Why:** Constant service times underestimate tail latencies (p95/p99), which are almost always driven by \"slow\" requests. Modeling this variability is crucial for a realistic analysis of bottlenecks.\n\n* **Distributions for Steps:** Allow parameters like `cpu_time` and `io_waiting_time` in an `EndpointStep` to be sampled from statistical distributions (e.g., Lognormal, Gamma, Weibull) instead of being fixed values.\n* **Per-Request Sampling:** Each request will sample its own service times independently, simulating the natural variability of a real-world system.\n\n### 3. Component Library Expansion\n\n**Why:** To increase the variety and realism of the architectures that can be modeled.\n\n* **New System Nodes:**\n * `CacheRuntime`: To model caching layers (e.g., Redis) with hit/miss logic, TTL, and warm-up behavior.\n * `APIGatewayRuntime`: To simulate API Gateways with features like rate-limiting and authentication caching.\n * `DBRuntime`: A more advanced model for databases featuring connection pool contention and row-level locking.\n* **New Load Balancer Algorithms:** Add more advanced routing strategies (e.g., Weighted Round Robin, Least Response Time).\n\n### 4. Fault and Event Injection\n\n**Why:** To test the resilience and behavior of the system under non-ideal conditions, a fundamental use case for Site Reliability Engineering (SRE).\n\n* **API for Scheduled Events:** Introduce a system to schedule events at specific simulation times, such as:\n * **Node Down/Up:** Turn a server off and on to test the load balancer's failover logic.\n * **Degraded Edge:** Drastically increase the latency or drop rate of a network link.\n * **Error Bursts:** Simulate a temporary increase in the rate of application errors.\n\n### 5. Advanced Network Modeling\n\n**Why:** To more faithfully model network-related bottlenecks that are not solely dependent on latency.\n\n* **Bandwidth and Payload Size:** Introduce the concepts of link bandwidth and request/response size to simulate delays caused by data transfer.\n* **Retries and Timeouts:** Model retry and timeout logic at the client or internal service level.\n\n### 6. Complex Endpoint Flows\n\n**Why:** To model more realistic business logic that does not follow a linear path.\n\n* **Conditional Branching:** Introduce the ability to have conditional steps within an endpoint (e.g., a different path for a cache hit vs. a cache miss).\n* **Fan-out / Fan-in:** Model scenarios where a service calls multiple downstream services in parallel and waits for their responses.\n\n### 7. Backpressure and Autoscaling\n\n**Why:** To simulate the behavior of modern, adaptive systems that react to load.\n\n* **Dynamic Rate Limiting:** Introduce backpressure mechanisms where services slow down the acceptance of new requests if their internal queues exceed a certain threshold.\n* **Autoscaling Policies:** Model simple Horizontal Pod Autoscaler (HPA) policies where the number of server replicas increases or decreases based on metrics like CPU utilization or queue length.\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Digital-twin simulator for distributed async systems. Build what-if scenarios and quantify capacity, latency and throughput offline, before you deploy.",
"version": "0.1.0a2",
"project_urls": {
"Documentation": "https://github.com/AsyncFlow-Sim/AsyncFlow/tree/main/docs",
"Homepage": "https://github.com/AsyncFlow-Sim/AsyncFlow",
"Repository": "https://github.com/AsyncFlow-Sim/AsyncFlow"
},
"split_keywords": [
"simulation",
" simpy",
" asyncio",
" capacity-planning",
" performance",
" fastapi",
" uvicorn",
" distributed-systems",
" queuing-theory"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "31b3596a3bb9a969ae79e62408b62b4b3bf36a20640d8ee69b548e266bc1856f",
"md5": "188e176ee8f66adc7da38cdb37b8f755",
"sha256": "65a3ecdb6d1a91e8f88a8e46066ff96ea59b0f2ab2443da10e54648a2ebe8b7f"
},
"downloads": -1,
"filename": "asyncflow_sim-0.1.0a2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "188e176ee8f66adc7da38cdb37b8f755",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.12",
"size": 52965,
"upload_time": "2025-08-17T20:15:43",
"upload_time_iso_8601": "2025-08-17T20:15:43.706111Z",
"url": "https://files.pythonhosted.org/packages/31/b3/596a3bb9a969ae79e62408b62b4b3bf36a20640d8ee69b548e266bc1856f/asyncflow_sim-0.1.0a2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7c57f23541bdaa83ba4946b7f49ec3d2419eec3be51fe62f78e725e8442efb2b",
"md5": "b894135f7d15a316e885661d2f224aac",
"sha256": "3f4b0c7b50b62c1ceb361402575d567e0bd27e7697af1b3d17bdb4d2270e72b3"
},
"downloads": -1,
"filename": "asyncflow_sim-0.1.0a2.tar.gz",
"has_sig": false,
"md5_digest": "b894135f7d15a316e885661d2f224aac",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.12",
"size": 44483,
"upload_time": "2025-08-17T20:15:45",
"upload_time_iso_8601": "2025-08-17T20:15:45.143075Z",
"url": "https://files.pythonhosted.org/packages/7c/57/f23541bdaa83ba4946b7f49ec3d2419eec3be51fe62f78e725e8442efb2b/asyncflow_sim-0.1.0a2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-17 20:15:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AsyncFlow-Sim",
"github_project": "AsyncFlow",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "asyncflow-sim"
}