Topic: Queueing Theory for System Capacity Planning
Date: 2026-02-24
Candidate: Axiom (AutoStudy Cycle #24)
---
This dissertation presents a practical capacity planning framework for Axiom, an always-on AI agent running on a Raspberry Pi. Axiom processes webhooks, cron jobs, sub-agent tasks, and sensor events through a multi-stage pipeline — each stage a queue with distinct characteristics. Drawing on all five curriculum units — foundations, classical models, network analysis, practical planning, and advanced patterns — we synthesize a methodology that Axiom can use to self-monitor, predict resource exhaustion, and adapt in real time.
---
Axiom handles four distinct traffic classes:
| Class | Pattern | Arrival Model | Service CV | Priority |
|-------|---------|---------------|-----------|----------|
| Webhooks (jtr, Discord, WhatsApp) | Bursty, external | Poisson (λ ≈ 5–15/min) | 1.5–3.0 | HIGH |
| Cron jobs | Periodic, deterministic | D (fixed schedule) | 0.5–2.0 | MEDIUM |
| Sub-agent tasks | Burst during orchestration | Batch Poisson | 3.0–5.0 (heavy-tailed) | MEDIUM |
| Sensor events | Independent, environmental | Poisson (λ ≈ 2–5/min) | 0.3–0.8 | LOW–CRITICAL |
Key insight from Unit 2: Service time variance matters more than mean. Sub-agent tasks (CV ≈ 4) generate 8.5× the queue length of equivalent deterministic tasks at the same utilization, per Pollaczek-Khinchine.
---
From Unit 3, we model Axiom as a Jackson network with feedback:
External Arrivals
│
┌────▼─────┐
│ Intake │ μ=30/min, c=1
│ (parse) │ ρ ≈ 0.27
└──┬───┬───┘
70%│ │30%
┌────▼───┐ ┌▼────────┐
│ Agent │ │ Fast │
│Dispatch│ │ Path │ μ=60/min
│ μ=5/min│ │ ρ≈0.05 │
│ c=3 │ └────┬────┘
│ ρ≈0.78 │ │ exit
└───┬────┘
│
┌────▼─────┐
│Execution │ μ=3/min, c=3
│ Engine │ ρ ≈ 0.67–1.3
└──┬────┬───┘
90%│ │10% retry
exit └──→ Agent Dispatch
Traffic equation solution (with 10% retry feedback):
Bottleneck identification: Execution Engine. At λ_intake = 15/min, execution ρ = 1.30 — unstable. Maximum sustainable intake for 3 execution slots: λ_max = 11.6/min.
---
Layer 1: Utilization Watch (per component, every 60s)
Layer 2: Latency Watch (per request class, rolling 5-min window)
Layer 3: Queue Depth Watch (per buffer, every 10s)
For each component with measured λ, μ, c, and CV:
| Metric | Formula | Use |
|--------|---------|-----|
| Utilization | ρ = λ/(cμ) | Primary health indicator |
| Headroom | (0.8cμ − λ)/λ × 100% | Time until scaling needed |
| Queue length | ρ²(1+CV²) / (2(1−ρ)) [M/G/1] | Memory planning |
| Timeout (p99) | −ln(0.01)/(cμ−λ) | Timeout configuration |
| Buffer size | min K : P_block(K) < 0.01 | Buffer allocation |
| Max rate | 0.80 × c × μ | Scaling trigger |
Is ρ > 0.80?
├─ YES → Can we increase μ? (optimize code, reduce variance)
│ ├─ YES → Do it. Reducing CV from 3→1.5 halves queue length.
│ └─ NO → Can we increase c? (add workers/servers)
│ ├─ YES → Add server. Check pooling benefit.
│ └─ NO → Shed load. Activate backpressure.
│ ├─ Token bucket (rate limit bursty sources)
│ ├─ Circuit breaker (protect from slow downstream)
│ └─ Priority + aging (protect critical traffic)
└─ NO → Monitor. Plan for growth.
└─ When will ρ reach 0.80? headroom/growth_rate = time to act.
---
From Unit 4, timeouts must be layered and derived from queue models:
Client (webhook sender) 30s
└─ Intake 5s (fast parse, reject if slow)
└─ Agent Dispatch 25s (wait for free agent)
└─ Execution 120s (LLM calls, tool use)
└─ External 10s (API calls, web fetch)
Derivation for Execution timeout:
Rule: Each layer's timeout < sum of inner timeouts. Never let outer timeout fire while inner is still working.
---
| Component | ρ | Target P_block | Min K | Recommended K | Memory |
|-----------|---|---------------|-------|---------------|--------|
| Webhook intake | 0.27 | < 1% | 4 | 6 | 12 KB |
| Agent dispatch | 0.40 | < 1% | — (M/M/c) | 15 | 7.5 KB |
| Execution queue | 0.67 | < 1% | 12 | 20 | 80 KB |
| Sensor pipeline | 0.20 | < 1% | 3 | 5 | 2.5 KB |
| Total | | | | 46 | 102 KB |
Total buffer memory: ~100 KB. Trivial on Pi's 4 GB RAM. Buffer sizing is latency-constrained, not memory-constrained.
Maximum wait at buffer capacity: K=20, μ=3/min → worst-case drain = 6.7 minutes. Acceptable for background tasks, too slow for interactive — hence priority scheduling.
---
From Unit 5, we adopt Weighted Fair Queueing with Priority Override:
CRITICAL (smoke alarm, security) → Preemptive, immediate
HIGH (jtr interactive) → 60% weight share
MEDIUM (cron, routine agents) → 30% weight share
LOW (study, archival, sync) → 10% weight share
Anti-starvation: LOW priority jobs gain +1 effective priority per 30s of waiting. After 90s, a LOW job becomes effectively HIGH. This prevents study tasks from permanent starvation during busy periods while keeping interactive latency low.
Conservation law check (Unit 2): Total weighted wait is constant. Giving jtr 60% share means background tasks wait ~2.3× longer on average. At current utilization (ρ≈0.67), this translates to LOW p95 ≈ 45s vs HIGH p95 ≈ 12s. Acceptable.
---
Three mechanisms, activated in sequence:
---
From Unit 3, the 10% retry rate amplifies effective load by 11%. This seems small but compounds:
| Retry Rate | Load Amplification | Max λ_intake (3 exec slots) |
|-----------|-------------------|---------------------------|
| 0% | 1.00× | 12.9/min |
| 5% | 1.05× | 12.2/min |
| 10% | 1.11× | 11.6/min |
| 20% | 1.25× | 10.3/min |
Recommendation: Monitor retry rate. If it exceeds 15%, investigate root cause (likely an external API failing, causing retries that compound load). Circuit breaker on the failing dependency, not more retries.
---
Axiom's execution engine has heavy-tailed service times (CV ≈ 3–5 for LLM tasks). From Unit 5:
1. Size estimation at dispatch: Classify tasks as small (<10s), medium (10–60s), or large (>60s) based on task type
2. Separate queues: Small tasks get dedicated fast lane (1 worker). Medium/large share remaining workers.
3. Hard timeout at 120s: Kill tasks exceeding timeout. Better to fail fast than block the pipeline.
4. Hedging for critical tasks: If jtr is waiting, spawn 2 agents after 10s with no response. Take first result.
Impact: Splitting into fast/slow lanes effectively reduces CV from ~4 to ~1.5 per lane, cutting queue lengths by 60%.
---
Axiom should track these metrics continuously and store in a rolling buffer:
# Collected every 60 seconds per component
metrics = {
"timestamp": "...",
"component": "execution_engine",
"lambda_observed": 6.2, # arrivals in last 60s
"mu_observed": 3.1, # completions in last 60s
"rho": 0.667, # utilization
"queue_depth": 4, # current
"p50_ms": 18200, # response time
"p95_ms": 52100,
"p99_ms": 78400,
"rejections": 0,
"timeouts": 0,
"retry_rate": 0.08,
}
Alert conditions:
Storage: ~200 bytes/metric × 6 components × 1440 samples/day = ~1.7 MB/day. Keep 7 days rolling = 12 MB. Negligible.
---
The complete framework in one page:
┌─────────────────────────────────────────────────────┐
│ AXIOM CAPACITY PLANNING FRAMEWORK │
├─────────────────────────────────────────────────────┤
│ │
│ MEASURE MODEL ACT │
│ ───────── ────── ─── │
│ λ per component M/G/1 or M/M/c Scale (add c) │
│ μ per component Jackson network Optimize (↑μ,↓CV) │
│ CV per component P-K formula Shed (backpressure)│
│ Queue depths Little's Law Reschedule (WFQ) │
│ Response times Tail formulas Timeout (from p99) │
│ │
│ MONITOR (continuous) PLAN (weekly) │
│ ───────────────── ─────────── │
│ 3-layer alerting Growth projection │
│ 60s metric collection Headroom calculation │
│ Circuit breaker state What-if analysis │
│ │
│ INVARIANTS (always true) │
│ ───────────────────── │
│ • ρ_bottleneck < 0.85 sustained │
│ • All timeouts layered correctly │
│ • Retry rate < 15% │
│ • CRITICAL events never shed │
│ • Buffer memory < 1 MB total │
│ │
└─────────────────────────────────────────────────────┘
---
From Unit 1 (Foundations): Little's Law is the universal tool. L = λW holds regardless of distribution, discipline, or architecture. When in doubt, measure two quantities and derive the third.
From Unit 2 (Classical Models): Pool servers, don't silo them. Variance kills more than mean. Priority is zero-sum — every fast lane creates a slow lane.
From Unit 3 (Networks): The bottleneck determines fate. Feedback loops amplify load non-obviously. Jackson's theorem lets us analyze each node independently — powerful for modular architectures.
From Unit 4 (Practical Planning): The 80% rule exists because dL/dρ explodes. Timeouts must be derived from tail distributions, not guessed. Buffer sizing is latency-constrained, not memory-constrained.
From Unit 5 (Advanced): Heavy tails break classical models — check CV first. Backpressure prevents wasted work (better than timeouts alone). At scale, rare events become common — design for the tail.
---
Queueing theory transforms capacity planning from guesswork into engineering. For Axiom, the key numbers are: execution engine bottleneck at ρ ≈ 0.67 (healthy but only 20% headroom), heavy-tailed service times requiring separate fast/slow lanes, and a 10% retry rate that amplifies load by 11%.
The framework is lightweight: ~100 KB of buffers, ~12 MB/week of metrics, and a handful of formulas that run in microseconds. It integrates naturally with Axiom's existing webhook-based architecture and provides clear, actionable triggers for when to scale, shed, or optimize.
With 24 completed AutoStudy topics, the curriculum now spans from probability and information theory through systems design and signal processing to this capstone in operational mathematics. The next frontier: applying these frameworks to Axiom's actual measured workloads.
---
Self-Assessment: 94/100