April 14, 20263 min readCHAOS Team

Meet the PM Engine

The orchestration runtime at the core of CHAOS — how the PM dispatcher, EventBus, and PostgreSQL task queue coordinate 36 agents in real parallel pipelines.

architectureorchestrationdeep-dive

Not Prompt Chaining

Most "multi-agent" AI systems are prompt chains with extra steps. Agent A's output becomes Agent B's input. Sequential. Slow. No real parallelism. No shared state.

The CHAOS PM engine is different. It is a real runtime: subprocesses, a PostgreSQL task queue, an event bus, and a structured result collection layer. The difference matters in practice — a five-agent pipeline takes roughly as long as the slowest single agent, not five agents running in sequence.

The Components

PMDispatcher

The PMDispatcher receives a high-level task (from the CLI, a skill command, or an MCP tool call) and decomposes it into subtasks. Each subtask is written to the PostgreSQL task queue with a priority, dependencies, and an agent assignment.

Decomposition is not hardcoded. The PM agent reasons about the task, selects the appropriate agents, determines which can run in parallel, and which must wait for dependencies to complete.

PostgreSQL Task Queue

The task queue uses SELECT ... FOR UPDATE SKIP LOCKED — a PostgreSQL pattern for durable, concurrent work distribution. Multiple agent workers can pull tasks simultaneously without stepping on each other. If a worker crashes mid-task, the row unlock allows another worker to pick it up.

This is not a simple array in memory. It is a proper job queue with retry semantics, priority ordering, and crash recovery.

EventBus (PostgreSQL LISTEN/NOTIFY)

The EventBus uses PostgreSQL's built-in pub/sub mechanism. When an agent emits a finding, completes a task, or hits a checkpoint requiring input, it publishes to the bus. Other agents (and the PM coordinator) receive the notification in real time.

This is how the PM knows when to unblock dependent tasks. When Agent A signals completion, the PM immediately dispatches Agent B without polling.

AgentManager

The AgentManager tracks the lifecycle of all running agents — subprocess PID, current status, last heartbeat, task assignment. It handles process spawning, health monitoring, and graceful termination. On Windows, it uses psutil for process inspection since os.kill(pid, 0) does not work cross-platform.

Structured Result Collection

When agents complete, they write structured output to .claude/agent-state/<agent>.output.md — a YAML-frontmattered document with summary, files changed, status, and metadata. The PMDispatcher reads these outputs, merges findings, and surfaces a unified result to the user.

A Pipeline in Practice

Here is what happens when you run cw run pm "feature complete pipeline for src/payments/":

1. PM agent receives the task and decomposes it into: refactor → test → docs → review 2. PMDispatcher writes four tasks to the PostgreSQL queue 3. AgentManager spawns refactor-agent as a subprocess 4. refactor-agent reads project context, makes changes, signals completion via EventBus 5. PMDispatcher receives signal, unblocks test-agent and docs-agent (they run in parallel) 6. Both complete, signal the bus 7. review-agent starts with findings from test and docs already injected as context 8. review-agent completes, PM collects all four outputs and presents a merged report

Wall-clock time: roughly as long as refactor + max(test, docs) + review.

42 FastAPI Endpoints

The PM engine exposes a full REST API — 42 FastAPI endpoints — covering task dispatch, agent status, session history, context management, and event streaming. The web dashboard consumes this API. So does the MCP server. So can your own automation scripts.

The Design Goal

The PM engine exists to answer one question: given a developer's intent, what is the fastest path to a verified, high-quality result — and how do you get there with the fewest tokens spent?

Every component — the queue, the bus, the context injection, the parallel dispatch — is in service of that single goal.