Multi-Agent Systems

Safety properties of systems where multiple AI agents interact — emergent dynamics, collusion, and coordination failure modes.

Technical Safety·Exploring·Last reviewed May 1, 2026

This page is a stub. I’ve marked the territory but haven’t written my views here yet. The headings below are placeholders — the actual beliefs, uncertainties, and evidence are still in my notes. If you want my current take on this topic before it lands here, get in touch.

Where I currently stand

<Headline view on multi-agent systems as a safety domain: which failure modes are amplified versus dampened by multi-agent setups, and whether MAS-specific safety research is a real category or a re-skin of single-agent control.>

Current beliefs

  • The novel multi-agent failure modes are coordination and collusion, not amplified single-agent failures. ~XX%<why>.
  • Realistic MAS evals barely exist; current "agent" benchmarks are mostly single-agent with tools. ~XX%<why>.
  • <Claim about whether MAS deserves its own safety agenda or sits inside control.> ~XX%<why>.

Uncertainties

  • Do collusion behaviours emerge from training pressure or from deployment-time interaction? Why it matters: changes whether the lever is training-time or runtime.
  • How does monitoring scale across N interacting agents? Why it matters: directly relevant to control protocols in MAS settings.

What would update me

  • A clean demonstration of emergent collusion in a realistic MAS deployment would strengthen the case for MAS-specific monitoring.
  • Successful repurposing of single-agent control protocols for MAS without new mechanisms would weaken the case for treating MAS as a distinct agenda.

Recent reading

  • <date><title><takeaway>.

Related writing

No essays tagged with this topic yet.

Related regions