The Borderlands Atlas

Technical Safety·Developing·Last reviewed May 1, 2026

This page is a stub. I’ve marked the territory but haven’t written my views here yet. The headings below are placeholders — the actual beliefs, uncertainties, and evidence are still in my notes. If you want my current take on this topic before it lands here, get in touch.

Where I currently stand

Current beliefs

Control is the right framing for the next 1–3 years of frontier deployment. ~XX% — <why>.
Control protocols are only as strong as their evaluation; the bottleneck is realistic red-teaming, not protocol design. ~XX% — <why>.
<Claim about untrusted-monitor / trusted-edit / etc.> ~XX% — <why>.

Uncertainties

How does control degrade as models become better at recognising they are in a control protocol? Why it matters: load-bearing for whether control buys us years or months at the frontier.
Are control protocols composable, or does adding more protocols leak more attack surface than they cover? Why it matters: architectural question for any deployment.

What would update me

A successful control-protocol breakdown in a realistic red-team setup would push me to take protocol fragility more seriously.
Demonstration that simple control protocols robustly survive stress-testing at frontier scale would meaningfully strengthen the deployment story.

Recent reading

<date> — <title> — <takeaway>.

Related writing

No essays tagged with this topic yet.

AI Control