Technical Safety·Developing·Last reviewed May 1, 2026
This page is a stub. I’ve marked the territory but haven’t written my views here yet. The headings below are placeholders — the actual beliefs, uncertainties, and evidence are still in my notes. If you want my current take on this topic before it lands here, get in touch.
Where I currently stand
<Headline view on AI control as a research agenda: what makes it different from alignment, what it can and can't do, and how it should compose with monitoring. The Greenblatt et al. framing is the natural reference point.>
Current beliefs
- Control is the right framing for the next 1–3 years of frontier deployment. ~XX% — <why>.
- Control protocols are only as strong as their evaluation; the bottleneck is realistic red-teaming, not protocol design. ~XX% — <why>.
- <Claim about untrusted-monitor / trusted-edit / etc.> ~XX% — <why>.
Uncertainties
- How does control degrade as models become better at recognising they are in a control protocol? Why it matters: load-bearing for whether control buys us years or months at the frontier.
- Are control protocols composable, or does adding more protocols leak more attack surface than they cover? Why it matters: architectural question for any deployment.
What would update me
- A successful control-protocol breakdown in a realistic red-team setup would push me to take protocol fragility more seriously.
- Demonstration that simple control protocols robustly survive stress-testing at frontier scale would meaningfully strengthen the deployment story.
Recent reading
- <date> — <title> — <takeaway>.
Related writing
No essays tagged with this topic yet.