The Borderlands Atlas

Technical Safety·Exploring·Last reviewed May 1, 2026

This page is a stub. I’ve marked the territory but haven’t written my views here yet. The headings below are placeholders — the actual beliefs, uncertainties, and evidence are still in my notes. If you want my current take on this topic before it lands here, get in touch.

Where I currently stand

Current beliefs

Most current dangerous-capability evals systematically under-elicit; "the model can't do X" is rarely actionable until elicitation has been pushed hard. ~XX% — <why>.
Eval results without elicitation methodology are uninterpretable. ~XX% — <why>.
<Claim about the role of third-party evaluators.> ~XX% — <why>.

Uncertainties

Are agentic-task evals the right operational measure, or do they misallocate effort? Why it matters: shapes where third-party evaluators should focus.
How much do evals generalise from one frontier model family to the next? Why it matters: determines whether eval suites can be re-used across releases.

What would update me

Repeated cases where models pass an eval but fail in deployment on materially similar tasks would push me toward more pessimism about eval validity.
A standards body publishing a methodologically rigorous eval taxonomy would change how I think about the path to operational use.

Recent reading

<date> — <title> — <takeaway>.

Related writing

No essays tagged with this topic yet.

Evaluations