The Borderlands Atlas

Technical Governance·Exploring·Last reviewed May 1, 2026

This page is a stub. I’ve marked the territory but haven’t written my views here yet. The headings below are placeholders — the actual beliefs, uncertainties, and evidence are still in my notes. If you want my current take on this topic before it lands here, get in touch.

Where I currently stand

Current beliefs

Static dangerous-capability evals decay faster than the standards bodies that adopt them can update. ~XX% — <why>.
Dynamic evaluations require a continuous third-party evaluator, not just a regulator with checklist authority. ~XX% — <why>.
<Claim about adversarial / co-evolving eval design.> ~XX% — <why>.

Uncertainties

What is the right cadence for re-evaluation — per release, per capability cluster, or continuously? Why it matters: determines the operational shape of the regime.
Can dynamic-eval results be used as binding regulatory triggers without becoming gameable? Why it matters: determines whether this is a genuine governance tool or a perpetual research project.

What would update me

A standards body adopting a meaningfully dynamic eval regime (rather than a one-shot eval list) would prove the design space is real.
Public evidence that a frontier model passed the relevant eval and then exhibited the underlying capability anyway in deployment.

Recent reading

<date> — <title> — <takeaway>.

Related writing

No essays tagged with this topic yet.

Dynamic Evaluations