Information Security and Model Weights

Whether model weights can be defended at all, and what changes if they can't.

Governance·Exploring·Last reviewed May 1, 2026

This page is a stub. I’ve marked the territory but haven’t written my views here yet. The headings below are placeholders — the actual beliefs, uncertainties, and evidence are still in my notes. If you want my current take on this topic before it lands here, get in touch.

Where I currently stand

<Headline view, with reference to the RAND security-levels frame (SL1–SL5): the gap between current frontier-lab security practice and the level needed to defend against state actors is several orders of magnitude, and closing it is hard for structural reasons rather than technical ones. This conditions a lot of the rest of the governance picture.>

Current beliefs

  • Frontier labs are not currently at SL4-equivalent security and are unlikely to reach it without major external pressure. ~XX%<why>.
  • Open-weights policy and weight-security policy are the same conversation; you can't separate them coherently. ~XX%<why>.
  • <Claim about voluntary versus mandated security uplift.> ~XX%<why>.

Uncertainties

  • Is SL5-equivalent security technically achievable without crippling research velocity? Why it matters: determines whether the high end of the RAND ladder is a real target or a goal post.
  • Does defence-in-depth on weights matter if elicitation can extract dangerous capabilities at scale from leaked weights anyway? Why it matters: shapes the relative priority of weight protection versus capability mitigations.

What would update me

  • A credible state-actor weight exfiltration becoming public would reset the policy conversation.
  • A demonstrated security architecture that achieves high RAND levels without unacceptable research friction would meaningfully shift the picture.

Recent reading

  • <date><title><takeaway>.

Related writing

No essays tagged with this topic yet.

Related regions