AI Safety Strategy·Exploring·Last reviewed May 1, 2026
This page is a stub. I’ve marked the territory but haven’t written my views here yet. The headings below are placeholders — the actual beliefs, uncertainties, and evidence are still in my notes. If you want my current take on this topic before it lands here, get in touch.
Where I currently stand
<Headline view: how I see the field's internal norms — what gets shared, what's held back, how disagreement is handled, and where the obvious failure modes are. 3–4 sentences.>
Current beliefs
- <e.g. The field's tolerance for working at frontier labs while publicly criticising them is healthier than outsiders assume, and is load-bearing for safety progress.> ~XX% — <one-line why>.
- <Claim about whether info-sharing norms (what gets published, what's held for safety reasons) are well-calibrated or systematically too loose / too tight.> ~XX% — <why>.
- <Claim about whether intra-lab safety teams can credibly red-team their own labs without capture.> ~XX% — <why>.
Uncertainties
- Does the field have a real mechanism for sanctioning bad actors, or only social ones that don't bind on labs? Why it matters: the answer changes how much policy effort is needed to backstop norms.
- Are race dynamics inside the field (between labs, between safety orgs) net helpful or net harmful for safety outcomes? Why it matters: changes how to think about competition vs. coordination interventions.
What would update me
- A documented case of safety-relevant information being held back at material cost (not just rhetorical commitment) would push me toward higher confidence in real norms.
- Repeated cases of safety teams losing internal disputes about deployment timing would push me toward thinking intra-lab safety is structurally too weak.
Recent reading
- <date> — <title> — <takeaway>.
Related writing
No essays tagged with this topic yet.