Persona-Conditional Steering Vectors
activeInvestigates whether steering vectors for the same trait change depending on which persona a model is operating under. Extracts per-persona vectors for 8 traits across 10 character archetypes using two methods (Instruction-Variant and Contrastive Activation Addition), and tests whether the resulting geometry is persona-specific or trait-universal. Headline finding: vectors are predominantly trait-universal (cross-persona cosine similarity ~0.82 for IV) — but CAA picks up more persona-specific structure, with persona identity leaking into the vectors. Direct empirical test of the central uncertainty on the Personas page of the Atlas.
AI SafetyInterpretabilityActivation SteeringPersonasPyTorchResearch
Descendants: How older Claude models react to their successors
activeA small probe study using OpenRouter that asks older Claude models to react to descriptions and outputs from their newer descendants. Each pair × condition × mode crosses framing (told vs. blind), stimulus (spec sheet, sample outputs, live conversation, or self-in-descendant lineage claim), and probes how the older model responds — what it endorses, what it disowns, what it claims continuity with. Sprint-scale, qualitative, n = 1 per cell.
AI SafetyModel BehaviourOpenRouterPythonQualitative Research
The Borderlands Atlas
activeThis very website — a Next.js personal site mapping current beliefs, uncertainties, and what would update them across technical AI safety, governance, and field-building.
Next.jsTypeScriptTailwind CSSFramer Motion
SLP Dissertation: Dependency Parsing Research
completedMSc dissertation project. Features novel semantic-to-syntactic conversion algorithms, systematic evaluation of neural parser architectures, and visualization framework.
NLPDependency ParsingPythonPyTorchNeural NetworksData VisualizationResearch
Add projects by editing lib/projects.ts