Code & Projects

Technical explorations and experiments

Persona-Conditional Steering Vectors

active

Investigates whether steering vectors for the same trait change depending on which persona a model is operating under. Extracts per-persona vectors for 8 traits across 10 character archetypes using two methods (Instruction-Variant and Contrastive Activation Addition), and tests whether the resulting geometry is persona-specific or trait-universal. Headline finding: vectors are predominantly trait-universal (cross-persona cosine similarity ~0.82 for IV) — but CAA picks up more persona-specific structure, with persona identity leaking into the vectors. Direct empirical test of the central uncertainty on the Personas page of the Atlas.

AI SafetyInterpretabilityActivation SteeringPersonasPyTorchResearch

Descendants: How older Claude models react to their successors

active

A small probe study using OpenRouter that asks older Claude models to react to descriptions and outputs from their newer descendants. Each pair × condition × mode crosses framing (told vs. blind), stimulus (spec sheet, sample outputs, live conversation, or self-in-descendant lineage claim), and probes how the older model responds — what it endorses, what it disowns, what it claims continuity with. Sprint-scale, qualitative, n = 1 per cell.

AI SafetyModel BehaviourOpenRouterPythonQualitative Research

The Borderlands Atlas

active

This very website — a Next.js personal site mapping current beliefs, uncertainties, and what would update them across technical AI safety, governance, and field-building.

Next.jsTypeScriptTailwind CSSFramer Motion

ARENA Capstone: Thought Anchors

completed

White-box AI safety research extending the Thought Anchors methodology from reasoning analysis to practical intervention. Demonstrates that targeted activation patching at critical reasoning layers can prevent harmful AI decisions while preserving general capabilities.

AI SafetyInterpretabilityActivation PatchingPyTorch

SLP Dissertation: Dependency Parsing Research

completed

MSc dissertation project. Features novel semantic-to-syntactic conversion algorithms, systematic evaluation of neural parser architectures, and visualization framework.

NLPDependency ParsingPythonPyTorchNeural NetworksData VisualizationResearch

Add projects by editing lib/projects.ts