AI Control for agents, synthetic document finetuning, limits of scalable oversight, evaluating stealth, deception & self-replication, model diffing, and AI safety agendas
Share this post
Paper Highlights, April '25
Share this post
AI Control for agents, synthetic document finetuning, limits of scalable oversight, evaluating stealth, deception & self-replication, model diffing, and AI safety agendas