Safety cases for AI scheming, AI R&D capability, low-stakes control, the two-hop curse, targeted manipulation, limitations of steering, and rapid response defense.
Keep up the great work, Johannes!
Keep up the great work, Johannes!