Alignment auditing, LM circuits, research sandbagging, CoT faithfulness, CoT monitoring, GDM's safety approach, and AI capability trends
Share this post
Paper Highlights, March '25
Share this post
Alignment auditing, LM circuits, research sandbagging, CoT faithfulness, CoT monitoring, GDM's safety approach, and AI capability trends