Constitutional Classifiers, human-AI teams, myopic agents, models learn from descriptions and report learned behavior, great models think alike, mechanistic interpretability, and many open problems
Thanks, as always!
Thanks, as always!