Empirically testing debate, LLM hackers, mechanistical verification, partial observability in RLHF, improved RLAIF, HarmBench
Share this post
Paper Collection, February '24
Share this post
Empirically testing debate, LLM hackers, mechanistical verification, partial observability in RLHF, improved RLAIF, HarmBench