Empirically testing debate, LLM hackers, mechanistical verification, partial observability in RLHF, improved RLAIF, HarmBench
Paper Collection, February '24
Paper Collection, February '24
Paper Collection, February '24
Empirically testing debate, LLM hackers, mechanistical verification, partial observability in RLHF, improved RLAIF, HarmBench