Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. +TL;DR’s of 31 papers ranging from high-level analysis to LLM alignment and red teaming.
Paper Collection, October '23
Paper Collection, October '23
Paper Collection, October '23
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. +TL;DR’s of 31 papers ranging from high-level analysis to LLM alignment and red teaming.