Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. +TL;DR’s of 31 papers ranging from high-level analysis to LLM alignment and red teaming.
Share this post
Paper Collection, October '23
Share this post
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. +TL;DR’s of 31 papers ranging from high-level analysis to LLM alignment and red teaming.