A set of recent research papers proposes that freezing or selectively tuning a small fraction of neurons inside large language models can, in reported benchmark evaluations, reduce unsafe outputs ...