News

One promising approach is the sparse autoencoder (SAE), a deep learning ... a new architecture that improves the performance and interpretability of SAEs for LLMs. JumpReLU makes it easier to ...
This article explores how Gemma Scope enhances the interpretability of language models, with a particular focus on its innovative use of sparse autoencoder technology. Imagine having a tool that ...
Mechanistic interpretability ... DeepMind ran a tool known as a “sparse autoencoder” on each of its layers. You can think of a sparse autoencoder as a microscope that zooms in on those ...
After the data preprocessing is completed, the next step is to input the processed data into the stacked sparse autoencoder model. The stacked sparse autoencoder is a powerful deep learning ...
“One hope for interpretability is that it can be ... passing GPT-4’s activations through the sparse autoencoder results in a performance equivalent to a model trained with roughly 10x less ...
Cofounder Tom McGrath helped create the interpretability team at DeepMind. Lee Sharkey pioneered the use of sparse autoencoders in language models. Nick Cammarata started the interpretability team ...
Being able to understand a model’s inner workings in bottom-up, forensic detail is called “mechanistic interpretability ... in 2023 with a so-called “sparse autoencoder”.