[Not loaded yet]
We constructed controlled datasets with many input features, and trained deep learning models to compute functions of those features (e.g. linear ones like identifying a feature, or nonlinear ones like XOR). We then analyzed the patterns of representational activity they learned.
Representations were systematically biased towards certain kinds of features. For example, a model reliably computing easy (linear) and hard (nonlinear) features has 55% repr. variance explained by the easy one, 5% by the hard, with similar biases in top PCs and individual units.
These biases can lead to dramatic downstream effects that cause unexpected conclusions from analyses. For example, RSA may identify two models computing the same, complex task as much less representationally-similar than either of them is to a model computing a much simpler task (right panel)!
Aug 5, 2025 14:36