What shapes the topography of high-level visual cortex?
Excited to share a new pre-print addressing this question with connectivity-constrained interactive topographic networks, titled "Retinotopic scaffolding of high-level vision", w/ Marlene Behrmann & David Plaut.
🧵 ↓ 1/n
Recent spatially-constrained deep neural networks have shown how task-optimized learning under local constraints in high-level vision gives rise to smooth organization of representations and functionally relevant clusters of category-selectivity. Ours:
2/n
Jun 16, 2025 15:11In all these models, when we change the random initialization, selectivity moves around.
In contrast, the consistency of the topographic layout in humans has been argued to suggest innate pre-specification.
What could explain the consistent global organization?
3/n
Here, we build on the Eccentricity Bias theory, which states that the retinotopic organization of early visual cortex constrains the organization of higher-level visual cortex, since stimuli like faces and words are foveated, while scenes take up the full periphery.
4/n
We implement this retinotopic constraint as a connectivity cost on V4 feature map inputs into our topographic "ventral temporal cortex" (VTC) layers, w/ viewing biases. Faces and objects are viewed at smaller sizes than scenes, with overlapping distributions.
5/n
This produces a systematic organization of domains in the medial-lateral axis, putting face representations closer to foveal inputs, and scene representations closer to peripheral inputs, as in human VTC. The organization is functionally relevant, as confirmed lesions.
6/n
Critically, this systematic organization is consistent across model runs (B), as in the human brain (A). When we remove the retinotopic connectivity constraint (C), we see topographic selectivity, but without any group-level consistency, as expected.
7/n
Domain-level responses were highly invariant to the input size, suggesting the model wasn't merely recapitulating the retinotopic responses of the input areas, but had learned to efficiently organize its representations given the viewing biases and retinotopic connectivity.
8/n
However, when we trained on less broad distributions of viewing size, the topographic responses became less invariant to retinotopic variation. At small sizes, scenes responded more like objects, and at large sizes, objects responded more like scenes.
9/n
In summary, our work highlights how the same principle of connectivity constrained task optimization that has explained the presence of topographic organization, can also explain the global topographic organization of the brain, when external connectivity is considered.
10/n
Note: we are not suggesting retinotopic connectivity is the only constraint on high-level visual organization. Future models must wrestle with the role of other long-range connectivity. This will require more sophisticated interfaces between vision and cognition.
11/n
Moreover, a move towards active foveated vision will allow for much more realistic modeling of viewing biases. I've been working a new model for foveated vision with my postdoc advisors @talia_konkle and @grez72 and we hope to release a pre-print and code later this summer.
Lots more discussion of these and related ideas in the paper, if you're interested, check it out here:
osf.io/preprints/p...
12/n

Retinotopic scaffolding of high-level vision
Functional specialization within high-level vision is reflected in the topographic organization of the ventral temporal cortex (VTC). The presence and consistent locations of small areas responding selectively to particular visual categories – such as faces and scenes – has led to proposals of innate domain-specific modules. However, such proposals do not easily explain other aspects of an apparently multi-scale topographic organization of high-level visual features in VTC. Computational models have recently accounted for the presence of domain-selective areas and other facets of topographic organization from a basic optimization process with local topographic pressures, such as locally constrained connectivity, but fail to account for the consistent location of category-selectivity. In the current work, we extend a recent computational model to demonstrate how this consistency may emerge from wiring constraints external to VTC, focusing on the role of retinotopically organized early v
If you like this sort of work -- be sure to check out the fantastic upcoming CCN workshop on biophysical modeling of the human brain.
neuroailab.github.io/modeling-th...
13/n
Thanks to my Ph.D advisors Dave Plaut and Marlene Behrmann for their amazing advising on this work! And thanks to my current postdoc home in the Harvard Vision Lab for really productive discussions on presentations of this work leading to the writeup. Thanks for reading!
14/14