A tad late (announcements coming) but very happy to share the latest developments in my previous preprint!
Previously, we show that neural representations for control of movement are largely distinct following supervised or reinforcement learning. The latter most closely matches NHP recordings.
This similarity to NHP neural recordings was true for geometric similarity metrics (CCA), but also for dynamical similarity. Importantly, this was only evident when our models were trained to control biomechanistically realistic effectors.
Nov 6, 2025 02:09But alignment metrics can overlook the question of what gives rise to the differences they capture. We approached this using a now established framework in systems neuroscience, dynamical systems theory.
Usually, one determines where neural activity naturally settles under a steady-state input regime to find “fixed-point” neural states. Local dynamics around these points provides valuable information about how neural networks process information—that is, what they compute, and how.
But a biological brain receives an ever-changing stream of inputs, rarely ever reducing to steady-state inputs. Our models reflect that, and their inputs are time varying.
So we took a slightly different approach, and asked how fixed-points evolved over time and over perturbed neural states.
A dynamical system could recover perfectly against a state perturbation, or it could expand following that perturbation. It turns out supervised learning (SL) models do the former, while reinforcement learning (RL) models do something in-between; they act as isometric systems.
We looked at local dynamics around fixed points over time. This showed that SL models’ fixed points are indeed very stable, having nearly all modes of their eigenspectrum <1. RL models showed many more self-sustaining modes ≈1, again demonstrating isometric dynamics.
Does this mean SL models are very orderly, while RL models lie at the interface between order and chaos? To formally confirm, we looked at Lyapunov exponents, which tell us how fast close-by states diverge. Unlike Jacobians, this tells us about long-horizon, not just local, dynamics.
Indeed, Lyapunov exponents show that fixed points for RL models largely stay near 0, showing these networks’ dynamics lie at the edge of chaos. Whereas SL models’ dynamics are contractive and orderly, keeping very little information in memory for long and having stereotyped expressivity.
“Edge of chaos” dynamics are long recognized as a computationally potent dynamical regime that avoids vanishing gradients during learning and allows greater memory and expressivity of a system. This stark difference surprised us, and we think it can help explain our results on neural adaptation.
Along the above, we add discussion points that I hope will clarify some of our stance on the topic of RL in neuroscience and acknowledge some past important work that we believe our study complements. We also add several important controls (particularly Figs. S8, S14). Feel free to check it all out!
We’re pleased to see RL's role in neural plasticity is increasingly under focus in the motor control community (check out
@adrianhaith.bsky.social's latest piece!)
I strongly believe motor learning is sitting at the interface of many plasticity mechanisms and RL is an important piece of this puzzle.
As always a huge thank you to my colleagues and supervisors
@glajoie.bsky.social @mattperich.bsky.social and
@nandahkrishna.bsky.social for helping make this work what it is—and making the journey so fun and interesting