A theory of temporal self-supervised learning in neocortical layers

The neocortex constructs an internal representation of the world, but the underlying circuitry and computational principles remain unclear. Inspired by self-supervised learning algorithms, we introduce a computational model wherein layer 2/3 (L2/3) learns to predict incoming sensory stimuli by comparing previous sensory inputs, relayed via layer 4, with current thalamic inputs arriving at layer 5 (L5). We demonstrate that our model accurately predicts sensory information in a contextual temporal task, and that its predictions are robust to noisy or partial sensory input. Additionally, our model generates layer-specific sparsity and latent representations, consistent with experimental observations. Next, using a sensorimotor task, we show that the model's L2/3 and L5 prediction errors mirror mismatch responses observed in awake, behaving mice. Finally, through manipulations, we offer testable predictions to unveil the computational roles of various cortical features. In summary, our findings suggest that the multi-layered neocortex empowers the brain with self-supervised learning capabilities.