Causal adaptation to visual input dynamics governs the development of complex cells in V1

Visual perception relies on cortical representations of visual objects that remain relatively stable with respect to the variation in object appearance typically encountered during natural vision (e.g., because of position changes). Such stability, known as transformation tolerance, is built incrementally along the ventral stream (the cortical hierarchy devoted to shape processing), but early evidence of position tolerance is already found in primary visual cortex (V1) for complex cells. To date, it remains unknown what mechanisms drive the development of this class of neurons, as well as the emergence of tolerance across the ventral stream. Leading theories suggest that tolerance is learned, in an unsupervised manner, either from the temporal continuity of natural visual experience or from the spatial statistics of natural scenes. However, neither learning principle has been empirically proven to be at work in the postnatal developing cortex. Here we show that passive exposure to temporally continuous visual inputs during early postnatal life is essential for normal development of complex cells in rat V1. This was causally demonstrated by rearing newborn rats with frame-scrambled versions of natural movies, resulting in temporally unstructured visual input, but with unaltered, natural spatial statistics. This led to a strong reduction of the fraction of complex cells, which also displayed an abnormally fast response dynamics and a reduced ability to support stable decoding of stimulus orientation over time. Conversely, our manipulation did not prevent the development of simple cells, which showed orientation tuning and multi-lobed, Gabor-like receptive fields as sharp as those found in rats reared with temporally continuous natural movies. Overall, these findings causally implicate unsupervised temporal learning in the postnatal development of transformation tolerance but not of shape tuning, in agreement with theories that place the latter under the control of unsupervised adaptation to spatial, rather than temporal, image statistics.