Want to become a better tennis player? If you repeatedly practice serving to the same spot, you’ll master serving to that exact location, if conditions remain similar. Practicing your serve to a variety of locations will take much longer to master, but in the end you’ll be a better tennis player, and much more capable of facing a fierce opponent.
The reason why is all about variability: the more we’re exposed to, the better our neural networks are able to generalize and calculate which information is important to the task, and what is not. This also helps us learn and make decisions in new contexts.
From fox to hounds
This generalization principle can be applied to many things, including learning languages or recognizing dog breeds. For example, an infant will have difficulty learning what a ‘dog’ is if they are only exposed to chihuahuas instead of many dog breeds (chihuahuas, beagles, bulldogs etc.), which show the real variation of Canis lupus familiaris. Including information about what is not in the dog category – for example foxes – also helps us build generalizations, which helps us to eliminate irrelevant information.
“Learning from less variable input is often fast, but may fail to generalize to new stimuli,” says Dr Limor Raviv, the senior investigator from the Max Planck Institute (Germany). “But these important insights have not been unified into a single theoretical framework, which has obscured the bigger picture.”
To better understand the patterns behind this generalization framework, and how variability effects the human learning process and that of computers, Raviv’s research team explored over 150 studies on variability and generalization across the fields of computer science, linguistics, motor learning, visual perception and formal education.
Wax on, wax off
The researchers found that there are at least four kinds of variability, including:
- Numerous (set size), which is the number of different examples; such as the number of locations on the tennis court a served ball could land
- Heterogeneity (differences between examples); serving to the same spot versus serving to different spots
- Situational (context) diversity; facing the same opponent on the same court or a different component on a different court
- Scheduling (interleaving, spacing); how frequently you practice, and in what order do you practice components of a task
“These four kinds of variability have never been directly compared—which means that we currently don’t know which is most effective for learning,” says Raviv.
According to the ‘Mr Miyagi principle’, inspired by the 1984 movie The Karate Kidpracticing unrelated skills – such as waxing cars or painting fences – might actually benefit the learning of other skills: in the movie’s case, martial arts.
Lemon or lime?
So why does including variability in training slow things down? One theory is that there are always exceptions to the rules, which makes learning and generalising harder.
For example, while color is important for distinguishing lemons from limes, it wouldn’t be helpful for telling cars and trucks apart. Then there are atypical examples – such as a chihuahua that doesn’t look like a dog, and a fox that does, but isn’t.
So as well as learning a rule to make neural shortcuts, we also have to learn exceptions to these rules, which makes learning slower and more complicated. This means that when training is variable, learners have to actively reconstruct memories, which takes more effort.
Putting a face to a name
So how do we train ourselves and computers to recognize faces? The illustration below is an example of variations of a fox for machine learning. Providing several variations – including image rotation, color and partial masking – improves the machine’s ability to generalize (in this case, to identify a fox). This data augmentation technique is an effective way of expanding the amount of available data by providing variations of the same data point, but it slows down the speed of learning.
Humans are the same: the more variables we’re presented with, the harder it is for us to learn – but eventually it pays off in a greater ability to generalize knowledge in new contexts.
“Understanding the impact of variability is important for literally every aspect of our daily life. Beyond affecting the way we learn language, motor skills, and categories, it even has an impact on our social lives.” explains Raviv. “For example, face recognition is affected by whether people grew up in a small community (fewer than 1000 people) or in larger community (over 30,000 people). Exposure to fewer faces during childhood is associated with diminished face memory.”
The learning message for both humans and AI is clear: variation is key. Switch up your tennis serve, play with lots of different dogs, and practice language with a variety of speakers. Your brain (or algorithm) will thank you for it… eventually.