Visual localization through learning: An information theoretic approach

Funding Round: 1 2013-2015

Research Question: What kinds of inference and learning support visual perception?

Interdisciplinary Approach: This study used behavioral methods in perceptual psychology (also called, ‘psychophysics,’) mathematical and computational analysis, and novel implementation of experiments on tablet computers in order to test children.

Potential Implications of Research: The research has multifaceted implications. It can guide computer scientists in terms of how to design algorithms for computer assisted visual search (for example, in radiology or airport security). It suggests that algorithm selection and learning may be a way to understand and target interventions in a variety of cognitive impairments. And it suggests specific targets in the case of visuospatial intervention in deficits that emerge in early childhood. 

Project Description: Part of the illusion of human visual perception is that we look through our eyes at the world directly — that the stimulation hitting our retina actually is what we see. In fact, what we perceive of the richly structured three-dimensional world is nothing more than a compressed, messy, and two-dimensional projection. But perhaps even more shocking is the illusion that seeing is effortless. We don’t try to see; we merely open our eyes. Yet without us knowing it, our brains work very hard so that we can see. In fact, everything we see is an inference: a series of conclusions arrived at by a mind whose goal is to learn the structure of the visual environment.

If seeing requires learning how to make the right inferences, how can we characterize this learning? This question was at the center of this research. We tested the idea that some basic types of visual perception amount to a kind of Socratic learning: the successive interrogation of inputs with increasingly targeted questions. This is a very different perspective on visual perception than the one that has prevailed for the last century or so.

Consider an analogy: the game of ‘20 Questions.’ Suppose that you need to guess a secret number between 1 and 100,000. In this game you can ask any 20 questions you like, as long as the answer to each is either “yes” or “no.” Which questions should you ask? It turns out that the best solution is to just divide the set in half with each question. “Is it between 1 and 50,000?” If the answer is yes, then ask, “Is it between 1 and 25,000?” And so on. Within the broader field of “information theory,” mathematicians and computer scientists have figured this out because it turns out to be useful in a variety of real-world applications, from identifying the source of a contaminant in a natural resource to tracking roads with satellites.

What does this have to do with visual perception? Imagine you are looking for something; you are trying to localize it. How do you acquire knowledge --how do you learn-- about its position? In this project we demonstrated that the visual system knows which questions to ask, that it divides the world into increasingly smaller pieces, zeroing in on a target and acquiring as much information as possible during each step. We did this by asking human observers to localize objects shown for varying amounts of time —but always very briefly. Put simply, an object flashes on a computer screen, and our participants used a mouse to click as close as possible to where they thought the object had appeared. We then used mathematical techniques to analyze their performance, asking what kinds of algorithms best describe their abilities, and whether they seem to behave like an ideal Socratic interrogator. Moreover, we investigated how people learn to ask the right (visual) questions over time by testing children as young as 4 and as old as 9.  It seems that children search more randomly at first, until they learn to use an entropy pursuit algorithm at age 5 or 6. Moreover whether they search efficiently or not appears related to working memory capacity, even in adults (see Figure 1). This means that how we search visual space is related to other mental faculties, setting the path for our future research.