In studying the workings of the visual system, we are often faced with quite complicated and seemingly messy empirical results from behavioral experiments. For instance, it is easier (against a neutral gray background) to search for a red target among pink distracting element, than vice versa. Is this because the brain has detectors that respond strongly to red and not pink, but no detectors that respond more strongly to pink than red? One of our key tasks as researchers is to ask whether the data necessitate an explanation like this, in which special status is given to certain features/operations/etc. Or is there a simpler explanation? Is the observed behavior in some sense ideal? Questions of this sort often lead us to examine statistical models of human behavior.
Much of the work in our lab has been aimed at testing the hypothesis that the visual system, in many instances, has an interest in operating as a statistician. In other words, the visual system samples (noisy) feature estimates from the visual input, in many cases computes summary statistics such as the mean and variance of those features, and makes intelligent decisions based upon those statistics.
Models of this form have proven quite powerful in predicting behavior at visual tasks, particularly when processing needs to be fast or “pre-attentive,” or when the stimulus is viewed in the periphery. For example, many of the existing results in visual search can be qualitatively predicted by a model that extracts the mean and covariance of basic features like motion, color, and orientation.
Pre-attentive texture segmentation, on the other hand, is well modeled by a process that takes a sample of features from each side of a hypothesized texture boundary, and does the equivalent of a t- and F-test to see if the textures differ significantly in their statistics. If so, the observer is likely to perceive a boundary.
More recent work in our lab suggests that one can predict the difficulty of doing a task in peripheral vision (i.e. under conditions of crowding) with a model that represents peripheral stimuli not through the exact configuration of their parts, but rather through the joint statistics of responses of V1-like receptive fields. This model shows promise at being able to predict a wide variety of visual phenomena, from optical illusions through predicting search reaction time for arbitrary search displays.
There are various benefits to this approach to studying vision. First and foremost, the resulting models often work quite well. They often have fewer parameters than neurally inspired models (which often, when analyzed, turn out to be implicitly performing similar calculations). Statistical models are often surprisingly easy to implement in biologically inspired hardware, and similarly easy to implement in computer vision algorithms which can then make predictions for arbitrarily complex natural images.
See the new FAQ here.====