Adelson ICCV 95

Adelson, E. H., Layered Representations for Vision and Video. Proceedings of IEEE Workshop on Representation of Visual Scenes, in conjunctions with ICCV '95, pp.3-9, Cambridge, MA; June (1995).

Abstract

Human vision, machine vision, and image coding, each demand representations that are useful and efficient. The best-established techniques today are based on low-level processing. Future systems for image analysis and image coding will increasingly use image representations that involve such concepts as surfaces, lighting, transparency, etc. These representations fall in the domain of "mid-level" vision, and there is accumulating evidence of their importance in human vision. By representing images with these more sophisticated vocabularies we can increase the flexibility and efficiency of our vision and image coding systems. We are developing systems that decompose image sequences into overlapping layers, rather like the "cels" used by a traditional animator. These layers are ordered in depth, sliding over one another and being combined according to the rules of transparency and occlusion. Using the layered representation we can achieve greatly improved motion analysis and image segmentation. By applying layers to image coding we can achieve data compression far better than MPEG, and achieve frame-rate independence as a side benefit. Moreover, the image sequence is decomposed in a meaningful way, which allows flexible image editing and access.