![[Layered Image Representation]](images/small2/stack-layers.jpg) 

Keywords: Image coding, motion analysis, image segmentation, image representation, robust estimation.

A general block diagram of our algorithm is shown below. The algorithm consists of motion estimation, motion segmentation, and temporal integration.
![[Block Diagram of layer decomposition]](figs/layer-proc.gif)
![[Block Diagram of segmentation]](figs/block-diagram.gif)
 Thirty frames of the MPEG
Flower Garden sequence (1 sec) is processed by an optic flow
estimator to obtain a motion vector of each pixel. Affine motions
parameters are estimated from the optic flow data by the model
estimator for each subregion.  We initialize these subregions with
an array of 20x20 pixel blocks. Similar models are merged by the 
model merger to improve stability.  Thus the model
estimator and the  model merger  cooperatively determines
the set of likely affine motions present in the optic flow maps.
Thirty frames of the MPEG
Flower Garden sequence (1 sec) is processed by an optic flow
estimator to obtain a motion vector of each pixel. Affine motions
parameters are estimated from the optic flow data by the model
estimator for each subregion.  We initialize these subregions with
an array of 20x20 pixel blocks. Similar models are merged by the 
model merger to improve stability.  Thus the model
estimator and the  model merger  cooperatively determines
the set of likely affine motions present in the optic flow maps.
 Affine motion segmentation results from
applying these motion models in a classification framework on the
motion map. Additional constrants on physical connectivity and
region size are enforced by the region splitter and region
filter. Results of segmentation are iteratively refined.
Affine motion segmentation results from
applying these motion models in a classification framework on the
motion map. Additional constrants on physical connectivity and
region size are enforced by the region splitter and region
filter. Results of segmentation are iteratively refined.
 Once the affine motions and the corresponding regions are identified,
data are collected from all the frames in the sequence and layer
components are obtained. For example, when the estimated affine models
accurately describe the motions of the coherent regions, these regions
can be "tracked" by motion
compenstation. The stability of various regions after motion
compenstation verify that the affine motion parameters have been
correctly estimated for these regions.
Once the affine motions and the corresponding regions are identified,
data are collected from all the frames in the sequence and layer
components are obtained. For example, when the estimated affine models
accurately describe the motions of the coherent regions, these regions
can be "tracked" by motion
compenstation. The stability of various regions after motion
compenstation verify that the affine motion parameters have been
correctly estimated for these regions.
 The motion compensated sequences help us determine the intensity and
color textures of the "tracked" regions.  Assisted by the segmentation
maps, stabilized regions are processed with a temporal median filter
to recover the image intensity maps of
"tracked" regions.  Because these layer
intensity maps are obtained by processing data in all the frames,
occluded regions in the sequence can be recovered and composited in
these layer maps. Likewise, the accumulation of data result in image
mosaics.  Furthermore, this sort of motion compenstated temporal
processing can produce images maps that are higher in resolution and
and lower in noise than images of any one frame.
The motion compensated sequences help us determine the intensity and
color textures of the "tracked" regions.  Assisted by the segmentation
maps, stabilized regions are processed with a temporal median filter
to recover the image intensity maps of
"tracked" regions.  Because these layer
intensity maps are obtained by processing data in all the frames,
occluded regions in the sequence can be recovered and composited in
these layer maps. Likewise, the accumulation of data result in image
mosaics.  Furthermore, this sort of motion compenstated temporal
processing can produce images maps that are higher in resolution and
and lower in noise than images of any one frame.
Finally, the depth ordering of these image maps are determined by a verification stage. The resulting representation is shown below.
![[Layered Image Representation]](images/small2/stack-layers.jpg) 

 The layered image representation provides a compact representation of
image sequence. The layered decomposition captures spatial coherence
of object motion and temporal coherence of object shape and texture in
a few layers.  Because of the efficiency at which the layers encode
the sequence, we can obtained a 300 to 1 data reduction with minimal
artifacts.  A layered description where each layer represents a
coherent moving objects provies a more semantic representation of
sequences and result in a richer mid-level visual language for
sequences. The layered visual language supports coherent moving
objects, surfaces, object opacity, occlusions, oridinal depths, image
mosaics, and object tracking.  These properties make the layered
representation attractive for video databases and applications
involving retrieval by content of compressed video data. Furthermore,
sequences can be easily synthesize from
the layers with standard computer graphics techniques.
The layered image representation provides a compact representation of
image sequence. The layered decomposition captures spatial coherence
of object motion and temporal coherence of object shape and texture in
a few layers.  Because of the efficiency at which the layers encode
the sequence, we can obtained a 300 to 1 data reduction with minimal
artifacts.  A layered description where each layer represents a
coherent moving objects provies a more semantic representation of
sequences and result in a richer mid-level visual language for
sequences. The layered visual language supports coherent moving
objects, surfaces, object opacity, occlusions, oridinal depths, image
mosaics, and object tracking.  These properties make the layered
representation attractive for video databases and applications
involving retrieval by content of compressed video data. Furthermore,
sequences can be easily synthesize from
the layers with standard computer graphics techniques.
 Layers facilitate video editing and video manipulation because they
are similar to elements used computer graphics representation. For
special effects, objects can be easily modified and propagated to the
entire sequence. Graphical elements can be added.  For video editing,
sequences can be synthesized with a subset
of layers to remove unwanted objects in the sequences.
Layers facilitate video editing and video manipulation because they
are similar to elements used computer graphics representation. For
special effects, objects can be easily modified and propagated to the
entire sequence. Graphical elements can be added.  For video editing,
sequences can be synthesized with a subset
of layers to remove unwanted objects in the sequences.



 
