COST 211ter Sim Subgroup November 1, 1996

FOCUS DOCUMENT v2.0

The work of the COST211ter Simulation subgroup is focusing on the definition and specification of tools and algorithms for image sequence analysis targeted for emerging Multimedia services in the context of MPEG-4 - to enable MPEG-4 content-based functionalities and to open ways for new coding methods.

The applications profiles envisioned include

I.) Real-Time Communications II.) Database Access

with applications specific functionalities (on-line or off-line) such as

content definition

i.e. image segmentation to identify and describe physical objects within the scene

feature extraction

i.e. definition and extraction of parameters associated to the object, as shape, motion, texture,...

tracking of content and/or features

that would allow the following other functionalities (see the "Definition of AM Output Parameters (Results)" section for more details):

selective coding of objects
improved coding efficiency
selective error protection
content-based visual database query and indexing (MPEG-7)
other functionalities not excluded

It is foreseen that the MPEG-4 Video Verification Model (VM) will provide the coding platform in this context.

Collaborative Work

The work in the COST211ter Group progresses in a collaborative manner. To this end a "Test Model" approach has been adopted to investigate and optimise algorithms for Image Analysis. A first version of an Analysis Model (AM) has been established at the October 1996 COST211ter Sim meeting in Ankara, Turkey. The AM consists of a full description of tools and algorithms for image analysis purposes. These algorithms and tools will be refined and improved in further meetings in a collaborative effort by software implementation of the AM, by exchange of software, and by performing joint experiments based on the AM. It is envisioned to provide compatibility between the software structure of the COST211ter AM and the ACTS-MOMUSYS MPEG-4 Video VM software platform.

The figure below depicts the basic building blocks of the Analysis Model, the input and output for the image analysis as well as provisions for user interaction and coding of the image sequences. In the following possible input and output parameters for the COST211ter AM as well as basic definitions of the envisioned AM building blocks are described in more detail.

Definition of AM Inputs Parameters

Audio-Visual (AV) Input

All the audio-visual information that is available before starting the analysis process.

Examples:

image sequences (pel based, YUV components?)
other visual input?
audio/speech signals

Control input

All the additional (non AV) information that was available before starting the analysis process.

Examples:

required features and initial set of parameters
initial segmentation masks (pel based)
initial feature points (i.e. location of eyes etc.)
additional visual input such as color bands (infrared etc.)
ranking of the importance of the control inputs

User's interaction

All the information that is introduced in the analysis process once it has started and which is generated based on the results of the analysis process (that is, it was not available before it started). The AM will provide an interface to help the user to interpret the feedback information and to use this information to refine or update the image analysis process.

Examples:

refined segmentation masks (pel based)
definition of new feature points
refined feature points (i.e. refined location of eyes etc.)
refined ranking of the importance of the control inputs
required features modification (e.g. new object to track)
prioritisation of objects defined

Definition of AM Output Parameters (Results)

The analysis results can be related to a complete scene (sc) or to a particular object of the scene (ob). They are categorised in two distinct classes:

Classification of content for

selective coding (sc/ob)
improved coding (sc/ob)
selective error protection (sc/ob)

When performed on a complete scene (sc), the classification just has to establish a relative ranking between the different parts of the scene. When performed on a particular object (ob), the ranking probably has to be absolute.

Content-based analysis

definition of the content and segmentation of a scene (sc)
tracking of an object (ob)
indexing of a scene/ object (sc/ob) including the generation of index parameters to help locating features in a database
selection of an object in a scene (sc->ob) which allows to extract an object with some particular properties (e.g. selection of speaker's head)
definition of the adequate spatial and temporal resolutions for the various objects defined on the scene (sc/ob).

Definition of the AM Building Blocks

Kernel of Analysis for New multimedia Technologies (KANT)

The overall AM

Automatic Analysis Kernel

The part that provides the various features. Applies the parameters to achieve the required feature(s).

User interaction

Interface (probably graphical) that will allow the user to modify the feedback parameters, to suppress a feature or to activate another one during the process. Even if the user does not act, an automatic regulation is performed. Although already taken into account in the Automatic Analysis Kernel, special attention must be devoted to the temporal integration of the data: half a second appears to be the normal reaction time of a user from the moment of the onset of motion or the moment a new moving object enters into the scene until an adaptation of control parameters can be performed.

N.B.: a prior interpretation of the coder statistics and/or of the analysis results appears to be necessary in order to present these data more appropriately to the user in a way he/she can use them easily.

MPEG-4 coder

sic. The MPEG-4 Video Verification Model.

Concrete form of the AM results

Masks:

Image with 256 different labels (cf. MPEG-4 alpha channel). It helps defining the different regions of a scene or establish a ranking of these regions. Also to ensure correspondence between successive images,...

Properties and identification labels:

Numbers that give characteristics of an object/image.

Back to Introduction to COST211