COST 211ter Sim Subgroup November 1, 1996

FOCUS DOCUMENT v2.0

The work of the COST211ter Simulation subgroup is focusing on the definition and specification of tools and algorithms for image sequence analysis targeted for emerging Multimedia services in the context of MPEG-4 - to enable MPEG-4 content-based functionalities and to open ways for new coding methods.

The applications profiles envisioned include

I.) Real-Time Communications II.) Database Access

with applications specific functionalities (on-line or off-line) such as

i.e. image segmentation to identify and describe physical objects within the scene

i.e. definition and extraction of parameters associated to the object, as shape, motion, texture,...

that would allow the following other functionalities (see the "Definition of AM Output Parameters (Results)" section for more details):

It is foreseen that the MPEG-4 Video Verification Model (VM) will provide the coding platform in this context.

Collaborative Work

The work in the COST211ter Group progresses in a collaborative manner. To this end a "Test Model" approach has been adopted to investigate and optimise algorithms for Image Analysis. A first version of an Analysis Model (AM) has been established at the October 1996 COST211ter Sim meeting in Ankara, Turkey. The AM consists of a full description of tools and algorithms for image analysis purposes. These algorithms and tools will be refined and improved in further meetings in a collaborative effort by software implementation of the AM, by exchange of software, and by performing joint experiments based on the AM. It is envisioned to provide compatibility between the software structure of the COST211ter AM and the ACTS-MOMUSYS MPEG-4 Video VM software platform.

The figure below depicts the basic building blocks of the Analysis Model, the input and output for the image analysis as well as provisions for user interaction and coding of the image sequences. In the following possible input and output parameters for the COST211ter AM as well as basic definitions of the envisioned AM building blocks are described in more detail.


Definition of AM Inputs Parameters

Audio-Visual (AV) Input

All the audio-visual information that is available before starting the analysis process.

Examples:

Control input

All the additional (non AV) information that was available before starting the analysis process.

Examples:

User's interaction

All the information that is introduced in the analysis process once it has started and which is generated based on the results of the analysis process (that is, it was not available before it started). The AM will provide an interface to help the user to interpret the feedback information and to use this information to refine or update the image analysis process.

Examples:

Definition of AM Output Parameters (Results)

The analysis results can be related to a complete scene (sc) or to a particular object of the scene (ob). They are categorised in two distinct classes:

Classification of content for

When performed on a complete scene (sc), the classification just has to establish a relative ranking between the different parts of the scene. When performed on a particular object (ob), the ranking probably has to be absolute.

Content-based analysis

Definition of the AM Building Blocks

Kernel of Analysis for New multimedia Technologies (KANT)

The overall AM

Automatic Analysis Kernel

The part that provides the various features. Applies the parameters to achieve the required feature(s).

User interaction

Interface (probably graphical) that will allow the user to modify the feedback parameters, to suppress a feature or to activate another one during the process. Even if the user does not act, an automatic regulation is performed. Although already taken into account in the Automatic Analysis Kernel, special attention must be devoted to the temporal integration of the data: half a second appears to be the normal reaction time of a user from the moment of the onset of motion or the moment a new moving object enters into the scene until an adaptation of control parameters can be performed.

N.B.: a prior interpretation of the coder statistics and/or of the analysis results appears to be necessary in order to present these data more appropriately to the user in a way he/she can use them easily.

MPEG-4 coder

sic. The MPEG-4 Video Verification Model.

Concrete form of the AM results

Masks:

Image with 256 different labels (cf. MPEG-4 alpha channel). It helps defining the different regions of a scene or establish a ranking of these regions. Also to ensure correspondence between successive images,...

Properties and identification labels:

Numbers that give characteristics of an object/image.

 

Back to Introduction to COST211