CVMT colloquia 2007

Moderator: Claus B. Madsen


Approximately every other week the CVMT group meets for
a technical colloqium, where people from the group take turns
to present own recent research, relevant research by other groups, or
rehearse an upcoming conference presentation.

This page contains the abstracts for these colloquia in reverse
chronological order, i.e., the latest is listed at the top of the page.


SCHEDULED EVENTS


This is the plan for the future ... go further down (to Next And Past Events) to find next upcoming colloquium and the history of past ones, please.






NEXT AND PAST EVENTS


December 5, 2007: Anne-Marie S. Hansen

Title: Adaptive Mixture of Gaussians for Robot to Person Encounters

Two possible PhD projects will be presented in order to discuss each project according to its relevance in relation to the general research at Medialogy, digital media art, design and architecture and the industry. I hope that your critique and comments will help me select, which project is the most interesting.

Today commercial interactive interfaces that address physical movement and playfulness start to see the light of the day. The two possible PhD projects address these two aspects with the freedom of movement as a fundamental premise for the design of both physical interface [wireless sensor devices] and software [visuals and sound].

With experiments inside "the aesthetics of behaviors", I intend to create an ambient medium - a visual and sonic kind of "synthetic nature"  that reacts intelligently to one or more people's movements and gestures in a space. In this way, I hope to design software programs that can be downloaded into a game engine that is connected to the sensor devices. When people interact with these software programs, they generate aesthetic experiences that encourage more accurate physical movements, playfulness, improvisation and collaboration.

The first project I will present is the concept of the "Moodmodules". These are interconnected sensor devices that can be placed in an arbitrary space, where they measure children's movements, gestures and social activity and send out projected visuals and sound in the space.

The second project that I will present is concept of the "Body Agency" (working title). This concept includes modified physical exercise instruments that sense movements. These are visual and sonic instruments that give realtime feedback on the training situation in the shape of aesthetic experiences.




November 21, 2007: Dennis Mølholm Hansen

Title: Multi-View Stereo Surveillance of Humans and Vehicles in an Unconstrained Environment

This talk presents an automatic visual surveillance system for tracking humans and vehicles using multiple cameras in an unconstrained traffic environment. The purpose of the system is to detect abnormal events through situation awareness. The system combines a principal axis and footage region approach for multi-view correspondence of humans and vehicles. Novel methods for locating humans in groups and solving ambiguity when matching vehicles across views are presented. Foreground segmentation for each view is performed using the codebook method and HSV shadow suppression. The tracking of objects is performed in each view, and occlusion situations are resolved by probabilistic appearance models. The object representation produced by the system includes position, size and velocity for behavior analysis. The applicability of the system is demonstrated by prediction of collisions and other abnormal events in unconstrained sequences. The system is tested on several hours of video and on three different datasets. A live version of the system is running near real-time with a direct video feed


Collision detection in a multi-view sequence. A red arrow depicts the extended velocity vectors of the tracked objects, and the yellow crosses show intersections as an indication of a possible collision.


October 18 (NOTE! This is a Thursday), 2007: Preben Fihl


Title: Human Gait Analysis in HERMES

This collouium will present on overview of our work on human gait analysis. As a part of the HERMES project we have worked with different approaches to the problem of classifying gait types, i.e. walking, jogging, and running. The goal of our work has been to find a description of human gait that will allow us to classify gait types in an unconstrained surveillance environment. Two different approaches will be presented. The first is a method for classifying gait as three destinct classes (walking, jogging, and running). This approach is limited by the weak (or missing) definition of the three gait types and we therefore explored a second approach where we describe gait as a continuum instead of three classes.





October 3, 2007: Claus B. Madsen

Title: Real-Time Image-Based Rendering of View-Dependent Surface Appearance

A technique for real-time visualization of glossy surfaces is presented. The technique is aimed at recreating the view-dependent appearance of glossy surfaces under some fixed illumination conditions. The visualized surfaces can be actual real world surfaces or they can be surfaces for which the appearance is precomputed with a global illumination renderer. The approach taken is to image to surface from a large number of viewpoints distributed over the viewsphere. From these images the reflected radiance in different direction is sampled and a parameterized model is fitted to these radiance samples. Two different models are explored: a very low parameter model inspired by the Phong reflection model, and a general Spherical Harmonics model. It is concluded that the Phong-based model is best suited for this type of application.




September 19, 2007: Michael B. Holte

Title: View Invariant Gesture Recognition using the CSEM SwissRanger SR-2 Camera

We introduces use of range information acquired by a CSEM Swiss-Ranger SR-2 camera for view invariant recognition of one and two arms gestures. The range data enables motion detection and 3D representation of gestures. Motion is detected by double difference range images and filtered by a hysteresis bandpass filter. Gestures are represented by concatenating harmonic shape contexts over time. This representation allows for a view invariant matching of the gestures. The system is trained on gestures from one viewpoint and evaluated ongestures from other viewpoints. The results show a recognition rate of 93:75%.

 


July 10, 2007: Ignasi Rius (CVC at Universitat Autònoma de Barcelona)

Title: (Ignasi is visiting from a HERMES project partner)

This work is aimed at performing full-body 3D human tracking from a monocular sequence of images. The common particle filtering approach is used for estimating the probability of the parameters of a human body model over time according to the measurements up to each moment.
However, two main issues need to be addressed within this model-based tracking approach. First, the high-dimensionality of the model to be tracked requires a very high number of particles to properly populate the space of solutions, thus making the problem computationally very expensive.
Then, a proper posture evaluation function (PEF) needs to be defined. The PEF should assign high likelihood to postures which match with the observations (images), and low likelihood to "badly predicted" particles. Additionally, the performance of the likelihood computation needs to be improved since it usually comprises most of the computation time of the particle filter.
To overcome these issues, we first present a lower dimensional representation space suitable for human postures, and learn an action specific model of human actions which is used, on the first hand, as a priori knowledge on human motion within the prediction step of the particle filtering. And on the other hand, for designing an heuristic for the likelihood computation which assigns a null likelihood value to postures which are not feasible given a particular action regardless the measurements obtained from the current image, thus improving the performance of this step.
Finally, a posture evaluation function which exploits spatio-temporal dependencies between 2D human poses and their silhouettes is presented, and used to compute the likelihood of a predicted posture accepted by the action model.


June 6, 2007: Michael Nielsen

In continuation of my overlay model based shadow segmentation and augmentation algorithm I have worked on automatic initialisation of the overlay color. I will discuss methods involving entropy and color class segmentation using guassian mixtures analysis with model selection.


June 6, 2007: Thomas B. Moeslund

Title: Automatic Annotation of Humans in Surveillance Video

I'll talk about a paper I presented last week at the Workshop on Video Processing and Recognition, May 28-30, 2007, Montreal, Canada.

In this paper we present a system for automatic annotation of humans passing a surveillance camera. Each human has 4 associated annotations: the primary color of the clothing, the height, and focus of attention. The annotation occurs after robust background subtraction based on a Codebook representation. The primary colors of the clothing are estimated by grouping similar pixels according to a body model. The height is estimated based on a 3D mapping using the head and feet. Lastly, the focus of attention is defined as the overall direction of the head, which is estimated using changes in intensity at four different positions. Results show successful detection and hence successful annotation for most test sequences.



May 16, 2007: Preben Fihl

    Title: Motion Primitives for Action Recognition


    At Gesture Workshop 2007 (May 23-25) I will present our work on gesture recognition based on motion primitives and this colloquium will feature the presentation. Parts of this work have previously been presented at a colloquium (June 2006).

    The number of potential applications has made automatic recognition of human actions a very active research area. Our approach is based on two cornerstones. Firstly we represent the actions as a sequence of temporal isolated instances, denoted primitives. Secondly we base the training of the system on semi-synthetic data, i.e. 3D-tracker data from real executions of the actions combined with a computer graphics model of a human.

    The primitives are extracted from motion images and are recognized in each frame based on a trained classifier. This results in a sequence of primitives. From this sequence we recognize different temporal actions using a probabilistic Edit Distance method.





May 1, 2007: Giang Phuong Nguyen

    Title: Similarity Based Visualization of Image Collections

    Visual FX image from The MatrixWith the tremendous growing of image archive and collection, there is a definite need to build up systems those are capable of managing them efficiently. In this talk, I will present my PhD work on building a system that can process a large image collection. As images are visual objects, in multimedia systems in general, and for the purpose of searching in particular, visualization step should be presented.  In our system, we analyze requirements for general visualization system of image collections. With the emphasis on processing large image collections, we then discuss how to satisfy those requirements.




March 13, 2007: Michael Nielsen

    Title: Segmentation of Soft Shadows based on a Daylight- and Penumbra Model

    Shadow SegmentationI will introduce a new concept within shadow segmentation for usage in shadow removal and augmentation through construction of an alpha overlay shadow model. Previously, an image was considered to consist of shadow and non-shadow regions. I construct a model that accounts for sunlit, umbra and penumbra regions. The model is based on theories about color constancy, daylight, and the geometry that causes penumbra. The behavior of the model is analyzed and a graph cut energy minimization is applied to estimate the alpha parameter. The approach is demonstrated on natural complex image situations. The results are convincing, but the alpha gradient in penumbra must be improved.



January 17, 2007: Morten F. Christensen

    Title: K means clustering

    Visual FX image from The MatrixClassification is a thorough task that might involve several processes, often it requires a deeper insight of the objects that are suppose to be classified and are limited to specific setup. Multi-Image Matching using Multi-Scale Oriented Patches (MOPS) has been chosen as a natural landmark detector (explained May 17th, 2006). In general is a fast recognition method, e.g.used for image-stitching, and is useful for rigid objects such as buildings. MOPS introduces index-based matching, but still requires images to be matched pair-wise. With many images, this is a slow process. Therefore all features, extracted from the images of the ground truth, are used to form a "vocabulary tree". The vocabulary tree is a hierarchically quantization of data, that splits data into clusters recursively on the basis of k-means. Classification can now, even with many images, be accomplished within a second or less.