Moderator:
Claus B. Madsen
Approximately every other week the
CVMT group
meets for
a technical colloqium, where people
from the
group take turns
to present own recent research,
relevant research
by other groups, or
rehearse an upcoming conference
presentation.
This page contains the
abstracts for these
colloquia in reverse
chronological order, i.e., the
latest is listed
at the top of the page.
SCHEDULED EVENTS
This is the plan for the future ... go
further down (to Next And Past Events) to find next upcoming colloquium
and the history of past ones, please.
NEXT AND PAST
EVENTS
December 5,
2007: Anne-Marie S. Hansen
Title: Adaptive Mixture of
Gaussians for Robot to Person Encounters
Two possible PhD projects will be presented in order to discuss each
project according to its relevance in relation to the general research
at Medialogy, digital media art, design and architecture and the
industry. I hope that your critique and comments will help me select,
which project is the most interesting.
Today commercial interactive interfaces that address physical
movement and playfulness start to see the light of the day. The two
possible PhD projects address these two aspects with the freedom of
movement as a fundamental premise for the design of both physical
interface [wireless sensor devices] and software [visuals and sound].
With experiments inside "the aesthetics of behaviors", I intend to
create an ambient medium - a visual and sonic kind of "synthetic
nature" that reacts intelligently to one or more people's
movements
and gestures in a space. In this way, I hope to design software
programs that can be downloaded into a game engine that is connected to
the sensor devices. When people interact with these software programs,
they generate aesthetic experiences that encourage more accurate
physical movements, playfulness, improvisation and collaboration.
The first project I will present is the concept of the
"Moodmodules". These are interconnected sensor devices that can be
placed in an arbitrary space, where they measure children's movements,
gestures and social activity and send out projected visuals and sound
in the space.
The second project that I will present is concept of the "Body
Agency" (working title). This concept includes modified physical
exercise instruments that sense movements. These are visual and sonic
instruments that give realtime feedback on the training situation in
the shape of aesthetic experiences.
November 21,
2007: Dennis Mølholm Hansen
Title:
Multi-View Stereo
Surveillance of Humans and Vehicles in an Unconstrained Environment
This talk presents an automatic visual surveillance system for tracking
humans and vehicles using multiple cameras in an unconstrained traffic
environment. The purpose of the system is to detect abnormal events
through situation awareness. The system combines a principal axis and
footage region approach for multi-view correspondence of humans and
vehicles. Novel methods for locating humans in groups and solving
ambiguity when matching vehicles across views are presented. Foreground
segmentation for each view is performed using the codebook method and
HSV shadow suppression. The tracking of objects is performed in each
view, and occlusion situations are resolved by probabilistic appearance
models. The object representation produced by the system includes
position, size and velocity for behavior analysis. The applicability of
the system is demonstrated by prediction of collisions and other
abnormal events in unconstrained sequences. The system is tested on
several hours of video and on three different datasets. A live version
of the system is running near real-time with a direct video feed

Collision detection in a multi-view sequence. A red arrow depicts the
extended velocity vectors of the tracked objects, and the yellow
crosses show intersections as an indication of a possible collision.
October 18
(NOTE! This is a Thursday),
2007: Preben Fihl
Title:
Human
Gait
Analysis
in HERMES

This
collouium will
present on overview of our work on human gait analysis. As a part of
the HERMES project we have worked with different approaches to the
problem of classifying gait types, i.e. walking, jogging, and running.
The goal of our work has been to find a description of human gait that
will allow us to classify gait types in an unconstrained surveillance
environment. Two different approaches will be presented. The first is a
method for classifying gait as three destinct classes (walking,
jogging, and running). This approach is limited by the weak (or
missing) definition of the three gait types and we therefore explored a
second approach where we describe gait as a continuum instead of three
classes.
October 3,
2007: Claus B. Madsen
Title:
Real-Time
Image-Based Rendering of View-Dependent Surface Appearance

A
technique for
real-time visualization of glossy surfaces is presented. The technique
is aimed at recreating the view-dependent appearance of glossy surfaces
under some fixed illumination conditions. The visualized surfaces can
be actual real world surfaces or they can be surfaces for which the
appearance is precomputed with a global illumination renderer. The
approach taken is to image to surface from a large number of viewpoints
distributed over the viewsphere. From these images the reflected
radiance in different direction is sampled and a parameterized model is
fitted to these radiance samples. Two different models are explored: a
very low parameter model inspired by the Phong reflection model, and a
general Spherical Harmonics model. It is concluded that the Phong-based
model is best suited for this type of application.
September 19,
2007: Michael B. Holte
Title:
View
Invariant
Gesture Recognition using the CSEM SwissRanger SR-2 Camera
We introduces use
of range
information acquired by a CSEM Swiss-Ranger SR-2 camera for view
invariant recognition of one and two arms gestures. The range data
enables motion detection and 3D representation of gestures. Motion
is detected by double difference range images and filtered by a
hysteresis bandpass filter. Gestures are represented by concatenating
harmonic shape contexts over time. This representation allows for a
view invariant matching of the
gestures. The system is trained on gestures from one viewpoint and
evaluated ongestures from other viewpoints. The results show a
recognition rate of 93:75%.
July 10, 2007:
Ignasi Rius (CVC at Universitat Autònoma de Barcelona)
Title:
(Ignasi
is visiting from a HERMES project partner)
This work is aimed at performing full-body 3D
human tracking
from a
monocular sequence of images. The common particle filtering approach is
used for estimating the probability of the parameters of a human body
model over time according to the measurements up to each moment.
However, two main issues need to be addressed within this
model-based tracking approach. First, the high-dimensionality of the
model to be tracked requires a very high number of particles to
properly populate the space of solutions, thus making the problem
computationally very expensive.
Then, a proper posture evaluation function (PEF) needs to be
defined. The PEF should assign high likelihood to postures which match
with the observations (images), and low likelihood to "badly predicted"
particles. Additionally, the performance of the likelihood computation
needs to be improved since it usually comprises most of the computation
time of the particle filter.
To overcome these issues, we first present a lower dimensional
representation space suitable for human postures, and learn an action
specific model of human actions which is used, on the first hand, as a
priori knowledge on human motion within the prediction step of the
particle filtering. And on the other hand, for designing an heuristic
for the likelihood computation which assigns a null likelihood value to
postures which are not feasible given a particular action regardless
the measurements obtained from the current image, thus improving the
performance of this step.
Finally, a posture evaluation function which exploits
spatio-temporal dependencies between 2D human poses and their
silhouettes is presented, and used to compute the likelihood of a
predicted posture accepted by the action model.
June
6, 2007: Michael Nielsen
In continuation of my
overlay model based
shadow segmentation and
augmentation algorithm I have worked on automatic initialisation of the
overlay color. I will discuss methods involving entropy and color class
segmentation using guassian mixtures analysis with model selection.
June 6, 2007:
Thomas B. Moeslund
Title:
Automatic Annotation of Humans in
Surveillance Video
I'll
talk about a paper I presented last week at the
Workshop on Video
Processing and Recognition, May 28-30, 2007, Montreal, Canada.
In this paper we present a system for automatic annotation of humans
passing a surveillance camera. Each human has 4 associated annotations:
the primary color of the clothing, the height, and focus of attention.
The annotation occurs after robust background subtraction based on a
Codebook representation. The primary colors of the clothing are
estimated by grouping similar pixels according to a body model. The
height is estimated based on a 3D mapping using the head and feet.
Lastly, the focus of attention is defined as the overall direction of
the head, which is estimated using changes in intensity at four
different positions. Results show successful detection and hence
successful annotation for most test sequences.

May 16, 2007:
Preben Fihl
Title: Motion Primitives
for Action
Recognition
At Gesture Workshop 2007 (May 23-25) I will present our work on gesture
recognition based on motion primitives and this colloquium will feature
the presentation. Parts of this work have previously been presented at
a colloquium (June 2006).
The number of potential applications has made automatic recognition of
human actions a very active research area. Our approach is based on two
cornerstones. Firstly we represent the actions as a sequence of
temporal isolated instances, denoted primitives. Secondly we base the
training of the system on semi-synthetic data, i.e. 3D-tracker data
from real executions of the actions combined with a computer graphics
model of a human.
The primitives are extracted from motion images and are recognized in
each frame based on a trained classifier. This results in a sequence of
primitives. From this sequence we recognize different temporal actions
using a probabilistic Edit Distance method.

May 1, 2007:
Giang Phuong Nguyen
Title: Similarity Based
Visualization of
Image Collections
With
the tremendous growing of image archive and
collection, there is a definite need to build up systems those are
capable of managing them efficiently. In this talk, I will present my
PhD work on building a system that can process a large image
collection. As images are visual objects, in multimedia systems in
general, and for the purpose of searching in particular, visualization
step should be presented. In our system, we analyze
requirements
for general visualization system of image collections. With the
emphasis on processing large image collections, we then discuss how to
satisfy those requirements.
March 13, 2007:
Michael Nielsen
Title: Segmentation of
Soft Shadows based on
a Daylight- and Penumbra Model
I will introduce a new concept within
shadow segmentation
for usage in shadow removal and augmentation through construction
of an alpha overlay shadow model. Previously, an image was
considered to consist of shadow and non-shadow regions. I construct a
model that accounts for sunlit, umbra and penumbra regions. The model
is based on theories about color constancy, daylight, and the geometry
that causes penumbra. The behavior of the model is analyzed and a
graph cut energy minimization is applied to estimate the alpha
parameter.
The approach is demonstrated on natural complex image situations.
The results are convincing, but the alpha gradient in penumbra must be
improved.
|
January 17, 2007:
Morten F. Christensen
Title: K means clustering
Classification is a thorough task that might
involve
several processes, often it requires a deeper insight of the objects
that are suppose to be classified and are limited to specific setup.
Multi-Image Matching using Multi-Scale Oriented Patches (MOPS) has been
chosen as a natural landmark detector (explained May 17th, 2006). In
general is a fast recognition method, e.g.used for image-stitching, and
is useful for rigid objects such as buildings. MOPS introduces
index-based matching, but still requires images to be matched
pair-wise. With many images, this is a slow process. Therefore all
features, extracted from the images of the ground truth, are used to
form a "vocabulary tree". The vocabulary tree is a hierarchically
quantization of data, that splits data into clusters recursively on the
basis of k-means. Classification can now, even with many images, be
accomplished within a second or less.