Research Report 2003

 

COMPUTER VISION and Media Technology (CVMT)

 

 

COMPUTER VISION

 

Automatic Acquisition of Visual Landmarks / Automatisk indlæring af visuelle landemærker

A mobile robot can autonomously navigate through for example a production facility by using a video camera to monitor known visual landmarks in the environment. Landmarks can be light switches, posters, door signs, etc. An especially flexible system is achieved if the robot can automatically acquire (learn) such landmarks for future navigation use. This project develops techniques for automatic learning of visual landmarks by acquiring images of the environment and extracting positions and appearances of landmark candidates. These candidates are then stored in the robot's memory for subsequent use in controlling robot movements. CVMT has special focus on analysis of how accurately this navigation method can perform, in order to optimise the use of the available landmarks. In 2003 this effort resulted in a Ph.D. degree being awarded to Salvatore Livatino.
Project under VIRGO, (EU, TMR).
(Claus B. Madsen; Salvatore Livatino, Italy)

 

 

 

 

Segmentation of skin colour under different illuminations and estimation of the colour of illuminations / Segmentering af hudfarve under forskellige lyskilder og estimering af lyskilders farve

Automatically detecting and tracking faces and hands of humans in motion is important in applications such as interfaces for human computer interaction (HCI), in‑ and outdoor surveillance, and automatic camera men. An often‑used feature in face and human motion tracking is skin colour because it is fast to compute and invariant to size and orientation. However, the colour of skin changes as the illumination changes. This project aims to develop adaptive methods for segmenting human skin under changing illumination conditions. A physics based reflection model of human skin has been developed, and successfully applied, e.g., to estimate the illumination colour and adapt statistical models for skin colour, respectively. Furthermore, colour information has been combined with near infrared information in order to improve the skin detection robustness.
Partly funded by ARTHUR.
(Moritz Störring, Erik Granum, Hans J. Andersen)

 

 

 

Assessment of crops by outdoor computer vision / Afgrødevurdering og udendørs computer vision

Visual information of crops has for centuries been used by mankind for assessment of the growth condition of crops. By visual information the spatial distribution of reflection from the canopy is used for detection of specific patterns related to lack of nutrients or infestation by diseases. I.e. lack of manganese will turn out as light brown spots on the leaves whereas lack of calcium will turn out as miscolouring of leaves from the leaf edge and inwards.
So far, development of sensors for detection of plant nutrient deficiency has been oriented towards nitrogen and with sensor types that do not take the spatial pattern of the reflection into account.
The objective of  this project is to develop computer vision methods that may enable a more detailed analysis of the reflection from canopies by: 1) identification of areas without specular reflection and 2) correct images for uneven illumination conditions.
In this way a reliable quantitative analysis of the canopy may be obtained for optimal growth control.
The project was concluded with a report early 2003 and providing the following major contributions:
- A pioneering investigation of computer vision analysis of outdoor scenes with high dynamic range of intensities
- 3D reconstruction and description of plants for detection of reflection patterns
- Modelling of "Red Edge Inflextion Point" using the Wiebull function for accurate estimation of the "clorofyl" content and its distribution across the leaf segments
The two first studies have given promising results and international publications. All results will be followed up in the ACROSS project.
Funded by The Danish Agricultural and Veterinary Research Council, February 2002-March 2003.
(Hans J. Andersen)

 

 

ACROSS: Autonomous spatial‑temporal crop and soil surveying / Autonom spatio temporal afgrøde‑ og jordmonitorering

The general vision is effective precision farming, which in harmony with the environment utilises resources optimally. This requires continuous selective and adaptive control of growth, weeds, diseases and pest. In turn such control is conditioned on corresponding continuous monitoring in the field using appropriate methods of measuring the current conditions of and for the plant growth.
The objective of this project is to develop methods of measuring and managing such information to support the above vision in a way that invites also new innovative approaches to precision farming by providing the necessary information on demand and in time for planning and decision making.
More concretely the project will develop methods and technology for: 1) Computer vision and laser range methods for on‑site and real‑time monitoring of information of the crop growth (nutrients, diseases, etc.). The methods will allow diagnostics of crop condition based on reflection patterns (e.g. miscoloured areas) down to single leaf scale. 2) Implementation and integration of the above methods on an autonomous platform with a suite of existing crop and soil measuring facilities for on‑site operation. 3) Repeated test and evaluation for development and proof of concept: "Autonomous Crop and Soil Surveillance".
Hence, the project through new research and practical development will contribute to a new scope precision agriculture, which until now has not been seen in its full perspectives, due to the lack of precise and timely information.

The projekt is carried out of a consortium comprising CVMT and 3 departments of the Danish Institute of Agricultural Science: Agricultural Engineering (Bygholm), Agricultural Systems / Crop Physiology and Soil Science (Foulum), and Plant Biology (Flakkebjerg).
Funded by: The Danish Technical Research and Agricultural and Veterinary Research Councils and the Danish Ministry of Food, Agriculture and Fisheries.
(Erik Granum, Hans J. Andersen, Michael Nielsen, Kristian Kirk)

 

 

Outdoor computer vision for analysis of vegetation  / Udendørs computer vision til analyse af vegetation

This project deals with problems such as segmentation of vegetation from a single image, and 3-D reconstruction from two or more images (stereopsis). A major challenge in outdoor colour-based segmentation is to make it robust towards changing illumination conditions. It will be examined how existing methods for illumination invariant analysis perform when the assumptions of these methods are violated in real-world situations (e.g., mixed illuminants, non-diffuse objects). In the end of 2003, an experiment was started in collaboration with Anton Thomsen, Foulum, with the purpose of comparing the use of laser range measurements with computer vision for gap fraction (and leaf area index) estimation.

(Kristian Kirk, Erik Granum, Hans Jørgen Andersen)

 

 

Computer vision analysis of biological semi transparent colour objects / Computer vision analyse af biologiske semi transparente farveobjekter

This Ph.D. project is mainly concerned with the reconstruction of the 3D structure of plants, i.e. of all elements of the canopy with stems and leaves and their interconnections. Identification and analysis of the individual 3D elements of a canopy will be the basis for development of descriptive methods for the overall structure of the canopy. Such descriptions will provide a means for the analysis of the position and function of these elements in the plant structure as well as the characteristics (and health) of the overall structure itself. Currently, the co-operation also extents outside the ACROSS consortium. Together with KVL, experiments are conducted to detect lead leaf area index, lead leaf tips and -bases on barley for automation of sampling of hyper-spectral data.

Funded through the ACROSS project

(Michael Nielsen, Hans Jørgen Andersen, Erik Granum)

 

 

Computer vision‑based human motion capture using kinematics constraints / Computer vision registrering af humane bevægelser vha. kinematiske begrænsninger

In man‑machine‑interfaces there is a great need for interaction methods more natural to humans, e.g. via speech and body language. The latter is based on a computer capturing the movements of the individual body parts and recognizing their meaning. In this project computer vision is utilized to investigate the capturing problem. The key approach is to have a geometric model of the articulated body parts. The model is used to predict possible “next”configurations given the past configurations. The predicted configurations are compared with the image measurements and the true configuration of the human body is captured. To optimize this process detailed kinematic constraints related to the articulated body parts are introduced to limit the search space of possible configurations. The limited search space is, however, still too large for a brute force search and therefore a probabilistic approach, in the form of a sequential Monte Carlo method, is adapted to identify the most likely configuration. In 2003 the project is concluded with a PhD degree being awarded to Thomas B. Moeslund.
(Thomas B. Moeslund, Erik Granum)

 

 

Computer vision based interface to virtual reality / Computer vision baseret interface til virtual reality

The current interfaces in the VR‑Media Lab's CAVE is based on physical devices with long wires. These wires are often disturbing for the user and, thus, reduce the immersiveness. In this effort it has been investigated how computer vision can be applied to create interfaces providing the same performance as the wired devices. The approach is based on four infrared cameras tracking markers attached to the user's VR-glasses and interface device, respectively, in order to estimate the user's viewing direction and interface position. This way of interaction is wireless and by that less disturbing for immersiveness. A tracking method has been developed and implemented that performs robustly with low latencies and update rates up to 200Hz.
Partly funded by ARTHUR and VR‑Media Lab.
(Niels Tjørnly Rasmussen, Moritz Störring, Thomas D. Nielsen, Thomas B. Moeslund, Erik Granum)

 

 

FG‑NET ‑ Face and Gesture Recognition Working Group / Ansigts‑ og gestusgenkendelsesarbejdsgruppe

FG‑NET is a 6 partner EU‑IST Concerted Action/Thematic Network (IST‑2000‑26434) that started in 2001 and will run until 2004. The aim of this project is to encourage technology development in the area of face and gesture recognition. The precise goals are: (1) to act as a focus for the workers developing face and gesture recognition technology; (2) to create a set of foresight reports defining development roadmaps and future use scenarios for the technology in the medium (5‑7 years) and long (10‑20 years) term; (3) to specify, develop, and supply resources (e.g. image data sets) supporting these scenarios, and (4) to use these resources to encourage technology development. The use of shared resources and data sets to encourage the development of complex processes and recognition systems has been very successful in the speech analysis and recognition field, and also in the image analysis field in the few specific cases where it has been applied. The basis of this project is that, when properly defined and collected, such resources would also be of benefit in the development of solutions to wider problems in face and gesture recognition. Currently a large data set containing pointing gestures is being recorded and annotated.

(Thomas B. Moeslund, Moritz Störring, Lars Reng, Erik Granum)

 

 

 

 

 

 

 

Reconstruction of 3D surface models using video cameras / Rekonstruktion af  3D overflademodeller vha. almindelige videokameraer

The aim of this project is to develop an image processing based system for construction of metrically correct 3D models from monochrome video camera images.  The method is based on calibrated 2D images captured under controlled illumination conditions, as changes in the illumination play an active role in the processing of the images.   From the 2D images, a set of corresponding 2.5D surface patches can be calculated and a surface based alignment method is being developed for joining these surface patches into a common 3D model of the surface. The generated 3D models are expected to have an accuracy compatible with other methods, like MRI‑scanning, but with a significantly lower use of resources.
(Jørgen Bjørnstrup, Erik Granum)

 

 

VIRTUAL REALITY

 

VR MediaLab, Virtual Reality MediaLab

VR MediaLab is the Aalborg University Centre for Virtual Reality and Interactive Media. It was inaugurated in August 1999 with unique computing and visualisation facilities in a dedicated building complex of NOVI, the Science Park of Aalborg University. The VR‑facilities comprise an sgi super computer (16 cpu's, 6 graphics pipes) and three visualisation arenas: (6‑sided CAVE; Panorama screen of 160 dg., 7.1 m diameter; and 3D Power Wall, 8 m wide). The Centre hosts research groups from various departments of the university wanting to operate in the well‑supported interdisciplinary environment with both research and teaching activities. CVMT played a major role in the establishment of the VR Centre, and is now located within it, contributing to, and benefiting from the interdisciplinary environment and the facilities. Upgrading of the computer facilities to PC-cluster is initiated.
Supported by Det Obelske Familiefond, Spar Nord Fonden, and EU Funds for Regional Development.
(Erik Kjems)

 

 

 

 

 

 

 

Virtual Reality Platform for Interactive, Inhabited Virtual Worlds / VR platform til interaktive, beboede verdener

To construct interactive virtual worlds a software platform is needed to 1) simulate and maintain a dynamic 3D model of a scenario, 2) present this simulated world for users using computer graphics and audio, and 3) provide one or more users with interaction facilities. In conjunction with past projects on interactive, inhabited virtual worlds (the STAGING and PUPPET projects) CVMT has designed and developed such a Virtual Reality platform. The platform supports arbitrary scenarios where computer controlled characters and humans interact. Such computer‑controlled characters are called Autonomous Agents, or simply agents. The user can freely navigate within this virtual world, which the software platform visualises in real‑time on a computer screen (or in either of the VR MediaLab's VR arenas). The platform also enables the autonomous agents to move around (supported by path planning), play animations, utter recorded sounds, and change facial expressions. In this way agents can communicate themselves to, and interact with, other agents or the user(s). The platform also enables agents to continuously sense the virtual world, i.e., the agents have simulated vision, audio and tactile senses. A special aspect of CVMT's VR platform is that it supports the user in being represented by an avatar (an agent controlled by the user) in the virtual world. This allows the user to interact with the virtual world and its inhabitants on equal terms with the agents. The platform features unique sound related interaction possibilities for the user. For example the user can communicate with the autonomous agents using sound (the agents can 'hear' the user through a microphone), and the user can record the sounds he/she wants the agents to use in particular situations.
(Claus B. Madsen, Erik Granum)

 

 

Interaction with virtual worlds and their inhabitants / Interaktion med virtuelle verdener og deres beboere

This project focuses on interaction in virtual reality by exploring and further developing the facilities at the VR‑Centre. The effort is carried out in collaboration with the VR Centre. In spite of the experience from a substantial history of interaction with computers, the interaction with VR‑applications is often rather primitive. A range of basic hardware and software problems regarding interaction in the VR installation was uncovered and solved.   Interaction processes were analysed in the light of application contexts and the relevant combinations of input devices and display types. A series of interaction and navigation techniques for VR was developed, implemented , and tested.
 
(Henrik R. Nagel, Søren Bovbjerg, Erik Granum)

 

 

 

 

3D Visual Data Mining (3DVDM)

Both private companies and public institutions regularly collect large databases, but much of the information content in those databases is difficult to extract. With VR‑technology it is possible to create virtual visual worlds based on the characteristics of the data, so that visual data explorers can be immersed in these worlds, navigate around, and observe the data from within. The project develops and investigates the applicability of temporal visualisation methods for the purpose of detection of previously unknown structures and relationships in data. A new and flexible VR visualisation system has been developed and implemented, which allows visualisation of arbitrary temporal developments of data, and furthermore makes it possible to study new forms of interaction. The VR visualisation system is being used by the members of the 3D Visual Data Mining project, as well as by other research groups at Aalborg University. It has allowed the participants of the 3D Visual Data Mining project to develop new methods for exploring data in Virtual Reality based on arbitrary temporal data visualisation. The system has also allowed for addition and integration of software facilities for 3D surround sound generation in VR. Different methods for using sound in the 3D Visual Data Mining system were investigated. It was found to be possible to use dynamic interactive 3D soundscapes to support the visual representation of data, and a series of methods were developed and tested. The result was two major tools with which a soundscape was created based on either user navigation/data windowing or direct querying of the data. The project is interdisciplinary with participation of computer scientists, statisticians, and psychologists, ‑ all from Aalborg University.
Supported by the Danish Research Councils, 1999‑2004.
(Henrik R. Nagel, Erik Granum; M. Böhlen, Department of Computer Science; Peer Mylov, Department of Communication)

 

 

Development of 3D surround sound software facililties for Virtual Reality/Udvikling af 3D surround softwarefaciliteter til Virtual Reality

This project was started as a part of the 3DVisual Data Mining project with the purpose of investigating possibilities for using sound as an additional tool for Data Mining. Generated sound can relate to additional statistical properties of the visualised statistical observations (objects) or to densities of objects with specific properties and generally support navigation in the virtual world.  For this purpose a 3D sound engine was developed, which was able to run on the different hardware configurations that exists at the VR-Centre, as well as on desktop PCs. The sound engine was designed to simulate different important psychoacoustic properties, such as position (using panning algorithms), motion (using doppler effect) and environment (reverb), making it possible to create immersive soundscapes and investigate user response in such environments. The sound engine was also equipped with an advanced musical synthesizer, primarily for data mining purposes, and the design allowed it to function as a stand alone application as well as an integrated part of the VR++ software system. It has been used extensively in the data mining project and as soundscape generator in the Benogo project.
Supported by the Danish Research Councils (2002-2004) as part of the 3DVDM project.
(Søren Bovbjerg, Henrik R. Nagel, Erik Granum)

 

 

ARTHUR: Augmented Round Table for Architecture and Urban Planning / Augmenteret "rundbords‑designværktøj" for arkitekter og byplanlæggere

ARTHUR is a 6 partner EU‑IST‑RTD project (IST‑2000‑28559), Key Action 4, Mixed Realities, that started in 2001 and will continue until 2004. ARTHUR bridges the gap between real and virtual worlds by enhancing the users' current working environment with virtual 3D objects. The project focuses on providing an intuitive environment supporting natural interaction with virtual objects while sustaining existing communication and interaction mechanisms. Real world objects are used as tangible interfaces to make 3D environments attractive even to non‑experts. ARTHUR developed new types of user‑friendly head mounted see‑through displays (HMD), non‑intrusive object tracking mechanisms, and intuitive user interface mechanisms within a location independent multi‑user real‑time augmented reality environment. CVMT develops object and head tracking mechanisms based on computer vision using cameras mounted on the HMD. A colour based tracking has been developed and implemented that is robust to illumination changes and that has real-time performance. Multi-view position and orientation estimation were developed and will be combined with other tracking methods. Furthermore, a computer vision based gesture interface has been developed allowing to recognize gestures without disturbing the user in his or her natural behaviour by cumbersome and wired hand tracking devices.

Funded by EU FP-5, IST-2000-28559, 2001-2004.
(Thomas B. Moeslund, Moritz Störring, Claus B. Madsen, Yong Liu, Erik Granum)

 

 

 

 

 

BENOGO: Being There – Without Going / At få oplevelsen uden at tage turen

BENOGO is a 6 partner EU‑FET project (IST‑2001‑39184), which started in 2002 and will continue until 2005. The project is coordinated by the Computer Vision and Media Technology Group. BENOGO develops and investigates novel computer graphics rendering techniques (the so‑called Image Based Rendering approach) with the purpose of optimizing users’ sense of being present at a location without actually being there. The BENOGO system visualizes existing, physical locations in stereo on a Head Mounted Display, or in any of VRMediaLab’s 3D arenas.  The project’s rendering technique is based on forming new images in real‑time, and in response to user movements, using only data from previously acquired real images of an existing place. With this technology the project can circumvent the 3D modelling problems traditionally associated with standard Virtual Reality, and at the same time achieve a very high level of visual realism (photo‑realism). The project consortium includes experts in the field of psycho‑physics, human perception, and presence research, and these research domains continuously evaluate the project’s rendering technology to optimize the performance, and to develop a theoretical understanding of how the sense of presence is best provided. In addition to visualizing a world to the user, the project also investigates the use of 3D sound, and the visualization is also augmented with virtual objects.

Funded by EU FP5, IST/FET-2001-39184, 2002-2005
(Erik Granum, Claus B. Madsen, Mads Sørensen, Michael Vittrup, Moritz Störring, Henrik Nagel)

 

 

Real and Virtual Shadows in Augmented Reality / Virkelige og kunstige skygger i Augmented Reality

In Augmented Reality virtual objects are visually combined with real objects to create the illusion that the virtual objects are in fact just a part of the real scenario. While this presents many interesting challenges, one area has so far received very little attention, namely the issue of shadows, or more precisely the issue of ensuring that the virtual objects are lit and cast shadows in the same way as the real scene. CVMT is developing techniques for estimating the positions of real scene light sources, and for estimating the spectral properties of real shadows, in order to apply this information in the rendering of the virtual objects. The result is that the virtual objects mix with the real scenario in a much more realistic manner.
In the reporting period activities have primarily focused at estimating the parameters of virtual lighting conditions so that these conditions match the real scene lighting conditions in an optimal manner. This allows us to realistically recreate very complicated lighting conditions in real-time.
Project under BENOGO (IST/FET), and ARTHUR (EU IST), 2002-2005
(Claus B. Madsen, Mads Sørensen, Rune Laursen)

 

 

Software Platform for Real‑Time Image Based Rendering / Software system til real‑tids billedbaseret visualisering

Image Based Rendering (IBR) is a visualization technique offering some clear advantages over traditional 3D model‑based computer graphics. Primarily IBR provides a much higher level of visual realism. The disadvantage of IBR is that it cannot (yet) be supported by fast purpose‑designed graphics hardware. The Computer Vision and Media Technology group has developed a software platform specifically for the IBR requirements of the BENOGO project. This platform integrates IBR software, traditional model‑based computer graphics rendering, and 3D sound rendering with user tracker technology. The platform enables a user to visually explore (move around in) a world being presented through visualization and sound. The primary challenge is to develop sufficient support for the massive data exchange and transformation required to do Image Based Rendering in real‑time. For this purpose the developed platform supports processing and input data to be distributed on an arbitrary number of standard computers, in order to achieve sufficient computing power and available memory.
Project under BENOGO (IST/FET),2002-2005
(Claus B. Madsen, Michael Vittrup, Henrik Nagel, Erik Granum)

 

 

Realistic Real-Time Visualization for Augmented Reality /

In support of real-time interactive Augmented Reality (AR) applications  a flexible software system for handling the visualization processes associated with AR is being developed. The system is aimed at supporting photo-realistic AR, and therefore handles occlusions between virtual and real geometry, as well as consistent lighting between virtual and real scene elements. For example the system employs the techniques described under “Real and Virtual Shadows in Augmented Reality” described above.

A particular feature of the system as that it operates with separate representations of the real and virtual scene elements so as to ensure correct occlusions and to be able to generate the virtual shadows of the virtual objects (virtual shadows cast also on real objects). The system can operate in video-see-through mode (where the real world is recorded with a camera and the image displayed to the user), or in optical-see-through mode, where the user sees the real world in see-through Head Mounted Displays.

The effort is aiming at developing a stand alone general-purpose AR system based on a standard computer, a flat panel screen and a video camera. The system will be used to demonstrate state-of-the-art realistic AR , e.g., for edutainment applications.

  (Claus B. Madsen, Rune Laursen, Erik Granum)