|
Research |
| MoPrim has two major research area: |
| |
| Action recognition | |||
We recognize actions in four steps. First we detect motion by double
difference images and enhance the detected motion by morphological filters.
The motion is represented by four low level features. These features are
used in the primitive recognition which is based on a Mahalanobis classifier.
The classified primitives constitute a string which is sorted before the
action recognition classifies the string by use of a probabilistic edit
distance.
We are working with both real and semi-synthetic video. The semi-synthetic video is based on motion data from a magnetic tracking system which is visualized with commercial software. In this way we get a real movement and at the same time we are able to control camera positions, lights, clothes of the model ect.
![]() Download video | |||
We calculate double difference images to detect motion in the images. The double difference
images are rather independent to illumination changes and clothing types and styles.
Furthermore, no background model or person model is required. By morphology we obtain a
"motion-cloud" from the motion pixels in the double difference image.
![]() Download video | |||
| We use four features to represent this motion-cloud. In order to make the features independent of image size and the person's position in the image they are represented as ratios. Furthermore, they are defined with respect to a reference point currently defined as the center of gravity of the person. | |||
|
![]() |
||
Primitive recognition is done by classifying the features from a double difference image
as one of ten primitives. This is done by calculating the Mahalanobis distance and
choosing the primitive with the smallest distance. Allying this process to a video
sequence and having each primitive represented by a letter will result in a text string
representing the action performed in the sequence.
![]() Download video | |||
|
During a training phase a string representation of each action to be
recognized is learned. The task is now to compare each of the learned actions
(strings) with the detected string. Since the learned strings and the detected
strings (possibly including errors!) will in general not have the same
length, we apply the Edit Distance method for the string comparison.
The string from the primitive recognition is pruned by first removing Ø's, isolated instances,
and then all repeated letters.
String from primitive recognition = { Ø, Ø, B, B, B, B, B, E, A, A, F, F, F, F, Ø, D, D, G, G, G, G, Ø} Pruned string = { B, A, F, D, G } Further more we apply a probabilistic version of the Edit Distance by generating a weight to reflect the number of repeated letters and then using these weights as costs in the Edit Distance algorithm. Weights = { 5, 2, 4, 2, 4 } Finally the action performed in the video sequence will correspond to the action with the smallest edit distance. | |||