Research
MoPrim has two major research area:

Action recognition
We recognize actions in four steps. First we detect motion by double difference images and enhance the detected motion by morphological filters. The motion is represented by four low level features. These features are used in the primitive recognition which is based on a Mahalanobis classifier. The classified primitives constitute a string which is sorted before the action recognition classifies the string by use of a probabilistic edit distance.



We are working with both real and semi-synthetic video. The semi-synthetic video is based on motion data from a magnetic tracking system which is visualized with commercial software. In this way we get a real movement and at the same time we are able to control camera positions, lights, clothes of the model ect.

Download video
We calculate double difference images to detect motion in the images. The double difference images are rather independent to illumination changes and clothing types and styles. Furthermore, no background model or person model is required. By morphology we obtain a "motion-cloud" from the motion pixels in the double difference image.


Download video
We use four features to represent this motion-cloud. In order to make the features independent of image size and the person's position in the image they are represented as ratios. Furthermore, they are defined with respect to a reference point currently defined as the center of gravity of the person.
  1. The eccentricity of the motion cloud defined as the ration between the minor and major axes of the ellipse.
  2. The orientation Φ of the ellipse.
  3. The minimum ratio r between the length of the major axis and the distance d from the reference point to the center of the ellipse.
  4. The angle Θ between the reference point and the center of the ellipse.


Primitive recognition is done by classifying the features from a double difference image as one of ten primitives. This is done by calculating the Mahalanobis distance and choosing the primitive with the smallest distance. Allying this process to a video sequence and having each primitive represented by a letter will result in a text string representing the action performed in the sequence.


Download video
During a training phase a string representation of each action to be recognized is learned. The task is now to compare each of the learned actions (strings) with the detected string. Since the learned strings and the detected strings (possibly including errors!) will in general not have the same length, we apply the Edit Distance method for the string comparison. The string from the primitive recognition is pruned by first removing Ø's, isolated instances, and then all repeated letters.

String from primitive recognition = { Ø, Ø, B, B, B, B, B, E, A, A, F, F, F, F, Ø, D, D, G, G, G, G, Ø}
Pruned string = { B, A, F, D, G }

Further more we apply a probabilistic version of the Edit Distance by generating a weight to reflect the number of repeated letters and then using these weights as costs in the Edit Distance algorithm.

Weights = { 5, 2, 4, 2, 4 }

Finally the action performed in the video sequence will correspond to the action with the smallest edit distance.