The data presented on this page relates to work on some of the challenges that arise when doing pose estimation of interacting people in video sequences.

|
Number of frames |
Image resolution |
Ground truth frames |
Download |
|
|
CVRR Hug |
148 |
696x520 px |
50 |
.zip(~12Mb) |
|
HERMES Crossing |
161 |
360x240 px |
75 |
.zip(~133Mb) |
Ground truth annotations are available as Matlab .mat files. The ground truth is the image coordinates of the corners of the enclosing rectangle of each visible body part. Body parts are furthermore marked as occluded when no more than half of the body part is visible (see example visualization below).

10 body parts are annotated: head, torso, left/right upper arm, left/right lower arm, left/right upper leg, and left/right lower leg (white body parts are occluded).
The .mat files contain a struct (size: num_images x num_persons). Each entry in this struct holds the annotations for a person in an image (as another struct). For each person a 10x4x2 matrix contains the four (x,y) coordinates of the body part rectangles and a 10x1 matrix contains the binary indication of whether or not the body part is occluded.
Some body parts are not visible at all in the images and the coordinates of the body part rectangles are in such cases set to (0,0).
Contact Preben Fihl for further information about the dataset.