
Description
Hand-held Object Dataset (HOD) is collected for hand-held object recognition, the goal of which is to recognize the specified objects held in users' hand.
Following the trend of recogntion during human-object interaction, HOD not only provides the point clouds, but also the three-dimensional coordinates of human skeleton as the context.
This dataset has 16 categories and each has four object instances, 12800 images in total. Each image consists of a colored point cloud and human skeletal data.
Data Collection
With the Kinect mounted still at chest height, we let two people collect 100 frames holding each object instance, respectively in two different scenes. We use OpenNI to derive the colored point cloud, and use use NITE, a Middleware of OpenNI, to derive skeletal coordinates. Frames are collected at 1 fps. The resolution setting of Kinect output is 640*480. We have performed simple recovery algorihtms on missing depth values.
Data Format
For each frame, we store the point cloud and skeletal data respectively in two different files. The point cloud is stored in standard PCD (Point Cloud Data) file format.
More details about PCD format, see here. The skeletal data contains 15 points, respectively for
JOINT_HEAD,JOINT_NECK,JOINT_LEFT_SHOULDER,JOINT_RIGHT_SHOULDER,JOINT_LEFT_ELBOW,JOINT_RIGHT_ELBOW,
JOINT_LEFT_HAND,JOINT_RIGHT_HAND,JOINT_TORSO,JOINT_LEFT_HIP,JOINT_RIGHT_HIP,JOINT_LEFT_KNEE,
JOINT_RIGHT_KNEE. Each coordinate is defined as <basedir>/<category_name>/<instance_id>/<scene_id>/<person_id>/<file>.pcd
and the associating skeleton file is.
under the same directory with file extension '.ske'.