Besides instance-level object retrieval and recognition, INSTRE serves many other computer vision algorithms, such as
detection, invariant features and feature matching. The whole dataset is split into three disjoint subsets INSTRE-S1 (for single object case 1),
INSTRE-S2 (for single object case 2) and INSTRE-M (for multiple object case).
INSTRE-S1 and INSTRE-S2 are collected for measuring single object case, both of which have 100 object classes. INSTRE-S1 contains 11011
images and INSTRE-S2 contains 12059 images. We group all the
100 objects from INSTRE-S1 into 50 two-tuples. INSTRE-M contains 5473 images distributed into such 50 two-tuple
classes. Note that INSTRE-S1 and INSTRE-S2 allow multiple appearances of the same object in one
image while each image in INSTRE-M strictly displays two different objects. Objects are roughly categorized into three basic classes: architectures (buildings and sculptures), planar
objects (designs, paintings and planar surface) and daily stereoscopic objects (toys and irregularly-shaped products).
Duplicates, low resolution and blurs are removed. Only color images of 200×200 or larger are kept. All images are in JPEG format and have been resized with a maximum value of height and width equal to 1000 pixels, preserving aspect ratio.
INSTRE has the following major properties: (1) balanced data scale (over 100 images for each class), (2) more diverse variations for each object, (3) cluttered and natural backgrounds, which are less correlated with the objects, (4) object localization annotation through bounding boxes for each image (5) containing well-manipulated images for measuring multiple object (within one image) case.
Data CollectionWe collected images from multiple sources. Internet sources include various image search engines(e.g. Baidu, Bing, Google, Picsearch, Altavista), social networks (e.g. weibo, facebook) and photo sharing communities (e.g. Flickr, nipic). Images from the Internet are distributed into 100 object classes in INSTRE-S2. Specially, in order to complete INSTRE-S1 and INSTRE-M, we recorded 100 objects that were mainly discovered from the personal photos and meanwhile available at hand. The 100 objects were reorganized semi-randomly into 50 groups (2-tuples), with 2 objects per group. For each group, one is required to take over 100 photos respectively for each object and another 100 photos for their co-occurrence. In this way, photos for one group would give two classes in INSTRE-S1 and one mapped 2-tuple class in INSTRE-M. 30 people were involved in this photographing task and in order to have adequate distinct backgrounds, we selected 25 named sites in Beijing including outdoors (parks, zoos, wild fields, universities etc.) and indoors (museums, malls, exhibitions etc). Many other sites that were unnamed and common included streets, communities, homes, offices etc.
|Three subsets are disjoint.||28543 images|
|INSTRE-S1||images downloaded from multiple Internet sources||100||11011|
|INSTRE-S2||photos recoding objects at hand (one object per photo)||100||12059|
|INSTRE-M||photos recoding objects at hand (two objects per photo)||50||5473|
Example images in INSTRE
Object Location Annotations Examples
Average image examples for some classes
Exemplars to show the intra-class variations
Analysis of Backgrounds and Objects