INSTRE: for INSTance-level object REtrieval and REcognition

Shuang Wang     Shuqiang Jiang

Institute of Computing Technology, CAS


Besides instance-level object retrieval and recognition, INSTRE serves many other computer vision algorithms, such as detection, invariant features and feature matching. The whole dataset is split into three disjoint subsets INSTRE-S1 (for single object case 1), INSTRE-S2 (for single object case 2) and INSTRE-M (for multiple object case). INSTRE-S1 and INSTRE-S2 are collected for measuring single object case, both of which have 100 object classes. INSTRE-S1 contains 11011 images and INSTRE-S2 contains 12059 images. We group all the 100 objects from INSTRE-S1 into 50 two-tuples. INSTRE-M contains 5473 images distributed into such 50 two-tuple classes. Note that INSTRE-S1 and INSTRE-S2 allow multiple appearances of the same object in one image while each image in INSTRE-M strictly displays two different objects. Objects are roughly categorized into three basic classes: architectures (buildings and sculptures), planar objects (designs, paintings and planar surface) and daily stereoscopic objects (toys and irregularly-shaped products).

Data Process

Duplicates, low resolution and blurs are removed. Only color images of 200×200 or larger are kept. All images are in JPEG format and have been resized with a maximum value of height and width equal to 1000 pixels, preserving aspect ratio.


INSTRE has the following major properties: (1) balanced data scale (over 100 images for each class), (2) more diverse variations for each object, (3) cluttered and natural backgrounds, which are less correlated with the objects, (4) object localization annotation through bounding boxes for each image (5) containing well-manipulated images for measuring multiple object (within one image) case.

Data Collection

We collected images from multiple sources. Internet sources include various image search engines(e.g. Baidu, Bing, Google, Picsearch, Altavista), social networks (e.g. weibo, facebook) and photo sharing communities (e.g. Flickr, nipic). Images from the Internet are distributed into 100 object classes in INSTRE-S2. Specially, in order to complete INSTRE-S1 and INSTRE-M, we recorded 100 objects that were mainly discovered from the personal photos and meanwhile available at hand. The 100 objects were reorganized semi-randomly into 50 groups (2-tuples), with 2 objects per group. For each group, one is required to take over 100 photos respectively for each object and another 100 photos for their co-occurrence. In this way, photos for one group would give two classes in INSTRE-S1 and one mapped 2-tuple class in INSTRE-M. 30 people were involved in this photographing task and in order to have adequate distinct backgrounds, we selected 25 named sites in Beijing including outdoors (parks, zoos, wild fields, universities etc.) and indoors (museums, malls, exhibitions etc). Many other sites that were unnamed and common included streets, communities, homes, offices etc.

Subset Description #Classes #Images
Three subsets are disjoint. 28543 images
INSTRE-S1 images downloaded from multiple Internet sources 100 11011
INSTRE-S2 photos recoding objects at hand (one object per photo) 100 12059
INSTRE-M photos recoding objects at hand (two objects per photo) 50 5473

Example images in INSTRE

Object Location Annotations Examples

Object Location Annotations

Every image in this dataset is annotated with upright rectangular bounding box for each present object, whose coordinates are recorded in a .txt file.

Each image in <basedir>/<subset>/<class>/<file>.jpg has its annotation file in <basedir>/<subset>/<class>/<file>.txt.

In each annotation file, each line records the coordinates of one bounding box in format: <x of top-left point><space><y of top-left point><space><width><space><height>. All numbers are integers.

Specially, for each tuple-class in INSTRE-M, there are two corresponding object classes in INSTRE-S1. In each annotation file for a INSTRE-M image, the first line records the object labeled as [a] in INSTRE-S1 and the second line records the object labeled as [b] in INSTRE-S1.

Average image examples for some classes

Exemplars to show the intra-class variations

Analysis of Backgrounds and Objects

(a) The left figure displays nearly random confusion across backgrounds of all 200 object classes in background classification. Features inside object bounding boxes are ignored. (b) The right figure shows the patterns of all bounding boxes in INSTRE-S1 and INSTRE-S2. Classes are distinguished through different marker types and colors.


Download the dataset here (images and annotations). The file size is 2.34 GB.

Related Paper

INSTRE: a New Benchmark for Instance-Level Object Retrieval and Recognition
Shuang Wang and Shuqiang Jiang
INSTRE: A New Benchmark for Instance-Level Object Retrieval and Recognition.
ACM Trans. Multimedia Comput. Commun. Appl. 11, 3, Article 37 (February 2015), 21 pages. DOI=10.1145/2700292


  • This dataset (including images and location annotations) is made publicly available at 03/08/2014.

Earlier Related Dataset


If you have any questions, corrections or other issues, please contact Shuqiang Jiang (sqjiang[at]