ION(Instance-level Object Navigation)

Overview

Visual object navigation is a fundamental task in Embodied AI. Previous works focus on the category-wise navigation, in which navigating to any possible instance of target object category is considered a success. Those methods may be effective to find the general objects. However, it may be more practical to navigate to the specific instance in our real life, since our particular requirements are usually satisfied with specific instances rather than all instances of one category. How to navigate to the specific instance has been rarely researched before and is typically challenging to current works.
Therefore, we introduce a new task of Instance Object Navigation (ION), where instance-level descriptions of targets are provided and instance-level navigation is required. In particular, multiple types of attributes such as colors, materials and object references are involved in the instance-level descriptions of the targets. For the task evaluation, we build the ION dataset for instance-level object navigation on AI2-Thor simulator, where over 27,735 object instance descriptions and navigation groundtruth are automatically obtained through the interaction with the simulator.

ION Dataset Collection

We collect our data with the interactive simulator AI2-Thor, which has four types near-realistic scenes: kitchen, living room, bed- room, bathroom, each containing 30 different rooms. Through the statistics on all environments, we find that except for the structure objects like cabinet and drawer, most objects have one or a few instances in each room, which may be too simple for the evaluation of instance-level navigation. Note that, when one room only contains an instance of the target object, the instance-level navigation is degenerated to category-level navigation. Although the AI2-Thor simulator supports duplicating objects of certain types and randomly place them somewhere, we argue that simply copying and randomly placing does not fit the practical setting in the real world. Since in reality, it is common to see various instances under the same category, like kitchen usually distributes many plates, pans, pots and bowls, and most of them appear in different colors or materials. Therefore, we modify the scenes of AI2-Thor with the Unity Editor and add a new operation to instantiate objects into multiple instances with specific colors, materials and placement receptacles. A quadruple is used to describe the target instance with ⟨𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦,𝑐𝑜𝑙𝑜𝑟,𝑚𝑎𝑡𝑒𝑟𝑖𝑎𝑙,𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒⟩, where 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦,𝑐𝑜𝑙𝑜𝑟,𝑚𝑎𝑡𝑒𝑟𝑖𝑎𝑙 are instance attributes and 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 represents the spatial relationship between target instance 𝐺 and related instance 𝐺' (e.g.,on the table). Furthermore, an offline-sampling and automatic annotation system is built to collect the ION dataset.

Target Selection

Our target objects involve 29 categories, following the rules:

(1) The object instances are directly visible if they do exist in the scene, excluding cases like apple in the fridge or fork in the drawer;
(2) The instance is not salient and single which can be easily observed, like the bed or toilet;
(3) Considering the practical scenes, some categories usually have only one instance while other categories have multiple instances. Hence, according to the scene types, we divide these 29 selected categories into single-instance category and multi-instance category. Each multi-instance category matches different instance templates (different templates often relate to different materials), specified colors and placement receptacles.
Object Instantiation

As to the target instance 𝐺 described with ⟨𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦,𝑐𝑜𝑙𝑜𝑟,𝑚𝑎𝑡𝑒𝑟𝑖𝑎𝑙,𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒⟩:

(1) The 𝑐𝑜𝑙𝑜𝑟 involves 10 kinds of primary colors: Black, Gray, White, Red, Orange, Yellow, Green, Cyan, Blue, Purple;
(2) The 𝑚𝑎𝑡𝑒𝑟𝑖𝑎𝑙 involves 15 kinds of materials: Metal, Wood, Plastic, Glass, Ceramic, Stone, Fabric, Rubber, Food, Paper, Wax, Soap, Sponge, Organic, Leather;
(3) The 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 covers 4 spatial relationships between the target instance 𝐺 and referring instance 𝐺' ,involving 𝐺 h𝑜𝑙𝑑 𝐺', 𝐺 𝑜𝑛 𝐺', 𝐺 𝑖𝑛 𝐺' and 𝐺 𝑛𝑒𝑎𝑟 𝐺'. Note that the 𝑛𝑒𝑎𝑟 relationship means a distance within 0.5𝑚. The target instances and the referring instances cover 44 object categories in total.

For each scene, we first select some related multi-instance categories with a certain probability, and then instantiate the objects to generate several random instances with specific colors, materials and placements. Note that different object category has related range of colors and placement categories. To avoid the generated instances in the scene being too dense or too sparse, we separately take 5 rounds of scene placements for each room. For each round, we choose 7 multi-instance categories at most and then create 2~5 specific instances respectively.
Automatic Annotation

Once the above instantiation rules and scene placement solutions have been confirmed, the scene sampling and instances annotations can be automatically processed. We also take the offline sampling on each new environment after inserting multiple instances. Simultaneously, the newly modified simulator can offer instances information on colors, materials and surrounding references with receptacle relationship, containment relationship and neighbor relationship. Thus, it is convenient to automatically annotate the instances descriptions and build the ION dataset.

ION Dataset Download

Our ION dataset contains 600 environments with different instances layout, which is 5 times the number of original AI2-Thor environments but still covers 4 types of scenes: kitchen, living room, bedroom, bathroom. For a specific instance in the environment, different quadruple descriptions can be generated according to its different references, like the receptacle and neighbors. Therefore, the ION dataset totally involves 27,735 instance descriptions.

Data Splits

For each type of scene, we choose 100 environments for training, 25 for validation and 25 for testing. Note that the test environments are unseen in the training and validation sets.

Download

ION_dataset.zip (~63G, after decompression ~140G)

Paper

ION: Instance-level Object Navigation

Weijie Li, Xinhang Song, Yubing Bai, Sixian Zhang, Shuqiang Jiang. 29th ACM International Conference on Multimedia (ACM Multimedia 2021), Chengdu, China, October 20-24, 2021.

[PDF] [Code]

Video

Contact Us

For any questions, please feel free to contact us :)

weijie.li@vipl.ict.ac.cn; sqjiang@ict.ac.cn

ION Dataset