Content area
Abstract
Mobile manipulation is a fundamental element of Embodied AI, crucial for agents tasked with interactive and adaptive functions. The visual room rearrangement challenge is a key test of an agent’s proficiency in reconfiguring items within a space according to a specified layout, relying solely on visual cues. Our research delves into the intricate workings of the AI2-THOR platform, pinpointing the principal factors that influence task performance. We focus specifically on the most pivotal challenges that hinder effective rearrangement. In response, we propose a sophisticated method tailored for the AI2-THOR simulation environment. This method innovates through the development of openness detection models tailored for various object categories and the construction of detailed voxel-based semantic maps to accurately identify and categorize objects that require rearrangement. Our experimental results, derived from a dataset specifically compiled for this study, demonstrate that our method significantly surpasses the performance of the previous year’s competition winner, marking a notable advancement in the field of Embodied AI.





