Content area

Abstract

Mobile manipulation is a fundamental element of Embodied AI, crucial for agents tasked with interactive and adaptive functions. The visual room rearrangement challenge is a key test of an agent’s proficiency in reconfiguring items within a space according to a specified layout, relying solely on visual cues. Our research delves into the intricate workings of the AI2-THOR platform, pinpointing the principal factors that influence task performance. We focus specifically on the most pivotal challenges that hinder effective rearrangement. In response, we propose a sophisticated method tailored for the AI2-THOR simulation environment. This method innovates through the development of openness detection models tailored for various object categories and the construction of detailed voxel-based semantic maps to accurately identify and categorize objects that require rearrangement. Our experimental results, derived from a dataset specifically compiled for this study, demonstrate that our method significantly surpasses the performance of the previous year’s competition winner, marking a notable advancement in the field of Embodied AI.

Details

Title
Deep Learning for Image Analysis and Semantic Mapping in Visual Room Rearrangement on the  AI2-THOR Platform
Author
Tang, Xinran
Publication year
2024
Publisher
ProQuest Dissertations & Theses
ISBN
9798382721064
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
3058327534
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.