Deep Learning for Image Analysis and Semantic

Abstract

Mobile manipulation is a fundamental element of Embodied AI, crucial for agents tasked with interactive and adaptive functions. The visual room rearrangement challenge is a key test of an agent’s proficiency in reconfiguring items within a space according to a specified layout, relying solely on visual cues. Our research delves into the intricate workings of the AI2-THOR platform, pinpointing the principal factors that influence task performance. We focus specifically on the most pivotal challenges that hinder effective rearrangement. In response, we propose a sophisticated method tailored for the AI2-THOR simulation environment. This method innovates through the development of openness detection models tailored for various object categories and the construction of detailed voxel-based semantic maps to accurately identify and categorize objects that require rearrangement. Our experimental results, derived from a dataset specifically compiled for this study, demonstrate that our method significantly surpasses the performance of the previous year’s competition winner, marking a notable advancement in the field of Embodied AI.

Details

Title

Deep Learning for Image Analysis and Semantic Mapping in Visual Room Rearrangement on the AI2-THOR Platform

Author

Tang, Xinran

Publication year

2024

Publisher

ProQuest Dissertations & Theses

ISBN

9798382721064

Source type

Dissertation or Thesis

Language of publication

English

ProQuest document ID

3058327534

Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.

Deep Learning for Image Analysis and Semantic Mapping in Visual Room Rearrangement on the AI2-THOR Platform

Content area

Abstract

Details

Suggested sources