In this paper, we present a visual learning framework to retrieve a 3D model and estimate its pose from a single image. To increase the quantity and quality of training data, we define our simulation space in the near infrared (NIR) band, and utilize quasi-Monte Carlo (MC) method for scalable photorealistic rendering of manufactured components. Two types of Convolutional Neural Networks (CNNs) architectures are trained over these synthetic data and relatively small amount of real data. The first CNN model seeks the most discriminative information to classify industrial components with fine-grained shape attributes. Once a 3D model is identified, one of the category-specific CNNs is tested for pose regression in the second phase. The mixed data for learning object categories is useful in domain adaptation and attention mechanism in our system. We validate our data-driven method with 88 component models, including one practical product, and the experimental results are qualitatively demonstrated. Also, the CNNs trained with various conditions of mixed data are quantitatively analyzed to discuss this approach.
조회 수 3907 댓글 0
|Seong-Heum Kim, Gyeongmin Choe, Byungtae Ahn, In So Kweon
|IEEE International Conference on Robotics and Automation (ICRA)
|This research was supported by the Ministry of Trade, Industry & Energy and the Korea Evaluation Institute of Industrial Technology (KEIT) with the program number of "10060110". The first author sincerely appreciates Prof. Sung-eui Yoon at KAIST for valuable discussions.