Robotics and Computer Vision Lab

Per-Clip Video Object Segmentation [CVPR 2022]

- The paper introduces a novel per-clip inference scheme for semi-supervised video object segmentation,
moving away from the traditional per-frame processing to achieve clip-wise optimization and parallel frame processing, which methodologically advances the field.

- It employs an intra-clip refinement module and a progressive matching mechanism, enhancing feature refinement and information
passing efficiency within clips, demonstrating a significant methodological innovation.

- Practically, the approach sets new state-of-the-art performances on Youtube-VOS and DAVIS benchmarks, showcasing superior accuracy and efficiency,
along with offering flexibility in speed-accuracy trade-offs, making it highly relevant for real-world applications.

Large-vocabulary Video Object Detection [ECCV 2022]

- The paper introduces a simple and effective learning framework that combines detection and tracking learning for large vocabulary video object detection,
effectively addressing the challenge of sparse supervisions in LVIS and TAO datasets through missing supervision hallucination.

- Methodologically, it innovates with spatial jittering methods (strong zoom-in/out and mosaicing augmentation) and a teacher-student framework
to simulate tracking supervisions and prevent catastrophic forgetting, showcasing superior handling of large vocabulary tracking challenges.

- Practically, the framework sets new state-of-the-art results on the TAO benchmark, demonstrating its effectiveness and potential as a strong baseline
for future research in large vocabulary video object detection and tracking.

Tracking by Associating Clips [ECCV 2022]

- The paper introduces a novel clip-wise matching methodology for multi-object tracking, shifting from traditional frame-wise tracking approaches
to treating a video as a series of short clips, thereby enhancing robustness against interruptions and leveraging temporal information more effectively.

- Methodologically, it addresses the limitations of error accumulation and lack of temporal context in frame-wise tracking by proposing intra-clip and inter-clip tracking,
significantly improving long-range track association accuracy through aggregation of multiple frame information.

- Practically, the clip-wise matching approach demonstrates superior tracking performance on the challenging TAO and MOT17 benchmarks,
showcasing its effectiveness and generality across different scenarios, making it a valuable advancement in multi-object tracking applications.

In-the-wild Matting [CVPR 2023]

- The paper introduces a novel learning framework for mask-guided matting that significantly enhances the ability to extract detailed foreground objects from complex backgrounds
using easily obtainable coarse masks, demonstrating methodological superiority by incorporating instance-wise learning and leveraging weak supervision datasets.

- Through extensive experiments and the development of a new evaluation benchmark, Composition-Wild, the method is shown to outperform existing approaches
in handling a wide range of object categories in real-world scenarios, showcasing its practical effectiveness.

- The proposed approach enables new applications such as video matting and panoptic matting, highlighting its practical versatility and potential
to reduce manual effort in various image and video editing tasks, thereby advancing the field of mask-guided matting.

Related publications

1. Per-Clip Video Object Segmentation

Kwanyong Park, Sanghyun Woo, Seoung Wug Oh, In So Kweon, Joon-Young Lee

CVPR 2022

2. A Unified Learning Framework for Large Vocabulary Video Object Detection

Sanghyun Woo, Kwanyong Park, Seoung Wug Oh, In So Kweon, Joon-Young Lee

ECCV 2022

3. Tracking by Associating Clips

Sanghyun Woo, Kwanyong Park, Seoung Wug Oh, In So Kweon, Joon-Young Lee

ECCV 2022

4. Mask-guided Matting in the Wild

Kwanyong Park, Sanghyun Woo, Seoung Wug Oh, In So Kweon, Joon-Young Lee

CVPR 2023

Research Area

Pixel-level Video Recognition and Understanding Projects

단축키

단축키

Pixel-level Video Recognition and Understanding Projects

MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark

Visual Perception for Autonomous Driving in Day and Night

Megacity Modeling

Intelligent Robot Vision System Research Center (Localization technology development Team)

i3D: Interactive Full 3D Service (Foreground/Background Segmentation Part)

National Research Lab: Development of Robust Vision Technology for Object Recognition and Shape Modeling of Intelligent Robot

Intelligent Robot Vision System Research Center (Detection Tracking Research Team)