We propose a novel video object segmentation algorithm
based on pixel-level matching using Convolutional Neural
Networks (CNN). Our network aims to measure pixel-wise
similarity between two objects and discriminate the area
of a target object from the similarity. The proposed network
represents a target object using the features from different
depth layers at the same time to take advantages
of both spatial details and category-level semantic information.
Furthermore, we propose a feature compression
technique that drastically reduces the memory requirement
while maintaining the feature representation capability. A
two-stage training (pre-training and fine-tuning) allows our
network to handle any target object regardless its category
(even if the object’s type does not belong to the pretraining
data) or its appearance variations through a video
sequence. Experiments on large datasets demonstrate the
effectiveness of our model comparing to the related methods
in terms of accuracy, speed, and stability. Finally, we
introduce the transferability of our network to the different
domains such as infrared data.
based on pixel-level matching using Convolutional Neural
Networks (CNN). Our network aims to measure pixel-wise
similarity between two objects and discriminate the area
of a target object from the similarity. The proposed network
represents a target object using the features from different
depth layers at the same time to take advantages
of both spatial details and category-level semantic information.
Furthermore, we propose a feature compression
technique that drastically reduces the memory requirement
while maintaining the feature representation capability. A
two-stage training (pre-training and fine-tuning) allows our
network to handle any target object regardless its category
(even if the object’s type does not belong to the pretraining
data) or its appearance variations through a video
sequence. Experiments on large datasets demonstrate the
effectiveness of our model comparing to the related methods
in terms of accuracy, speed, and stability. Finally, we
introduce the transferability of our network to the different
domains such as infrared data.