作者:Liang Chen, Wang Wenguan, Yang Yi;等
来源:IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 卷:45 期:8 页:10055 - 10069 出版时间:AUG 2023
In this article, a local-global context aware Transformer (Locater) is devised to capture both short- and long-term context and encourage visual-linguistic alignment in language-guided video segmentation. By incorporating an extra memory into the Transformer architecture, Locater persistently preserves global video content while dynamically gathering local temporal context and segmentation history. Locater won 1st place in Referring Video Object Segmentation Track of 3rd Large-scale Video Object Segmentation Challenge at CVPR 2021 and achieved state-of-the-art performance on three public datasets.