Local-Global Context Aware Transformer for Language-Guided Video Segmentation

2024-06-13 Vistors:10

作者：Liang Chen, Wang Wenguan, Yang Yi；等

来源：IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 卷:45 期:8 页:10055 - 10069 出版时间:AUG 2023

In this article, a local-global context aware Transformer (Locater) is devised to capture both short- and long-term context and encourage visual-linguistic alignment in language-guided video segmentation. By incorporating an extra memory into the Transformer architecture, Locater persistently preserves global video content while dynamically gathering local temporal context and segmentation history. Locater won 1st place in Referring Video Object Segmentation Track of 3rd Large-scale Video Object Segmentation Challenge at CVPR 2021 and achieved state-of-the-art performance on three public datasets.