2021-04-12ViLBERT:Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks内容模态 / 多模态