News

We analyze the pros and cons of each module based on the unique characteristics of the data. To the best of our knowledge, we present the first exploration of combining Swin Transformer and ...
Each MEDA layer contains an Encoder module modeling the self-attention of questions, as well as a Decoder module modeling the question-guided-attention and self-attention of images. Experimental ...