News

In this paper, a novel Multimodal Encoder-Decoder Attention Networks (MEDAN) is proposed. The MEDAN consists of Multimodal Encoder-Decoder Attention (MEDA) layers cascaded in depth, and can capture ...
To the best of our knowledge, we present the first exploration of combining Swin Transformer and convolution in both the encoder and decoder stages. Through comprehensive comparative analysis, we ...