عنوان انگلیسی مقاله:
Visual question answering model based on graph neural network and contextual attention
ترجمه فارسی عنوان مقاله:
مدل پاسخگویی به سوالات بصری بر اساس شبکه عصبی گراف و توجه متنی
Sciencedirect - Elsevier - Image and Vision Computing, 110 (2021) 104165: doi:10:1016/j:imavis:2021:104165
Visual Question Answering (VQA) has recently appeared as a hot research area in the ﬁeld of computer vision and natural language processing. A VQA model uses both image and question features and fuses them to predict an answer for a given natural question related to an image. However, most VQA approaches using attention mechanism mainly concentrate on extraction of visual information from regions of interests for answer prediction and ignore the relation between the regions of interests together with the reasoning among these regions. Apart from this limitation, VQA approaches also ignore the regions which are previously attended for answer generation. These regions which are attended in past can guide the selection of the subsequent regions of attention. In this paper, a novel VQA model is presented and formulated that utilizes this relationship between the regions and employs visual context based attention that takes into account the previously attended visual content. Experimental results demonstrate that the proposed VQA model boosts the accuracy of answer prediction on publically available datasets VQA 1.0 and VQA 2.0.© 2021 Elsevier B.V. All rights reserved.
Keywords: Visual question answering | Computer vision | Natural language processing | Attention