Weight-based multi-stream model for Multi-Modal Video Question Answering

Mohith Rajesh; Sanjiv Sridhar; Chinmay Kulkarni; Aaditya Shah; Natarajan S

doi:10.32473/flairs.36.133306

Auteurs-es

Mohith Rajesh PES University https://orcid.org/0000-0002-3621-4946
Sanjiv Sridhar PES University https://orcid.org/0000-0002-4080-5191
Chinmay Kulkarni PES University https://orcid.org/0000-0003-2935-9861
Aaditya Shah PES University
Natarajan S PES University https://orcid.org/0000-0002-8689-5137

DOI :

https://doi.org/10.32473/flairs.36.133306

Mots-clés :

Video Question Answering, Attention Mechanism, Computer Vision, Natural language Processing, Neural Networks, Pretrained models, Transfer Learning, Weight-based multi-stream model, TVQA Dataset, CLIP, Vision transformers, DeBERTa, Multimedia, Multi-modal

Résumé

There has been a tremendous success in individual domains of Computer Vision, Natural Language Processing, and Knowledge Representation. Videos are a rich source of information with the multi-modal data forms of images, audio, and optionally subtitles blended. Current research is going on in combining these individual domains which have given rise to topics such as image captioning, visual question answering, and video question answering. Video Question Answering is a model which combines research topics like object detection and recognition, temporal information processing, visual attention, and natural language processing.
In this paper, we propose a model with Attention Mechanism for Video Question Answering that assigns varying weights to the many pieces of information the video encompasses. The model combines the question with 3 streams i.e., video's frames, subtitles, and objects to get the most probable answer. The model also receives the set of answer candidates as input and predicts one of them as the most probable answer since it has been trained and tested on the TVQA dataset.

Weight-based multi-stream model for Multi-Modal Video Question Answering

Auteurs-es

DOI :

Mots-clés :

Résumé

Téléchargements

Publié-e

Comment citer

Numéro

Rubrique

Licence

Développé par

Faire une soumission

Langue