George Tom
High level understanding of videos; VQA over images with text — DocVQA, InfographicVQA, etc.
Undergrad and MS students whom I co-mentored with Prof. C.V. Jawahar.
High level understanding of videos; VQA over images with text — DocVQA, InfographicVQA, etc.
Video VQA — text-based video question answering and understanding video scenes through text.
Medical VQA
Road Text — recognition of text on roads and its application to navigation / autonomous driving.
Handwritten text recognition
Unconstrained scene text recognition in a seq2seq framework
Scene text detection and recognition
RNN + CTC on GPU
DAISY Audio Book Library and Playback — web and desktop apps