Question Answer Design with Text Box HTML

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

Abstract: We introduce EgoTextVQA, a novel and rigorously constructed benchmark for egocentric QA assistance involving scene text. EgoTextVQA contains 1.5K ego-view videos and 7K scene-text aware ...

IEEE

Weakly-Supervised 3D Spatial Reasoning for Text-Based Visual Question Answering

Abstract: Text-based Visual Question Answering (TextVQA) aims to produce correct answers for given questions about the images with multiple scene texts. In most cases, the texts naturally attach to ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

Weakly-Supervised 3D Spatial Reasoning for Text-Based Visual Question Answering

Trending now