Abstract: Understanding human interactions is extremely crucial in various applications, including robotics, automated systems, human-computer interaction, and video surveillance. Many studies have ...
Abstract: Text-based Visual Question Answering (TextVQA) focuses on answering questions about the scene text in images. Most works in this field uses transformer based models to modeling the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results