Abstract: Scene graph generation aims to detect objects and their relations in images, providing structured representations for scene understanding. Currently, mainstream approaches first detect the ...
Abstract: Question answering requiring numerical reasoning, which generally involves symbolic operations such as sorting, counting, and addition, is a challenging task. To address such a problem, ...