Scene Graph Question Answering (SGQA)
Reasoning over scene graphs to answer a given question
This task involves reasoning over scene graphs to answer a given question. Formally, given a question and scene graphs \( G=(V, E) \), the model must predict an answer, which corresponds to an element in \( V \). As shown in Figure, questions in our benchmark require logically or temporally connecting a sequence of actions or object state changes, which can be solved by hopping across multiple triplets.