GQA VDP Dataset

Instructions #

Use the dropdown menu on the left to navigate to a particular puzzle.

Generation Process #

In order to ablate the effect of the neural layer on the performance of the pipeline, we solve these puzzles using FO Scene Models extracted from the ground-truth scene graphs given in the GQA dataset.

Each puzzle is created by choosing a question from the GQA dataset that only has ‘yes’ or ‘no’ answers, by choosing the example images and the intended candidate from the ‘yes’ set, and all other candidates from the ‘no’ set. A random subset of 1000 puzzles are presented here.

Unsolved Puzzles Analysis #

The subset of puzzles in the GQA VDP dataset that were not be solved by our tool are presented under the Unsolved Puzzles menu. These puzzles fall into three major categories:

Multiple labeled object classes in the scene graph.

For example, in a puzzle featuring umbrellas objects from the same category are labeled ‘umbrella’ in some images, and ‘umbrellas’ in others. As we use unmodified scene graphs with no other knowledge, these two labels are treated as different object classes, leading to there being no discriminators that can identify a distinct candidate.
Discriminators that cannot be captured by information in the scene graph.

For example, consider a puzzle where the discriminator is ‘There is a bag in the bottom portion of the image’. Scene graphs are meant to be models of the world that is captured by the image, and therefore they do not contain information about the general location of objects in reference to the image itself. As such, we do not solve these puzzles.
Discriminator not expressible in the FO Scene Logic fragment.

For example, discriminators like ‘There is a dog that is not white’ involve negation, which is not expressible in the guarded conjunctive scene logic.