TUM Logo

DINOgraph: Towards Enhancing Robustness through Panoptic Scene Graphs

DINOgraph: Towards Enhancing Robustness through Panoptic Scene Graphs

Supervisor(s): Wei Herng Choong, Chingyu Kao
Status: finished
Topic: Others
Author: Houcemeddine Ben Ayed
Submission: 2025-09-10
Type of Thesis: Masterthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching

Description

While panoptic scene graph generation (PSG) aims to provide structured scene understanding,
relation prediction remains prone to errors, as minor perturbations in the input can lead to
inconsistent results. This thesis investigates whether a two-stage PSG pipeline can enhance
relation recall compared to existing baselines and whether structured outputs can help
with adversarial robustness. We present DINOgraph, which combines Mask DINO’s panoptic
segmentation with a VCTree head that fuses mask- and box-level features to improve predicate
prediction. We also introduce a proof-of-concept LLM auditing pipeline that identifies scene
graph inconsistencies caused by a simulated label flip on a single mask. In our experiments,
DINOgraph improves relation recall over two-stage baselines such as VCTree by ∼ 30% and
outperforms one-stage models such as PSGFormer (by ∼ 50%). The auditing pipeline is able
to detect clear contradictions. However, it exhibits non-negligible false positive and false
negative rates and remains sensitive to calibration. Consequently, we currently treat it as a
proof of concept rather than a deployable system. In this work, we propose an approach to
overcome these limitations and advance the pipeline toward deployability.
Keywords: panoptic segmentation, panoptic scene graph generation, adversarial robustness,
adversarial patch attacks, large language models, LLM-auditing.