TUM Logo

Vulnerability Detection in Source Code using Graph Neural Networks on Code Property Graphs

Vulnerability Detection in Source Code using Graph Neural Networks on Code Property Graphs

Supervisor(s): Daniel Kowatsch, Tobias Specht
Status: finished
Topic: Others
Author: Philip Haitzer
Submission: 2025-04-14
Type of Thesis: Masterthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching

Description

This thesis introduces a modular Graph Neural Network framework designed for C/C++
vulnerability detection using Code Property Graphs, supporting architectures such as Graph-
SAGE and GAT. Utilizing self-supervised pre-training with feature masking on a diverse
Debian codebase, the framework is subsequently fine-tuned for node-level classification of
CWE-457 (Use of Uninitialized Variable) on the Juliet test suite. Our evaluation addresses
the effectiveness of the GNN approach and the specific benefit derived from pre-training.
Experiments with GraphSAGE demonstrate considerable effectiveness in detecting CWE-457
patterns. Notably, pre-training led to improved performance on the test set, primarily by
enhancing classifier precision with minimal impact on recall, resulting in a higher overall
F1-score. While qualitative analysis reveals that pre-training fosters more structured embeddings,
it also highlights increased sensitivity to code context. The findings affirm the viability
of GNNs on CPGs for vulnerability detection and demonstrate a clear, positive impact of the
employed pre-training strategy for this specific downstream task.