Description
This thesis introduces a modular Graph Neural Network framework designed for C/C++ vulnerability detection using Code Property Graphs, supporting architectures such as Graph- SAGE and GAT. Utilizing self-supervised pre-training with feature masking on a diverse Debian codebase, the framework is subsequently fine-tuned for node-level classification of CWE-457 (Use of Uninitialized Variable) on the Juliet test suite. Our evaluation addresses the effectiveness of the GNN approach and the specific benefit derived from pre-training. Experiments with GraphSAGE demonstrate considerable effectiveness in detecting CWE-457 patterns. Notably, pre-training led to improved performance on the test set, primarily by enhancing classifier precision with minimal impact on recall, resulting in a higher overall F1-score. While qualitative analysis reveals that pre-training fosters more structured embeddings, it also highlights increased sensitivity to code context. The findings affirm the viability of GNNs on CPGs for vulnerability detection and demonstrate a clear, positive impact of the employed pre-training strategy for this specific downstream task.
|