| DescriptionTraditional and deep learning methods have been widely used for detecting softwarevulnerabilities, but they often struggle to provide consistent and reliable automated
 solutions. Recent advancements in large language models (LLMs) have demonstrated remarkable
 capabilities in understanding complex patterns within both natural language
 and code. This thesis explores the potential of fine-tuning state-of-the-art open-source
 LLMs for the specific task of vulnerability detection.
 A major challenge in training models for this purpose is the limited availability of
 high-quality, large-scale datasets. To address this, we explore the construction of an
 extensive and well-curated dataset by combining existing real-world and synthetic
 datasets. Through careful selection, preprocessing, merging, and additional cleaning,
 we created a dataset designed to better support effective training and evaluation of
 LLMs for vulnerability detection.
 We evaluate the effectiveness of fine-tuned LLMs in detecting vulnerabilities in
 software code, comparing models of different sizes and architectures. Furthermore,
 we analyze the impact of dataset composition, examining how class balance and data
 complexity affect model performance.
 Our results indicate that while fine-tuned LLMs can learn certain patterns and
 identify some vulnerabilities with high confidence, distinguishing more complex cases
 remains a challenge, particularly when subtle code modifications are crucial for identifying
 vulnerabilities. Additionally, we observe that model size alone is not a determining
 factor for improved performance in vulnerability detection, highlighting the importance
 of dataset size and quality for task-specific fine-tuning.
 |