TUM Logo

Similarity-Based Deduplication of Fuzzing Crashes

Similarity-Based Deduplication of Fuzzing Crashes

Supervisor(s): Vincent Ahlrichs, Dr. Julian Horsch
Status: finished
Topic: Others
Author: Ridvan Acilan
Submission: 2022-11-15
Type of Thesis: Masterthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching

Description

Fuzzing has proven itself as a practical and effective software testing method. It reveals bugs otherwise
hidden from the human eye by generating random inputs and running the target program. An input that causes
a crash is called a Proof-of-Concept test case (PoC).
The PoCs are the output and are saved for developers.
A fuzzing campaign can often lead to 1000s of PoCs, overwhelming developers, although fuzzers incorporate
deduplication techniques.
Commonly incorporated deduplication techniques rely on either coverage profiles or stack hashes.
This thesis focuses on the stack hashing technique, which overestimates bug counts by 1-2 orders of magnitude
less than coverage profile techniques.
We develop a novel, highly scalable approach to deduplicate fuzzing crashes by measuring stack trace similarity
using similarity digests and a two-phase clustering process to accurately estimate the actual bug count of a PoC set.
We use a ground-truth benchmark with 340.000 PoCs from 14 real-world targets to evaluate our approach.
We show that our approach performs better than the state-of-the-art solutions for crash deduplication with negligible
computational cost.