TUM Logo

Smart Mutations for Library Fuzzing

Smart Mutations for Library Fuzzing

Supervisor(s): Fabian Kilger
Status: finished
Topic: Others
Author: Alexandru Sasu
Submission: 2023-04-17
Type of Thesis: Masterthesis

Description

Library fuzzing is crucial for software security, as a vulnerable code library compromises

all the projects that use it. Unfortunately, the process of fuzzing spends a lot of time on

unpromising inputs and the specific use of mutations in the context of library fuzzing

has been overlooked by the research community. In this thesis, we propose a novel

system that utilizes neural networks to filter mutations during the fuzzing process to

enhance the speed of new path discovery.

New path discovery is an important part of mutation-based fuzzing, and the way

mutations are selected heavily impacts its speed and efficacy. While the current methods

for creating and choosing mutations show good results, we suggest that the speed of

path discovery can be increased through filtering.

We are training a neural network to predict which mutations will not be useful in

the long term, so we can discard them. We developed two different approaches for our

design, one in which the model returns a heatmap of mutation relevance for every seed

that can be compared with mutations, and one in which the model predicts how many

mutations that increase coverage will be generated if we save the current seed. We also

suggested multiple neural network architectures including long short-term memory

and convolutional neural networks. We also modified RULF, a fuzz-target generator

for Rust libraries, to change the structure of the fuzz-target input, so that it is more

structured which would make it easier for the neural networks to learn. Furthermore,

we modified AFL++ to query our neural networks before saving a seed, to decide if it

should do so.

We evaluated both versions of our proposed system by running the modified fuzzers

and the unmodified fuzzer on a set of five Rust libraries and compared the results. We

showed that while both approaches fall short for three out of the five targets, and for

one of the targets the results are inconclusive, new path discovery speed was increased

for the library url in both approaches. From the total number of approximately 6800

edges found after 4 hours of testing with all 3 fuzzers, on average, the unmodified

fuzzer reached 6700 after 53 minutes, the bitmask fuzzer after 31 minutes, and the

coverage prediction fuzzer after 27 minutes. This shows that the performance of the

system is highly dependent on factors such as the target library and its structure.

Through our results, we highlight the potential for improving the efficiency of library

fuzzing through the proposed system.