TUM Logo

Analyzing Binary Similarity Models with Explainable AI

Analyzing Binary Similarity Models with Explainable AI

Supervisor(s): Daniel Kowatsch
Status: finished
Topic: Others
Author: Achraf Flah
Submission: 2022-10-17
Type of Thesis: Bachelorthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching

Description

In recent years, Machine Learning (ML) models have shown their capability of automating
the detection of similar binaries, a crucial cybersecurity task. It has several
real-world usages, ranging from vulnerability detection to tracing malware. However,
as decisions taken by such models are important, they should be explainable to security
experts and end users. ML models, mainly neural networks, are known for their opacity.
Most of their decisions are hard to interpret, which makes their usage in real-world
scenarios unlikely. To mitigate this issue, explainable AI (XAI) methods were developed
aiming to interpret AI models and specify the components that impact their predictions
the most.
In this work, we present different existing AI models that can detect similar binaries
in addition to XAI methods that explain their decisions. Then, we select models and
methods based on defined selection criteria. We use the methods to analyze the models
providing explanations for the model’s decisions, capturing unusual behavior, and
uncovering the reasons for the model’s failures. We also describe how these methods
are to be used and how to interpret their results, depending on the use case.
We show that, with the right setup and depending on the model and its use case, XAI
methods can be used to analyze the model. Such analysis can interpret the outcome
of the model, debug its failures, and provide suggestions to improve its performance.
Moreover, the analysis can help security experts understand which parts of the data
are most important to the model’s output and how the model reacts to its input data.
This can improve the real-world usage and development of such models. Finally, the
results of the analysis can motivate future research in this area by understanding which
components should be altered or improved.