TUM Logo

Analyzing Malware Detection Models with Explainable AI

Analyzing Malware Detection Models with Explainable AI

Supervisor(s): Daniel Kowatsch
Status: finished
Topic: Others
Author: Achraf Flah
Submission: 2024-03-31
Type of Thesis: Guided Research
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching

Description

Malware, short for malicious software, is one of the most harmful threats in the digital world. It can cause huge
damage ranging from Denial of Service (DoS) to stealing sensitive data. Consequently, malware detection is a crucial area in cyber
security. Moreover, malware is usually hard to detect, as malware development and obfuscation techniques get more advanced. On
the other side, Machine Learning (ML) methods have shown a great capability of automating and improving the task of malware
detection. However, ML-based methods should be transparent to security experts as they are being used in a critical area.
Most of these methods, especially deep learning, are known for their opacity. To mitigate this challenge, explainable artificial
intelligence (XAI) methods were developed aiming to interpret ML models.
In this work, we inspect and provide an overview of ML-based methods performing malware detection or classification
in addition to XAI methods that can analyze them. Then, we analyze multiple models covering different model families using
XAI. The analysis provides explanations for the model’s decisions, capturing unusual behavior and uncovering the reasons for the
model’s failures. We show that given an appropriate configuration and depending on the model and its specific use case, XAI
techniques can be employed for model analysis. This analysis facilitates the interpretation of the model’s results, identification
of failure points for debugging, and offers recommendations to enhance its overall performance. It uncovered for instance that
the model uses the compilation date of the binary as an indicator. Furthermore, the analysis can be used to facilitate the task of
malware analysis by highlighting the most interesting parts of the malware, such as the code blocks responsible for malicious
activities.