Dynamic Malware Analysis with Reinforcement Learning

Dynamic Malware Analysis with Reinforcement Learning

Supervisor(s): Daniel Kowatsch, Konstantin Böttinger
Status: finished
Topic: Others
Author: Achraf Flah
Submission: 2026-01-15
Type of Thesis: Masterthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching

Description

Malware, which stands for malicious software, represents one of the biggest threats to
the digital world. It can disrupt services, steal data, and manipulate systems. The task
of malware analysis includes the investigation of the way malware works, to design
better defense mechanisms against it, or to analyze incidents during digital forensics.
As malware becomes increasingly complex due to technological advancements, so does
its analysis. Security researchers need to design more automated analysis and defense
technologies to face the threats of malware.
On the other hand, Machine Learning has been successfully applied to solve different
complex tasks. More specifically, Reinforcement Learning is a paradigm of Machine
Learning in which an agent interacts with an environment using a set of actions to
reach a specific goal. This concept allows it to be applied to the task of dynamic
malware analysis, as agents interact with malware during its execution to discover its
behavior and capabilities. Although already applied to different areas of cyber security,
Reinforcement Learning has not been extensively applied to malware analysis. Most of
the existing literature uses it as an adversary to spot weaknesses in other analysis tools.
In this thesis, we build upon recent research conducted as a Proof of Concept to
dynamically analyze evasive malware using Reinforcement Learning. Evasive malware
is a type of malware that is capable of hindering its analysis by hiding its malicious
intent. In effect, we first provide a review and an analysis of the application of
Reinforcement Learning methods to dynamic malware analysis. Afterwards, we review
and implement the work proposed by the mentioned research. In the next step, we
propose and implement improvements based on the latest Machine Learning and
Natural Language Processing techniques to make it capable of working with more
realistic malware samples. Finally, we propose a multi-agent Reinforcement Learning
setup for dynamic analysis of malware.
Our work shows that Reinforcement Learning is a promising technique that can
aid human experts during dynamic analysis of malware, provided that the required
resources are available. We also prove that a combination of the latest advancements in
Machine Learning, and especially Natural Language Processing, improves the efficacy
and efficiency of Reinforcement Learning to solve this task.