TUM Logo

Malware Detection with hybrid Control Flow Graph and Graph Embedding

Malicious software, widely-known as malware, is one of the biggest threats of our interconnected society. Cyber-criminals can utilize malware to carry out their nefarious tasks. To address this issue, malware analysts develop detection systems that can detect and prevent malware from successfully infecting a machine. Unfortunately, these systems come with two significant limitations. First, they frequently target one specific platform/architecture, and thus, they cannot be ubiquitous. Second, code obfuscation techniques used by malware authors can negatively influence their performance. In this paper, we design and implement a control flow graph based multi-platform malware detection system to tackle the problems mentioned above. In more detail, we utilize a graph neural network method to convert the control flow graphs of executables to vectors and then use a machine learning-based classifier to create a malware detection system. We evaluate our framework by testing real samples on multi-platforms, including Linux (x86, x64, and ARM-32) and Windows (x86 and x64). Our results outperform most of the existing works with accuracy 96.8% on Linux and 93.9% on Windows. To the best of our knowledge, our work is the first to consider graph neural networks in the malware detection field.

Malware Detection with hybrid Control Flow Graph and Graph Embedding

Supervisor(s): Peng Xu
Status: finished
Topic: Anomaly Detection
Author: Youyi Zhang
Submission: 2019-10-15
Type of Thesis: Masterthesis
Proof of Concept useful

Astract:

Malicious software, widely-known as malware, is one of the biggest threats of our interconnected society. Cyber-criminals can utilize malware to carry out their nefarious tasks. To address this issue, malware analysts develop detection systems that can detect and prevent malware from successfully infecting a machine. Unfortunately, these systems come with two significant limitations. First, they frequently target one specific platform/architecture, and thus, they cannot be ubiquitous. Second, code obfuscation techniques used by malware authors can negatively influence their performance. In this paper, we design and implement a control flow graph based multi-platform malware detection system to tackle the problems mentioned above. In more detail, we utilize a graph neural network method to convert the control flow graphs of executables to vectors and then use a machine learning-based classifier to create a malware detection system. We evaluate our framework by testing real samples on multi-platforms, including Linux (x86, x64, and ARM-32) and Windows (x86 and x64). Our results outperform most of the existing works with accuracy 96.8% on Linux and 93.9% on Windows. To the best of our knowledge, our work is the first to consider graph neural networks in the malware detection field.