TUM Logo

Semantics-Aware Neural Network-Based Malware Classificaton Model

Semantics-Aware Neural Network-Based Malware Classificaton Model

Supervisor(s): Paul Muntean
Status: finished
Topic: Machine Learning Methods
Author: Gehrig Matthias
Submission: 2018-09-17
Type of Thesis: Masterthesis

Description

The absolute number of malware per year grows exponentially.
Traditional antimalware tools have a hard time to keep up with this rapid development since they are
mainly based on detection via signatures. Therefore the industry desires automatic
systems capable of large-scale malware analysis. In this thesis, the author proposes a
classification approach based on convolutional neural networks combined with semantic
features. Its purpose is the classification of malware variants into their respective
families. The ground truth of the malware families is generated via unsupervised
clustering. For this purpose, VirusTotal results are used. The presented framework
mainly utilizes static features extracted from malware binaries. Those features include
common ones like bag of words and n-grams among others. These features are mainly engineered from so-called gadgets. 
Gadgets are short blocks of assembly code instruction sequences. The author furthermore shows that the
addition of semantic-based features derived from control flow graphs improves the neural network
classification results with regard to chosen metrics. The classification performance of
the proposed approach is compared to popular machine learning reference algorithms
such as Gradient Boosting Trees and Support Vector Machines.