TUM Logo

Neural Network-based Malware Classification

Malware variants are continuously evolving in terms of severity level, sophistication and malicious activity. In this thesis, we propose a deep neural network-based approach that can automatically classify malware variants into their respective families. We present a classification framework that extracts static features from malware executables. These features consist of executable metadata and import features as well as assembly opcode features. The framework adopts a deep neural network classifier which combines two network architectures: fully connected and convolutional. The fully connected network learns suspicious aspects inherent in a particular executable, while the convolutional network captures the semantics of the instruction usage patterns upon which this executable relies. The convolutional architecture treats the executable code as a sequence of opcodes. It better captures context features and generates highly non-linear code semantic features that are invariant to reordering of instructions in the executable code. Our combined approach was deemed very promising and achieved a high F1- score value of 0.92 in several experiments we ran over a data set of 22,694 malware executables. Furthermore, we compare our approach against a reference support vector machine classifier whereby the performance gain was measured.

Neural Network-based Malware Classification

Supervisor(s): Bojan Kolosnjaji
Status: finished
Topic: Anomaly Detection
Author: Ghadir Eraisha
Submission: 2016-05-13
Type of Thesis: Masterthesis
Proof of Concept No

Astract:

Malware variants are continuously evolving in terms of severity level, sophistication and malicious activity. In this thesis, we propose a deep neural network-based approach that can automatically classify malware variants into their respective families. We present a classification framework that extracts static features from malware executables. These features consist of executable metadata and import features as well as assembly opcode features. The framework adopts a deep neural network classifier which combines two network architectures: fully connected and convolutional. The fully connected network learns suspicious aspects inherent in a particular executable, while the convolutional network captures the semantics of the instruction usage patterns upon which this executable relies. The convolutional architecture treats the executable code as a sequence of opcodes. It better captures context features and generates highly non-linear code semantic features that are invariant to reordering of instructions in the executable code. Our combined approach was deemed very promising and achieved a high F1- score value of 0.92 in several experiments we ran over a data set of 22,694 malware executables. Furthermore, we compare our approach against a reference support vector machine classifier whereby the performance gain was measured.