TUM Logo

Deep Learning for Classification of Malware System Call Sequences

The increase in number and variety of malware samples amplifies the need for improvement in automatic detection and classification of the malware variants. Machine learning is a natural choice to cope with this increase, because it addresses the need of discovering underlying patterns in large-scale datasets. Nowadays, neural network methodology has been grown to the state that can surpass limitations of previous machine learning methods, such as Hidden Markov Models and Support Vector Machines. As a consequence, neural networks can now offer superior classification accuracy in many domains, such as computer vision or natural language processing. This improvement comes from the possibility of constructing neural networks with a higher number of potentially diverse layers and is known as Deep Learning.In this paper, we attempt to transfer these performance improvements to model the malware system call sequences for the purpose of malware classification. We construct a neural network based on convolutional and recurrent network layers in order to obtain the best features for classification. This way we get a hierarchical feature extraction architecture that combines convolution of n-grams with full sequential modeling. Our evaluation results demonstrate that our approach outperforms previously used methods in malware classification, being able to achieve an average of 85.6% on precision and 89.4% on recall using this combined neural network architecture.

Deep Learning for Classification of Malware System Call Sequences

29th Australasian Joint Conference on Artificial Intelligence (AI)

Authors: Bojan Kolosnjaji, Apostolis Zarras, George Webster, and Claudia Eckert
Year/month: 2016/12
Booktitle: 29th Australasian Joint Conference on Artificial Intelligence (AI)
Fulltext: deeplearning.pdf

Abstract

The increase in number and variety of malware samples amplifies the need for improvement in automatic detection and classification of the malware variants. Machine learning is a natural choice to cope with this increase, because it addresses the need of discovering underlying patterns in large-scale datasets. Nowadays, neural network methodology has been grown to the state that can surpass limitations of previous machine learning methods, such as Hidden Markov Models and Support Vector Machines. As a consequence, neural networks can now offer superior classification accuracy in many domains, such as computer vision or natural language processing. This improvement comes from the possibility of constructing neural networks with a higher number of potentially diverse layers and is known as Deep Learning.In this paper, we attempt to transfer these performance improvements to model the malware system call sequences for the purpose of malware classification. We construct a neural network based on convolutional and recurrent network layers in order to obtain the best features for classification. This way we get a hierarchical feature extraction architecture that combines convolution of n-grams with full sequential modeling. Our evaluation results demonstrate that our approach outperforms previously used methods in malware classification, being able to achieve an average of 85.6% on precision and 89.4% on recall using this combined neural network architecture.

Bibtex:

@inproceedings { kolosnjaji2016deep,
author = { Bojan Kolosnjaji and Apostolis Zarras and George Webster and Claudia Eckert},
title = { Deep Learning for Classification of Malware System Call Sequences },
year = { 2016 },
month = { December },
booktitle = { 29th Australasian Joint Conference on Artificial Intelligence (AI) },
url = {https://www.sec.in.tum.de/i20/publications/deep-learning-for-classification-of-malware-system-call-sequences/@@download/file/deeplearning.pdf}
}