TUM Logo

Machine Learning Techniques in Detection of Malicious Web Traffic

This research shows how to apply deep learning techniques to malware classification, specifically to determine whether a web page is malicious. The data types we examine are HTML, JavaScript and CSS - the building blocks of modern web, and also typical tools to deliver malicious code. Various deep learning techniques, including convolutional neural networks (CNN), long-short term memory (LSTM) and combinations of CNN and LSTM are applied and compared to determine which algorithm is the most effective for our sit- uation. We also compare our results to the results of other conventional machine learning methods, such as support vector machines (SVMs) and k-nearest neighbors (k-NN) meth- ods.To maximize the training speed, we use a distributed environment during our training phase, with synchronous updates.Index term : Malware classification, distributed deep learning, convolutional neural net- work, long-short term memory

Machine Learning Techniques in Detection of Malicious Web Traffic

Supervisor(s): Roman Kruszelnicki
Status: finished
Topic: Machine Learning Methods
Author: Ching-Yu Kao
Submission: 2016-08-15
Type of Thesis: Masterthesis
Proof of Concept No
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching

Astract:

This research shows how to apply deep learning techniques to malware classification, specifically to determine whether a web page is malicious. The data types we examine are HTML, JavaScript and CSS - the building blocks of modern web, and also typical tools to deliver malicious code. Various deep learning techniques, including convolutional neural networks (CNN), long-short term memory (LSTM) and combinations of CNN and LSTM are applied and compared to determine which algorithm is the most effective for our sit- uation. We also compare our results to the results of other conventional machine learning methods, such as support vector machines (SVMs) and k-nearest neighbors (k-NN) meth- ods.To maximize the training speed, we use a distributed environment during our training phase, with synchronous updates.Index term : Malware classification, distributed deep learning, convolutional neural net- work, long-short term memory