TUM Logo

A Scalable Framework to Automate Advanced Malware Analytics

The amount of malware invading our networks and personal devices is growing exponentially. In response, the security community is shifting away from having specialized security analysts manually dissecting samples to automated triage systems. These systems automatically analyze samples for categorizing similar samples and identify when a complex sample needs human intervention. The automated part of this process is increasingly becoming more complex due to the rise in sophistication of machine learning and graph analysis. This new paradigm shift poses new challenges for how to effectively manage and automate these new analytic methods. In this work, we present a scalable framework for our in-house triage platform, Holmes Processing, that orchestrates the execution of multiple machine learning frameworks and algorithms that support for micro-batch and stream processing. Our framework introduces a way to encapsulate analytic services into small packages that can be easily deployed and redistributed between different installations without the need for manual configuration. The planner can efficiently schedule the repetitive and one-off execution of jobs as well as chain multiple services together to build powerful algorithms on the fly. We evaluate the scalability and reliability of our concept leveraging a large-scale deployment with multiple terabytes of malware samples.

A Scalable Framework to Automate Advanced Malware Analytics

Supervisor(s): Thomas Kittel
Status: finished
Topic: Anomaly Detection
Author: Christan von Pentz
Submission: 2017-12-15
Type of Thesis: Masterthesis
Proof of Concept No

Astract:

The amount of malware invading our networks and personal devices is growing exponentially. In response, the security community is shifting away from having specialized security analysts manually dissecting samples to automated triage systems. These systems automatically analyze samples for categorizing similar samples and identify when a complex sample needs human intervention. The automated part of this process is increasingly becoming more complex due to the rise in sophistication of machine learning and graph analysis. This new paradigm shift poses new challenges for how to effectively manage and automate these new analytic methods. In this work, we present a scalable framework for our in-house triage platform, Holmes Processing, that orchestrates the execution of multiple machine learning frameworks and algorithms that support for micro-batch and stream processing. Our framework introduces a way to encapsulate analytic services into small packages that can be easily deployed and redistributed between different installations without the need for manual configuration. The planner can efficiently schedule the repetitive and one-off execution of jobs as well as chain multiple services together to build powerful algorithms on the fly. We evaluate the scalability and reliability of our concept leveraging a large-scale deployment with multiple terabytes of malware samples.