TUM Logo

Code Virtualization for Software Protection: Analysis and Countermeasures

Code Virtualization for Software Protection: Analysis and Countermeasures

Supervisor(s): Julian Kirsch Clemens Jonischkeit
Status: finished
Topic: Others
Author: Ludwig Peuckert
Submission: 2020-03-16
Type of Thesis: Masterthesis


Code-Virtualization for Software Protection: Analysis and Countermeasures

March 16, 2020

The number of newly detected malware every year is immense. In particular, industry and mobile devices are targets of interest. To keep their malicious functionality under the radar of anti virus software, malware developers obfuscate critical code sections. In the recent years a technique called code virtualization gained popularity. The often short malicious code is translated into a different instruction sect architecture. At runtime the virtual code is interpreted by a virtual machine attached to the original program.

Code virtualization is a challenging obfuscation and few approaches for re- verse engineering exist. The translation to an unknown instruction sect archi- tecture is effectively close to an encryption. Apart from that, interpretation introduces a large overhead. The combination of those facts renders many de- obfuscation approaches ineffective.

Approaches like symbolic execution and taint analysis heavily rely on input size. Therefore, detection of virtualization boundaries is a key point during deobfuscation. Many approaches assume the virtual boundaries as known be- forehand. However, this is a non trivial task.

In this work, we present a new approach to identify virtualized sections. We exploit characteristic patterns in memory access, register usage, and jump structure. However, different virtual machines show different patterns, render- ing pattern matching with known patterns ineffective. To overcome this, we introduce a modified version of autocorrelation, capable of detecting recurring patterns of any shape. With the boundaries detected, we identify the virtual program counter to filter false positives and fulfill the assumption of other approaches.

We implement our approach as an overestimation to minimize false negatives. We test our approach on unseen virtual machines and are able to detect most of their virtualized sections. An evaluation on false positives shows, that the original trace can be reduced by up to 99% in many cases.