Extracting Handler Semantics of Virtual-Machine Obfuscated Code with Static Analysis

Extracting Handler Semantics of Virtual-Machine Obfuscated Code with Static Analysis

Supervisor(s): Fabian Kilger
Status: finished
Topic: Others
Author: Johannes Franz Maier
Submission: 2026-04-29
Type of Thesis: Masterthesis

Description

Virtual machine obfuscation protects software and malware by translating source code
into bytecode executed by a custom interpreter. For hardening, obfuscators apply
diversification techniques and generate a random virtual instruction set for every
program. State-of-the-art automated deobfuscation tools like VMSpec utilize dynamic
approaches such as instruction trace analysis or dynamic symbolic execution (DSE).
But they remain vulnerable to path explosion, even when analyzing single instruction
handlers. To remove this limitation, we propose the usage of static analysis to extract
handler semantics from obfuscated binaries. Using for stages, we lift the code to an
intermediate representation, apply common and custom dataflow and control-flow
analyses, replace VM state accesses with variables and generate a decompiler processor
plugin. We implemented our proof of concept with Miasm as analysis framework
and Ghidra as decompiler. We further propose the optimization Plugin Merging to
reduce decompilation times by decompiling multiple virtualized functions in a single
decompiler session. We evaluated our approach for correctness and performance using
55 toy and real-world programs obfuscated with Tigress in six different configurations.
We show our approach can correctly deobfuscate all samples in a reasonable timeframe.
Additionally, we compare the performance to the DSE-based approach VMSpec and
observe an overhead a factor between two and three. Lastly, we examine correctness and
performance on a real-world sample obfuscated using CodeVirtualizer and measure an
overhead by a factor of three. Overall, have we shown that our static analysis approach
can be successfully used for correct handler semantic extraction.