Description
Virtual machine obfuscation protects software and malware by translating source code into bytecode executed by a custom interpreter. For hardening, obfuscators apply diversification techniques and generate a random virtual instruction set for every program. State-of-the-art automated deobfuscation tools like VMSpec utilize dynamic approaches such as instruction trace analysis or dynamic symbolic execution (DSE). But they remain vulnerable to path explosion, even when analyzing single instruction handlers. To remove this limitation, we propose the usage of static analysis to extract handler semantics from obfuscated binaries. Using for stages, we lift the code to an intermediate representation, apply common and custom dataflow and control-flow analyses, replace VM state accesses with variables and generate a decompiler processor plugin. We implemented our proof of concept with Miasm as analysis framework and Ghidra as decompiler. We further propose the optimization Plugin Merging to reduce decompilation times by decompiling multiple virtualized functions in a single decompiler session. We evaluated our approach for correctness and performance using 55 toy and real-world programs obfuscated with Tigress in six different configurations. We show our approach can correctly deobfuscate all samples in a reasonable timeframe. Additionally, we compare the performance to the DSE-based approach VMSpec and observe an overhead a factor between two and three. Lastly, we examine correctness and performance on a real-world sample obfuscated using CodeVirtualizer and measure an overhead by a factor of three. Overall, have we shown that our static analysis approach can be successfully used for correct handler semantic extraction.
|