TUM Logo

Machine Code Obfuscation via Instruction Set Reduction and Control Flow Graph Linearization: Analysis and Countermeasures

Obfuscation is widely used to hide sensitive data of software systems in scenarios where an analyst has full access to a software system. It trans- forms a program to become harder to understand and reverse engineer while preserving the semantics of the original program.Current obfuscation methods try to achieve this by adding complexity to the control flow. Recently, a new approach to obfuscation has been proposed working in the exact opposite way: The control flow of a protected program is linearized, leaving only one continuous stream of machine code instructions containing the entire program. To further strengthen the code against analysis, all instructions of the original program are replaced by a single instruction.In this Bachelor’s Thesis, I will analyze the functionality of the proposed techniques and its applicability as an obfuscation method.The thesis also presents an approach that allows for a complete recovery of the control flow of the original program utilizing a modified version of taint analysis in combination with a satisfiable modulo theory (SMT) solver.We evaluate both, obfuscation and deobfuscation in context of applicability to real-world scenarios. We find that the time and space penalty introduced by this obfuscation method in its current form, add significant overhead to the program. Finally we evaluate our algorithm on several artificial programs to recover the original control flow and show that recovery is possible in most cases.

Machine Code Obfuscation via Instruction Set Reduction and Control Flow Graph Linearization: Analysis and Countermeasures

Supervisor(s): Julian Kirsch
Status: finished
Topic: Software testing
Author: Clemens Jonischkeit
Submission: 2016-03-15
Type of Thesis: Bachelorthesis
Proof of Concept No

Astract:

Obfuscation is widely used to hide sensitive data of software systems in scenarios where an analyst has full access to a software system. It trans- forms a program to become harder to understand and reverse engineer while preserving the semantics of the original program.Current obfuscation methods try to achieve this by adding complexity to the control flow. Recently, a new approach to obfuscation has been proposed working in the exact opposite way: The control flow of a protected program is linearized, leaving only one continuous stream of machine code instructions containing the entire program. To further strengthen the code against analysis, all instructions of the original program are replaced by a single instruction.In this Bachelor’s Thesis, I will analyze the functionality of the proposed techniques and its applicability as an obfuscation method.The thesis also presents an approach that allows for a complete recovery of the control flow of the original program utilizing a modified version of taint analysis in combination with a satisfiable modulo theory (SMT) solver.We evaluate both, obfuscation and deobfuscation in context of applicability to real-world scenarios. We find that the time and space penalty introduced by this obfuscation method in its current form, add significant overhead to the program. Finally we evaluate our algorithm on several artificial programs to recover the original control flow and show that recovery is possible in most cases.