TUM Logo

Evaluation of LLVM Lifters

LLVM is a powerful compilation and static analysis framework with an intermediate representation (IR) at its core, that is strongly suited for code analysis and optimization. LLVM lifters are tools that produce IR from an executable binary, effectively reversing the compilation process LLVM performs when compiling the IR to machine code. Recently, there has been a trend to perform static analyses on intermediate representations of binaries, as this eases the implementation of said analyses. LLVM IR is an optimal target, as powerful analyses already exist as part of the LLVM infrastructure. This requires the use of binary lifters to acquire the necessary IR. There are different lifters that accomplish this task, however, little research exists on the topic of comparing the capabilities of these lifters. This work will present four lifters from industry and academia: Dagger, mctoll, retdec, and reopt, of which we first describe the means by which these lifters produce IR. We aim at experimentally identifying possible shortcomings on one hand, by creating a set of testing binaries with each one testing a language feature of C and C++ and on the other hand by using real world programs from the GNU coreutils package and the json-c library. With the help of a self-written script, we benchmark the lifter's ability to convert these binaries which differs greatly depending on the lifter at hand. In order to assess the suitability of the lifted IR for static analyses, we built LLVM passes which export the instruction count as well as the findings of alias analysis, value set analysis, loop analysis, and CFG recovery of LLVM. We conclude that the workflow of using LLVM lifters to employ LLVM as a binary analysis framework yields generally good and useful results, which varies depending on the choice of lifter and specific analysis. To summarize our findings, we create a feature matrix detailing the capabilities of every lifter.

Evaluation of LLVM Lifters

Supervisor(s): Fabian Kilger
Status: finished
Topic: Others
Author: Nico Greger
Submission: 2022-03-15
Type of Thesis: Bachelorthesis
Proof of Concept No

Astract:

LLVM is a powerful compilation and static analysis framework with an intermediate representation (IR) at its core, that is strongly suited for code analysis and optimization. LLVM lifters are tools that produce IR from an executable binary, effectively reversing the compilation process LLVM performs when compiling the IR to machine code. Recently, there has been a trend to perform static analyses on intermediate representations of binaries, as this eases the implementation of said analyses. LLVM IR is an optimal target, as powerful analyses already exist as part of the LLVM infrastructure. This requires the use of binary lifters to acquire the necessary IR. There are different lifters that accomplish this task, however, little research exists on the topic of comparing the capabilities of these lifters. This work will present four lifters from industry and academia: Dagger, mctoll, retdec, and reopt, of which we first describe the means by which these lifters produce IR. We aim at experimentally identifying possible shortcomings on one hand, by creating a set of testing binaries with each one testing a language feature of C and C++ and on the other hand by using real world programs from the GNU coreutils package and the json-c library. With the help of a self-written script, we benchmark the lifter's ability to convert these binaries which differs greatly depending on the lifter at hand. In order to assess the suitability of the lifted IR for static analyses, we built LLVM passes which export the instruction count as well as the findings of alias analysis, value set analysis, loop analysis, and CFG recovery of LLVM. We conclude that the workflow of using LLVM lifters to employ LLVM as a binary analysis framework yields generally good and useful results, which varies depending on the choice of lifter and specific analysis. To summarize our findings, we create a feature matrix detailing the capabilities of every lifter.