TUM Logo

Evaluation of LLVM Lifters

Evaluation of LLVM Lifters

Supervisor(s): Fabian Kilger
Status: open
Topic: Reverse Engineering, Binary Exploitation
Type of Thesis: Masterthesis Bachelorthesis Guided Research

Description

Motivation

The LLVM compiler framework is a very popular component for program analysis tools and instrumentation. Usually, source
code is compiled into the LLVM IR (Intermediate Representation), where multiple program analysis steps are performed to enable safe optimizations. Furthermore, it provides a good API to statically analyze code and to add instrumentation to a program.
This makes it a popular choice for many static and dynamic analysis techniques.

Contrary to that, binary lifting is the process of transforming a binary (i.e. it's assembly instructions+data) to an abstract IR, which eases the implementation of binary analysis techniques and makes the analysis independent of the target architecture of the analyzed program. More recently, there has been a trend to lift binaries to the LLVM IR and leverage the program analysis, instrumentation and optimization capabilities of LLVM to binary analysis. Furthermore, an optimal binary lift to LLVM would allow the application of LLVM-based security analyses to closed-source applications. This would, for example, allow third-parties to perform security audits of COTS (Commercial off-the-shelf) software and harden them against existing vulnerabilities using several available hardening techniques.

However, there has not been an extensive evaluation of the applicability and limitations of using the lifted LLVM IR.

Topic

The goal of this research is to evaluate the quality of the LLVM IR's produced by different LLVM lifters. It should also result in an extendable benchmark that can be used to evaluate the capabilities of future lifters and, thereby, also help in the development of binary lifters. The work can be summarized in the following steps:

  1. Identify relevant use-cases to evaluate
  2. Identify general difficulties of LLVM lifters and how they might affect the different use-cases
  3. Identify metrics to evaluate the applicability of a specific LLVM IR Lifter to the chosen use-cases
  4. Implement an extendable benchmark measuring those metrics

 

Requirements

  • Programming Skills in C/C++ (to understand and possibly write LLVM passes)
  • Programming Skills in another language for writing the benchmark (preferably Python, but other languages possible)
  • Interest in program analysis (including static analysis)
  • Preliminary understanding of at least one program analysis related topic (e.g. fuzzing, static analysis, symbolic execution, etc.)
  • Ability to work self-directed and systematically
  • (optional) Experience with using LLVM

Contact

kilger@sec.in.tum.de