TUM Logo

Program Slicing on Code Property Graphs

Program Slicing on Code Property Graphs

Supervisor(s): Konrad Weiss
Status: finished
Topic: Others
Author: Eric Armbruster
Submission: 2020-11-16
Type of Thesis: Bachelorthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching

Description

Program slicing is a source code reduction technique for capturing the
statements that could influence the value of a variable at a specific statement. 
A Code Property Graph (CPG) is a directed, attributed, and labeled multigraph. 
It merges several graphs that represent different aspects of program code into a 
single supergraph. Originally, the graph combines an Abstract Syntax Tree (AST), a
Control-Flow Graph (CFG) and a Program Dependence Graph (PDG) to encode structure,
control-flow, data dependence and control dependence relationships into the graph. By
storing this source code representation in a graph database, efficient traversals can
be employed to search for patterns of structure and behavior in the code. Via these
graph traversals many vulnerability types can be modeled and the graph can be searched
for matching instances of vulnerabilities.
In this thesis, we construct a static program slicer on the basis of an existing CPG
library. Our approach follows graph-reachability based slicing techniques that have
been proposed for the PDG and System Dependence Graph (SDG). In comparison to
previous works we rely on the Evaluation Order Graph (EOG) instead of the CFG
as a foundation for the construction of control- and data dependences. The EOG
is similar to the CFG as both encode control-flow relationships but only the EOG
models evaluation order within expressions. Furthermore, the graph is of operator-
level precision and this enables later to slice at finer granularity. After implementing 
an intraprocedural slicer, a SDG is built to extend dependence analysis across the function
border. A context-sensitive interprocedural slicer is then implemented on top of the
SDG and the implementation is finished with a command-line frontend for both slicers.
As an indirect consequence of the slicer implementation, the CPG library’s analytical
capabilities are extended by dominators, a PDG, and a SDG, which lays the groundwork
for modeling further vulnerability types in future works.
The thesis is finished with a discussion on the limitations of our implementation, a
small evaluation on the quality of our slicers using two case studies, and a performance
measurement on the case studies. The measurement indicates that the performance our
implementation is acceptable fast if compared to the performance of the entire CPG
library implementation.