TUM Logo

Robust graph-based static code analysis

Robust graph-based static code analysis

Supervisor(s): Dr. Julian Schütte, Dennis Titze
Status: finished
Topic: Others
Author: Samuel Hopstock
Submission: 2019-08-16
Type of Thesis: Bachelorthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching

Description

Automatic code analysis is a widely used technique to find and eliminate
> errors in software projects. Instead of executing the program and verify
> that its behavior is correct, as dynamic analysis does it, static
> analysis is applied on its source code. Here, we search for suspicious
> patterns that are likely to indicate erroneous behavior.
>    
>      A special type of software bugs are those errors, that lead to
> security vulnerabilities. In this case, attackers may be able to
> undermine fundamental security aspects, by exfiltrating sensitive user
> data from server applications or assume con angehängt, sowohl Deutsch
> als auch Englisch.trol over the machine running the program in question.
> Security vulnerabilities in the code can have drastic consequences,
> which is why it is important to identify them as fast as possible and
> fix them immediately afterwards.
>    
>      This thesis extends the concept of Code Property Graphs (CPGs),
> which has been proposed for static analysis of C/C++ code, to be applied
> on programs and incomplete code snippets written in Java. Unifying
> Abstract Syntax Trees (ASTs), Control Flow Graphs (CFGs) and Data Flow
> Graphs (DFGs) in a single datastructure, this approach enables searching
> for vulnerabilities whose code patterns are spread out across the
> boundaries of single methods and classes. These patterns are identified
> using the graph query language cypher, which is provided by the graph
> database Neo4j.
>    
>      In an evaluation run on 100 public repositories on GitHub using
> cryptography, 135 findings of cryptographic API misuse have been
> identified using this technique. These include the use of insecure
> algorithms, like the Data Encryption Standard (DES) or Electronic Code
> Book mode (ECB), and hardcoded passwords that are used for encryption
> purposes.