TOWARDS ASSESSING AND ENHANCING MODERN SOFTWARE REVERSE ENGINEERING

PhD Thesis Proposal Defence


Title: "TOWARDS ASSESSING AND ENHANCING MODERN SOFTWARE REVERSE ENGINEERING"

by

Mr. Zhibo LIU


Abstract:

Software reverse engineering is the process of converting incomprehensible 
binary software into human-readable or analysis-friendly high-level 
representation. Nowadays, in many security-critical tasks, such as malware and 
closed-source software analysis and legacy code hardening, the source code is 
usually unavailable, which hinders automatic analysis and makes further fixing 
or hardening impossible. In order to be able to analyze software in such 
scenarios, the methodologies of software reverse engineering are summarized and 
developed to extract the essence of the program while discarding 
machine-specific details.

After decades of continuous development and the investment of enormous 
resources, perfect reverse engineering is still beyond reach due to the 
complexity of binaries. This thesis presents our works of systematically 
testing and evaluating modern reverse engineering tools. Our works provide a 
deep and timely understanding of these tools, pointing out existing problems 
and possible future directions. Based on the observations provided by our 
works, we also extend reverse engineering techniques to the emerging field of 
DNN executables.

Our first work tests decompilation correctness to present an up-to-date 
understanding of modern C decompilers. Our decompiler testing framework detects 
1,423 decompilation error-triggering inputs from four popular decompilers. With 
extensive manual effort, we identified 13 bugs in two open-source decompilers. 
Our findings reveal that modern decompilers are making promising progress in 
functional correctness and have been underestimated by researchers. 
Nevertheless, we show that some tasks that have been studied for years in 
academia, such as variable recovery and type inference, still impede C 
decompilers from generating quality outputs.

Our second work conducts an in-depth study of binary lifters from the 
"expressiveness" perspective. We demystify binary lifters and reveal how well 
the lifters' output can support security-critical downstream applications. We 
study four popular static and dynamic LLVM IR lifters that were developed by 
the industry or academia from a total of 252,063 executables generated across 
compilers, optimizations, and architectures. Our findings show that such binary 
lifters are suitable for common similarity- or code comprehension-based 
security analysis (e.g., binary diffing). However, the lifted IR code appears 
unsuited to rigorous static analysis (e.g., pointer analysis). We summarize our 
findings and suggest the correct use and further enhancement of binary lifters. 
We also explored practical ways to enhance the lifted IR code.

Our third work presents BTD (Bin to DNN), a decompiler for deep neural network 
(DNN) executables. BTD takes DNN executables and recovers DNN model 
specifications, including DNN operators, network topology, dimensions, and 
parameters. BTD delivers a practical framework to process DNN executables 
compiled by different deep learning compilers and with full optimizations 
enabled on x86 platforms.


Date:			Tuesday, 21 February 2023

Time:                  	10:00am - 12:30pm

Venue:			Room 5501
 			lifts 25/26

Committee Members:	Dr. Shuai Wang (Supervisor)
  			Dr. Charles Zhang (Chairperson)
 			Dr. Lionel Parreaux
 			Dr. Jiasi Shen


**** ALL are Welcome ****