Enhancing Reliability and Performance of Data-Centric Systems with Static Analysis

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Enhancing Reliability and Performance of Data-Centric Systems with
Static Analysis"

By

Mr. Chengpeng WANG


Abstract:

In the era of big data, data-centric systems have emerged as the fundamental
infrastructure for processing, storing, and transmitting various forms of data,
providing diverse services in our daily lives The widespread adoption of
data-centric systems highlights the critical need for enhancing their
reliability and performance. Unreliable or inefficient data-centric systems can
result in unanticipated economic losses and unnecessary consumption of
computational resources, jeopardizing property safety and compromising the
overall service experience.

This thesis provides a comprehensive analysis of data-centric systems using
static analysis techniques. The research is centered around three critical
components, namely the application, library, and database sides of the systems.
By delving into data organization, propagation, manipulation, and validation,
our techniques can successfully identify vulnerabilities and optimize
computation, leading to enhanced reliability and performance of data-centric
systems in a holistic manner.

The first part of our research focuses on ubiquitous data structures, called
containers, and improves the system performance by organizing data with
efficient container types. We introduce Cres, a synthesizer designed to replace
inefficient container types for a given program. Cres statically identifies
container usage and selects methods with lower time complexity for each
container method call, finally discovering a more efficient container type for
each container object. Cres reduces execution time by 8.1% on average in our
experimental subjects while theoretically preserving program behavior.

The second aspect of our research investigates the data propagation in the
application code to improve the system reliability. The erroneous values
through containers necessitate precise and efficient reasoning about container
memory layout. To address this, we introduce \Anchor, which utilizes memory
orientation analysis to apply strong updates to container memory layouts and
conducts reachability analysis. It is shown that Anchor detects 20 null pointer
exceptions with only 9.1% as its false-positive ratio and finishes analyzing 5
MLoC within five hours. Its high precision and efficiency of bug detection
demonstrate its potential to improve system reliability from the application
side.

The third part of our research investigates the data manipulation conducted by
library APIs. We propose DAInfer, an algorithm that identifies store-load APIs
and derives API aliasing specifications from library documentation. Equipped
with NLP models, DAInfer effectively interprets informal semantic information
and achieves an efficient API aliasing specification inference with a precision
of 79.78% and a recall of 82.29%. The inferred specifications can effectively
benefit downstream analyses to derive fundamental program facts, such as
value-flow and alias facts, further promoting bug detection and program
optimization.

The final part of our research shifts to domain-specific programs, named data
constraints, to investigate data validation upon databases. While data
constraints play a crucial role in ensuring data correctness, the presence of
equivalent ones can result in the waste of computational resources. To tackle
this issue, we present \EqDAC, an efficient decision procedure that utilizes
two lightweight analyses to refute or prove the data constraint equivalence in
polynomial time. It is demonstrated that EqDAC discovers 11,538 equivalent
pairs from 30,801 data constraints in the Ant Group and uncovers 7,842
redundant data constraints. It successfully alleviates redundant data
validation and reduces the CPU time by 15.48% from the database side.


Date:                   Monday, 11 December 2023

Time:                   4:00pm - 6:00pm

Venue:                  Room 5566
                        Lifts 27/28

Chairman:               Prof. Nian LIN (PHYS)

Committee Members:      Prof. Charles ZHANG (Supervisor)
                        Prof. Shing-Chi CHEUNG
                        Prof. Shuai WANG
                        Prof. Jack CHENG (CIVL)
                        Prof. Xiangyu ZHANG (Purdue University)


**** ALL are Welcome ****