More about HKUST
Enhancing Reliability and Performance of Data-Centric Systems with Static Analysis
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Enhancing Reliability and Performance of Data-Centric Systems with Static Analysis" By Mr. Chengpeng WANG Abstract: In the era of big data, data-centric systems have emerged as the fundamental infrastructure for processing, storing, and transmitting various forms of data, providing diverse services in our daily lives The widespread adoption of data-centric systems highlights the critical need for enhancing their reliability and performance. Unreliable or inefficient data-centric systems can result in unanticipated economic losses and unnecessary consumption of computational resources, jeopardizing property safety and compromising the overall service experience. This thesis provides a comprehensive analysis of data-centric systems using static analysis techniques. The research is centered around three critical components, namely the application, library, and database sides of the systems. By delving into data organization, propagation, manipulation, and validation, our techniques can successfully identify vulnerabilities and optimize computation, leading to enhanced reliability and performance of data-centric systems in a holistic manner. The first part of our research focuses on ubiquitous data structures, called containers, and improves the system performance by organizing data with efficient container types. We introduce Cres, a synthesizer designed to replace inefficient container types for a given program. Cres statically identifies container usage and selects methods with lower time complexity for each container method call, finally discovering a more efficient container type for each container object. Cres reduces execution time by 8.1% on average in our experimental subjects while theoretically preserving program behavior. The second aspect of our research investigates the data propagation in the application code to improve the system reliability. The erroneous values through containers necessitate precise and efficient reasoning about container memory layout. To address this, we introduce \Anchor, which utilizes memory orientation analysis to apply strong updates to container memory layouts and conducts reachability analysis. It is shown that Anchor detects 20 null pointer exceptions with only 9.1% as its false-positive ratio and finishes analyzing 5 MLoC within five hours. Its high precision and efficiency of bug detection demonstrate its potential to improve system reliability from the application side. The third part of our research investigates the data manipulation conducted by library APIs. We propose DAInfer, an algorithm that identifies store-load APIs and derives API aliasing specifications from library documentation. Equipped with NLP models, DAInfer effectively interprets informal semantic information and achieves an efficient API aliasing specification inference with a precision of 79.78% and a recall of 82.29%. The inferred specifications can effectively benefit downstream analyses to derive fundamental program facts, such as value-flow and alias facts, further promoting bug detection and program optimization. The final part of our research shifts to domain-specific programs, named data constraints, to investigate data validation upon databases. While data constraints play a crucial role in ensuring data correctness, the presence of equivalent ones can result in the waste of computational resources. To tackle this issue, we present \EqDAC, an efficient decision procedure that utilizes two lightweight analyses to refute or prove the data constraint equivalence in polynomial time. It is demonstrated that EqDAC discovers 11,538 equivalent pairs from 30,801 data constraints in the Ant Group and uncovers 7,842 redundant data constraints. It successfully alleviates redundant data validation and reduces the CPU time by 15.48% from the database side. Date: Monday, 11 December 2023 Time: 4:00pm - 6:00pm Venue: Room 5566 Lifts 27/28 Chairman: Prof. Nian LIN (PHYS) Committee Members: Prof. Charles ZHANG (Supervisor) Prof. Shing-Chi CHEUNG Prof. Shuai WANG Prof. Jack CHENG (CIVL) Prof. Xiangyu ZHANG (Purdue University) **** ALL are Welcome ****