More about HKUST
Observable and Economical Dataflow Computation in Datacenters
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Observable and Economical Dataflow Computation in Datacenters" By Mr. Huangshi TIAN Abstract With the proliferation of data emerges a myriad of dataflow frameworks. When they are deployed in a datacenter and productized as a service, their performance and cost become two primary concerns. However, performance issues prevail in dataflow computation. Their diagnosis is complicated by the heterogeneity of dataflow frameworks because the frameworks differ in underlying design, application domain, and computation complexity. It poses challenges for service providers and users to debug and locate the problems. A side effect of performance issues is higher resource costs as the datacenter operator cannot easily determine the appropriate allocation that could guarantee stable performance, thus leading to unwanted resource waste. To tackle the challenges of performance and cost, the dissertation first characterizes dataflow computation in a large datacenter by analyzing a recently released workload trace. It examines the static properties of job DAGs and the runtime characteristics of their task execution. Statically, the DAGs are discovered to exhibit high artificiality when compared with random graphs. The dependent tasks may have significant variability in resource usage and duration—–even for recurring tasks. The results confirm the challenge of performance debugging and resource allocation. To diagnose performance issues, the dissertation enables resource observability in dataflow computation by proposing CrystalPerf, a new approach that learns to characterize the performance of dataflow computation based on code analysis. It requires no code instrumentation and applies to a wide variety of dataflow frameworks. Our key insight is that the source code of an operation contains learnable syntactic and semantic patterns that reveal how it uses resources. Our approach establishes a performance-resource model that, given a dataflow program, infers automatically how much time each operation has spent on each resource (e.g., CPU, network, disk) from past execution traces and the program source code, using machine learning techniques. Extensive evaluations and real-world case studies show that CrystalPerfcan predict job performance and accurately detect runtime bottlenecks of DAG jobs. To reduce resource costs, the dissertation proposed Owl, an overcommitted scheduler for executing dataflow computation on serverless platforms. It achieves high utilization without compromising performance with a dual approach. (1) For less-invoked functions, it allocates resources to the sandboxes with usage-based heuristic, keeps monitoring their performance, and remedies any detected degradation. (2) For frequently-invoked functions, Owl profiles the interference patterns among collocated functions and places the sandboxes under the guidance of profiles. Owl further consolidates idle sandboxes to reduce resource waste. We prototype OWL in our production system and implement a representative benchmark suite to evaluate it. The results demonstrate that the prototype could reduce VM cost by 43.80% and effectively mitigate latency degradation, with negligible overhead incurred. Date: Tuesday, 19 July 2022 Time: 1:00pm - 3:00pm Zoom Meeting: https://hkust.zoom.us/j/99656972022?pwd=SzI1R1hTa2xIR0tqTWNqTDNkQThHZz09 Chairperson: Prof. Jidong ZHAO (CIVL) Committee Members: Prof. Wei WANG (Supervisor) Prof. Bo LI Prof. Shuai WANG Prof. Jiang XU (ECE) Prof. Chuan WU (HKU) **** ALL are Welcome ****