More about HKUST
I/O-Efficient Algorithms and Data Structures
Speaker: Ke YI Department of Computer Science Duke University Title: "I/O-Efficient Algorithms and Data Structures" Date: Wednesday, 1 March 2006 Time: 3:30pm - 4:30pm Venue: Room 2404 (via lift nos. 17/18) HKUST ABSTRACT: As the massive data sets we face today far exceed memory limit, traditional RAM algorithms do not scale due to excessive page faults. In recent years, I/O-efficient algorithms have been a successive and active direction to better scalability on massive data. In this talk, I will discuss two topics from my thesis work on I/O-efficient algorithms and data structures, with their applications in large spatial databases. The union-find problem is one of the most fundamental algorithmic problems. However, no I/O-efficient algorithm is known despite extensive study over the last four decades. I will present the first I/O-efficient algorithm for the batched (off-line) union-find problem, and illustrate how it can be used in some important tasks in geographic information systems. Experimental results show that the new algorithm gives order-of-magnitude improvement over previous methods on large data sets that do not fit in memory. The R-tree is a disk-based data structure that is commonly used in spatial databases. Many R-tree variants have been proposed in the past 20 years since its invention. However, all existing R-trees are based on heuristics and have poor performance on bad data sets. I will present the priority R-tree, which is the first R-tree variant that always answers a window query using the optimal number of I/Os. Not only being theoretically optimal, experiments show that the priority R-tree is also quite competitive compared with other popular R-trees on real-life and relatively nicely distributed data, and outperforms them significantly on more extreme data. ********************* Biography: Ke YI got his B.E. in Computer Science from Tsinghua University, China, in 2001. He is currently a Ph.D. candidate in the Department of Computer Science at Duke University, and is expected to complete all degree requirements by August, 2006. During his Ph.D. study he also spent two summers as an intern at IBM T.J. Watson Research Center and AT&T Labs -Research, respectively. His primary research interest is algorithms, with a focus on I/O-efficient algorithms, but he is also interested in many problems that arise from database systems.