Title: Optimal Sampling Algorithms for Frequency Estimation in Distributed Data Speaker: Zengfeng Huang, HKUST Time/Date: Friday, Mar 25, 11-12 Location: Room 3416 Abstract: Consider a distributed system with $n$ nodes where each node holds a multiset of items. In this paper, we design sampling algorithms that allow us to estimate the global frequency of any item with a standard deviation of $\eps N$, where $N$ denotes the total cardinality of all these multisets. Our algorithms have a communication cost of $O(n+\sqrt{n}/\eps)$, which is never worse than the $O(n+1/\eps^2)$ cost of uniform sampling, and could be much better when $n \ll 1/\eps^2$. In addition, we prove that one version of our algorithm is {\em instance-optimal} in a fairly general sampling framework. We also design algorithms that achieve optimality on the bit level, by combining Bloom filters of various granularities.