Range Counting Under Differential Privacy

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Range Counting Under Differential Privacy"

By

Mr. Ziyue HUANG


Abstract

The past few years have seen major advances in machine learning and data 
analysis due to the access to substantial amounts of user data. However, the 
collected personal information is often quite sensitive. Differential privacy 
is a rigorous mathematical definition for privacy-preserving data publishing, 
which has been widely adopted nowadays.

This thesis mainly focuses on the fundamental problem of range counting under 
differential privacy, i.e., computing how many points are lying inside any 
specific region from a certain range space, and its applications. The query 
family we consider consists of geometric ranges of a certain type, e.g., 
points, rectangles, halfspaces, spheres, or arbitrarily-shaped regions. First, 
we study the range counting problem in the centralized model, where a trusted 
curator holds the entire dataset. Existing lower bounds based on discrepancy 
theory suggest that large errors have to be introduced in order to preserve 
privacy: Essentially for any range space (except axis-parallel rectangles), the 
error has to be polynomial. We show that by allowing a standard notion of 
geometric approximation where points near the boundary of the range may or may 
not be counted, the error can be reduced to logarithmic. Next, we study the 
mean estimation problem under differential privacy, where worst-case optimal 
mechanisms do not offer meaningful utility guarantees in practice when the 
global sensitivity is very large. We take a principled approach, yielding a 
mechanism that is instance-optimal in a strong sense. The mechanism is based on 
private quantile estimation that relies on range counting mechanisms. Finally, 
we study the frequency estimation problem under both privacy and communication 
constraints in the multiparty model, where the (one-shot or streaming) data is 
distributed among multiple parties. Our protocols achieve optimality (up to 
polylogarithmic factors) permissible by the more stringent of the two 
constraints.


Date:			Tuesday, 18 January 2022

Time:			10:00am - 12:00noon

Venue:			Room 3494
 			Lifts 25/26

Chairperson:		Prof. Weichuan YU (ECE)

Committee Members:	Prof. Ke YI (Supervisor)
 			Prof. Sunil ARYA
 			Prof. Siu Wing CHENG
 			Prof. Yuan YAO (MATH)
 			Prof. Hubert CHAN (HKU)


**** ALL are Welcome ****