PRACTICAL DIFFERENTIAL PRIVACY

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "PRACTICAL DIFFERENTIAL PRIVACY"

By

Mr. Georgios KELLARIS


Abstract

In this thesis we focus on publishing statistics on a private database with 
ε-differential privacy. We target at three scenarios; (i) when the statistics 
are computed over a static database, (ii) when we wish to publish histograms 
over sensitive data, and (iii) when the statistics are published continuously 
over data that are updated by a stream.

For the first scenario, we address one-time publishing of non-overlapping 
counts. These statistics are useful in a wide and important range of 
applications, including transactional, traffic and medical data analysis. Prior 
work on ε-differential privacy publishes such statistics with prohibitively low 
utility in several practical scenarios. Towards this end, we present GS, a 
method that pre-processes the counts by elaborately grouping and smoothing them 
via averaging. This step acts as a form of preliminary perturbation that 
diminishes sensitivity, and enables GS to achieve ε-differential privacy 
through low Laplace noise injection. The grouping strategy is dictated by a 
sampling mechanism, which minimizes the smoothing perturbation. We demonstrate 
the superiority of GS over its competitors, and confirm its practicality, via 
extensive experiments on real datasets.

For the second scenario, we focus on the problem of differentially private 
histogram publication, for range-sum query answering. Specifically, we derive a 
histogram from a given dataset, such that (i) it satisfies ε-differential 
privacy, and (ii) it achieves high utility for queries that request the sum of 
contiguous histogram bins. Existing schemes are distinguished into two 
categories: fast but oblivious to utility optimizations that exploit the data 
characteristics, and data-aware but slow. We are the first to address this 
problem with emphasis on both efficiency and utility. Towards this goal, we 
formulate a principled approach, which defines a small set of simple modules, 
based on which we can devise a variety of more complex schemes. We first 
express the state-of-theart methods in terms of these modules, which allows us 
to identify the performance bottlenecks. Next, we design novel efficient and 
effective schemes based on non-trivial module combinations. We experimentally 
evaluate all mechanisms on three real datasets with diverse characteristics, 
and demonstrate the benefits of our proposals over previous work.

For the third scenario, we address continuously publishing of non-overlapping 
counts. Numerous applications require continuous publication of statistics for 
monitoring purposes, such as real-time traffic analysis, timely disease 
outbreak discovery, and social trends observation. These statistics may be 
derived from sensitive user data and, hence, necessitate privacy preservation. 
Although ε-differential privacy is a notable paradigm for offering strong 
privacy guarantees in statistics publishing, there is limited literature that 
adapts this concept to settings where the statistics are computed over an 
infinite stream of “events” (i.e., data items generated by the users), and 
published periodically. These works aim at hiding a single event over the 
entire stream. We argue that, in most practical scenarios, sensitive 
information is revealed from multiple events occurring at contiguous time 
instances. Towards this end, we put forth the novel notion of w-event privacy 
over infinite streams, which protects an event sequence occurring in w 
successive time instants. We first formulate our privacy concept, motivate its 
importance, and introduce a methodology for achieving it. We next design two 
instantiations, whose utility is independent of the stream length. Finally, we 
confirm the practicality of our solutions experimenting with real data.


Date:			Thursday, 25 June 2015

Time:			1:00pm - 3:00pm

Venue:			Room 2132C
 			Lift 19

Chairman:		Prof. Prithviraj Chattopadhyay (MGMT)

Committee Members:	Prof. Dimitris Papadias (Supervisor)
 			Prof. Wilfred Ng
 			Prof. Raymond Wong
 			Prof. Daniel Palomar (ECE)
 			Prof. Li Xiong (Emory Univ., USA)


**** ALL are Welcome ****