ACCURATE PROBABILITY ESTIMATES FROM LARGE-SCALE DATA IN THE APPLICATIONS OF DISPLAY ADVERTISING

MPhil Thesis Defence


Title: "ACCURATE PROBABILITY ESTIMATES FROM LARGE-SCALE DATA IN THE 
APPLICATIONS OF DISPLAY ADVERTISING"

By

Miss Liya JI


Abstract

Class membership probability estimates are important for many 
applications, especially Click-Through Rate (CTR) prediction in online 
advertising, in which classification outputs are combined with other 
sources, such as bid price, for decision-making. Existing calibration 
models can well learn a mapping function from predicted probabilities to 
empirical CTRs and thus reduce the systematic bias (the differences 
between the average predicted and observed CTRs on some slices of data). 
Yet, current methods have some theoretical issues and the classifier used 
in display advertising has some special properties. In this thesis, in 
order to address those limitations, we propose a model, called Calibration 
Trees (CT) as a post-processing to calibrate the bias of predictions. CT 
is scalable to large-scale data and robust for extremely imbalanced data. 
The experimental results on two data sets of display advertising systems 
show that our model significantly outperforms the state-of-the-art 
calibration models in terms of accuracy and well-calibrated properties. An 
advanced version of CT, called Calibration Forest, also allows 
implementation in a distributed system and further improves the 
performance of predictions.


Date:			Tuesday, 5 May 2015

Time:			3:00pm - 5:00pm

Venue:			Room 3501
 			Lifts 25/26

Committee Members:	Prof. Qiang Yang (Supervisor)
 			Dr. Raymond Wong (Chairperson)
 			Prof. Dit-Yan Yeung


**** ALL are Welcome ****