SOME RESEARCH ISSUES IN HASH FUNCTION LEARNING

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "SOME RESEARCH ISSUES IN HASH FUNCTION LEARNING"

By

Mr. Yi ZHEN


Abstract

Over the past decade, hashing-based methods for large-scale similarity search 
have sparked considerable research interest in the database, data mining and 
information retrieval communities. These methods achieve very fast search speed 
by indexing data with binary codes. Although lots of hash functions for various 
similarity metrics have been proposed, they often generate very long codes due 
to their data independence nature. In recent years, machine learning techniques 
have been applied to learn hash functions from data, forming a new research 
topic called hash function learning.

In this thesis, we study two important issues in hash function learning. On one 
hand, existing supervised or semi-supervised hash function learning methods, 
which learn hash functions from labeled data, can be regarded to be passive 
because they assume that the labeled data are provided in advance. Given that 
the data labeling process can be very costly in practice and the contribution 
of labeled data to hash function learning can be quite different, it may be 
more cost effective for the hash function learning methods to select labeled 
data from which to learn. To this end, we propose a novel framework, termed 
active hashing, to actively select the most informative data to label for hash 
function learning. Under the framework, we develop one simple method which 
queries data labels that the current hash functions are most uncertain about. 
Experiments conducted on two real data sets show obvious improvement of our 
active hashing algorithm over previous passive hashing methods. On the other 
hand, most existing hash function learning methods only work on unimodal data, 
which are obviously not the case in many applications, e.g., multimedia 
retrieval and cross-lingual document analysis. To apply hash function learning 
to multimodal data, we develop three methods under the framework of multimodal 
hashing which hashes data points of multiple modalities into one common Hamming 
space. For aligned data, the _rst method is based on spectral analysis of the 
correlation of the multimodal data. For graph data, the second method falls 
into the category of latent feature models and the hash codes can be obtained 
through Bayesian inference. For general data, we propose a boosted 
co-regularization model which can be efficiently solved by stochastic 
gradient-based algorithms. The effectiveness of our models is validated through 
extensive comparative study on crossmodal multimedia retrieval.


Date:			Monday, 9 July 2012

Time:			3:00pm – 5:00pm

Venue:			Room 3501
 			Lifts 25/26

Chairman:		Prof. Man-Yu Wong (MATH)

Committee Members:	Prof. Dit-Yan Yeung (Supervisor)
 			Prof. James Kwok
 			Prof. Nevin Zhang
 			Prof. Weichuan Yu (ECE)
                         Prof. Irwin King (Comp. Sci. & Engg., CUHK)


**** ALL are Welcome ****