Source-Free Transfer Learning with Web 2.0

PhD Thesis Proposal Defence


Title: "Source-Free Transfer Learning with Web 2.0"

by

Mr. Wei XIANG


ABSTRACT:

Transfer learning can adapt and reuse knowledge from auxiliary domains 
even when the data distribution and feature spaces are different. It is 
gaining increasing popularity in diverse application domains, ranging from 
Web search to collaborative filtering. One particular challenge in the 
application of transfer learning is the need to identify one or more 
proper source domains from where knowledge can be transferred. In the 
past, the source domains are often specified by human experts, which has 
become a major impediment to the application of transfer learning in the 
real world. In this proposal, we propose a novel transfer learning 
framework that requires no specific source data be given; instead, the 
system can find a set of appropriate subsets to use as the auxiliary data 
for transfer learning. It does this by searching from some extremely large 
collection on the world wide web for data that can be beneficial to a 
given target learning task.

In our approach, known as source-free transfer learning (SFTL), we are 
given a target task set to learn, where the training data may have an 
insufficient amount of labeled data only. To build a model, SFTL turns to 
a series of very large, open information sources, such as the Web, for 
help, by identifying a portion of the knowledge as the potential source 
data. In our preliminary feasibility study on text classification, we have 
tested an idea of selecting the source data from the Wikipedia to assist 
several text categorization tasks. When we open up to Web scale, we face 
several new problems. First, since the Web scale online knowledge source 
is huge, it is a challenge to automatically generate task related queries 
to find the ``right'' source data. Second, once the potentially diverse 
source data are found, it is also an open issue how to unify these 
heterogeneous data sources to enable effective transfer learning for the 
specific target data. We also need to design a mechanism to acquire the 
right distance measures for these data and models. Third, we need 
different selection strategies to integrate the diverse data as the 
potential source data for solving the target learning problem, in an 
effective and efficient manner, particularly as these collections can be 
large.

Compared to previous works on transfer learning, the SFTL framework has a 
major advantage from its source-free nature; users of a learning task no 
longer need to find the necessary source data to start learning. Another 
advantage is scalability; unlike many previous transfer learning 
approaches, which are difficult to scale up to the WWW scale knowledge 
transfer, our approach would be highly scalable both for the training and 
the prediction stage. We will use two real-world learning tasks, i.e., 
classification and link prediction, to carry out this research. We will 
demonstrate how our SFTL framework can be instantiated for these two 
different learning problems. The proposal will discuss some difficulties 
which have been tackled by related works and our preliminary feasibility 
study, and then point out some ongoing research issues for extensive 
investigation.


Date:                   Thursday, 15 December 2011

Time:                   10:00am - 12:00noon

Venue:                  Room 3311
                         lifts 17/18

Committee Members:      Prof. Qiang Yang (Supervisor)
                         Dr. Huamin Qu (Chairperson)
 			Prof. Dit-Yan Yeung
 			Dr. Ke Yi


**** ALL are Welcome ****