More about HKUST
Source-Free Transfer Learning with Web 2.0
PhD Thesis Proposal Defence
Title: "Source-Free Transfer Learning with Web 2.0"
by
Mr. Wei XIANG
ABSTRACT:
Transfer learning can adapt and reuse knowledge from auxiliary domains
even when the data distribution and feature spaces are different. It is
gaining increasing popularity in diverse application domains, ranging from
Web search to collaborative filtering. One particular challenge in the
application of transfer learning is the need to identify one or more
proper source domains from where knowledge can be transferred. In the
past, the source domains are often specified by human experts, which has
become a major impediment to the application of transfer learning in the
real world. In this proposal, we propose a novel transfer learning
framework that requires no specific source data be given; instead, the
system can find a set of appropriate subsets to use as the auxiliary data
for transfer learning. It does this by searching from some extremely large
collection on the world wide web for data that can be beneficial to a
given target learning task.
In our approach, known as source-free transfer learning (SFTL), we are
given a target task set to learn, where the training data may have an
insufficient amount of labeled data only. To build a model, SFTL turns to
a series of very large, open information sources, such as the Web, for
help, by identifying a portion of the knowledge as the potential source
data. In our preliminary feasibility study on text classification, we have
tested an idea of selecting the source data from the Wikipedia to assist
several text categorization tasks. When we open up to Web scale, we face
several new problems. First, since the Web scale online knowledge source
is huge, it is a challenge to automatically generate task related queries
to find the ``right'' source data. Second, once the potentially diverse
source data are found, it is also an open issue how to unify these
heterogeneous data sources to enable effective transfer learning for the
specific target data. We also need to design a mechanism to acquire the
right distance measures for these data and models. Third, we need
different selection strategies to integrate the diverse data as the
potential source data for solving the target learning problem, in an
effective and efficient manner, particularly as these collections can be
large.
Compared to previous works on transfer learning, the SFTL framework has a
major advantage from its source-free nature; users of a learning task no
longer need to find the necessary source data to start learning. Another
advantage is scalability; unlike many previous transfer learning
approaches, which are difficult to scale up to the WWW scale knowledge
transfer, our approach would be highly scalable both for the training and
the prediction stage. We will use two real-world learning tasks, i.e.,
classification and link prediction, to carry out this research. We will
demonstrate how our SFTL framework can be instantiated for these two
different learning problems. The proposal will discuss some difficulties
which have been tackled by related works and our preliminary feasibility
study, and then point out some ongoing research issues for extensive
investigation.
Date: Thursday, 15 December 2011
Time: 10:00am - 12:00noon
Venue: Room 3311
lifts 17/18
Committee Members: Prof. Qiang Yang (Supervisor)
Dr. Huamin Qu (Chairperson)
Prof. Dit-Yan Yeung
Dr. Ke Yi
**** ALL are Welcome ****