More about HKUST
Source-Free Transfer Learning with Web 2.0
PhD Thesis Proposal Defence Title: "Source-Free Transfer Learning with Web 2.0" by Mr. Wei XIANG ABSTRACT: Transfer learning can adapt and reuse knowledge from auxiliary domains even when the data distribution and feature spaces are different. It is gaining increasing popularity in diverse application domains, ranging from Web search to collaborative filtering. One particular challenge in the application of transfer learning is the need to identify one or more proper source domains from where knowledge can be transferred. In the past, the source domains are often specified by human experts, which has become a major impediment to the application of transfer learning in the real world. In this proposal, we propose a novel transfer learning framework that requires no specific source data be given; instead, the system can find a set of appropriate subsets to use as the auxiliary data for transfer learning. It does this by searching from some extremely large collection on the world wide web for data that can be beneficial to a given target learning task. In our approach, known as source-free transfer learning (SFTL), we are given a target task set to learn, where the training data may have an insufficient amount of labeled data only. To build a model, SFTL turns to a series of very large, open information sources, such as the Web, for help, by identifying a portion of the knowledge as the potential source data. In our preliminary feasibility study on text classification, we have tested an idea of selecting the source data from the Wikipedia to assist several text categorization tasks. When we open up to Web scale, we face several new problems. First, since the Web scale online knowledge source is huge, it is a challenge to automatically generate task related queries to find the ``right'' source data. Second, once the potentially diverse source data are found, it is also an open issue how to unify these heterogeneous data sources to enable effective transfer learning for the specific target data. We also need to design a mechanism to acquire the right distance measures for these data and models. Third, we need different selection strategies to integrate the diverse data as the potential source data for solving the target learning problem, in an effective and efficient manner, particularly as these collections can be large. Compared to previous works on transfer learning, the SFTL framework has a major advantage from its source-free nature; users of a learning task no longer need to find the necessary source data to start learning. Another advantage is scalability; unlike many previous transfer learning approaches, which are difficult to scale up to the WWW scale knowledge transfer, our approach would be highly scalable both for the training and the prediction stage. We will use two real-world learning tasks, i.e., classification and link prediction, to carry out this research. We will demonstrate how our SFTL framework can be instantiated for these two different learning problems. The proposal will discuss some difficulties which have been tackled by related works and our preliminary feasibility study, and then point out some ongoing research issues for extensive investigation. Date: Thursday, 15 December 2011 Time: 10:00am - 12:00noon Venue: Room 3311 lifts 17/18 Committee Members: Prof. Qiang Yang (Supervisor) Dr. Huamin Qu (Chairperson) Prof. Dit-Yan Yeung Dr. Ke Yi **** ALL are Welcome ****