A Crowdsourced Probabilistic Approach on Data Fusion Refinement

MPhil Thesis Defence


Title: "A Crowdsourced Probabilistic Approach on Data Fusion Refinement"

By

Mr. Yunfan CHEN


Abstract

Data fusion has played an important role in data mining because high 
quality data is required in a lot of applications. As on-line data may be 
out-of-date and errors in the data may propagate with copying and 
referring between sources, it is hard to achieve satisfying results with 
merely applying existing data fusion methods to fuse Web data. To best 
understand the current studies, we first present an extensive survey about 
the Data Fusion field. Since we use crowdsourcing as a tool to solve the 
problem, we then survey the crowdsourcing researches as well.

In this paper, we make use of the crowd to achieve high quality data 
fusion. We design a framework selecting a set of tasks to ask crowds in 
order to improve the confidence of data. Since data are correlated and 
crowds may provide incorrect answers, how to select a proper set of tasks 
to ask the crowd is a very challenging problem. In this paper, we design 
an approximation solution to address these challenges since we prove that 
the problem is at NP-hard. To further improve the efficiency, we design a 
pruning strategy and a preprocessing method, which effectively improve the 
performance of the approximation solution.

Furthermore, we find that under certain scenarios, we are not interested 
in all the facts, but only a specific set of facts. Thus, for these 
specific scenarios, we also develop another approximation solution which 
is much faster than the general approximation solution.

Then, we verify the solutions with extensive experiments on a real 
crowdsourcing platform. We apply multiple existing machine-based data 
fusion methods and apply our refinement method on those results to show 
our method is general enough with many methods.

In conclusion part, we further analysis the methods are incompatible with 
our method and have a discussion about possible further researches in this 
topic.


Date:			Thursday, 11 May 2017

Time:			2:00pm – 4:00pm

Venue:			Room 3494
 			Lifts 25/26

Committee Members:	Prof. Lei Chen (Supervisor)
 			Dr. Yangqiu Song (Chairperson)
 			Dr. Ke Yi


**** ALL are Welcome ****