More about HKUST
A Crowdsourced Probabilistic Approach on Data Fusion Refinement
MPhil Thesis Defence Title: "A Crowdsourced Probabilistic Approach on Data Fusion Refinement" By Mr. Yunfan CHEN Abstract Data fusion has played an important role in data mining because high quality data is required in a lot of applications. As on-line data may be out-of-date and errors in the data may propagate with copying and referring between sources, it is hard to achieve satisfying results with merely applying existing data fusion methods to fuse Web data. To best understand the current studies, we first present an extensive survey about the Data Fusion field. Since we use crowdsourcing as a tool to solve the problem, we then survey the crowdsourcing researches as well. In this paper, we make use of the crowd to achieve high quality data fusion. We design a framework selecting a set of tasks to ask crowds in order to improve the confidence of data. Since data are correlated and crowds may provide incorrect answers, how to select a proper set of tasks to ask the crowd is a very challenging problem. In this paper, we design an approximation solution to address these challenges since we prove that the problem is at NP-hard. To further improve the efficiency, we design a pruning strategy and a preprocessing method, which effectively improve the performance of the approximation solution. Furthermore, we find that under certain scenarios, we are not interested in all the facts, but only a specific set of facts. Thus, for these specific scenarios, we also develop another approximation solution which is much faster than the general approximation solution. Then, we verify the solutions with extensive experiments on a real crowdsourcing platform. We apply multiple existing machine-based data fusion methods and apply our refinement method on those results to show our method is general enough with many methods. In conclusion part, we further analysis the methods are incompatible with our method and have a discussion about possible further researches in this topic. Date: Thursday, 11 May 2017 Time: 2:00pm – 4:00pm Venue: Room 3494 Lifts 25/26 Committee Members: Prof. Lei Chen (Supervisor) Dr. Yangqiu Song (Chairperson) Dr. Ke Yi **** ALL are Welcome ****