More about HKUST
A Probabilistic Framework on Machine-Crowd Collaboration and its Applications on Data Integration
PhD Thesis Proposal Defence Title: "A Probabilistic Framework on Machine-Crowd Collaboration and its Applications on Data Integration" by Mr. Chen ZHANG Abstract: Recently, the popularity of crowdsourcing has brought a new opportunity for engaging human intelligence into the process of data analysis. Existing works on crowdsourcing have developed sophisticated methods by utilizing the crowd as a new kind of processor, a.k.a HPU. One of the drawbacks of these works is that they treat the crowd as the sole information source for the human-intrinsic queries. However, on many applications, such human-intrinsic queries can be also answered by machine-alone systems (i.e. CPUs). On the one hand, the latency of using HPUs to answer queries is much longer than that of CPUs, and the monetary cost of HPUs is often high (e.g. crowdsoucing on Amazon Mechanical Turk), but on the other hand, the answers obtained from CPUs often have high uncertainty due to its incapability to recognize human-intrinsic semantics. Therefore, it is natural to ask why we cannot combine the power of CPUs and the wisdom of HPUs to answer human-intrinsic queries accurately and fast, which is exactly the motivation of this work. To summarize, our study covers four following aspects: 1) We propose three new specific human-machine hybrid system in three different application background, to improve the data quality 2) We design a novel crowd-machine hybrid system of uncertain data cleaning ; 3) We study the classic problem of schema mapping in the new crowdsourcing perspective; We validate our solutions through extensive experiments and discuss several interesting research directions of CPU and HPU hybrid systems on data integration. Date: Monday, 4 May 2015 Time: 5:30pm - 7:30pm Venue: Room 3501 lifts 25/26 Committee Members: Dr. Lei Chen (Supervisor) Dr. Pan Hui (Chairperson) Dr. Raymond Wong Dr. Ke Yi **** ALL are Welcome ****