More about HKUST
Incomplete Data Analysis in Smart City Applications
PhD Thesis Proposal Defence Title: "Incomplete Data Analysis in Smart City Applications" by Mr. Siyuan Liu ABSTRACT: Incomplete data in my work is defined as the data with extremely limited samples observed, which brings big challenges to data mining. Such extremely limited sample data obviously gives us terrible bias and inaccurate results. Given one over ten thousand of the whole set of vehicles in a city, how can we still retrieve the vehicle distribution and detect the hot spots/crowded areas in the city? The traditional density-based clustering methods work not well because of the very limited and errorable vehicle density/location information. Hence we need new algorithms to handle such incomplete data, in terms of accuracy and scalability. On the other hand, the vehicle traces are typical spatio-temporal data, which requires efficient approaches. In this paper, we have an interesting observation that the vehicle speed can indicate the crowdedness of a given area. In other words, if a given area is very crowded, then the vehicles’ speed in this area is low; while if this area is not crowded, then the vehicles’ speed in this area prefers high. As such the mobility of samples is naturally incorporated and a novel non-density-based clustering method is developed, called mobility-based clustering. Several key factors beyond the vehicle crowdedness have been identified and techniques to compensate these effects are proposed. We evaluate the performance of mobility-based clustering based on real traffic situations. Experimental results show that using 0.3 % of vehicles as the samples, mobility-based clustering can accurately identify hot spots which can hardly be obtained by the latest representative algorithm UMicro. Date: Tuesday, 28 June 2011 Time: 3:30pm - 5:30pm Venue: Room 4483 lifts 25/26 Committee Members: Prof. Lionel Ni (Supervisor) Prof. Shing-Chi Cheung (Chairperson) Dr. Qiong Luo Dr. Raymond Wong **** ALL are Welcome ****