More about HKUST
Feature Selection for Big Data with Trillion Dimensions
Speaker: Dr. Ivor Tsang Nanyang Technological University Singapore Title: "Feature Selection for Big Data with Trillion Dimensions" Date: Monday, 3 March 2014 Time: 4:00pm - 5:00pm Venue: Lecture Theater F (near lifts 25/26), HKUST Abstract: The world continues to generate quintillion bytes of data daily, leading to the pressing needs for new efforts in dealing with the grand challenges brought by Big data. Today, there is consensus among machine learning and data mining communities that data volume presents an immediate challenge pertaining to the scalability issue. However, when addressing volume in Big data analytics, researchers have taken a one-sided study of volume, which is the "Big instance size" factor of the data. The flip side of volume which is the dimensionality factor of Big data, on the other hand, has received much lesser attention. In this talk, I will present an attempt to fill in this gap and places special focus on this relatively under-explored topic of ultrahigh dimensionality. Specifically, I first reformulate the resultant non-convex problem as a convex semi-infinite programming (SIP) problem, and then present an efficient feature generating paradigm to solve it. The proposed feature generating paradigm is guaranteed to converge globally under mild conditions. In addition, it can achieve lower feature selection bias compared with the L1-regularized methods. To speed up the training on big data (w.r.t. dataset size), several speedup strategies are explored under the proposed feature generating paradigm. Comprehensive experiments on a wide range of synthetic and real-world datasets with tens of million data points and O(10^14) dimensions demonstrate that the proposed method achieves superb performances compared with state-of-the-art feature selection methods in terms of generalization performance and training efficiency. ***************** Biography: Ivor W. Tsang will join the Centre for Quantum Computation & Intelligent Systems (QCIS), University of Technology, Sydney (UTS) as Australian Future Fellow and Associate Professor. Before joining UTS, he was the Deputy Director of the Center for Computational Intelligence, Nanyang Technological University, Singapore. He received his Ph.D. degree in computer science from the Hong Kong University of Science and Technology in 2007. His research focuses on kernel methods, transfer learning, feature selection, big data analytics for data with millions of dimensions, and their applications to computer vision and pattern recognition. He has more than 100 research papers published in refereed international journals and conference proceedings, including 4 JMLR, 8 T-PAMI, 18 T-NN, 12 ICML, NIPS, UAI, AISTATS, SIGKDD, IJCAI, AAAI, ICCV, CVPR, ECCV, etc. Dr. Tsang received the prestigious Australian Research Council Future Fellowship in 2013, the IEEE Transactions on Neural Networks Outstanding 2004 Paper Award in 2006, and the second class prize of the National Natural Science Award, China in 2009. His research also earned him the Best Student Paper Award at CVPR'10, the Best Paper Award at ICTAI'11, the Best Poster Honorable Mention at ACML'12. He was also conferred with the Microsoft Fellowship in 2005.