More about HKUST
Machine Learning for Spam Detection
The Hong Kong University of Science and Technology Department of Computer Science and Engineering FYT Presentation and Demonstration Title: "Machine Learning for Spam Detection" by Mr. Ho Wai Pang, Tony Abstract Spam mails not only annoy users but also bring unnecessary bandwidth wastage to the Internet. This problem is posing a serious threat to email service. Although many content-based spam filters have been developed, they alone cannot solve the problem especially when spammers apply various tricks these days to modify the content of spam mails to fool the filters. In this study, we propose a server-side spam filter which plays a complementary role to a content-based filter by filtering out some spam mails at the server level based on complementary features other than those extracted from the mail content. By using a na?ve Bayes classifier with the reject option, we show that utilizing 18 features based on URL and mail header information enables half of the emails to be classified with low false positive rate (<1%). Also, we address the URL information hiding problem in our study. Moreover, an online survey has been conducted to understand the user preferences regarding the use of spam filters. The survey results show that the maximum tolerance of missing legitimate emails should never exceed 5%. The implications of the survey results to our future research will also be discussed. Date : 28 April 2008, Monday Time : 10am to 11am Venue : Room 3304 Advisor : Dr. D.Y. Yeung 2nd Reader : Dr. Brian Mak