Machine Learning for Spam Detection

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


FYT Presentation and Demonstration


Title: "Machine Learning for Spam Detection"

by

Mr. Yau Wing Pong, Patrick


Abstract

Many current spam filters make use of machine learning techniques to
classify emails as either spam or legitimate mails before filtering out
the spam mails.  However, some emails, known as gray mails, cannot be
classified as spam or legitimate mails easily, partly because different
email users have different preferences on these emails.  This poses great
challenges to the design of spam filters.

In this paper, we explore the feasibility of using a two-stage spam filter
to tackle gray mails.  In the first stage, the spam filter first detects
gray mails and then classifies them as either spam or legitimate mails
using a na?ve Bayes classifier. In the second stage, all the remaining
gray mails are classified using a support vector machine. We have
performed extensive experiments to compare the performance of one-stage
and two-stage filters with respect to both accuracy and efficiency.


Date		:	28 April 2008, Monday

Time		:	11am to 12pm

Venue		:	Room 3304

Advisor		: 	Dr. D.Y. Yeung

2nd Reader	:	Dr. Brian Mak