Towards Accurate and Efficient Classification: A Discriminative and Frequent Pattern-based Approach

Speaker:  Hong CHENG
	  Department of Computer Science
	  University of Illinois at Urbana-Champaign

Title:	"Towards Accurate and Efficient Classification:
	 A Discriminative and Frequent Pattern-based Approach"

Date:	Wednesday, 27 February 2008

Time:	3:00pm - 4:00pm

Venue:	Lecture Theatre G
	(Chow Tak Sin Lecture Theatre, near lift nos. 25/26)
	HKUST

Abstract:

Classification is an essential theme widely studied in machine learning,
statistics, and data mining.  A lot of classification methods have been
proposed in literature, most of which assume that the input data is in a
feature vector representation.  However, in many applications, it is
desirable to construct accurate classification models on complex
structural data which has no initial feature vector representation,
including transactions, sequences, graphs, semi-structured data, and
texts.  A primary question is how to construct a discriminative and
compact feature set, on the basis of which, classification could be
performed directly.  A concrete example is classifying chemical compounds
to various classes (e.g., toxic vs. nontoxic, active vs. inactive).  While
simple features such as atoms and links are too simple to preserve the
structural information, graph kernels make it hard to interpret the
classifiers.

My goal is to use discriminative frequent patterns to characterize complex
structural data and thus enhance the classification power.  Theoretical
analysis is provided to justify the discriminative power of frequent
patterns.  Two efficient search strategies have also been designed to
directly mine the most discriminative patterns.  Based on these results, I
developed a framework of discriminative frequent pattern-based
classification which could lead to a highly accurate, efficient and
interpretable classifier on complex data.  The proposed pattern-based
classification has been demonstrated useful in applications such as
chemical compound classification, text categorization as well as software
engineering.


********************
Biography:

Hong CHENG is currently a Ph.D. candidate in the Department of Computer
Science, at University of Illinois at Urbana-Champaign.  She got her M.
Phil degree from Hong Kong University of Science and Technology in 2003
and B.S. degree from Zhejiang University in 2001, both in Computer
Science.  Her research interests include data mining, machine learning and
database systems.  She has published over 20 research papers in
international conferences, journals and book chapter, including SIGKDD,
SDM, VLDB, ICDE, ICDM, ACM Transactions on KDD, and Data Mining and
Knowledge Discovery, and received research paper awards at ICDE'07,
SIGKDD'06 and SIGKDD'05.