Course Code: COMP 5331, Fall 2016

Course Title: Knowledge Discovery and Data Mining

Time and Venue: MW 10:30PM - 11:50AM, Rm 5583


Lei Chen (send e-mail for questions regarding the class and for arranging individual meetings)


Zheng Liu (


Course Description

Data mining has emerged as a major frontier field of study in recent years. Aimed at extracting useful and interesting patterns and knowledge from large data repositories such as databases and the Web, the field of data mining integrates techniques from database, statistics and artificial intelligence. This course will provide a broad overview of the field, preparing the students with the ability to conduct research in the field.


  1. Association
  2. Clustering
  3. Classification
  4. Data Warehouse
  5. Data Mining over Social Networks
  6. Outerliers
  7. Frequent Itemset

Reference Book/Materials

Grading Scheme

Project Presentation Score Bonus Marks    

1.      Project Presentation Scores will be given by the audience (5%) and the instructor (5%)

2.      Participant and Marking Bonus: all the students are strongly encouraged to attend project presentation sessions and give marks to the presenter (the score sheet will be distributed at the beginning of the session). I will give you bonus 0.25 mark for each filled score sheet.

3.      Question Bonus: all the students are encouraged to ask questions during the Question/Answer session after each presentation. Each student is allowed to ask one question in each paper presentation. For each asked question, I will give you bonus 0.5 mark.  You can staple your bonus coupon which we will give you on your filled score sheet.

Please note, the questions like “Can you explain more?”, “I cannot understand, can you repeat?” will not be counted. For each paper presentation’s Q/A session, at most 3 questions are allowed to ask.


Please check the project page.

Important Dates:


Welcome to COMP5331 


Group information confirmed.


Presentation Signup  Sheet  (

Midterm Exam: Oct 12th 10:30am-12:00pm

Final Exam: 16 December 2016       16:30   -19:30  CYTG010          






Tentative Schedule




cture Slides

Text book


M, Sept 5th

Introduction to Courses and Data Mining  (PPT (1, 2))

HK, Chpater


M, Sept 12th

Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods (PPT, PDF)

Example of Apiror Algorithm and FP tree

TSK, Chapter 6

HK, Chapter 6


M, Sept 19th

Advanced Frequent Pattern Mining (PPT, PDF)

Guest Lecture by Prof. Yangqiu Song

 Incorporating Structured World Knowledge into Unstructured Text via Heterogeneous Information Networks (PPT, PDF)

 TSK, Chapter 7

HK, Chapter 7

M, Sept 26th

Classification: Basic Concepts (PPT, PDF)

Ensemble Methods (PPT, PDF)

HK, Chapter 9


Oct 3rd

Classification: Advanced Methods (PPT, PDF)

HK, Chapter 10

W, Oct 12th

Mid-term Exam (in Class)

HK, Chapter 10

M, Oct 17th

Cluster Analysis: Basic Concepts and Methods (PPT, PDF)

HK, Chapter 11


M, Oct 24th

Cluster Analysis: Advanced Methods (PPT, PDF)


M, Oct 31st

Outlier Analysis  (PPT, PDF) LOF example (PDF)


TSK, Chapter 10

HK, Chapter 12

LOF Example

M Nov.7th

Data Cube and OLAP (PPT, PDF)

Introduction to Web Data Mining  (PPT, PDF)

Final Review (PPT)

HK, Chapter 4, and 5



M, Nov. 14th

Project Presentation

Mining Approximate TopK Non-redundant Rules (PPT)

Exploring the Skyline Pattern Mining (PPT)

Group 11 (LEUNG Chung Yin)

Group 12 (Wu Ziming, Sun Mingfei)


M, Nov 21th

Project Presentation

Clustering Graph Streams (PPT)

Discovering Relations between Culture and Personal Background based on OkCupid Database (PPT)

Given Word representation to Rare Word (PPT)


Batch incremental mining with loose FP-Tree and FP-growth (PPT)

A study on Novel Recommendation based on Personal Popularity Tendency (PDF)

Mining Utility Functions (PPT)


M, Nov. 28th

Recommender Systems with Trust-based Social Networks (PDF)

Discovering sentiments by text mining and statistical analysis of media review (PPT)

UP-Growth (PPT)


Food recommendation using personal popularity tendency (PPT)

The Comparison of Recommendation Approaches in Movielens (PDF)

Discovering Relations between Culture and Personal Background based on OkCupid Database (PPT)