This course homepage is accessible from http://www.cs.ust.hk/~dlee/4321/

COMP 4321 Search Engines for Web and Enterprise Data [3-0-1:3]

 

Spring 2020

Course and Instructor/TA Information

Instructor: 

Prof. Dik Lun Lee

Email: 

dlee@cse.ust.hk

Office:

3534 (Lift 25/26)

Office Hours: 

Emails are the best way to get quick response from me. If you want to meet and cannot make the office hours, try to make an appointment with me by email.

 

Lectures: 

Tue/Thu 9:00am – 10:20am 

Lecture Room: 

Rm 4620 Lf 31/2

 

TA: 

NI, Wangze wniab@ust.hk

 

YU Manli myuae@connect.ust.hk

 

CHEN Hongkai hchencf@connect.ust.hk

Lab LA1:

Monday 7:00pm – 7:50 pm, Rm 4213 (Lift 19)

Lab Schedule  Please refer to the page on Canvas

Term (Group) Project >>

Detailed Course Topics >> [use student ID as username and password]

Grading Scheme: Please refer to the page on Canvas

Late Submission Policy

  1. If you submit your homeworks or project phases after the due date, your score will be penalized by 20% for each day after the due date.
  2. Submissions will not be accepted 2 days after the due date.

Course Grade Assignment

Course grades will normally fall within the following percentage bands:

A 15%
B 40%
C 40%
D/F 5%

There is no particular distribution within the subgrades of a grade but can be assumed to be equally divided.

How is bonus considered?

Grades are first assigned to all students according to the distribution above without considering bonus points. Thresholds between subgrades are set. Then, bonus points are added to students. A student’s grade will be re-assigned (moved up) according to his/her new score. The end result is that students who do not have bonus points will not be penalized by other students having bonus points.

Open-Book/Note Exams

Both the mid-term and final exams are open book. You can bring your lecture notes (slides and notes) and one book to the exam venue. While you do not need to memorize everything (formula and pseudo code, etc.) by heart, the examinations are set assuming you know the materials well. That is, the notes/slides are there to help you with “is my cosine similarity formula correct?” and “if the PR formula 1-p… or p – 1 …” etc., but flipping through the slides page by page to find the answer of a question would waste too much time. At the end you do not have enough time to finish all of the questions. Bear in mind that you still need to study hard!

Course Outcome

On successful completion of this course, students are expected to be able to:

(1)

Design and implement a complete and functional search engine.

(2)

Test and evaluate the effectiveness of a search engine.

(3)

Identify the limitations of search engine technologies and develop solutions to meet application requirements.

Course Outline

1.   Introduction and course overview

6.     Retrieval effectiveness, benchmarking

2.   Business models

7.     Document preprocessing

3.   Information retrieval models and Inverted Files

8.     Query expansion and relevance feedback

4.   Web-based information retrieval

9.     Applications: text summarization

5.   Pattern matching and extended Boolean model

10.  Applications: recommendation systems

Text and Reference Materials

Course Description

Text retrieval models, vector space model, document ranking, performance evaluation; indexing, pattern matching, relevance feedback, clustering; web search engines, authority-based ranking; enterprise data management, content creation, metadata, taxonomy, ontology; semantic web, digital libraries and knowledge management applications.

Course Objective

After completing the course, students will have acquired:

  1. Core techniques for building search engines
  2. Technologies and business models employed in modern web-based search engines
  3. Hands-on experience in building a complete web-based search engine including spider, data storage and search modules
  4. Knowledge in the future trends and applications of information retrieval Web and Enterprise applications and digital libraries.

Pre-requisites/Background needed: COMP 151/151H (prior to 2009-10) or COMP 171/171H (prior to 2009-10) or COMP 2012/2012H

Policy on Academic Misconduct

Homework/lab assignments must be done individually. Collaboration between students is strictly forbidden. Any violation will be passed to the Department's Undergraduate/Postgraduate Studies Committee for assessment. The result may lead to dismissal from the University.

Term project must be done by the individual group. No sharing of code and copying of code from previous projects are allowed.