COMP5221 Natural Language Processing, Spring 2024, HKUST

Dekai Wu |

Course organization



All lectures and tutorials will be held ONLINE LIVE INTERACTIVELY at the regularly scheduled times.

You can find the recurring Zoom meetings for the lectures and tutorials in Canvas. You are highly recommended to join the meetings from there. Note that these Zoom meetings only admit authenticated users with ITSC accounts (with domain or You can only join the meetings via either of the two paths above.You must register for the lectures and tutorials at the following links. After registering, you will receive a confirmation email containing information about joining the meeting.

After you are registered, you may use the following links to join the lectures and tutorials:

If you haven’t done so, please watch this video to get your HKUST Zoom account ready as soon as possible, not just for this course but also for all other courses at HKUST:

Times and places

Lecture 1: WF 13:30-14:50, Rm 6591 (Lift 31-32).

Office hours: W 15:00-16:00. The TA's office hours are posted at


Course: is the master home page for the course.

Tutorial: contains all information for the tutorials.

Forum: is where all discussion outside class should be done. Always read before asking/posting/emailing your question. Note that you must register for your account at the first lecture, tutorial, or lab.


Abbreviated course catalog description

COMP 5221. Language modeling from basics to LLMs. Techniques for parsing, interpretation, context modeling, generation. How neural and statistical approaches interact with linguistic constraints. Applications include machine translation, dialogue chatbots, cognitive modeling, and knowledge acquisition

Course description

Human language technology for processing text and spoken language. Fundamental machine learning, syntactic parsing, semantic interpretation, and context models, algorithms, and techniques. Applications include machine translation, web technologies, text mining, knowledge management, cognitive modeling, intelligent dialog systems, and computational linguistics.

Learning objectives

At the end of the Natural Language Processing course, you will have achieved the following outcomes.

  1. General
    1. Possess solid understanding of the fundamental concepts of natural language processing
    2. Possess solid understanding of the fundamental concepts of language modeling, interpretation, and translation, and grasp how it stress tests all aspects of human intelligence and language processing
  2. Transduction
    1. Know foundational input-output formulations of transduction, such as alignment, chunking, classification, dependency relations, and parsing
    2. Understand the relationship between noisy channel and loglinear models of string transduction, and their Bayesian interpretations
  3. Syntax
    1. Understand the relationship between word segmentation and phrasal lexicons, the relationship to transduction and alignment, and associated algorithms
    2. Understand the relationship between traditional grammatical formalisms versus stochastic and weighted grammars
    3. Understand the strengths and weaknesses of part-of-speech models, and associated tagging algorithms
    4. Understand the various fundamental approaches to parsing, and how they deal with syntactic ambiguity
  4. Alignment
    1. Understand how bilingual models of syntax generalize upon monolingual models to improve learnability
    2. Understand the combinatorial and empirical trade-offs between various learning models of alignment and compositionality, and their associated algorithms
    3. Understand the core methods for inducing lexicons, translation lexicons, phrasal translation lexicons, as well as permutation and reordering models
  5. Decoding
    1. Understand the combinatorial and empirical trade-offs between various runtime models for translation, and their associated algorithms
    2. Understand how bilingual transduction models generalize upon monolingual parsing models
  6. Semantics
    1. Understand lexical semantics models for word sense disambiguation, their relationship to phrasal lexicons and transduction, and associated ambiguity resolution algorithms
    2. Understand lexical semantics models for semantic frames (predicate-argument structures), and associated semantic role labeling algorithms



Honor policy

To receive a passing grade, you are required to sign an honor statement acknowledging that you understand and will uphold all policies on plagiarism and collaboration.


All materials submitted for grading must be your own work. You are advised against being involved in any form of copying (either copying other people's work or allowing others to copy yours). If you are found to be involved in an incident of plagiarism, you will receive a failing grade for the course and the incident will be reported for appropriate disciplinary actions.

University policy requires that students who cheat more than once be expelled. Please review the cheating topic from your UST Student Guide.

Warning: sophisticated plagiarism detection systems are in operation!


You are encouraged to collaborate in study groups. However, you must write up solutions on your own. You must also acknowledge your collaborators in the write-up for each problem, whether or not they are classmates. Other cases will be dealt with as plagiarism.


Course grading will be adjusted to the difficulty of assignments and exams. Moreover, I guarantee you the following.

Grade guarantees
If you achieve 85% you will receive at least a A grade.
75% B
65% C
55% D

Your grade will be determined by a combination of factors:

Grade weighting
Exams 0% (due to university coronovirus meaures)
Pop quizzes ~10%
Class participation ~15%
Forum participation ~10%
Assignments ~65%


No reading material is allowed during the examinations. No make-ups will be given unless prior approval is granted by the instructor, or you are in unfavorable medical condition with physician's documentation on the day of the examination. In addition, being absent at the final examination results in automatic failure of the course according to university regulations, unless prior approval is obtained from the department head.


Science and engineering (including software engineering!) is about communication between people. Good participation in class will count for approximately 15%, and good participation in the online forum will count for approximately 10%.


All assignments must be submitted by 23:00 on the due date. Assignments will be collected electronically using the automated CASS assignment collection system. Late assignments cannot be accepted. Sorry, in the interest of fairness, exceptions cannot be made.

Scheme programming assignments must run under Chicken Scheme on Linux.

Programming assignments will account for a total of approximately 65%.

Required readings

Any linked material (unless labeled "Supplementary references") is required reading that you are responsible for.


Topics will be recorded below.


date wk event topic
20240130 1 Lecture Welcome; introduction; survey; administrivia (honor statement, HKUST classroom conduct)