COMP 6511B: Advanced Software Testing (Spring 2024)
Lecture Details
Instructor: Dongdong She (dongdong@cse.ust.hk)
Office hours: Tuesday (1:30-2:30 pm) Room 3505
TA: Xunguang Wang (xwanghm@cse.ust.hk)
TA Office hours: TBD
Classroom: Room 2503
Class hours: Monday and Wednesday (12:00-1:20 pm)
Course Description
Software vulnerabilities profoundly impact our daily lives, from global ransomware attacks to various sensitive information leakage. Software testing is a program analysis technique to discover these vulnerabilities. This course will cover classic software techniques such as fuzzing, symbolic execution, and formal methods. The latest trend of leveraging machine learning (i.e., LLM, ChatGPT) to assist software testing and neural-symbolic software testing are also included.
Course Goal
The general goal of this course is to help you gain a solid understanding of software testing techniques such as fuzzing, symbolic execution, and formal methods. You can also learn the most recent works of software testing techniques and ML-based software testing.
Course Format and Student Workload
This course will center around paper readings, presentations, and discussions; and a final project. The course readings include a list of research papers selected from top-tier security, software engineering and systems conferences. We will discuss roughly one or two papers every class meeting. For the in-depth discussions to be possible, you will have to read the papers carefully before class.
You have three main responsibilities in the course:
- Read the assigned papers
carefully, before class. One of the main goals of the course
is to have interesting in-class discussions so that students can
hopefully understand the topics better. This goal is reflected in
grading: 40% of the total grade will come from class
participations: this includes talking in class, as well as how you
do on quizzes and presentations. To truly understand a paper, I recommend you read each paper at least three
times: twice very carefully, the last time focusing on the hard
parts. You should also form reading groups and discuss the papers
before class. Reading and thoroughly understanding a paper is not
easy; you may find the reading advice on
the advice section below helpful.
- Present some of the papers. Students sign up to
present papers to the class. The key is that you need
to really understand the paper and come up with a good way
to explain it. The presentation should covers 1) Background; 2) Problem formulation; 3) Survey of existing work; 4) Methodology; 5) Evaluation; 6) Strengths and Weakness; 7) Takeaways
Student presenters must send draft slides three
days before class to get feedback from the teaching staff.
Presenting well is not easy; you may find presentation the advice
on the advice section below helpful.
- Complete the final project. The final project is
essentially a mini-research project that may involve building a
new software testing tool, designing a new algorithm, improving an existing
technique, or performing a large case study. You are encouraged to
come up with a topic of your own, which I'll help refine;
alternatively, you can choose one of the projects I suggest.
Prerequisite
COMP 3633 Principles of Cybersecurity, COMP 4211 Machine Learning (optional), or equivalents of these two courses.
Grading
- Class participation - 40%
- In-class discussion - 20%
- Paper presentation - 20%
- Final project - 60%
- Project proposal - 15%
- Midterm demo - 15%
- Final report - 30%
Schedule
Date
|
Topics
|
Lecture slides & Reading
|
31/01
|
Introduction
|
slides
|
05/02
|
Real-world security bugs
|
slides, additional reading: heartbleed,
gotofail,
DirtyCOW,
Debian randomness fiasco.
|
07/02
|
Control flow analysis
|
slides, additional reading: Control Flow Analysis, Using llvm to view CFG (Slide 6). |
14/02
|
Data flow analysis
|
slides, additional reading: Data Flow Analysis.
|
19/02
|
Symbolic execution
|
slides, additional reading: Symbolic Execution for Software Testing: Three Decades Later (Cadar and Sen)
KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs (Cadar et al.)
CUTE: A Concolic Unit Testing Engine for C (Sen et al.)
|
21/02
|
Symbolic execution (cntd.)
|
additional reading: DART: Directed Automated Random Testing (Godfroid et al.)
Symbolic execution and program testing (King et al.)
|
26/02
|
Fuzzing
|
slides, additional reading:
The fuzzing book. Part II: Lexical Fuzzing
|
28/02
|
Fuzzing (cntd.)
|
additional reading: Fuzzing: The State of the Art (McNally et al.)
|
04/03
|
Taint analysis
|
slides, additional reading: Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software
All You Ever Wanted to Know About
Dynamic Taint Analysis and Forward Symbolic Execution
|
06/03
|
Formal methods
|
|
11/03
|
Paper reading: ML-based fuzzing
|
slides, additional reading:
Learn&Fuzz: Machine Learning for Input Fuzzing
NEUZZ: Efficient Fuzzing with Neural Program Smoothing
|
13/03
|
Paper reading: RL-based fuzzing
|
Deep Reinforcement Fuzzing
Effectively Generating Vulnerable Transaction Sequences in Smart Contracts with Reinforcement Learning-guided Fuzzing
|
18/03
|
Paper reading: ML-based symbolic execution
|
Learning to Explore Paths for Symbolic Execution
Enhancing Symbolic Execution by Machine
Learning Based Solver Selection
|
20/03
|
Midterm Demo
|
|
|
25/03
|
Paper reading:Neural-symbolic execution
|
Neuro-Symbolic Execution: Augmenting Symbolic Execution with Neural Constraints
|
27/03
|
Paper reading:Neural-symbolic fuzzing
|
Large Language Model guided
Protocol Fuzzing
|
08/04
|
Paper reading:ML-based compiler fuzzing
|
Compiler Fuzzing through Deep Learning
|
10/04
|
Paper reading:LLM-based compiler fuzzing
|
Fuzz4All: Universal Fuzzing with Large Language Models
|
15/04
|
Paper reading:ML-based smart contract fuzzing
|
Learning to Fuzz from Symbolic Execution with Application to Smart Contracts
|
17/04
|
Paper reading:LLM-based smart contract fuzzing
|
LLM4Fuzz: Guided Fuzzing of Smart Contracts with Large Language Models
|
22/04
|
Paper reading:ML-based autonomous vehicles fuzzing
|
Neural Network Guided Evolutionary Fuzzing for Finding Traffic Violations of Autonomous Vehicles
|
24/04
|
Paper reading:LLM-based autonomous vehicles fuzzing
|
Guided Conditional Diffusion for Controllable Traffic Simulation
|
29/04
|
Paper reading:LLM-generated fuzzer harness
|
FUDGE: Fuzz Driver Generation at Scale
Fuzz target generation using LLMs
|
06/05
|
Paper reading:Jailbreak LLM using fuzzing
|
GPTFUZZER: Red Teaming Large Language Models
with Auto-Generated Jailbreak Prompts
|
08/05
|
Project presentation
|
|
Advice
- Reading
Read
this
article about how to read a research paper. The take-home
message is that until you can answer a bunch of questions, you are
not done reading a paper. William lists a number of important
questions. I would add two: 1) What are the re-usable
principles/tricks/algorithms presented in this paper? 2) What is the
(authors') insight that drives the research? A system research paper
often has a bunch of novel tricks. I believe the more such things
you have in your toolbox, the more likely you can come up with an
elegant/novel system design.
- Presentation
Some advice on how to give a good presentation:
- Be very selective about the talk contents. It's almost
always the case that you have more stuff than your audience can
understand within a short amount of time. You should thus be very
selective about what to include in your talk. What is the important
thing about your proposal? What is neat, unusual, interesting to a
listener? Figure it out, and say it in the talk, more than once.
Do not try to include everything in your talk.
- Repeat the key points. Don't expect your listener to always
follow your talk. It's a good idea to repeat and highlight the key
points several times, for example, once at the beginning, once when
you actually present them, and once at the end. Make sure your
listener won't miss the most important stuff of your talk.
- Use an outline. A good way to keep your audience with
you is to use an outline slide to describe the structure of your
talk. I typically present an outline slide after the introduction
of a talk. Then, as I go from one section to another, I may show
the outline slide again, to let the audience know where we are.
- Get the timing right. Each content slide typically takes
1-3 minutes. Thus, for a 10 minutes talk, do not have more than
6-7 slides of real content! Note the title slide and outline
slides do not count in this total because they take little time to
present.
- Use visual aids, but do not abuse them. Pictures,
animations are sometimes very handy at explaining complex ideas.
However, use them only on the most important stuff; otherwise,
they'll distract your listener.
- Above all else, practice, practice, practice. Practicing
is the real key to give a good talk. I find it much more useful
to practice aloud than to murmur to myself. If you can, try to
give the talk in front of other people. Practicing is certainly
the only way to get timing right.
Some online advice from others:
Read
this
paper about things to avoid when giving a talk
Read
this paper
about how to give a good conference talk. Many ideas apply to the
mini-talk you'll give.
- Writing
Some suggested readings (to make you a better writer):
Read
this paper
about how to write a technical paper. Many ideas apply to writing
proposals as well.
Read this paper
about how to write sentences, paragraphs, etc.
And, of course, read Strunk and
White. Many times.