COMP5421, Spring
2024
COMPUTER VISION, a new
perspective in the year of 2024
Professor Long QUAN, CYT G009B
phone: 2358-7018, quan@cse.ust.hk
http://www.cs.ust.hk/~quan/comp5421/index.html
Teaching
Assistant:
Lecture room and
time:
Lecture room CYT G009B,
Tuesday and Thursday from 1:30-3:00pm.
Course description:
In this abrupt changing epoch, this course offers an updated introduction and perspective to the current beyond modern computer vision fundamental developments, which is at the core of the recent artificial intelligence developments and achivements. It covers a deterministic geometry approach for vision reconstruction, and also a probabilistic and machine learning introduction to the supervised and self-supervised visual learning and generation.
The content is challenging and only for motivated and mature students.
Course outlines
1. Introduction
2. Visual
features
3. Convolutional
networks
4. Self-supervised
generative methods
5. Vision
geometry
6. 3D
reconstruction
7. Perspective
There will be two course projects, which are carried out by a group of two students, and the group must be different grouping for the two different projects.
There will be a final exam which will be a long closed-book hand-writing exam.
Tentative schedules and
notes:
Week |
Date |
Topics/Notes |
Remarks |
|||||
1 |
1 Feb |
What is computer vision? (This first lecture will be very short for the course organization.) |
Start to read the first classicals. |
|||||
2 |
6
Feb 8
Feb |
Introduction to computer vision. What are visual features? Edges and Canny detector. Point Features and SIFT |
read the chapter 1 of the book ‘Vision’ by David Marr, 1982. https://www.cse.ust.hk/~quan/comp5421/notes/marr.vision.chapter1.pdf read
‘Chapter 4’ (8 pages) Feature Point by Long Quan, 2011 https://www.cse.ust.hk/~quan/comp5421/notes/chapter4-longQuan.pdf This part
is also to review the classical and the deterministic views of a visual
representation of image as a continuous function f(x) or more exactly f(x,y), or just one x(u), which is
the object of study. The mathematical tools are classical signal processing,
functional analysis with Fourier, wavelets, sparse sensing, and the PDE and
scale-space analysis. In this
context, it is important to fully understand why and
how we approach the traditional low-level vision tasks of filtering, edge
detection, and de-noising in the traditional mathematical and engineering
framework. The
first lecture will be on edges: read
‘A computational approach to edge detection’ by John Canny, 1986, PAMI https://www.cse.ust.hk/~quan/comp5421/notes/canny1986.pdf www.cse.ust.hk/~quan/comp5421/notes/edge.ppt The
second lecture will be on scale-space: |
|||||
3 |
15 Feb |
|
||||||
4 |
20 Feb |
In
search of visual features with discriminative approaches A few basic tools |
read ‘Distinctive Image
Features for Scale-Invariant Keypoints’ by David
Lowe, 2004, IJCV https://www.cse.ust.hk/~quan/comp5421/notes/lowe-ijcv2004.pdf https://www.cse.ust.hk/~quan/comp5421/notes/features.ppt https://www.cse.ust.hk/~quan/comp5421/notes/cnn.pdf |
|||||
5 |
27 Feb |
Convolutional neural networks |
The features are representations and everything, and they are
the visual features that could be learned with a
convolutional neural network in a supervised framework. read ‘Gradient-based learning
applied to document recognition’ by LeCun et al.
1998 https://www.cse.ust.hk/~quan/comp5421/notes/Lecun98.pdf read ‘ImageNet classification
with deep convolutional neural networks’ Krizhevsky
et al. 2012 https://www.cse.ust.hk/~quan/comp5421/notes/alexnet2012.pdf read ‘Deep residual learning for image
recognition’ by He et al. 2015 https://www.cse.ust.hk/~quan/comp5421/notes/resnet2015.pdf A few
important machine learning and statistical topics and methodology are to be revisited throughout the super-vised learning development and
discussed in depth. |
|||||
6 |
5
March |
Super-vised Visual classification and recognition |
|
|||||
7 |
12 March |
Object
detection and semantic segmentation |
read U-net paper U-net: Convolutional Networks
for Biomedical Image Segmentation Ronneberger et
al. 2015 https://arxiv.org/abs/1505.04597 |
|||||
8 |
19 March |
Generative and Sampling Read Yang Song’s dissertation on Learning to
generate data by estimating gradients of the data distribution. |
probabilistic modeling estimation, learning inference generation read the diffusion paper ‘Deep Unsupervised Learning using Nonequilibrium
Thermodynamics’ by Jascha Sohl-Dickstein
et al. 2015 /https://arxiv.org/pdf/1503.03585.pdf read the paper the power of depth for feedforward neural networks /https://proceedings.mlr.press/v49/eldan16.pdf https://www.cse.ust.hk/~quan/comp5421/notes/song-yang-thesis-submit-augmented.pdf |
|||||
9 |
26
March Mid
break and Easter week
|
Random
Sampling |
read the diffusion summary
slides by Bortoli /https://vdeborto.github.io/project/generative_modeling/session_3.pdf |
|||||
10 |
9 April 11 April |
The
second project is released. The
presentations of the first project of supervised CNN. |
||||||
11 |
16
April
|
MCMC Monte Carlo Markov Chaines Markov Chaines and discrete diffusion Continuous Langevin diffusion |
|
|||||
12 |
23
April 25
April |
On 25 April, Back to the deterministic and low-dimensional
geometry for 3D reconstruction 3D reconstruction beyond recognition Basic geometric concepts Projective space Transformations, Similarities and Euclidean geometry |
https://www.cse.ust.hk/~quan/comp5421/notes/geom.pptx (geom.pptx
is from intro.ppt) read ‘lecture notes’ Chapters 2 and 3 by Long Quan, 2011. https://www.cse.ust.hk/~quan/comp5421/notes/chap2-3-2015.pdf https://www.cse.ust.hk/~quan/comp5421/notes/single.ppt |
|||||
13 |
30
April 2
May |
What
is a camera, and where is it? Single
view geometry. Two-view
geometry |
https://www.cse.ust.hk/~quan/comp5421/notes/two.ppt https://www.cse.ust.hk/~quan/comp5421/notes/three.ppt
|
|||||
14 |
7
May 9
May |
Robust
geometry estimation 3D reconstruction New
perspectives |
SFM,
dense reconstruction, surface triangulation and refinement https://www.cse.ust.hk/~quan/comp5421/notes/reconstruction2019.pptx |
|||||
|
22
May |
Final
exam, closed book of long hours. Room
2465 (lift 25/26), 4:30pm-8:30pm |
|
|||||
Course projects:
Comp5412 contains two required course projects. The first project of visual recognition should be completed around the midterm period of the course. The second project of 3D reconstruction should be completed around the end of the course. The projects should be demonstrated to the TAs and be briefly presented in the course.
The group could be an individual or consists of at most two members.
Project 1 Visual Supervised Recognition
We will choose to recognize outdoor scenes with a CNN based semantic
segmentation of the outdoor images.
We will create a simple data set for training and testing. The TAs will
prepare the unlabeled data set, and also the labeled
data for the testing, not known to the students. Each group will label a small
set of data according to the requirement, then all labeled
data will be shared by all groups. After that, each group will choose to
implement the semantic segmentation pipeline.
Not necessarily very big networks and large databases, and not necessarily the performance, but would focus on the justifications and the understanding through the small scale experimentations. Creative design of a specific experiment that could better reveal the understanding of the CNN, the statistical and mathematical justifications of the CNN, and the visualization of intermediate results and interpretations if possible.
Some ideas in Zhang et al. 2016 Understanding deep learning requires rethinking generalization can be used to design experiments.
Project 2 Visual Self-supervised
generation
Area in which course can
be counted:
VG
Background:
The equivalent prerequisites in linear algebra (eg. COMP3211 knowledge in linear algebra), in object-oriented programming (eg. COMP2012 object-oriented programming), algorithm design and analysis (eg. COMP171, COMP271) are required. Basic knowledge in image processing and machine learning is helpful.
Course outline/content
(by major topics):
1. Introduction
2. Visual features and descriptors (low level feature
detection and description)
3.Visual recognition and CNN
4. Vision geometry (mid-level geometry, projective geometry, cameras, and 3D
reconstruction)
5. Visual recognition (high-level object recognition
and image understanding).
6. Perspective.
Reference books:
* Image-based Modeling, Long Quan, 2010, Springer.
* Three-Dimensional Computer Vision, O. Faugeras,
MIT Press, 1993
* The Geometry of Multiple Images, Faugeras, Luong, and Papadoupolo
* The Multi-View Geometry, Hartley and Zisserman
* Robot Vision, B.K.P. Horn, MIT Press, 1986
* Computer Vision, D. Ballard and C. Brown, Prentice-Hall, 1982
* Vision, David Marr, Freeman, 1982
* Computer Vision, A Modern Approach, D. Forsyth and J. Ponce
Grading scheme:
The supervised visual recognition project: X%
The self-supevised visual
project: Y%
Final Exam (written): Z%