COMP5421, Spring
2025
COMPUTER VISION, yet
another newer perspective in the year of 2025
Professor Long QUAN
quan@cse.ust.hk
phone: 2358-7018
office: 3506
http://www.cs.ust.hk/~quan/comp5421/index.html
Teaching
Assistant:
Dehao HAO
Kuan LI
Lecture room and
time:
Lecture room 2404, Wednesday and Friday from 3:00-4:20pm.
Course description:
In this abrupt changing epoch, this course offers an updated introduction and perspective to the current computer vision fundamental developments, which is at the core of the recent artificial intelligence developments and achievments. It covers a deterministic geometry approach for visual recognition and reconstruction, and goes to a probabilistic introduction to and foundation of the supervised and self-supervised visual learning and generation.
The content is challenging, and only reserved
for truly motivated post-graduate students and exceptionally mature
undergraduate students.
Course outlines
1. Introduction
2. Visual features
3. Convolutional networks
4. Self-supervised generative methods
5. Vision geometry
6. 3D reconstruction
7. Perspective
There will be one course project by a group of two students.
, but there will be two stages of the project, so it becomes a kind of two projects, but the two are continued, the second stage is at the advanced level.
There will be a final exam which will be a long closed-book hand-writing exam.
Tentative schedules and
notes:
Week |
Date |
Topics/Notes |
Remarks
and reading materials |
1 |
5 Feb 7 Feb |
What is computer vision? Introduction to (the classical) computer vision, and a brief historical review of computer vision. |
Start to read the first classicals. read the chapter 1 of the book ‘Vision’ by David Marr, 1982. https://www.cse.ust.hk/~quan/comp5421/notes/marr.vision.chapter1.pdf read ‘Chapter 4’ (8 pages) Feature Point by Long Quan, 2011 https://www.cse.ust.hk/~quan/comp5421/notes/chapter4-longQuan.pdf This part
is also to review the classical and the deterministic views of a visual
representation of image as a continuous function f(x) or more exactly f(x,y), or just one x(u), which is
the object of study. The mathematical tools are classical signal processing,
functional analysis with Fourier, wavelets, sparse sensing, and the PDE and
scale-space analysis. In this
context, it is important to fully understand why and
how we approach the traditional low-level vision tasks of filtering, edge
detection, and de-noising in the traditional mathematical, engineering and AI
framework. |
2 |
12
Feb 14
Feb (reschedule) |
What are visual features? Edges and Canny detector. |
The
first topical lecture will be on edges: read
‘A computational approach to edge detection’ by John Canny, 1986, PAMI https://www.cse.ust.hk/~quan/comp5421/notes/canny1986.pdf www.cse.ust.hk/~quan/comp5421/notes/edge.ppt we
see how an edge detector brought into the signal processing framework as a filter
is much like a neuron with linear operators (convolution and derivation) in
functional spaces followed by the nonlinear activation function of the thresholding, also paving the way towards manual bank of
filters, and later learned filters. |
3 |
19 Feb 21 Feb |
Point Features and SIFT (local) (pixel) point matching, and (global) image matching, early stage of visual recognition and understanding Global features to measure the ‘distance’ between images |
The
second topical lecture will be on scale-space: read ‘Distinctive Image
Features for Scale-Invariant Keypoints’ by David
Lowe, 2004, IJCV https://www.cse.ust.hk/~quan/comp5421/notes/lowe-ijcv2004.pdf https://www.cse.ust.hk/~quan/comp5421/notes/features.ppt we
will see the three important concepts: the first is the importance of the
scale in images and in any sciences. The second is the more systematic
emergence of the descriptors very much like word embeddings
in languages, yet it is still manual and small in 128 dimensions, which paves
the way for more systematic general visual encoding. The
descriptors are still local, using local distributions. Lastly,
the point feature is the geometry point of 3D vision. Matching
points, key points. More global features, matching global images, image retrievals from image databases Read ‘A Metric for Distributions with Applications to Image Databases’ by Rubner et al. 1998, Earth Mover’s Distances EMD https://www.cse.ust.hk/~quan/comp5421/notes/rubnerIccv98.pdf |
4 |
26 Feb |
In
search of learned visual features with discriminative approaches |
https://www.cse.ust.hk/~quan/comp5421/notes/cnn.pdf The features are representations and everything, and they are
the visual features that could be learned with a
convolutional neural network in a supervised framework. read ‘Gradient-based learning
applied to document recognition’ by LeCun et al.
1998 https://www.cse.ust.hk/~quan/comp5421/notes/Lecun98.pdf read ‘ImageNet classification
with deep convolutional neural networks’ Krizhevsky
et al. 2012 https://www.cse.ust.hk/~quan/comp5421/notes/alexnet2012.pdf |
5 |
5
March 7
March |
Convolutional neural networks Super-vised Visual classification and recognition Visual ‘segmentation’ and U-net Semantic segmentation and object detection |
read ‘Deep residual learning for image
recognition’ by He et al. 2015 https://www.cse.ust.hk/~quan/comp5421/notes/resnet2015.pdf A few
important machine learning and statistical topics and methodology are to be revisited throughout the super-vised learning development
and discussed in depth. (
In addition
to visual ‘classification’ by CNN, there is also one important visual task of
‘segmentation’. It is per pixel, or
the importance is that the output is not labels y, yet another image. This is
the appearance of the U-net architecture which does
have some generative elements with encoding and decoding nature in its
architecture. read U-net paper U-net: Convolutional Networks
for Biomedical Image Segmentation Ronneberger et
al. 2015 https://arxiv.org/abs/1505.04597 Segmentation
U-net is Convolutional U-net, is a bridge between classification and
generation. It is not surprising to see that the first diffusion
implementation worked on a U-net architecture. Going to
unsupervised visual learning with a probabilistic view |
6 |
12
March |
Global descriptions, texture synthesis, generative nature of images From supervised to un-supervised learning and generative approach vs discriminative Unsupervised learning |
Read these papers for preparation. Kingma’s ‘Auto-Encoding Variational
Bayes’ by Kingma and Welling, 2013 https://www.cse.ust.hk/~quan/comp5421/notes/vae-kingma.pdf read the diffusion paper ‘Deep Unsupervised Learning using Nonequilibrium Thermodynamics’ by Jascha Sohl-Dickstein et al. 2015 /https://arxiv.org/pdf/1503.03585.pdf Read ‘Denoising Diffusion Probabilistic Models’ (DDPM) by Ho et al. https://arxiv.org/abs/2006.11239 Read Yang Song’s dissertation on Learning to
generate data by estimating gradients of the data distribution. https://www.cse.ust.hk/~quan/comp5421/notes/song-yang-thesis-submit-augmented.pdf Some basic concepts from information theory for understanding the probabilistic nature of generation in high-dimensions: Probability distributions, entropy, typical sets, sampling, and Monte Carlo |
7 |
19 March |
Probabilistic modeling |
Discussions on variational autoencoder VAE A systematic way of modeling and approximating the distribution
p(x), and the classical maximum likelihood of parameter estimation approach
to the parameterized distribution p_theta(x) in
different ways ranging from the energy based methods
to ‘flow’ based methods via autoregressive for LLM and latent space in VAE. Diffusion |
8 |
26 March 28 March (Mid term
progress report) Room 2404 from 3pm to 4h20pm, Room 2504 from 4h30pm to 6pm |
Mid term progress reporting and discussions |
|
9 |
Mid
break week 9
April 11
April |
Generative and Sampling MCMC Monte Carlo Markov Chaines, Markov Chaines and discrete diffusion, Continuous Langevin diffusion Transportation |
Read ‘Flow Matching for Generative Modeling’ by Lipman et al. ‘Flow matching tutorial’ Neurips 2024, Flow matching for generative modeling Flow-based approaches |
10 |
16 April
|
Back to the deterministic and low-dimensional geometry for 3D reconstruction Basic geometric concepts Projective space Transformations, Similarities and Euclidean geometry |
https://www.cse.ust.hk/~quan/comp5421/notes/geom.pptx (geom.pptx
is from intro.ppt) read ‘lecture notes’ Chapters 2 and 3 by Long Quan, 2011. https://www.cse.ust.hk/~quan/comp5421/notes/chap2-3-2015.pdf |
11 |
23
April
|
What is a camera, and where is it? Single view geometry. |
https://www.cse.ust.hk/~quan/comp5421/notes/single.ppt |
12 |
30
April 2
May |
Two-view geometry |
https://www.cse.ust.hk/~quan/comp5421/notes/two.ppt https://www.cse.ust.hk/~quan/comp5421/notes/three.ppt |
13 |
7
May 9
May |
Robust
geometry estimation 3D reconstruction New
perspectives |
SFM, dense reconstruction, surface
triangulation and refinement https://www.cse.ust.hk/~quan/comp5421/notes/reconstruction2019.pptx |
|
|||
|
Final
exam, closed book of long hours. Room
2465 (lift 25/26), 4:30pm-8:30pm |
|
Course projects:
Visual generations of images or 3D
objects
https://dhoho2002.github.io/GenVision/comp5421.html
There will be one course project by a group of two students.
But there will be two stages of the project, so it becomes a kind of two projects, but the two are continued, the second stage is at the advanced level.
Area in which course can
be counted:
VG
Background:
The equivalent prerequisites in linear algebra (eg. COMP3211 knowledge in linear algebra), in object-oriented programming (eg. COMP2012 object-oriented programming), algorithm design and analysis (eg. COMP171, COMP271) are required. Basic knowledge in image processing and machine learning is helpful.
Course outline/content
(by major topics):
1. Introduction
2. Visual features and descriptors (low level feature
detection and description)
3.Visual recognition and CNN
4. Vision geometry (mid-level geometry, projective geometry, cameras, and 3D
reconstruction)
5. Visual recognition (high-level object recognition
and image understanding).
6. Perspective.
Reference books:
* Image-based Modeling, Long Quan, 2010, Springer.
* Three-Dimensional Computer Vision, O. Faugeras,
MIT Press, 1993
* The Geometry of Multiple Images, Faugeras, Luong, and Papadoupolo
* The Multi-View Geometry, Hartley and Zisserman
* Robot Vision, B.K.P. Horn, MIT Press, 1986
* Computer Vision, D. Ballard and C. Brown, Prentice-Hall, 1982
* Vision, David Marr, Freeman, 1982
* Computer Vision, A Modern Approach, D. Forsyth and J. Ponce
Grading scheme:
The first stage report of the project: X%
The second stage report of the project: Y%
Mid term exam (written): Z1%
Final Exam (written): Z2%