Computer Science 621G - Computer Vision and Image-Based Modeling and Rendering

COMP5421, Spring 2025

COMPUTER VISION, yet another newer perspective in the year of 2025

Professor Long QUAN

quan@cse.ust.hk

phone: 2358-7018

office: 3506

http://www.cs.ust.hk/~quan/comp5421/index.html

Teaching Assistant:

Dehao HAO

dhaoab@connect.ust.hk

Kuan LI

klibs@cse.ust.hk

Lecture room and time:

Lecture room 2404, Wednesday and Friday from 3:00-4:20pm.

Course description:

In this abrupt changing epoch, this course offers an updated introduction and perspective to the current computer vision fundamental developments, which is at the core of the recent artificial intelligence developments and achievments. It covers a deterministic geometry approach for visual recognition and reconstruction, and goes to a probabilistic introduction to and foundation of the supervised and self-supervised visual learning and generation.

The content is challenging, and only reserved for truly motivated post-graduate students and exceptionally mature undergraduate students.

Course outlines

1. Introduction

2. Visual features

3. Convolutional networks

4. Self-supervised generative methods

5. Vision geometry

6. 3D reconstruction

7. Perspective

There will be one course project by a group of two students.

, but there will be two stages of the project, so it becomes a kind of two projects, but the two are continued, the second stage is at the advanced level.

There will be a final exam which will be a long closed-book hand-writing exam.

Tentative schedules and notes:

Week	Date	Topics/Notes	Remarks and reading materials
1	5 Feb 7 Feb	What is computer vision? Introduction to (the classical) computer vision, and a brief historical review of computer vision.	Start to read the first classicals. read the chapter 1 of the book ‘Vision’ by David Marr, 1982. https://www.cse.ust.hk/~quan/comp5421/notes/marr.vision.chapter1.pdf read ‘Chapter 4’ (8 pages) Feature Point by Long Quan, 2011 https://www.cse.ust.hk/~quan/comp5421/notes/chapter4-longQuan.pdf This part is also to review the classical and the deterministic views of a visual representation of image as a continuous function f(x) or more exactly f(x,y), or just one x(u), which is the object of study. The mathematical tools are classical signal processing, functional analysis with Fourier, wavelets, sparse sensing, and the PDE and scale-space analysis. In this context, it is important to fully understand why and how we approach the traditional low-level vision tasks of filtering, edge detection, and de-noising in the traditional mathematical, engineering and AI framework.
2	12 Feb 14 Feb (reschedule)	What are visual features? Edges and Canny detector.	The first topical lecture will be on edges: read ‘A computational approach to edge detection’ by John Canny, 1986, PAMI https://www.cse.ust.hk/~quan/comp5421/notes/canny1986.pdf www.cse.ust.hk/~quan/comp5421/notes/edge.ppt we see how an edge detector brought into the signal processing framework as a filter is much like a neuron with linear operators (convolution and derivation) in functional spaces followed by the nonlinear activation function of the thresholding, also paving the way towards manual bank of filters, and later learned filters.
3	19 Feb 21 Feb	Point Features and SIFT (local) (pixel) point matching, and (global) image matching, early stage of visual recognition and understanding Global features to measure the ‘distance’ between images	The second topical lecture will be on scale-space: read ‘Distinctive Image Features for Scale-Invariant Keypoints’ by David Lowe, 2004, IJCV https://www.cse.ust.hk/~quan/comp5421/notes/lowe-ijcv2004.pdf https://www.cse.ust.hk/~quan/comp5421/notes/features.ppt we will see the three important concepts: the first is the importance of the scale in images and in any sciences. The second is the more systematic emergence of the descriptors very much like word embeddings in languages, yet it is still manual and small in 128 dimensions, which paves the way for more systematic general visual encoding. The descriptors are still local, using local distributions. Lastly, the point feature is the geometry point of 3D vision. Matching points, key points. More global features, matching global images, image retrievals from image databases Read ‘A Metric for Distributions with Applications to Image Databases’ by Rubner et al. 1998, Earth Mover’s Distances EMD https://www.cse.ust.hk/~quan/comp5421/notes/rubnerIccv98.pdf
4	26 Feb 28 Feb	In search of learned visual features with discriminative approaches	https://www.cse.ust.hk/~quan/comp5421/notes/cnn.pdf The features are representations and everything, and they are the visual features that could be learned with a convolutional neural network in a supervised framework. read ‘Gradient-based learning applied to document recognition’ by LeCun et al. 1998 https://www.cse.ust.hk/~quan/comp5421/notes/Lecun98.pdf read ‘ImageNet classification with deep convolutional neural networks’ Krizhevsky et al. 2012 https://www.cse.ust.hk/~quan/comp5421/notes/alexnet2012.pdf
5	5 March 7 March	Convolutional neural networks Super-vised Visual classification and recognition Visual ‘segmentation’ and U-net Semantic segmentation and object detection	read ‘Deep residual learning for image recognition’ by He et al. 2015 https://www.cse.ust.hk/~quan/comp5421/notes/resnet2015.pdf A few important machine learning and statistical topics and methodology are to be revisited throughout the super-vised learning development and discussed in depth. (~~read the paper~~ ~~the power of depth for feedforward neural networks~~ ~~/https://proceedings.mlr.press/v49/eldan16.pdf~~) In addition to visual ‘classification’ by CNN, there is also one important visual task of ‘segmentation’. It is per pixel, or the importance is that the output is not labels y, yet another image. This is the appearance of the U-net architecture which does have some generative elements with encoding and decoding nature in its architecture. read U-net paper U-net: Convolutional Networks for Biomedical Image Segmentation Ronneberger et al. 2015 https://arxiv.org/abs/1505.04597 Segmentation U-net is Convolutional U-net, is a bridge between classification and generation. It is not surprising to see that the first diffusion implementation worked on a U-net architecture. Going to unsupervised visual learning with a probabilistic view https://www.cse.ust.hk/~quan/comp5421/notes/generative.pdf
6	12 March 14 March	Global descriptions, texture synthesis, generative nature of images From supervised to un-supervised learning and generative approach vs discriminative Unsupervised learning	Read these papers for preparation. Kingma’s ‘Auto-Encoding Variational Bayes’ by Kingma and Welling, 2013 https://www.cse.ust.hk/~quan/comp5421/notes/vae-kingma.pdf read the diffusion paper ‘Deep Unsupervised Learning using Nonequilibrium Thermodynamics’ by Jascha Sohl-Dickstein et al. 2015 /https://arxiv.org/pdf/1503.03585.pdf Read ‘Denoising Diffusion Probabilistic Models’ (DDPM) by Ho et al. https://arxiv.org/abs/2006.11239 Read Yang Song’s dissertation on Learning to generate data by estimating gradients of the data distribution. https://www.cse.ust.hk/~quan/comp5421/notes/song-yang-thesis-submit-augmented.pdf Some basic concepts from information theory for understanding the probabilistic nature of generation in high-dimensions: Probability distributions, entropy, typical sets, sampling, and Monte Carlo
7	19 March 21 March	Probabilistic modeling	Discussions on variational autoencoder VAE A systematic way of modeling and approximating the distribution p(x), and the classical maximum likelihood of parameter estimation approach to the parameterized distribution p_theta(x) in different ways ranging from the energy based methods to ‘flow’ based methods via autoregressive for LLM and latent space in VAE. Diffusion
8	26 March 28 March (Mid term progress report) Room 2404 from 3pm to 4h20pm, Room 2504 from 4h30pm to 6pm		Mid term progress reporting and discussions
9	~~2 April~~ (off) ~~4 April~~ (off , Ching Ming Festival) Mid break week 9 April 11 April	Generative and Sampling MCMC Monte Carlo Markov Chaines, Markov Chaines and discrete diffusion, Continuous Langevin diffusion Transportation	~~(Some basic concepts Read the diffusion summary slides by Bortoli, in a continuous formulation~~ ~~/https://vdeborto.github.io/project/generative_modeling/session_3.pdf~~ ) Read ‘Flow Matching for Generative Modeling’ by Lipman et al. ‘Flow matching tutorial’ Neurips 2024, Flow matching for generative modeling Flow-based approaches
10	16 April ~~18 April~~ (off good Friday)	Back to the deterministic and low-dimensional geometry for 3D reconstruction Basic geometric concepts Projective space Transformations, Similarities and Euclidean geometry	https://www.cse.ust.hk/~quan/comp5421/notes/geom.pptx (geom.pptx is from intro.ppt) read ‘lecture notes’ Chapters 2 and 3 by Long Quan, 2011. https://www.cse.ust.hk/~quan/comp5421/notes/chap2-3-2015.pdf
11	23 April 25 April (Iclr2025)	What is a camera, and where is it? Single view geometry.	https://www.cse.ust.hk/~quan/comp5421/notes/single.ppt
12	30 April 2 May	Two-view geometry	https://www.cse.ust.hk/~quan/comp5421/notes/two.ppt https://www.cse.ust.hk/~quan/comp5421/notes/three.ppt
13	7 May 9 May	Robust geometry estimation 3D reconstruction New perspectives	SFM, dense reconstruction, surface triangulation and refinement https://www.cse.ust.hk/~quan/comp5421/notes/reconstruction2019.pptx

		Final exam, closed book of long hours. G010, CYT Bldg 4:30pm-8:30pm Sat 24 May 2025

Course projects:

Visual generations of images or 3D objects

https://dhoho2002.github.io/GenVision/comp5421.html

There will be one course project by a group of two students.

But there will be two stages of the project, so it becomes a kind of two projects, but the two are continued, the second stage is at the advanced level.

Area in which course can be counted:

Background:

The equivalent prerequisites in linear algebra (eg. COMP3211 knowledge in linear algebra), in object-oriented programming (eg. COMP2012 object-oriented programming), algorithm design and analysis (eg. COMP171, COMP271) are required. Basic knowledge in image processing and machine learning is helpful.

Course outline/content (by major topics):

1. Introduction
2. Visual features and descriptors (low level feature detection and description)
3.Visual recognition and CNN
4. Vision geometry (mid-level geometry, projective geometry, cameras, and 3D reconstruction)
5. Visual recognition (high-level object recognition and image understanding).
6. Perspective.

Reference books:

* Image-based Modeling, Long Quan, 2010, Springer.
* Three-Dimensional Computer Vision, O. Faugeras, MIT Press, 1993
* The Geometry of Multiple Images, Faugeras, Luong, and Papadoupolo
* The Multi-View Geometry, Hartley and Zisserman
* Robot Vision, B.K.P. Horn, MIT Press, 1986
* Computer Vision, D. Ballard and C. Brown, Prentice-Hall, 1982
* Vision, David Marr, Freeman, 1982
* Computer Vision, A Modern Approach, D. Forsyth and J. Ponce

Grading scheme:

The first stage report of the project: 35%
The second stage report of the project: 30%

~~Mid term~~ ~~exam (written): Z1%~~
Final Exam (written): 35%