COMP5421, Spring 2024


COMPUTER VISION, a new perspective in the year of 2024
 


Professor Long QUAN, CYT G009B

phone: 2358-7018, quan@cse.ust.hk
                             

http://www.cs.ust.hk/~quan/comp5421/index.html

 

Teaching Assistant:  

 

Lecture room and time: 

Lecture room CYT G009B, Tuesday and Thursday from 1:30-3:00pm.


Course description:

In this abrupt changing epoch, this course offers an updated introduction and perspective to the current beyond modern computer vision fundamental developments, which is at the core of the recent artificial intelligence developments and achivements. It covers a deterministic geometry approach for vision reconstruction, and also a probabilistic and machine learning introduction to the supervised and self-supervised visual learning and generation.

The content is challenging and only for motivated and mature students.

Course outlines

1.      Introduction

2.      Visual features

3.      Convolutional networks

4.      Self-supervised generative methods

5.      Vision geometry

6.      3D reconstruction

7.      Perspective

There will be two course projects,  which are carried out by a group of two students, and the group must be different grouping for the two different projects.

There will be a final exam which will be a long closed-book hand-writing exam.


Tentative schedules and notes:   

 

Week

Date

Topics/Notes

Remarks

1

1 Feb

What is computer vision?

(This first lecture will be very short for the course organization.)

Start to read the first classicals.

2

6 Feb
 

8 Feb

Introduction to computer vision.

 

What are visual features?

 

Edges and Canny detector.

Point Features and SIFT

 

read the chapter 1 of the book ‘Vision’ by David Marr, 1982.

https://www.cse.ust.hk/~quan/comp5421/notes/marr.vision.chapter1.pdf

 

read ‘Chapter 4’ (8 pages) Feature Point by Long Quan, 2011

https://www.cse.ust.hk/~quan/comp5421/notes/chapter4-longQuan.pdf

 

This part is also to review the classical and the deterministic views of a visual representation of image as a continuous function f(x) or more exactly f(x,y), or just one x(u), which is the object of study. The mathematical tools are classical signal processing, functional analysis with Fourier, wavelets, sparse sensing, and the PDE and scale-space analysis.

 

In this context, it is important to fully understand why and how we approach the traditional low-level vision tasks of filtering, edge detection, and de-noising in the traditional mathematical and engineering framework.

 

The first lecture will be on edges:

 

read ‘A computational approach to edge detection’ by John Canny,  1986, PAMI

https://www.cse.ust.hk/~quan/comp5421/notes/canny1986.pdf

 

www.cse.ust.hk/~quan/comp5421/notes/edge.ppt

 

The second lecture will be on scale-space:

 

3

13 Feb (off)

15 Feb

 

 

 

4

20 Feb
22  Feb

In search of visual features with discriminative approaches

 

A few basic tools

read ‘Distinctive Image Features for Scale-Invariant Keypoints’ by David Lowe, 2004, IJCV

https://www.cse.ust.hk/~quan/comp5421/notes/lowe-ijcv2004.pdf

 

https://www.cse.ust.hk/~quan/comp5421/notes/features.ppt

 

https://www.cse.ust.hk/~quan/comp5421/notes/cnn.pdf

 

 

5

27 Feb
29 Feb

Convolutional neural networks

 

 

The features are representations and everything, and they are the visual features that could be learned with a convolutional neural network in a supervised framework.

 

read ‘Gradient-based learning applied to document recognition’ by LeCun et al. 1998

https://www.cse.ust.hk/~quan/comp5421/notes/Lecun98.pdf

 

read ‘ImageNet classification with deep convolutional neural networks’ Krizhevsky et al. 2012

https://www.cse.ust.hk/~quan/comp5421/notes/alexnet2012.pdf

 

read  ‘Deep residual learning for image recognition’ by He et al. 2015

https://www.cse.ust.hk/~quan/comp5421/notes/resnet2015.pdf

 

A few important machine learning and statistical topics and methodology are to be revisited throughout the super-vised learning development and discussed in depth.

 

6

5 March
7 March

Super-vised Visual classification and recognition

 

7

12 March
14 March

Object detection and semantic segmentation

read U-net paper

U-net: Convolutional Networks for Biomedical Image Segmentation Ronneberger et al. 2015

https://arxiv.org/abs/1505.04597

8

19 March
21 March

Generative and Sampling

 

Read Yang Song’s dissertation on Learning to generate data by estimating gradients of the data distribution.

 

probabilistic modeling

estimation, learning

inference

generation

 

read the diffusion paper

‘Deep Unsupervised Learning using Nonequilibrium Thermodynamics’ by Jascha Sohl-Dickstein et al. 2015

 

/https://arxiv.org/pdf/1503.03585.pdf

 

read the paper

the power of depth for feedforward neural networks

/https://proceedings.mlr.press/v49/eldan16.pdf

 

https://www.cse.ust.hk/~quan/comp5421/notes/song-yang-thesis-submit-augmented.pdf

9

26 March

Mid break and Easter week

28 March (off)

Random Sampling

 

 

 

 

 

read the diffusion summary slides by Bortoli

 

/https://vdeborto.github.io/project/generative_modeling/session_3.pdf

10

2 April (off)
4 April (off)

9 April

11 April

The second project is released.

The presentations of the first project of supervised CNN.

11

16 April
18 April (break)

23 April (break)

25 April

MCMC Monte Carlo Markov Chaines

Markov Chaines and discrete diffusion

Continuous Langevin diffusion

 

12

23 April

25 April

On 25 April, Back to the deterministic and low-dimensional geometry for 3D reconstruction

 

3D reconstruction beyond recognition

 

Basic geometric concepts Projective space

 

Transformations, Similarities and Euclidean geometry

 

 

 

https://www.cse.ust.hk/~quan/comp5421/notes/geom.pptx

(geom.pptx is from intro.ppt)

read ‘lecture notes’ Chapters 2 and 3 by Long Quan, 2011.

https://www.cse.ust.hk/~quan/comp5421/notes/chap2-3-2015.pdf

 

 

https://www.cse.ust.hk/~quan/comp5421/notes/single.ppt

 

 

 

13

30 April

2 May

What is a camera,

and where is it?

 

Single view geometry.

Two-view geometry

https://www.cse.ust.hk/~quan/comp5421/notes/two.ppt

https://www.cse.ust.hk/~quan/comp5421/notes/three.ppt

 

14

7 May

9 May

Robust geometry estimation 3D reconstruction

 

New perspectives

 

 

SFM, dense reconstruction, surface triangulation and refinement

https://www.cse.ust.hk/~quan/comp5421/notes/reconstruction2019.pptx

 

22 May

Final exam, closed book of long hours.

Room 2465 (lift 25/26), 4:30pm-8:30pm

 


 


 

Course projects:

Comp5412 contains two required course projects. The first project of visual recognition should be completed around the midterm period of the course. The second project of 3D reconstruction should be completed around the end of the course. The projects should be demonstrated to the TAs and be briefly presented in the course.

The group could be an individual or consists of at most two members.

Project 1 Visual Supervised Recognition

We will choose to recognize outdoor scenes with a CNN based semantic segmentation of the outdoor images.

We will create a simple data set for training and testing. The TAs will prepare the unlabeled data set, and also the labeled data for the testing, not known to the students. Each group will label a small set of data according to the requirement, then all labeled data will be shared by all groups. After that, each group will choose to implement the semantic segmentation pipeline.

Not necessarily very big networks and large databases, and not necessarily the performance, but would focus on the justifications and the understanding through the small scale experimentations. Creative design of a specific experiment that could better reveal the understanding of the CNN, the statistical and mathematical justifications of the CNN, and the visualization of intermediate results and interpretations if possible.

Some ideas in Zhang et al. 2016 Understanding deep learning requires rethinking generalization can be used to design experiments.

Project 2 Visual Self-supervised generation

 
 
 
 
 
Area in which course can be counted:

VG

Background:

The equivalent prerequisites in linear algebra (eg. COMP3211 knowledge in linear algebra),  in object-oriented programming (eg. COMP2012 object-oriented programming), algorithm design and analysis (eg. COMP171,  COMP271) are required.  Basic knowledge in image processing and machine learning is helpful.

Course outline/content (by major topics):

1. Introduction
2. Visual features and descriptors (low level feature detection and description)
3.Visual recognition and CNN
4. Vision geometry (mid-level geometry, projective geometry, cameras, and 3D reconstruction)
5. Visual recognition (high-level object recognition and image understanding).
6. Perspective.

Reference books: 

* Image-based Modeling, Long Quan, 2010, Springer.
*
Three-Dimensional Computer Vision, O. Faugeras, MIT Press, 1993
* The Geometry of Multiple Images, Faugeras, Luong,  and Papadoupolo
* The Multi-View Geometry, Hartley and Zisserman
* Robot Vision, B.K.P. Horn, MIT Press, 1986
* Computer Vision, D. Ballard and C. Brown, Prentice-Hall, 1982
* Vision, David Marr, Freeman, 1982
* Computer Vision, A Modern Approach, D. Forsyth and J. Ponce

Grading scheme:

The supervised visual recognition project: X%
The self-supevised visual project: Y%
Final Exam (written): Z%