News
2025/09/05: Registration code has been sent to the waitlist student who are approved. If you prefer to audit the class, please contact TA.
2025/08/22: If you are NTU students and would like to register the class, please come to the class and add yourself to the waitlist on Sept. 3rd. The waitlist link will be provided then.
2025/08/22: If you are NTU students and would like to register the class, please come to the class and add yourself to the waitlist on Sept. 3rd. The waitlist link will be provided then.
Introduction
This course examines the transformative impact of deep learning in computer vision. Students will establish a solid foundation while engaging with state-of-the-art techniques, ranging from convolutional neural networks (CNNs), Transformers, diffusion models, and 3D vision to multimodal large language models (MLLMs). The course emphasizes the design of model architectures, training methodologies, and real-world applications. Through hands-on projects and critical theoretical discussions, students will develop comprehensive expertise in designing, training, and optimizing deep learning models for complex visual tasks. Ultimately, the course prepares students for advanced research and professional careers in the rapidly evolving field of computer vision.
Goals
This course introduces students to cutting-edge research, beginning with the fundamentals of deep learning and extending to its latest advances in vision applications. Students are expected to master key concepts such as neural network architectures, training methodologies, and performance optimization strategies. A central component of the course is the final project, designed to foster critical thinking and problem-solving skills, while preparing students to contribute to frontier research or address real-world challenges. Active participation, completion of hands-on projects, and engagement in theoretical discussions will be essential for success in this course.
Syllabus
Week |
Date |
Topic |
Course Materials |
Remarks |
1 |
09/03 |
Course Logistics; Intro to Neural Nets |
||
2 |
09/10 |
Convolutional Neural Networks & Self-Supervised Learning |
HW #1 out |
|
3 |
09/17 |
Image Segmentation & Object Detection |
||
4 |
09/24 |
Generative Models (I) - AE, VAE, & Diffusion Model (I) |
HW #1 due |
|
5 |
10/01 |
Generative Models (II) - Diffusion Model (II), GAN |
HW #2 out |
|
6 |
10/8 |
Recurrent Neural Networks & Transformer (I) |
||
7 |
10/15 |
Transformer (II); Vision & Language Models |
||
8 |
10/22 |
ICCV week |
HW #2 due HW #3 out |
|
9 |
10/29 |
Guest Lecture: ision Language Action Models (VLA) & Reasoning by Dr. Fu-En Yang, NVIDIA |
||
10 |
11/05 |
Guest Lecture: Toward Efficient LLM Inference by Prof. Kai-Chiang Wu, NYCU Project Sponsor Introduction: PicCollage |
|
|
11 |
11/12 |
Guest Lecture: 3D Vision by Dr. Sheng-Yu Huang, NTU/NVIDIA |
HW #3 due HW #4 out |
|
12 |
11/19 |
Guest Lecture: Linda Huang, Senior Dir., GeValyn Associates |
||
13 |
11/26 |
Advanced Topics in DLCV; Guest Talk: Dr. Trista Chen, Microsoft |
Final Project Announcement |
|
14 |
12/03 |
Guest Lecture (TBD) |
HW #4 due; NeurIPS week |
|
15 |
12/10 |
Advanced topics in Foundation & World Models |
||
16 |
12/22 Mon |
Final Project Presentation |
Sponsor: PicCollage |
Contacts
Teaching Assistants
|
Hsi-Che Lin
MK-514 TA Hours: Wed. 16:30 ~ 17:20 |
Bing-Yi Yang
MK-514 TA Hours: Mon. 16:30 ~ 17:20 |
Kenneth Yang
MK-514 TA Hours: Mon. 16:30 ~ 17:20 |
|
Chang-Hsun Wu
MK-514 TA Hours: Thu. 16:30 ~ 17:20 |
Kuan-Yi Lee
MK-514 TA Hours: Tue. 16:30 ~ 17:20 |
Ching-Yu Hsu
MK-514 TA Hours: Tue. 16:30 ~ 17:20 |
|
Ting-Hsun Chi
MK-514 TA Hours: Tue. 16:30 ~ 17:20 |
Chi-Tun Hsu
MK-514 TA Hours: Mon. 16:30 ~ 17:20 |
Han-Jun Ko
MK-514 TA Hours: Thu. 16:30 ~ 17:20 |