Towards Medical Image Understanding and Interpretation: From Anatomy to Clinical Insights

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Towards Medical Image Understanding and Interpretation: From Anatomy 
to Clinical Insights"

By

Mr. Haibo JIN


Abstract:

Medical image interpretation is a cornerstone of modern healthcare, enabling 
accurate diagnosis and precise clinical decision-making. However, this 
process remains time- consuming, prone to variability, and constrained by 
the growing demand for radiological expertise. Artificial intelligence (AI) 
offers a promising solution, yet existing approaches often address medical 
image analysis tasks in isolation, limiting their clinical applicability. 
This thesis presents a unified framework for automated medical image 
interpretation, advancing from anatomical understanding to diagnostic 
reasoning through deep learning innovations.

First, we address anatomical structure understanding by developing 
semi-supervised landmark detection methods that reduce reliance on labeled 
data. Our approach leverages self-training with domain adaptation and a 
task-level curriculum to refine pseudo-labels, improving scalability and 
generalization across datasets.

Next, we tackle automated report generation with PromptMRG, a novel 
framework that enhances diagnostic accuracy by using disease classification 
outputs as prompts for the text decoder. Cross-modal feature retrieval and 
an adaptive loss function further improve performance, integrating prior 
clinical knowledge and addressing class imbalance problem, respectively.

To enable fine-grained clinical reasoning, we introduce Chain of Diagnosis 
(CoD), a framework that integrates large language models (LLMs) for accurate 
and explainable report generation. By simulating radiologist workflows 
through diagnostic question- answering (QA) pairs, our method ensures 
clinically accurate descriptions of diagnosed disease and lesion attributes. 
Moreover, a diagnosis grounding module aligns generated text with evidence 
while a lesion grounding module localizes abnormalities for improved 
workflow efficiency.

Collectively, this work bridges the gap between AI research and clinical 
needs, delivering scalable, interpretable, and robust solutions for medical 
image analysis. We conclude by outlining future directions, including 
generalist diagnostic model, multi- agent collaborative refinement, and 
universal multi-organ model to further advance the field.


Date:                   Tuesday, 20 May 2025

Time:                   2:00pm - 4:00pm

Venue:                  Room 2128B
                        Lift 19

Chairman:               Prof. David Chuen Chun LAM (MAE)

Committee Members:      Dr. Hao CHEN (Supervisor)
                        Dr. Long CHEN
                        Dr. Xiaomin OUYANG
                        Dr. Terence Tsz Wai WONG (CBE)
                        Prof. Jin QIN (PolyU)