More about HKUST
Towards Medical Image Understanding and Interpretation: From Anatomy to Clinical Insights
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Towards Medical Image Understanding and Interpretation: From Anatomy to Clinical Insights" By Mr. Haibo JIN Abstract: Medical image interpretation is a cornerstone of modern healthcare, enabling accurate diagnosis and precise clinical decision-making. However, this process remains time- consuming, prone to variability, and constrained by the growing demand for radiological expertise. Artificial intelligence (AI) offers a promising solution, yet existing approaches often address medical image analysis tasks in isolation, limiting their clinical applicability. This thesis presents a unified framework for automated medical image interpretation, advancing from anatomical understanding to diagnostic reasoning through deep learning innovations. First, we address anatomical structure understanding by developing semi-supervised landmark detection methods that reduce reliance on labeled data. Our approach leverages self-training with domain adaptation and a task-level curriculum to refine pseudo-labels, improving scalability and generalization across datasets. Next, we tackle automated report generation with PromptMRG, a novel framework that enhances diagnostic accuracy by using disease classification outputs as prompts for the text decoder. Cross-modal feature retrieval and an adaptive loss function further improve performance, integrating prior clinical knowledge and addressing class imbalance problem, respectively. To enable fine-grained clinical reasoning, we introduce Chain of Diagnosis (CoD), a framework that integrates large language models (LLMs) for accurate and explainable report generation. By simulating radiologist workflows through diagnostic question- answering (QA) pairs, our method ensures clinically accurate descriptions of diagnosed disease and lesion attributes. Moreover, a diagnosis grounding module aligns generated text with evidence while a lesion grounding module localizes abnormalities for improved workflow efficiency. Collectively, this work bridges the gap between AI research and clinical needs, delivering scalable, interpretable, and robust solutions for medical image analysis. We conclude by outlining future directions, including generalist diagnostic model, multi- agent collaborative refinement, and universal multi-organ model to further advance the field. Date: Tuesday, 20 May 2025 Time: 2:00pm - 4:00pm Venue: Room 2128B Lift 19 Chairman: Prof. David Chuen Chun LAM (MAE) Committee Members: Dr. Hao CHEN (Supervisor) Dr. Long CHEN Dr. Xiaomin OUYANG Dr. Terence Tsz Wai WONG (CBE) Prof. Jin QIN (PolyU)