Towards Medical Image Understanding and Interpretation: From Anatomy to Clinical Insights

PhD Thesis Proposal Defence


Title: "Towards Medical Image Understanding and Interpretation: From Anatomy 
to Clinical Insights"

by

Mr. Haibo JIN


Abstract:

Automated analysis of medical images plays a pivotal role in modern 
healthcare, enabling faster and more accurate diagnoses while supporting 
clinical decision-making. Despite significant advancements, the field 
continues to face several challenges, including the need for high precision 
in detecting subtle abnormalities, reducing reliance on manual annotations, 
ensuring robust generalization across diverse patient populations and imaging 
modalities, and providing interpretable results that align with clinical 
expectations. Addressing these challenges is crucial for the successful 
integration of AI-driven systems into clinical practice and for improving 
patient outcomes. This thesis explores a range of deep learning methodologies 
aimed at advancing the automated understanding and interpretation of medical 
images in a progressive and systematic manner.

The first part of this research focuses on anatomical structure 
understanding, which involves the accurate identification and delineation of 
key anatomical features, such as organs and critical landmarks. To develop 
models in a label-efficient manner, we propose a semi-supervised landmark 
detection method based on self-training. This approach leverages estimated 
pseudo-labels for training while mitigating noise through a task-level 
curriculum. By effectively utilizing additional unlabeled data, the method 
consistently improves landmark detection performance, offering a scalable 
solution for anatomical analysis.

Next, we tackle the challenge of medical image interpretation through 
automated report generation. We introduce PromptMRG, a novel framework 
designed to enhance the diagnostic accuracy of generated reports. This 
framework utilizes diagnostic results from a disease classification branch 
as prompts to guide the text decoder, ensuring that the generated reports 
are both clinically relevant and diagnostically precise.

To further advance report generation, we integrate a large language model 
(LLM) for fine-grained reasoning. Inspired by the Chain of Thought approach, 
we propose Chain of Diagnosis, a framework that maintains a structured 
diagnostic process to generate clinically accurate and explainable reports. 
The framework begins by generating question-answer (QA) pairs through 
diagnostic conversations to extract key findings, such as disease presence, 
lesion location, and severity. These QA pairs are then used to prompt the 
LLM, enabling precise descriptions of disease attributes. To enhance 
explainability, we design a diagnosis grounding module that aligns QA-based 
diagnoses with generated sentences, providing reference-driven explanations. 
Additionally, a lesion grounding module is introduced to localize 
abnormalities in the image, further improving radiologist workflow 
efficiency.

Collectively, the approaches presented in this thesis advance the field of 
medical image analysis by delivering scalable, reliable, and interpretable 
solutions that bridge the gap between research innovations and clinical 
needs. Finally, we discuss potential future directions to further propel the 
field forward, including the development of universal models, multimodal 
data integration, longitudinal analysis, and the use of synthetic data.


Date:                   Wednesday, 19 March 2025

Time:                   2:00pm - 4:00pm

Venue:                  Room 2408
                        Lifts 17/18

Committee Members:      Dr. Hao Chen (Supervisor)
                        Dr. Dan Xu (Chairperson)
                        Dr. Xiaomin Ouyang