More about HKUST
Towards Medical Image Understanding and Interpretation: From Anatomy to Clinical Insights
PhD Thesis Proposal Defence
Title: "Towards Medical Image Understanding and Interpretation: From Anatomy
to Clinical Insights"
by
Mr. Haibo JIN
Abstract:
Automated analysis of medical images plays a pivotal role in modern
healthcare, enabling faster and more accurate diagnoses while supporting
clinical decision-making. Despite significant advancements, the field
continues to face several challenges, including the need for high precision
in detecting subtle abnormalities, reducing reliance on manual annotations,
ensuring robust generalization across diverse patient populations and imaging
modalities, and providing interpretable results that align with clinical
expectations. Addressing these challenges is crucial for the successful
integration of AI-driven systems into clinical practice and for improving
patient outcomes. This thesis explores a range of deep learning methodologies
aimed at advancing the automated understanding and interpretation of medical
images in a progressive and systematic manner.
The first part of this research focuses on anatomical structure
understanding, which involves the accurate identification and delineation of
key anatomical features, such as organs and critical landmarks. To develop
models in a label-efficient manner, we propose a semi-supervised landmark
detection method based on self-training. This approach leverages estimated
pseudo-labels for training while mitigating noise through a task-level
curriculum. By effectively utilizing additional unlabeled data, the method
consistently improves landmark detection performance, offering a scalable
solution for anatomical analysis.
Next, we tackle the challenge of medical image interpretation through
automated report generation. We introduce PromptMRG, a novel framework
designed to enhance the diagnostic accuracy of generated reports. This
framework utilizes diagnostic results from a disease classification branch
as prompts to guide the text decoder, ensuring that the generated reports
are both clinically relevant and diagnostically precise.
To further advance report generation, we integrate a large language model
(LLM) for fine-grained reasoning. Inspired by the Chain of Thought approach,
we propose Chain of Diagnosis, a framework that maintains a structured
diagnostic process to generate clinically accurate and explainable reports.
The framework begins by generating question-answer (QA) pairs through
diagnostic conversations to extract key findings, such as disease presence,
lesion location, and severity. These QA pairs are then used to prompt the
LLM, enabling precise descriptions of disease attributes. To enhance
explainability, we design a diagnosis grounding module that aligns QA-based
diagnoses with generated sentences, providing reference-driven explanations.
Additionally, a lesion grounding module is introduced to localize
abnormalities in the image, further improving radiologist workflow
efficiency.
Collectively, the approaches presented in this thesis advance the field of
medical image analysis by delivering scalable, reliable, and interpretable
solutions that bridge the gap between research innovations and clinical
needs. Finally, we discuss potential future directions to further propel the
field forward, including the development of universal models, multimodal
data integration, longitudinal analysis, and the use of synthetic data.
Date: Wednesday, 19 March 2025
Time: 2:00pm - 4:00pm
Venue: Room 2408
Lifts 17/18
Committee Members: Dr. Hao Chen (Supervisor)
Dr. Dan Xu (Chairperson)
Dr. Xiaomin Ouyang