More about HKUST
Towards Medical Image Understanding and Interpretation: From Anatomy to Clinical Insights
PhD Thesis Proposal Defence Title: "Towards Medical Image Understanding and Interpretation: From Anatomy to Clinical Insights" by Mr. Haibo JIN Abstract: Automated analysis of medical images plays a pivotal role in modern healthcare, enabling faster and more accurate diagnoses while supporting clinical decision-making. Despite significant advancements, the field continues to face several challenges, including the need for high precision in detecting subtle abnormalities, reducing reliance on manual annotations, ensuring robust generalization across diverse patient populations and imaging modalities, and providing interpretable results that align with clinical expectations. Addressing these challenges is crucial for the successful integration of AI-driven systems into clinical practice and for improving patient outcomes. This thesis explores a range of deep learning methodologies aimed at advancing the automated understanding and interpretation of medical images in a progressive and systematic manner. The first part of this research focuses on anatomical structure understanding, which involves the accurate identification and delineation of key anatomical features, such as organs and critical landmarks. To develop models in a label-efficient manner, we propose a semi-supervised landmark detection method based on self-training. This approach leverages estimated pseudo-labels for training while mitigating noise through a task-level curriculum. By effectively utilizing additional unlabeled data, the method consistently improves landmark detection performance, offering a scalable solution for anatomical analysis. Next, we tackle the challenge of medical image interpretation through automated report generation. We introduce PromptMRG, a novel framework designed to enhance the diagnostic accuracy of generated reports. This framework utilizes diagnostic results from a disease classification branch as prompts to guide the text decoder, ensuring that the generated reports are both clinically relevant and diagnostically precise. To further advance report generation, we integrate a large language model (LLM) for fine-grained reasoning. Inspired by the Chain of Thought approach, we propose Chain of Diagnosis, a framework that maintains a structured diagnostic process to generate clinically accurate and explainable reports. The framework begins by generating question-answer (QA) pairs through diagnostic conversations to extract key findings, such as disease presence, lesion location, and severity. These QA pairs are then used to prompt the LLM, enabling precise descriptions of disease attributes. To enhance explainability, we design a diagnosis grounding module that aligns QA-based diagnoses with generated sentences, providing reference-driven explanations. Additionally, a lesion grounding module is introduced to localize abnormalities in the image, further improving radiologist workflow efficiency. Collectively, the approaches presented in this thesis advance the field of medical image analysis by delivering scalable, reliable, and interpretable solutions that bridge the gap between research innovations and clinical needs. Finally, we discuss potential future directions to further propel the field forward, including the development of universal models, multimodal data integration, longitudinal analysis, and the use of synthetic data. Date: Wednesday, 19 March 2025 Time: 2:00pm - 4:00pm Venue: Room 2408 Lifts 17/18 Committee Members: Dr. Hao Chen (Supervisor) Dr. Dan Xu (Chairperson) Dr. Xiaomin Ouyang