Research and Technology Forum 2025

Introduction
The Research and Technology Forum (RTF) 2025, hosted by the Department of Computer Science and Engineering, is designed to enhance collaboration between academia and industry. Our focus is on showcasing the innovative research areas that our department is exploring and to provide industry partners with insights into potential collaboration opportunities with HKUST.
Participants will have the chance to engage with faculty members who are leading groundbreaking research projects across various domains, from artificial intelligence and machine learning to cybersecurity and data analytics. The forum will feature presentations and discussions that highlight our ongoing projects, demonstrating how they address real-world challenges and contribute to technological advancement. The event will serve as a platform for networking, allowing industry representatives to connect with our researchers and explore avenues for partnership. By fostering dialogue between academia and industry, we aim to drive innovation and create impactful solutions that benefit society.
Join us for this exciting opportunity to learn more about the cutting-edge research at HKUST and discover how we can work together to shape the future of technology.
Event Details [Poster]
Date: April 30, 2025 (Wednesday)
Time: 10:30 - 14:00 (Registration starts at 10:10)
Venue: Lam Woo Lecture Theater (LT-B),
The Hong Kong University of Science and Technology (location map)
Video: How to get to Lam Woo
Lecture Theater (LT-B) from the North Gate of HKUST
Registration Deadline: April 15, 2025 (Tuesday)
A notification will be sent via email to confirm the registration.
Program Rundown
Time | Event Rundown |
---|---|
10:10 | Registration Starts |
10:30 - 10:45 | Opening Remarks and Introduction Overview of CSE Labs and Research Highlights By Prof. Xiaofang ZHOU, Head of Department of CSE |
10:45 - 11:00 |
Presentation
by the Office of
Knowledge Transfer (OKT), HKUST By Dr. Jacob MOK, Head (AI and Electronics) of OKT By Mr. Wenlei ZHUANG, Head (Entrepreneur in Residence for Engineering Technology) of OKT |
11:00 - 13:00 | Research and Project Presentation By CSE Professors |
13:00 - 14:00 |
Poster Exhibition and AI Project Demonstration (and Refreshment Session) By CSE Postgraduate & Undergraduate Students |
Remarks
- All are welcome. Online registration is required.
- Free admission.
- Each registration admits one person only.
Title and Abstract -
Research and Project Presentations by CSE Professors
Harnessing Large AI Models for Transforming Healthcare
The rapid advancements in large AI models are poised to revolutionize healthcare, offering unprecedented opportunities to enhance diagnosis, treatment, and patient outcomes. From accelerating drug discovery and personalizing medicine to improving operational efficiency and predictive analytics, these models are unlocking new frontiers in medical innovation. This talk explores the transformative potential of large AI models in healthcare, highlighting real-world applications, challenges, and ethical considerations. Join us to discover how cutting-edge AI technologies can reshape the future of healthcare, empowering providers and patients alike to achieve better, faster, and more equitable care.
Scaling Human-Centric Trustworthy Foundation Model Reasoning
In recent years, multimodal large language models have achieved remarkable progress, excelling across diverse tasks and demonstrating impressive few-shot learning capabilities. However, ensuring these models align with principles of trustworthiness, robustness, and human-centric reasoning remains an open challenge. In this talk, we present a roadmap for enhancing foundation models’ reasoning capabilities, with a focus on improving their knowledge boundary awareness and reasoning robustness through instruction tuning. Beyond textual modality, we introduce two benchmarks, VLM²-Bench and V²R-Bench, for evaluating large vision-language models (LVLMs)' ability to visually link matching cues and assessing the robustness of LVLMs to fundamental visual variations, respectively. We conclude with a discussion on scalable self-supervised learning approaches and emerging research directions that promise to make foundation models more interpretable, resilient, and aligned with human reasoning, paving the path towards AI agents with advanced reasoning capabilities that operate reliably in dynamic, real-world environments.
SimpleRL: On Emerging Reasoning of Open Models in the Wild
Recently, DeepSeek-R1 has shown that long chain-of-thought (CoT) reasoning can naturally emerge through a simple reinforcement learning (RL) framework with rule-based rewards alone, where the training may directly start from the base models—a paradigm referred to as zero training. While DeepSeek-R1 achieved these results using a 600B model and relatively large-scale data, several critical questions remain unanswered: (1) Can smaller, various base models exhibit similar emergent reasoning with limited, simple data? (2) Does increased CoT length always lead to increased cognitive phenomena such as self-reflection (i.e., the "aha moment")? (3) What are the essential designs that enable the emergence of long CoT? In this talk, we take mathematical reasoning as an example and try to answer these questions. We conduct extensive zero-training experiments across various base models, spanning different families and sizes, including LLama3.1-8B, Mistral-7B/24B, DeepSeekMath-7B, and Qwen2.5-0.5B/1.5B/7B/14B/32B. Through several key insights and designs such as varying the format reward and controlling the query difficulty, we successfully produce significant improvements in both reasoning accuracy and CoT length across all settings. We share all the key findings, insights, and practices in this talk.
Designing, Developing, and Democratizing Guidance for Visual Analytics
Given the scale and complexity of today's datasets, people face a critical challenge during visual data analysis: an overwhelming number of decisions. These decisions, made while interacting with data—about what, where, when, and how to visualize—all influence the overall quality of the analysis process and outcomes. For instance, a user may need to decide which attributes to visualize, whether to apply a filter, or which chart type best represents the data. Despite their importance, these decisions often lack intelligent system support, which can lead to biased interpretations, missed insights, and flawed conclusions. In this talk, I will demonstrate guidance-enriched visual analytics tools wherein AI-powered systems and human-users "guide" each other during analysis, increasing user awareness of biased analytic behaviors thereby aiding decision making, while facilitating system learning. I will also demonstrate an open-source library of intelligent UI controls that helps visualization developers build their own guidance-enriched tools, thereby broadening access.
Advancing AI in IoT Systems for Smart Health
Nowadays Internet of Things (IoT) and mobile devices are all around our daily lives, offering the capabilities of in-situ sensing and networked computing across connected devices. However, deploying artificial intelligence (AI) on resource-constrained devices and in real-world applications still faces many practical challenges. Our work aims to empower IoT and mobile devices with AI to create ambient intelligence, facilitating real-world applications, particularly in smart health. First, I will introduce how we address real-world challenges in advancing AI within practical IoT systems, such as distributed and imperfect IoT data, limited resources, and system dynamics. Next, I will discuss our work in mobile health systems and applications. We design and deploy end-to-end systems that leverage AI and sensor devices to detect behavioral biomarkers of chronic diseases like Alzheimer's Disease in natural living environments. Additionally, I will talk about how we can leverage large language models to generate personalized health interventions based on multimodal sensor data collected by mobile devices.
Knowledge Understanding for Data Analytics
Data analysis involves data understanding, data modelling, and output generation via user feedback for knowledge understanding. In this talk, we focus on four components in data analytics: data understanding, data visualization, recommendation system and user feedback. Data understanding is to analyze the dataset structure for better data analysis. Data visualization is to display the results in a visualized form such as bar charts and pie charts. Recommendation system is to recommend some interesting records to the user. User feedback is a component in data analysis so that the outputs generated by the system could meet expectations from the users.
Clearblue: X-ray for enterprise scale software
Modern software is enormous, evolving fast, integrated through long supply chains, and very demanding in strength and versatility of analytics that tools need to support. We believe a technology similar to X-ray, producing accurate internal knowledge fast and non-intrusively, is fundamental in taming the complexity of understanding software. Clearblue is a tool understands software in its opaque form, exploits efficient algorithms in decomposing and storing its behavior as data, provides extension power to enable versatile downstream applications. This talk covers the challenges Clearblue sets to tackle, the outline of how Clearblue works, and selected discussions of specific mechanisms of Clearblue
Poster by CSE Students
The posters, prepared by CSE PG and UG students, will be exhibited outside LT-B and the authors will be present to answer questions and discuss their posters during 13:00-14:00. Refreshments will be served during the poster session.
Haoxi LI - PhD(CSE) student
Title: Causally Motivated Sycophancy Mitigation for Large Language Models
Author(s): Haoxi Li, Xueyang Tang, Jie ZHANG, Song Guo, Sikai Bai, Peiran Dong, Yue Yu
Abstract:
Incorporating user preferences into large language models (LLMs) can enhance the personalization and reliability of model outputs and facilitate the application of LLMs to real-world scenarios. However, leveraging user preferences can be a double-edged sword. Recent studies have found that improper utilization can incur sycophancy, where LLMs prioritize alignment with user preferences over the correctness of their outputs. To address sycophancy in LLMs, we analyze and model the problem through the lens of structured causal models (SCMs). We attribute sycophancy to LLMs' reliance on spurious correlations between user preferences and model outputs in this paper. Based on the proposed SCMs, we develop a novel framework, termed CAUSM, to mitigate sycophancy in LLMs by exploiting a significant causal signature. Specifically, we eliminate the spurious correlations embedded in the intermediate layers of LLMs through causally motivated head reweighting, and then calibrate the intra-head knowledge along the causal representation direction. Extensive experiments are conducted across diverse language tasks to demonstrate the superiority of our method over state-of-the-art competitors in mitigating sycophancy in LLMs.
Ziqi JIANG - PhD(CSE) student
Title: CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing
Author(s): Ziqi Jiang, Zhen Wang, Long Chen
Abstract:
Precise and flexible image editing remains a fundamental challenge in computer vision. Based on the modified areas, most editing methods can be divided into two main types: global editing and local editing. In this paper, we choose the two most common editing approaches (ie text-based editing and drag-based editing) and analyze their drawbacks. Specifically, text-based methods often fail to describe the desired modifications precisely, while drag-based methods suffer from ambiguity. To address these issues, we proposed CLIPDrag, a novel image editing method that is the first to combine text and drag signals for precise and ambiguity-free manipulations on diffusion models. To fully leverage these two signals, we treat text signals as global guidance and drag points as local information. Then we introduce a novel global-local motion supervision method to integrate text signals into existing drag-based methods by adapting a pre-trained language-vision model like CLIP. Furthermore, we also address the problem of slow convergence in CLIPDrag by presenting a fast point-tracking method that enforces drag points moving toward correct directions. Extensive experiments demonstrate that CLIPDrag outperforms existing single drag-based methods or text-based methods.
Wei CHEN - PhD(CSE) student
Title: CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Author(s): Wei Chen, Lin Li, Yongqi Yang, Bin Wen, Fan Yang, Tingting Gao, Yu Wu, Long Chen
Abstract:
Interleaved image-text generation has emerged as a crucial multimodal task, aiming at creating sequences of interleaved visual and textual content given a query. Despite notable advancements in recent multimodal large language models (MLLMs), generating integrated image-text sequences that exhibit narrative coherence and entity and style consistency remains challenging due to poor training data quality. To address this gap, we introduce CoMM, a high-quality Coherent interleaved image-text MultiModal dataset designed to enhance the coherence, consistency, and alignment of generated multimodal content. Initially, CoMM harnesses raw data from diverse sources, focusing on instructional content and visual storytelling, establishing a foundation for coherent and consistent content. To further refine the data quality, we devise a multi-perspective filter strategy that leverages advanced pre-trained models to ensure the development of sentences, consistency of inserted images, and semantic alignment between them. Various quality evaluation metrics are designed to prove the high quality of the filtered dataset. Meanwhile, extensive few-shot experiments on various downstream tasks demonstrate CoMM's effectiveness in significantly enhancing the in-context learning capabilities of MLLMs. Moreover, we propose four new tasks to evaluate MLLMs' interleaved generation abilities, supported by a comprehensive evaluation framework. We believe CoMM opens a new avenue for advanced MLLMs with superior multimodal in-context learning and understanding ability.
Haodong WANG - PhD(CSE) student
Title: D2MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving
Author(s): Haodong Wang, Qihua Zhou, Hongzi Cong, Song Guo
Abstract:
The mixture of experts (MoE) model is a sparse variant of large language models (LLMs), designed to hold a better balance between intelligent capability and computational overhead. Despite its benefits, MoE is still too expensive to deploy on resource-constrained edge devices, especially with the demands of on-device inference services. Recent research efforts often apply model compression techniques, such as quantization, pruning and merging, to restrict MoE complexity. Unfortunately, due to their predefined static model optimization strategies, they cannot always achieve the desired quality-overhead trade-off when handling multiple requests, finally degrading the on-device quality of service. These limitations motivate us to propose the D2MoE, an algorithm-system co-design framework that matches diverse task requirements by dynamically allocating the most proper bit-width to each expert. Specifically, inspired by the nested structure of matryoshka dolls, we propose the matryoshka weight quantization (MWQ) to progressively compress expert weights in a bit-nested manner and reduce the required runtime memory. On top of it, we further optimize the I/O-computation pipeline and design a heuristic scheduling algorithm following our hottest-expert-bit-first (HEBF) principle, which maximizes the expert parallelism between I/O and computation queue under constrained memory budgets, thus significantly reducing the idle temporal bubbles waiting for the experts to load. Evaluations on real edge devices show that D2MoE improves the overall inference throughput by up to 1.39X and reduces the peak memory footprint by up to 53% over the latest on-device inference frameworks, while still preserving comparable serving accuracy as its INT8 counterparts.
Kai CHEN - PhD(CSE) student
Title: EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Author(s): Kai Chen, Yunhao Gou, Runhui Huang, Zhili Liu, Daxin Tan, Jing Xu, Chunwei Wang, Yi Zhu, Yihan Zeng, Kuo Yang, Dingdong Wang, Kun Xiang, Haoyuan Li, Haoli Bai, Jianhua Han, Xiaohui Li, Weike Jin, Nian Xie, Yu Zhang, James T. Kwok, Hengshuang Zhao, Xiaodan Liang, Dit-Yan Yeung, Xiao Chen, Zhenguo Li, Wei Zhang, Qun Liu, Jun Yao, Lanqing Hong, Lu Hou, Hang Xu
Abstract:
GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and tones, marks a milestone for omni-modal foundation models. However, empowering Large Language Models to perceive and generate images, texts, and speeches end-to-end with publicly available data remains challenging in the open-source community. Existing vision-language models rely on external tools for the speech processing, while speech-language models still suffer from limited or even without vision-understanding abilities. To address this gap, we propose EMOVA (EMotionally Omni-present Voice Assistant), to enable Large Language Models with end-to-end speech capabilities while maintaining the leading vision-language performance. With a semantic-acoustic disentangled speech tokenizer, we notice surprisingly that omni-modal alignment can further enhance vision-language and speech abilities compared with the corresponding bi-modal aligned counterparts. Moreover, a lightweight style module is proposed for flexible speech style controls (e.g., emotions and pitches). For the first time, EMOVA achieves state-of-the-art performance on both the vision-language and speech benchmarks, and meanwhile, supporting omni-modal spoken dialogue with vivid emotions.
Hei Ching IU - BSc(COSC) student
Title: Enhancing Neurocognitive Disorder Screening with Conversational AI: A Comparative Analysis of Prompting Strategies and Language Effects
Author(s): IU Hei Ching, ZHANG Li
Abstract:
This study explores strategies to improve the efficacy of conversational AI in neurocognitive disorder screening by comparing prompting methods and the influence of language on prediction outcomes.
Our current system uses a structural chain of prompts delivered to a LLM model to respond to user input. With reference to the current algorithm, we explore another strategy – direct prompting, which simplifies the algorithm structure, and compare the effectiveness of the model responses.
We also analyse and compare the performance of two LLM models, GPT-4o and Qwen, and for each we investigate the influence of language on predicting outcomes by using Chinese and English prompts. In addition to comparing accuracy of categorial predictions, we aim to investigate the effects of automatic speech recognition errors and accuracies of predictions across different user inputs.
Cheng JIN - PhD(CSE) student
Title: Generalizable Cervical Cancer Screening via Large-scale Pretraining and Test-Time Adaptation
Author(s): Hao Jiang, Cheng Jin, Huangjing Lin, Yanning Zhou, Xi Wang, Jiabo Ma, Li Ding, Jun Hou, Runsheng Liu, Zhizhong Chai, Luyang Luo, Huijuan Shi, Yinling Qian, Qiong Wang, Changzhong Li, Anjia Han, Ronald Cheong Kin Chan, Hao Chen
Abstract:
Cervical cancer is a leading malignancy in female reproductive system. While AI-assisted cytology offers a cost-effective and non-invasive screening solution, current systems struggle with generalizability in complex clinical scenarios. To address this issue, we introduced Smart-CCS, a generalizable Cervical Cancer Screening paradigm based on pretraining and adaptation to create robust and generalizable screening systems. To develop and validate Smart-CCS, we curated a large-scale, multi-center dataset named CCS-127K, which comprises a total of 127,471 cervical cytology whole-slide images collected from 48 medical centers. By leveraging large-scale self-supervised pretraining, our CCS models are equipped with strong generalization capability, potentially generalizing across diverse scenarios. Then, we incorporated test-time adaptation to specifically optimize the trained CCS model for complex clinical settings, which adapts and refines predictions, improving real-world applicability. For evaluation, Smart-CCS was tested in both retrospective and prospective centers, achieving AUCs of 0.965 (retrospective internal centers), 0.950 (retrospective external centers). and up to 0.986 in the prospective centers, with high sensitivity validated by histology. Interpretability analysis confirms model alignment with clinical practice. These results highlight Smart-CCS as a promising solution for enhancing the reliability and scalability of AI-assisted cervical cancer screening in real-world clinical settings.
Kangyu YUAN - PhD(CSE) student
Title: "I Love the Internet Again": Exploring the Interaction Inception of "TikTok Refugees" Flocking into RedNote
Author(s): Kangyu Yuan, Li Zhang, Hanfang Lyu, Ziqi Pan, Yuanhao Zhang, Junze Li, Bingcan Guo, Jiaxiong Hu, Qingyu Guo, Xiaojuan Ma
Abstract:
The U.S. government's announcement to ban TikTok in January 2025 led an influx of TikTok users to migrate to RedNote, a Chinese user dominant social media platform. These migrants, self-identified as "TikTok Refugees," engage directly with Chinese natives, overcoming cross-cultural communication barriers. This platform migration provides a unique opportunity to examine the communication behaviors and strategies employed when users from the Western culture integrate into communities primarily composed of users from the Eastern culture. In this study, we analyze 3,510 RedNote posts (mainly posted from America (47%) and China (39.3%)) to characterize how TikTok migrants blend into the platform. Through open coding, we identified 12 cross-cultural communication strategies manifested in newcomers' posts. We also discuss the critical role of friendly content quality in facilitating successful interactions and the challenges of maintaining sustainable engagement. Finally, we offer design implications for social media platforms to enhance cross-cultural communication.
Yanghao WANG - PhD(CSE) student
Title: Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification
Author(s): Yanghao Wang, Long Chen
Abstract:
Data Augmentation (DA), i.e., synthesizing faithful and diverse samples to expand the original training set, is a prevalent and effective strategy to improve the performance of various data-scarce tasks. With the powerful image generation ability, diffusion-based DA has shown strong performance gains on different image classification benchmarks. In this paper, we analyze today’s diffusion-based DA methods, and argue that they cannot take account of both faithfulness and diversity, which are two critical keys for generating high quality samples and boosting classification performance. To this end, we propose a novel Diffusion-based DA method: Diff-II. Specifically, it consists of three steps: 1) Category concepts learning: Learning concept embeddings for each category. 2) Inversion interpolation: Calculating the inversion for each image, and conducting circle interpolation for two randomly sampled inversions from the same category. 3) Two-stage denoising: Using different prompts to generate synthesized images in a coarse-to-fine manner. Extensive experiments on various data-scarce image classification tasks (e.g., few-shot, long-tailed, and outof-distribution classification) have demonstrated its effectiveness over state-of-the-art diffusion-based DA methods.
Yingxue XU - PhD(CSE) student
Title: Large Foundation Models Empower Full-stack Solutions for Digital and Intelligent Pathology
Author(s): Xu Yingxue, Ma Jiabo, Wang Yihui
Abstract:
Pathological diagnosis in clinical practice heavily relies on scrutinizing slides manually, which is time-consuming and prone to subjective variability. Current digital pathology systems typically employ task-specific models (e.g., for metastasis detection or survival analysis) trained independently. However, their performance and generalizability are limited by the scarcity of annotated data for individual tasks. Therefore, we propose to leverage large pathology foundation models pretrained on over 190 million pathological images to empower oncological tasks, delivering the full-stack solution to whole-slide image (WSI) analysis for clinical decision support. The system, SmartPath, is developed to unify diverse oncological tasks into a single analysis system across lesion segmentation, auxiliary diagnosis, treatment response prediction, prognostic risk stratification and biomarker detection, while providing real-time interactions with pathology copilot and automating structured report generation. Currently, three major cancers are supported: lung, breast, and gastrointestinal cancers, covering nearly 100 oncological tasks. The system streamlines pathology workflows by automating multi-step analytical processes and standardizing reporting workflow, demonstrating scalable integration into clinical environments. This holds promise in addressing the critical need for full-stack AI solutions in digital and intelligent pathology, moving forward to the comprehensive demands of modern precision oncology.
Qianli LIU - PhD(CSE) student
Title: MELL: Memory-Efficient Large Language Model Serving via Multi-GPU KV Cache Managemen
Author(s): Qianli Liu, Zicong Hong, Peng Li, Fahao Chen and Song Guo
Abstract:
Serving large language models (LLMs) for massive users is challenged by the significant memory footprint of the transient state, known as the key-value (KV) cache, which scales with sequence length and number of requests. Instead of renting or buying more expensive GPUs, the load imbalance of the KV cache across GPUs, coupled with recent advances in inter-GPU communication, provides an opportunity to serve more requests via request migration. However, high migration overhead and unpredictable request patterns make it challenging. Therefore, this paper proposes MELL, a memory-efficient LLM serving system via multi-GPU KV cache management. It saves the number of GPUs needed in the system by considering the dynamic KV cache load and the costly request migration. Specifically, we first develop an adaptive request migration mechanism to balance the computational and communication overheads and adapt to diverse resource conditions. Then, we design an online algorithm tailored to a multi-LLM request and multi-GPU scheduling problem with migration enabled. It aims to minimise the required GPUs while limiting the number of migrations. Finally, we implement a prototype of MELL and demonstrate that it reduces the number of GPUs by 31% and increases the GPU utilization by 43% at most compared to existing LLM serving systems.
Kashun SHUM - PhD(CSE) student
Title: Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Author(s): Kashun Shum, Yuzhen Huang, Hongjian Zou, Qi Ding, Yixuan Liao, Xiaoxin Chen, Qian Liu, Junxian He
Abstract:
Language model pretraining involves training on extensive corpora, where data quality plays a pivotal role. In this work, we aim to directly estimate the contribution of data during pretraining and select pretraining data in an efficient manner. Specifically, we draw inspiration from recent findings showing that compression efficiency (i.e., the normalized loss) of diverse models on certain text correlates strongly with their downstream performance, when the text domain aligns with the downstream benchmarks(Huang et al., 2024). Building on this observation, we hypothesize that data on which model losses are predictive of downstream abilities also contribute effectively to learning. To leverage this insight, we introduce predictive data selection (PreSelect), a lightweight and efficient data selection method that requires training and deploying only a fastText-based scorer. Through comprehensive experiments with 1B and 3B parameter models, we demonstrate that models trained on 30B tokens selected with PreSelect surpass the performance of the vanilla baseline trained on 300B tokens, achieving a 10x reduction in compute requirements. Furthermore, PreSelect significantly outperforms other competitive data selection baselines, such as DCLM and FineWeb-Edu on a scale of 3B models trained on 100B tokens. We open-source our trained data selection scorer along with the curated datasets at this https URL.
Juyoung BAE - PhD(CSE) student
Title: Revolutionizing Dental Prosthetics: Generative AI in Biomimetic Design
Author(s): Juyoung Bae, Moo-hyun Son, Hao Chen
Abstract:
Partial edentulism significantly impacts oral functionality and aesthetics. Dental implant prostheses are the standard intervention, restoring function and appearance. Accurate prosthesis design is crucial, requiring factors such as occlusal alignment and adjacent teeth condition. This demands high level of expertise and a comprehensive understanding of each patient’s oral anatomy, leading to increased time and costs for customized treatments. Recent deep learning advancements show potential in generating missing teeth from intraoral scan (IOS) data. However, existing models are limited to generating a single tooth type, lack flexibility, and clinical applicability, and fail to thoroughly evaluate the usability of generated prosthesis designs. In this work, we introduce a novel generative AI model that adaptively creates one or multiple implant crowns based on patient-specific intraoral scan data, with dramatically reduced cost and manual effort. Our method incorporates user-driven control over crown size and placement, allowing practitioners to override and exclude compromised neighboring teeth where necessary. Through extensive expert evaluations and quantitative inter-rater analyses, we demonstrate that our solution achieves clinically viable accuracy, anatomical realism, and time-efficient workflows, advancing the state-of-the-art in AI-powered dental prosthesis design.
Haoyue ZHANG - PhD(CSE) student
Title: ToFe: Lagged Token Freezing and Reusing for Efficient Transformer Inference
Author(s): Haoyue Zhang, Jie Zhang, Song Guo
Abstract:
Token reduction, which discards less important tokens during forward propagation, has been proposed to enhance the efficiency of transformer-based model inference. However, existing methods handle unimportant tokens irreversibly, preventing their reuse in subsequent blocks. Considering that transformers focus on different information between blocks, tokens reduced in early blocks might be useful later. Furthermore, to adapt transformer models for resource-constrained devices, it is crucial to strike a balance between model performance and computational overhead. To address these challenges, in this paper, we introduce a novel Token Freezing and Reusing (ToFe) framework, where we identify important tokens at each stage and temporarily freeze the unimportant ones, allowing their lagged reuse at a later stage. Specifically, we design a prediction module for token identification and an approximate module for recovery of the frozen tokens. By jointly optimizing with the backbone through computation budget-aware end-to-end training, ToFe can adaptively process the necessary tokens at each block, thereby reducing computational cost while maintaining performance. Extensive experiments demonstrate that ToFe reduces the computational cost of LV-ViT model by 50% with less than 2% drop in Top-1 accuracy, achieving a better trade-off between performance and complexity compared to state-of-the-art methods.
Bingnan CHEN - PhD(CSE) student
Title: Yannakakis+: Practical Acyclic Query Evaluation with Theoretical Guarantees
Author(s): Qichen Wang, Bingnan Chen, Binyang Dai, Ke Yi, Feifei Li, Liang Lin
Abstract:
Acyclic conjunctive queries form the backbone of most analytical workloads, and have been extensively studied in the literature from both theoretical and practical angles. However, there is still a large divide between theory and practice. While the 40-year-old Yannakakis algorithm has strong theoretical running time guarantees, it has not been adopted in real systems due to its high hidden constant factor. In this paper, we strive to close this gap by proposing Yannakakis+, an improved version of the Yannakakis algorithm, which is more practically efficient while preserving its theoretical guarantees. Our experiments demonstrate that Yannakakis+ consistently outperforms the original Yannakakis algorithm by 2x to 5x across a wide range of queries and datasets.
Another nice feature of our new algorithm is that it generates a traditional DAG query plan consisting of standard relational operators, allowing Yannakakis+ to be easily plugged into any standard SQL engine. Our system prototype currently supports four different SQL engines (DuckDB, PostgreSQL, SparkSQL, and AnalyticDB from Alibaba Cloud), and our experiments show that Yannakakis+ is able to deliver better performance than their native query plans on 160 out of the 162 queries tested, with an average speedup of 2.41x and a maximum speedup of 47,059x.
Possible Ways of Collaboration with Industry Partners
- Joint research lab
It is a long-term collaboration relation where companies set up a research funding pool at the university. - Internship
Our students are encouraged to take internship during their studies. The internship can be as short as six weeks and as long as one year. We maintain a database enlisting the internship opportunities offered by companies. The database is open to our students. - Final year project
Our students are required to complete a project applying what they have learnt in their final year. Companies are welcome to let us know possible topics that they are interested in sponsoring by each March. Selected topics will be open for enrollment by students as their final year projects. - Professional and development course
All students who have enrolled to our program are required to take a seminar course, where companies are welcome to give a talk introducing the various opportunities and career path in a profession for IT graduates. - Research and Technology Forum
We will organize a Research and Technology Forum, inviting companies to join a series of 5-min research highlights by our faculty. - Innovation and Technology Fund (ITF)
Hong Kong government offering funding to facilitate collaboration between universities and industry. The two popular schemes are: Innovation and Technology Support Programme (ITSP) and Partnership Research Program (PRP). Besides tax relief, companies are eligible to rebate 40% of their cash sponsorship in these projects from the government via the Research and Development Cash Rebate Scheme (CRS).
Enquiry
Ms. Sylvia Mak ()