Research and Technology Forum 2024


The Research and Technology Forum 2024 of the Department of Computer Science and Technology aims at providing industry partners a better understanding of perspective research areas of the department and learn about different collaboration channels provided by HKUST. Throughout the event, some ongoing research projects conducted by our faculty would be showcased and participants may also learn more about the department.

Event Details

Date: 5 April 2024 (Friday)
Time: 10:00 am - 1:30 pm HKT (Registration starts at 9:40 am)
Venue: Lam Woo Lecture Theater (LT-B), The Hong Kong University of Science and Technology (location map)
Video: How to get to Lam Woo Lecture Theater (LT-B) from the North Gate of HKUST
Registration: closed

Program Rundown

Time (HKT) Event Rundown
9:40 am Registration Starts
10:00 am - 10:15 am Opening Remarks and Introduction
Overview of CSE Labs and Research Highlights
By Prof. Xiaofang ZHOU,
Head of Department of CSE
10:15 am - 10:30 am Presentation by the Office of Knowledge Transfer (OKT), HKUST
By Mr. Pok Man YIU,
Head (Investment Fund Management) of OKT
10:30 am - 12:30 pm Research and Project Presentation
By CSE Professors
12:30 pm - 1:30 pm Poster Exhibition (served with refreshment)


  • All are welcome. Online registration is required.
  • Free admission.
  • Each registration admits one person only.

Expand all

Title and Abstract -
Research and Project Presentations by CSE Professors

Prof. Binhang YUAN

Assistant Professor, Department of CSE

Accommodating LLM Service over Heterogeneous Computational Resources

Serving large-scale foundation model service is a crucial component of contemporary AI applications. We focus on deploying such services in a heterogeneous and potentially decentralized setting to mitigate the substantial costs typically associated with centralized data centers. Our work relies on carefully designed scheduling algorithms where we model the computation capacity and inter-machine connection precisely and propose an efficient searching algorithm to find the optimal allocations. We implement an asymmetric parallel learning framework to accommodate the asymmetric allocation of the LLM computation tasklets in both training and inference. Our empirical study suggests that the proposed method can efficiently reduce the service cost while preserving the service quality.

Presentation Slides

Prof. Dan XU

Assistant Professor, Department of CSE

Joint 2D-3D Multi-Modal and Multi-Task Human and Scene Visual Understanding and Generation

In this talk, several ongoing research directions in our lab will be briefly presented, including scene depth estimation and large-scale reconstruction from supervised or self-supervised settings, joint multi-modal multi-task scene perception, open-world video understanding and generation, and end-to-end deep visual slam. I will introduce the basic setups of these problems, their important research values, and the significant outputs we have achieved along these directions.

Presentation Slides

Prof. Dongdong SHE

Assistant Professor, Department of CSE

Software Testing in a Data-driven Approach

Software vulnerabilities significantly impact our daily lives, ranging from global ransomware attacks that cost billions of dollars loss to confidential data breaches in government, military, and industry. Traditional software testing techniques (e.g., fuzzing, symbolic execution) mainly rely on rule-based approaches, i.e., incorporate a set of static rules, and often fail to generalize on diverse programs. Compared with the rule-based approach, the data-driven approach is flexible, adaptive, and effective. Because it can dynamically make decisions based on online data rather than static rules. In this talk, I will present a few past projects that leverage data-driven approaches (machine learning, optimization algorithm, social network analysis) to improve software testing. Then, I will introduce two future directions in the era of LLM: 1) using LLM to assist software testing and vulnerability detection; 2) detecting and analyzing vulnerabilities in the ML system.

Presentation Slides

Prof. Hao CHEN

Assistant Professor, Department of CSE

Towards Trustworthy Artificial Intelligence for Healthcare

We aim to develop an integrated and versatile Multimodal AI Foundation platform for trustable and scalable biomedical data analytics. This endeavor promises to redefine the boundaries of precision oncology by fully harnessing advanced AI technologies to mitigate hospitals' burden, enhance diagnostic accuracy, tailor personalized treatments for individuals, improve patient outcomes, and, ultimately, saving lives at scale and precision.

Presentation Slides

Prof. Lei CHEN

Chair Professor, Department of CSE

Data Management for Deep Learning

Deep learning (DL) has made significant progress and found wide application in various fields, like chaptGPT for question answering. However, the success and efficiency of DL models depend on proper data management. Training deep learning-based image classifiers is challenging without labeled data, and efficiency is hindered by large datasets, complex models, and numerous hyperparameters. Lack of validation and explanation limits model applicability. In this presentation, I will discuss three crucial issues in data management for deep learning: 1) effective data preparation for DL, including extraction, integration, and labeling; 2) DL training optimization, involving data compression and computation graph optimization; and 3) the importance of model explanation for robustness and transparency. I will conclude by highlighting future research directions. Moreover, I will demonstrate the important industry collaboration with my group on this topic.

Presentation Slides

Prof. Long CHEN

Assistant Professor, Department of CSE

Toward Explainable and Robust Scene Understanding in the Open World

Nowadays, these well-trained deep learning models can significantly outperform our humans in many computer vision tasks, including complex visual scene understanding (e.g., visual-language tasks). Despite unprecedented attention and great success, today's visual scene understanding models still fail to realize human-like understanding. By "human-like", we mean that these vision systems should be equipped with two important abilities: 1) Explainable: The model should rely on (right) explicit evidence when making decisions; 2) Robust: The model should be robust to some situations with only "low-quality" training data (e.g., training samples are biased, noisy, or limited). In this talk, I will share the research outputs we have achieved in these directions.

Presentation Slides

Prof. Qifeng CHEN

Assistant Professor, Department of CSE

Open-source Video Generation and Editing

Recently, Sora by OpenAI has demonstrated exceptional video generation quality in text-to-video generation with length up to one minute. While Sora is currently the best-performing system in video generation, its API and source code are not available yet. We will look into some of our open-source projects in video generation and editing and demonstrate the strengths and weaknesses of these projects. We will also talk about the challenges when we work on video generation and editing, especially in computational resources and data collection.

Presentation Slides

Prof. Shing-Chi CHEUNG

Chair Professor, Department of CSE

May LLMs tell how my system can go wrong?

In the era of large language models (LLMs), enterprises are exploring the use of these models to help detect undetected problems in their systems. While LLMs have demonstrated promising results in popular coding benchmarks like HumanEval, LLMs are know to suffer from bias and hallucination. Their answers can be incorrect. In this presentation, we will share our experience in using LLMs for test generation and briefly introduce recent research efforts.

Presentation Slides

Prof. Shuai WANG

Assistant Professor, Department of CSE

Building Trustworthy LLM

Large language models (LLMs) have become a buzzword in recent years, and it is increasingly being used in various real-world scenarios, ranging from financial sectors to healthcare devices. With the rapid advancement of LLM technology, the security and reliability risks associated with LLM systems are also increasing. LLMs may be vulnerable to cyber-attacks, which can compromise their reliability and integrity to a large extent. The failure of LLMs may jeopardize the entire computing infrastructure, resulting in significant financial losses, legal implications, and reputational damages. It is essential to ensure that LLMs are developed in a secure and reliable manner, and that proper measures are taken to mitigate the associated risks. In this talk, Dr. Shuai WANG will briefly introduce their recent efforts in building trustworthy LLMs. He will present various techniques and strategies that have been applied by his research group to uncover LLM pitfalls and accordingly enhance their trustworthiness.

Presentation Slides

Prof. Song GUO

Professor, Department of CSE

Towards Edge-Native Foundation Model-as-a-Service: A New AI Paradigm from Algorithms to Systems

Foundation Models (FMs), such as the GPT series, LLaMA, DALL-E, etc., have driven a revolution in modern artificial intelligence in both academia and industry by adapting to many downstream applications in our society. However, the high costs and barriers associated with building, deploying and managing on-premises FMs prevent their widespread adoption by small and medium-sized enterprises and individual users. Recognising that types of edge nodes (e.g. edge servers, personal PCs, etc.), which account for about 90% of the computing power in the network, have not yet been effectively exploited, our research focuses on a new AI paradigm from algorithms to systems, called Edge-Native FM-as-a-Service. The paradigm exploits the natural advantages of edge AI in terms of cost, latency and privacy to efficiently aggregate distributed, heterogeneous and multi-party edge computing power. It aims to provide a solution for users to access personalised, ubiquitous and secure FM services through application programming interface calls, without costly development or complex management.

Presentation Slides

Prof. Yangqiu SONG

Associate Professor, Department of CSE

E-Commerce Commonsense Knowledge Graphs for Intention-based Recommendation

Presentation Slides

Poster by CSE Postgraduate Students

The posters, prepared by CSE PhD and MPhil students, will be exhibited outside LT-B and the authors will be present to answer questions and discuss their posters during 12:30-13:30. Refreshments will be served during the poster session.

Expand all

Shih-Yang LIU - PhD(CSE) student

Title: Acceleration of LLM inference through low-bit quantization

Author(s): Shih-Yang Liu, Xijie Huang, Zechun Liu, Kwang-Ting Cheng


This poster will present findings from three research projects focused on accelerating Large Language Model (LLM) inference using low-bit quantization. Specifically, we will discuss SDQ (Stochastic Differentiable Quantization with Mixed Precision), a mixed-precision quantization framework for Convolutional Neural Networks. Additionally, we will showcase results from Oscillation-free Quantization for Low-bit Vision Transformers, which enables quantization of Vision Transformers to just 2 bits. Lastly, we will examine LLM-FP4 (4-Bit Floating-Point Quantized Transformers), which represents a pioneering approach capable of quantizing both activation and weight of Large Language Models to only 4 bits using post-training quantization techniques.

Shiwen WU - PhD(CSE) student

Title: Blocker and Matcher Can Mutually Benefit: A Co-Learning Framework for Low-Resource Entity Resolution

Author(s): Shiwen Wu, Qiyu Wu, Honghua Dong, Wen Hua, Xiaofang Zhou


Entity resolution (ER) approaches typically consist of a blocker and a matcher. They share the same goal and cooperate in different roles: the blocker first quickly removes obvious non-matches, and the matcher subsequently determines whether the remaining pairs refer to the same real-world entity. Despite the state-of-the-art performance achieved by deep learning methods in ER, these techniques often rely on a large amount of labeled data for training, which can be challenging or costly to obtain. Thus, there is a need to develop effective ER systems under low-resource settings. In this work, we propose an end-to-end iterative Co-learning framework for ER, aimed at jointly training the blocker and the matcher by leveraging their cooperative relationship. In particular, we let the blocker and the matcher share their learned knowledge with each other via iteratively updated pseudo labels, which broaden the supervision signals. To mitigate the impact of noise in pseudo labels, we develop optimization techniques from three aspects: label generation, label selection and model training. Through extensive experiments on benchmark datasets, we demonstrate that our proposed framework outperforms baselines by an average of 9.13-51.55%. Furthermore, our analysis confirms that our framework achieves mutual benefits between the blocker and the matcher.

Yi LIN - PhD(CSE) student

Title: Boundary Detection in Deep Learning Models for Medical Image Segmentation

Author(s): Yi Lin, Dong Zhang, Xiao Fang, Yufan Chen, Kwang-Ting Cheng, and Hao Chen


Medical image segmentation is a fundamental task in the community of medical image analysis. In this paper, a novel network architecture, referred to as Convolution, Transformer, and Operator (CTO), is proposed. CTO employs a combination of Convolutional Neural Networks (CNNs), Vision Transformer (ViT), and an explicit boundary detection operator to achieve high recognition accuracy while maintaining an optimal balance between accuracy and efficiency. The proposed CTO follows the standard encoder-decoder segmentation paradigm, where the encoder network incorporates a popular CNN backbone for capturing local semantic information, and a lightweight ViT assistant for integrating long-range dependencies. To enhance the learning capacity on boundary, a boundary-guided decoder network is proposed that uses a boundary mask obtained from a dedicated boundary detection operator as explicit supervision to guide the decoding learning process. The performance of the proposed method is evaluated on six challenging medical image segmentation datasets, demonstrating that CTO achieves state-of-the-art accuracy with a competitive model complexity.

Ruibin YUAN - PhD(CSE) student

Title: ChatMusician: Understanding and Generating Music Intrinsically with LLM

Author(s): Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu, Tao Jiang, Wenhao Huang, Wenhu Chen, Emmanouil Benetos, Jie Fu, Gus Xia, Roger Dannenberg, Wei Xue, Shiyin Kang, Yike Guo


While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub.

Van Quyet DO - MPhil(CSE) student

Title: ConstraintChecker: A Plugin for Large Language Models to Reason on Commonsense Knowledge Bases

Author(s): Quyet V. Do, Tianqing Fang, Shizhe Diao, Zhaowei Wang, Yangqiu Song


Reasoning over Commonsense Knowledge Bases (CSKB), i.e. CSKB reasoning, has been explored as a way to acquire new commonsense knowledge based on reference knowledge in the original CSKBs and external prior knowledge. Despite the advancement of Large Language Models (LLM) and prompt engineering techniques in various reasoning tasks, they still struggle to deal with CSKB reasoning. One of the problems is that it is hard for them to acquire explicit relational constraints in CSKBs from only in-context exemplars, due to a lack of symbolic reasoning capabilities (Bengio et. al., 2021). To this end, we proposed ConstraintChecker, a plugin over prompting techniques to provide and check explicit constraints. When considering a new knowledge instance, ConstraintChecker employs a rule-based module to produce a list of constraints, then it uses a zero-shot learning module to check whether this knowledge instance satisfies all constraints. The acquired constraint-checking result is then aggregated with the output of the main prompting technique to produce the final output. Experimental results on CSKB Reasoning benchmarks demonstrate the effectiveness of our method by bringing consistent improvements over all prompting methods.

Junze LI - PhD(CSE) student

Title: Designing Scaffolding Strategies for Conversational Agents in Dialog Tasks of Neurocognitive Disorder Screening

Author(s): Jiaxiong Hu, Junze Li, Yuhang Zeng, Dongjie Yang, Danxuan Liang, Helen Meng, Xiaojuan Ma


Regular screening is critical for individuals at risk of neurocognitive disorders (NCDs) to receive early intervention. Conversational agents (CAs) have been adopted to administer dialog-based NCD screening tests for their scalability compared to human-administered tests. However, unique communication skills are required for CAs during NCD screening, e.g., clinicians often apply scaffolding to ensure subjects' understanding of and engagement in screening tests. Based on scaffolding theories and analysis of clinicians' practices from human-administered test recordings, we designed a scaffolding framework for the CA. In an exploratory wizard-of-Oz study, the CA empowered by ChatGPT administered tasks in the Grocery Shopping Dialog Task with 15 participants (10 diagnosed with NCDs). Clinical experts verified the quality of the CA's scaffolding and we explored its effects on task understanding of the participants. Moreover, we proposed implications for the future design of CAs that enable scaffolding for scalable NCD screening.

Zhifeng JIANG - PhD(CSE) student

Title: Dordis: Efficient Federated Learning with Dropout-Resilient Differential Privacy

Author(s): Zhifeng Jiang, Wei Wang, Ruichuan Chen


Federated learning (FL) is increasingly deployed among multiple clients to train a shared model over decentralized data. To address privacy concerns, FL systems need to safeguard the clients' data from disclosure during training and control data leakage through trained models when exposed to untrusted domains. Distributed differential privacy (DP) offers an appealing solution in this regard as it achieves a balanced tradeoff between privacy and utility without a trusted server. However, existing distributed DP mechanisms are impractical in the presence of client dropout, resulting in poor privacy guarantees or degraded training accuracy. In addition, these mechanisms suffer from severe efficiency issues.

We present Dordis, a distributed differentially private FL framework that is highly efficient and resilient to client dropout. Specifically, we develop a novel 'add-then-remove' scheme that enforces a required noise level precisely in each training round, even if some sampled clients drop out. This ensures that the privacy budget is utilized prudently, despite unpredictable client dynamics. To boost performance, Dordis operates as a distributed parallel architecture via encapsulating the communication and computation operations into stages. It automatically divides the global model aggregation into several chunk-aggregation tasks and pipelines them for optimal speedup. Large-scale deployment evaluations demonstrate that Dordis efficiently handles client dropout in various realistic FL scenarios, achieving the optimal privacy-utility tradeoff and accelerating training by up to 2.4× compared to existing solutions.

Jianzhe YU - PhD(CSE) student

Title: DP-SQL: a differentially Private SQL engine

Author(s): Wei Dong, Juanru Fang, Dajun Sun, Jianzhe Yu, Ke Yi


Differential privacy (DP) has garnered significant attention from both academia and industry due to its potential in offering robust privacy protection for individual data during analysis. With the increasing volume of sensitive information being collected by organizations and analyzed through SQL queries, the development of a general-purpose query engine that is capable of supporting a broad range of SQLs while maintaining DP has become the holy grail in privacy-preserving query release. In a relational database, there are two DP policies: tuple-DP, which protects the privacy of single tuples in each relation, and user-DP, which protects all data belonging to each user via foreign keys. Under each policy, we have designed DP mechanisms for answering a broad class of queries consisting of the selection, projection, aggregation, and join operators. Five papers are published in tier-1 computer science conferences. Finally, based on the algorithms, we have built a DP-SQL system that significantly outperforms existing systems in terms of both utility and efficiency.

Xi ZHAO - PhD(CSE) student

Title: Efficient Approximate Maximum Inner Product Search over Sparse Vectors

Author(s): Xi Zhao, Zhonghan Chen, Kai Huang, Ruiyuan Zhang, Bolong Zheng, Xiaofang Zhou


The Maximum Inner Product Search (MIPS) problem is a critical operation in high-dimensional vector spaces with numerous applications, particularly in the realm of deep neural network-based embedding models. Existing MIPS methods, such as locality-sensitive hashing (LSH), are well-suited for dense vectors but fall short when dealing with sparse vectors due to the near-orthogonality among them. Current solutions for sparse vectors rely heavily on inverted lists, leading to inefficient query performance, especially with large-scale sparse datasets.

To address these challenges, we present SOSIA, a novel framework designed specifically for sparse vectors. SOSIA introduces the SOS transformation, which converts sparse vectors into a binary space while providing an unbiased estimator of the inner product between any two vectors. This transformation effectively handles the sparsity issue. To further enhance query efficiency, we develop a minHash-based index.

We provide a theoretical analysis of SOSIA's query quality and demonstrate its effectiveness through extensive experiments on real-world sparse datasets. The results show that SOSIA outperforms existing methods in terms of query efficiency and accuracy, making it a promising solution for MIPS over sparse vectors.

Renjie PI - PhD(CSE) student

Title: Finegrained Understanding via Multimodal Large Language Models

Author(s): Renjie Pi, Jipeng Zhang


The integration of visual inputs with large language models (LLMs) has led to remarkable advancements in multi-modal capabilities, giving rise to vision large language models (VLLMs). However, effectively harnessing LLMs for intricate visual perception tasks, such as detection and segmentation, remains a challenge. Conventional approaches achieve this by transforming perception signals (e.g., bounding boxes, segmentation masks) into sequences of discrete tokens, which struggle with the precision errors and introduces further complexities for training. We propose techniques to empower the VLLMs with fine-grained visual perception abilities.

Kai CHEN - PhD(CSE) student

Title: GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation

Author(s): Kai Chen, Enze Xie, Zhe Chen, Yibo Wang, Lanqing Hong, Zhenguo Li, Dit-Yan Yeung


Diffusion models have attracted significant attention due to their remarkable ability to create content and generate data for tasks like image classification. However, the usage of diffusion models to generate high-quality object detection data remains an underexplored area, where not only image-level perceptual quality but also geometric conditions such as bounding boxes and camera views are essential. Previous studies have utilized either copy-paste synthesis or layout-to-image (L2I) generation with specifically designed modules to encode the semantic layouts. In this paper, we propose GeoDiffusion, a simple framework that can flexibly translate various geometric conditions into text prompts and empower pre-trained text-to-image (T2I) diffusion models for high-quality detection data generation. Unlike previous L2I methods, our GeoDiffusion can encode not only the bounding boxes but also extra geometric conditions such as camera views in self-driving scenes. Extensive experiments demonstrate that GeoDiffusion outperforms previous L2I methods while maintaining 4x training time faster. To the best of our knowledge, this is the first work to adopt diffusion models for layout-to-image generation with geometric conditions and demonstrate that L2I-generated images can be beneficial for improving the performance of object detectors.

Jipeng ZHANG - PhD(CSE) student

Title: LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

Author(s): Shizhe Diao, Rui Pan, Hanze Dong, Kashun Shum, Jipeng Zhang, Wei Xiong, Tong Zhang


Large foundation models have demonstrated a great ability to achieve general human-level intelligence far beyond traditional approaches. As the technique keeps attracting attention from the AI community, more and more large foundation models have become publicly available. However, most of those models exhibit a major deficiency in specialized-task applications, where the step of finetuning is still required to obtain satisfactory performance. As the number of available models and specialized tasks keeps growing, the job of general finetuning becomes highly nontrivial. In this paper, we take the first step to address this issue. We introduce an extensible and lightweight toolkit, LMFlow, which aims to simplify the finetuning and inference of general large foundation models. LMFlow offers a complete finetuning workflow for a large foundation model to support personalized training with limited computing resources. Furthermore, it supports continuous pretraining, instruction tuning, parameter-efficient finetuning, alignment tuning, and large model inference, along with carefully designed and extensible APIs. This toolkit has been thoroughly tested and is available at

Maryam MASOUDIAN - PhD(CSE) student

Title: Mole: Efficient Crash Reproduction in Android Applications with Enforcing Necessary UI Events

Author(s): Maryam Masoudian, Heqing Huang, Morteza Amini, Charles Zhang


To improve the quality of Android apps, developers use automated debugging and testing solutions to determine whether the previously found crashes are reproducible or not. However, existing GUI fuzzing solutions for Android apps struggle to reproduce crashes efficiently based solely on a crash stack trace. This trace provides the location in the app where the crash occurs. GUI fuzzing solutions currently in use rely on heuristics to generate UI events. Unfortunately, these events often do not align with the investigation of an app's UI event space to reach a specific location of code. Hence, they generate numerous events unrelated to the crash, leading to an event explosion. To address this issue, a precise static UI model of widgets and screens can greatly enhance the efficiency of a fuzzing tool in its search. Building such a model requires considering all possible combinations of event sequences since the execution order of widgets is not statically determined. However, this approach presents challenges, particularly in terms of scalability, making it susceptible to encountering issues with handling larger and complex apps. In this paper, we propose a directed-based fuzzing solution to reduce an app's event domain to necessary events for triggering a crash. Our insight is that the dependencies between widgets in their visual presentation and attribute states provide valuable information in precisely identifying the necessity of an event triggering a crash. We propose an attribute-sensitive reachability analysis (ASRA) to track dependent widgets in reachable paths to the crash point. It gives us a precise static UI model of widgets in which generating an event on them is necessary for a crash occurrence. We leverage instrumentation to inject code to prune irrelevant events, reducing the event domain to search. We used four famous fuzzing tools, Monkey, Ape, Stoat, and FastBot2, to assess the impact of our solution in decreasing the crash reproduction time and increasing the possibility of reproducing a crash. The results show that the average reproduction time of crashes becomes at least 2x faster, and the success ratio of finding it again is increased to 1.0 for half of the analyzed crashes.

Qirui YANG - PhD(CSE) student

Title: Penetrative AI: Making LLMs Comprehend the Physical World

Author(s): Huatao Xu, Liying Han, Qirui Yang, Mo Li, Mani Srivastava


Recent developments in Large Language Models (LLMs) have demonstrated their remarkable capabilities across a range of tasks. Questions, however, persist about the nature of LLMs and their potential to integrate common-sense human knowledge when performing tasks involving information about the real physical world. This paper delves into these questions by exploring how LLMs can be extended to interact with and reason about the physical world through IoT sensors and actuators, a concept that we term "Penetrative AI". The paper explores such an extension at two levels of LLMs' ability to penetrate into the physical world via the processing of sensory signals. Our preliminary findings indicate that LLMs, with ChatGPT being the representative example in our exploration, have considerable and unique proficiency in employing the embedded world knowledge for interpreting IoT sensor data and reasoning over them about tasks in the physical realm. Not only this opens up new applications for LLMs beyond traditional text-based tasks, but also enables new ways of incorporating human knowledge in cyber-physical systems.

Possible Ways of Collaboration with Industry Partners

  1. Joint research lab
    It is a long-term collaboration relation where companies set up a research funding pool at the university.
  2. Internship
    Our students are encouraged to take internship during their studies. The internship can be as short as six weeks and as long as one year. We maintain a database enlisting the internship opportunities offered by companies. The database is open to our students.
  3. Final year project
    Our students are required to complete a project applying what they have learnt in their final year. Companies are welcome to let us know possible topics that they are interested in sponsoring by each March. Selected topics will be open for enrollment by students as their final year projects.
  4. Professional and development course
    All students who have enrolled to our program are required to take a seminar course, where companies are welcome to give a talk introducing the various opportunities and career path in a profession for IT graduates.
  5. Research and Technology Forum
    We will organize a Research and Technology Forum, inviting companies to join a series of 5-min research highlights by our faculty.
  6. Innovation and Technology Fund (ITF)
    Hong Kong government offering funding to facilitate collaboration between universities and industry. The two popular schemes are: Innovation and Technology Support Programme (ITSP) and Partnership Research Program (PRP). Besides tax relief, companies are eligible to rebate 40% of their cash sponsorship in these projects from the government via the Research and Development Cash Rebate Scheme (CRS).


Ms. Sylvia Mak ()