More about HKUST
Vision-Enhanced Graph Intelligence: Reasoning, Prediction, and Generation
PhD Thesis Proposal Defence
Title: "Vision-Enhanced Graph Intelligence: Reasoning, Prediction, and
Generation"
by
Mr. Yanbin WEI
Abstract:
Graph-structured data plays a fundamental role in modern AI systems, yet
existing graph methods still face key limitations in flexibility, structural
expressiveness, and interpretability across heterogeneous tasks. This thesis
proposal studies a unified direction, termed vision-enhanced graph
intelligence, which integrates visual structural awareness with language- and
message-passing-based learning to improve graph reasoning, prediction, and
retrieval-augmented generation.
The first part presents GITA, a Graph-to-Visual-and-Textual Integration
framework for instruction-based graph reasoning. GITA converts structural
graphs into coordinated visual and textual representations, enabling
vision-language models to perform graph reasoning in a unified and
user-friendly paradigm. To support systematic evaluation, we introduce GVLQA,
a large-scale vision-language benchmark for general graph reasoning.
The second part presents GVN and its efficient variant E-GVN for link
prediction. The core idea is to extract visual structural features from local
subgraph renderings and fuse them with message-passing neural network
representations. This design is orthogonal to existing structural-feature
enhancements and consistently improves link prediction performance on both
standard and large-scale benchmarks.
The third part presents VizRAG, a retrieval-augmented generation framework
enhanced by hypergraph visualization. We analyze the advantages and
feasibility of visual hypergraph cues for knowledge-intensive generation,
identify key challenges such as visual congestion and rendering bias, and
introduce HyperViz as a practical toolkit to address them. Extensive
experiments validate that visualized high-order structure improves retrieval
quality and downstream response generation.
Overall, this proposal establishes visual structure awareness as a general
and practical principle for graph AI. By bridging multimodal reasoning and
graph learning, the proposed research advances unified methodologies and
empirical foundations for next-generation graph-centric intelligent systems.
Date: Monday, 15 June 2026
Time: 10:00am - 11:30am
Venue: Room 3494
Lift 25/26
Committee Members: Prof. James Kwok (Supervisor)
Prof. Raymond Wong (Chairperson)
Dr. Yangqiu Song