Vision-Enhanced Graph Intelligence: Reasoning, Prediction, and Generation

PhD Thesis Proposal Defence


Title: "Vision-Enhanced Graph Intelligence: Reasoning, Prediction, and
Generation"

by

Mr. Yanbin WEI


Abstract:

Graph-structured data plays a fundamental role in modern AI systems, yet
existing graph methods still face key limitations in flexibility, structural
expressiveness, and interpretability across heterogeneous tasks. This thesis
proposal studies a unified direction, termed vision-enhanced graph
intelligence, which integrates visual structural awareness with language- and
message-passing-based learning to improve graph reasoning, prediction, and
retrieval-augmented generation.

The first part presents GITA, a Graph-to-Visual-and-Textual Integration
framework for instruction-based graph reasoning. GITA converts structural
graphs into coordinated visual and textual representations, enabling
vision-language models to perform graph reasoning in a unified and
user-friendly paradigm. To support systematic evaluation, we introduce GVLQA,
a large-scale vision-language benchmark for general graph reasoning.

The second part presents GVN and its efficient variant E-GVN for link
prediction. The core idea is to extract visual structural features from local
subgraph renderings and fuse them with message-passing neural network
representations. This design is orthogonal to existing structural-feature
enhancements and consistently improves link prediction performance on both
standard and large-scale benchmarks.

The third part presents VizRAG, a retrieval-augmented generation framework
enhanced by hypergraph visualization. We analyze the advantages and
feasibility of visual hypergraph cues for knowledge-intensive generation,
identify key challenges such as visual congestion and rendering bias, and
introduce HyperViz as a practical toolkit to address them. Extensive
experiments validate that visualized high-order structure improves retrieval
quality and downstream response generation.

Overall, this proposal establishes visual structure awareness as a general
and practical principle for graph AI. By bridging multimodal reasoning and
graph learning, the proposed research advances unified methodologies and
empirical foundations for next-generation graph-centric intelligent systems.


Date:                   Monday, 15 June 2026

Time:                   10:00am - 11:30am

Venue:                  Room 3494
                        Lift 25/26

Committee Members:      Prof. James Kwok (Supervisor)
                        Prof. Raymond Wong (Chairperson)
                        Dr. Yangqiu Song
Privacy Sitemap
Vision-Enhanced Graph Intelligence: Reasoning, Prediction, and Generation

About

People

Research

Academics

Admissions