More about HKUST
A Survey on LLM-Based Multi-Agent Systems for Document Mining
PhD Qualifying Examination
Title: "A Survey on LLM-Based Multi-Agent Systems for Document Mining"
by
Mr. Pengze CHEN
Abstract:
Document mining is the process of extracting valuable information and
knowledge from large collections of documents, such as academic literature,
analysis reports, and medical records. Efficient document mining is crucial
for expediting the collection, processing, and utilization of massive
documents in this era of unprecedented data growth. However, traditional
methods have often been limited by a shallow understanding of semantics,
being largely confined to keyword matching and statistical patterns. The true
potential of document mining is only now being unlocked with the advent of
Large Language Models (LLMs). Their capabilities in semantic understanding,
reasoning, and multimodal processing empower them to comprehend and interact
with documents in a human-like manner. Furthermore, the emergence of
LLM-based multi-agent systems (LMASs) enhances this process through
collaborative execution and mitigation of hallucinations, significantly
boosting the reliability and effectiveness. To systematically investigate the
advancements in this field, this survey organizes the landscape around three
representative capabilities: document retrieval, for precisely locating
relevant documents; document answering, for delivering direct answers; and
document summarization, for creating condensed and coherent summaries of
documents. It first analyzes the requirements and challenges of each task,
and subsequently examines the existing solutions, highlighting their
respective strengths and weaknesses. Finally, it summarizes the existing
works and outlines promising research directions to further advance LMAS for
document mining.
Date: Monday, 29 September 2025
Time: 3:00pm - 5:00pm
Venue: Room 5501
Lifts 25/26
Committee Members: Prof. Lei Chen (Supervisor)
Prof. Ke Yi (Chairperson)
Dr. Xiaomin Ouyang