More about HKUST
Constructing Natural Language Interfaces for Data Querying and Visualization
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Constructing Natural Language Interfaces for Data Querying and Visualization" By Mr. Yuanfeng SONG Abstract: We live in the era of Big Data, and a considerable amount of the world's data is stored in relational databases. A necessary step to access, analyze, manipulate, and visualize this data is composing programming codes in some programming languages (e.g., structure query languages and declarative visualization languages). However, these programming languages usually have a steep learning curve which may block beginners and non-technical background users. To facilitate the end users to perform data querying and visualization, automatically translating natural language questions to these programming languages has been proposed and extensively studied in natural language processing (NLP) and database communities recently, especially with the rapid development and dominating performance of advanced deep neural networks (DNNs). In this thesis, we work towards constructing more intelligent and user-friendly natural language interfaces (NLIs) for data querying and visualization. To meet this goal, we mainly study three critical tasks. The first task is text-to-vis, which automatically translates natural language questions into data visualizations (DVs). It can be treated as combining the automatic machine translation problem with the DV problem. To tackle this task, we propose a novel hybrid retrieval-generation framework named RGVisNet. RGVisNet integrates both the retrieval- and the generation-based approach to combine the merits of both methods. Specifically, it retrieves the most appropriate DV query candidate as a prototype from the DV query codebase and then revises the prototype to generate the desired DV query. The second task is CoVis, short for Conversational Text-to-Vis, which combines the dialogue system with DV and aims to compose DVs by a successive series of exchanges between the DV system and the users. Since CoVis is a new task without closely relevant existing studies, we first build a benchmark dataset named Dial-NVBench, including dialogue sessions with a sequence of queries (from a user) and responses (from the system). The ultimate goal of each dialogue session is to create a suitable DV; However, this process can contain diverse dialogue queries, such as seeking information about the dataset, manipulating parts of the data, and visualizing the data. Then, we propose a multi-modal neural network named MMCoVisNet to answer these DV-related queries. In particular, MMCoVisNet first fully understands the dialogue context and determines the corresponding responses. Then, it uses adaptive decoders to provide the appropriate replies: a straightforward text decoder is used to produce general responses, a SQL-form decoder is applied to synthesize data querying responses, and a DV-form decoder tries to construct the appropriate DVs. The third task is Speech-to-SQL, which takes one step further and directly translates speechbased natural language questions into SQL queries. A naive solution to this problem can work in a cascaded manner: an automatic speech recognition (ASR) component followed by a text-to-SQL component. However, it requires a high-quality ASR system and helps with the error-compounding problem between the two components, resulting in limited performance. To handle these challenges, we propose a novel end-to-end neural network named SpeechSQLNet to directly convert speechbased questions into SQL statements without an external ASR step. SpeechSQLNet can fully use the rich linguistic information presented in the speech. As far as we know, this is the first study to directly synthesize SQL based on common natural language questions rather than a natural language-based version of SQL. To analyze the feasibility of the designed problem and model, we further construct a dataset named SpeechQL, by piggybacking the widely-used text-to-SQL datasets. Extensive experimental evaluations on this dataset show that SpeechSQLNet can directly synthesize high-quality SQL queries from human speech, outperforming various competitive counterparts and the cascaded methods in exact match accuracies. Date: Friday, 15 March 2024 Time: 2:00pm - 4:00pm Venue: Room 5501 Lifts 25/26 Chairman: Prof. Ping TAN (ECE) Committee Members: Prof. Raymond WONG (Supervisor) Prof. Xiaojuan MA Prof. Xiaofang ZHOU Prof. Kam Tuen LAW (PHYS) Prof. Jianping WANG (CityU)