Constructing Natural Language Interfaces for Data Querying and Visualization

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Constructing Natural Language Interfaces for Data Querying and
Visualization"

By

Mr. Yuanfeng SONG


Abstract:

We live in the era of Big Data, and a considerable amount of the world's data
is stored in relational databases. A necessary step to access, analyze,
manipulate, and visualize this data is composing programming codes in some
programming languages (e.g., structure query languages and declarative
visualization languages). However, these programming languages usually have a
steep learning curve which may block beginners and non-technical background
users. To facilitate the end users to perform data querying and visualization,
automatically translating natural language questions to these programming
languages has been proposed and extensively studied in natural language
processing (NLP) and database communities recently, especially with the rapid
development and dominating performance of advanced deep neural networks (DNNs).

In this thesis, we work towards constructing more intelligent and user-friendly
natural language interfaces (NLIs) for data querying and visualization. To meet
this goal, we mainly study three critical tasks. The first task is text-to-vis,
which automatically translates natural language questions into data
visualizations (DVs). It can be treated as combining the automatic machine
translation problem with the DV problem. To tackle this task, we propose a
novel hybrid retrieval-generation framework named RGVisNet. RGVisNet integrates
both the retrieval- and the generation-based approach to combine the merits of
both methods. Specifically, it retrieves the most appropriate DV query
candidate as a prototype from the DV query codebase and then revises the
prototype to generate the desired DV query.

The second task is CoVis, short for Conversational Text-to-Vis, which combines
the dialogue system with DV and aims to compose DVs by a successive series of
exchanges between the DV system and the users. Since CoVis is a new task
without closely relevant existing studies, we first build a benchmark dataset
named Dial-NVBench, including dialogue sessions with a sequence of queries
(from a user) and responses (from the system). The ultimate goal of each
dialogue session is to create a suitable DV; However, this process can contain
diverse dialogue queries, such as seeking information about the dataset,
manipulating parts of the data, and visualizing the data. Then, we propose a
multi-modal neural network named MMCoVisNet to answer these DV-related queries.
In particular, MMCoVisNet first fully understands the dialogue context and
determines the corresponding responses. Then, it uses adaptive decoders to
provide the appropriate replies: a straightforward text decoder is used to
produce general responses, a SQL-form decoder is applied to synthesize data
querying responses, and a DV-form decoder tries to construct the appropriate
DVs.

The third task is Speech-to-SQL, which takes one step further and directly
translates speechbased natural language questions into SQL queries. A naive
solution to this problem can work in a cascaded manner: an automatic speech
recognition (ASR) component followed by a text-to-SQL component. However, it
requires a high-quality ASR system and helps with the error-compounding problem
between the two components, resulting in limited performance. To handle these
challenges, we propose a novel end-to-end neural network named SpeechSQLNet to
directly convert speechbased questions into SQL statements without an external
ASR step. SpeechSQLNet can fully use the rich linguistic information presented
in the speech. As far as we know, this is the first study to directly
synthesize SQL based on common natural language questions rather than a natural
language-based version of SQL. To analyze the feasibility of the designed
problem and model, we further construct a dataset named SpeechQL, by
piggybacking the widely-used text-to-SQL datasets. Extensive experimental
evaluations on this dataset show that SpeechSQLNet can directly synthesize
high-quality SQL queries from human speech, outperforming various competitive
counterparts and the cascaded methods in exact match accuracies.


Date:                   Friday, 15 March 2024

Time:                   2:00pm - 4:00pm

Venue:                  Room 5501
                        Lifts 25/26

Chairman:               Prof. Ping TAN (ECE)

Committee Members:      Prof. Raymond WONG (Supervisor)
                        Prof. Xiaojuan MA
                        Prof. Xiaofang ZHOU
                        Prof. Kam Tuen LAW (PHYS)
                        Prof. Jianping WANG (CityU)