CONSTRUCTING NATURAL LANGUAGE INTERFACES FOR DATA VISUALIZATION

PhD Thesis Proposal Defence


Title: "CONSTRUCTING NATURAL LANGUAGE INTERFACES FOR DATA VISUALIZATION"

by

Mr. Yuanfeng SONG


Abstract:

We live in the era of Big Data, and a considerable amount of the world's data 
is stored in relational databases. Composing programming codes in some 
declarative visualization languages is necessary to access, analyze and 
visualize this data. However, these declarative visualization languages usually 
have a steep learning curve which may block beginners and non-technical users. 
To facilitate the end users to perform data visualization (DV), automatically 
translating natural language questions to DVs, has been proposed and 
extensively studied in natural language processing (NLP) and database 
communities recently, especially with the rapid development and dominating 
performance of advanced deep neural networks.

In this proposal, we work towards constructing more intelligent and 
user-friendly natural language interfaces (NLIs) for DV.  To meet this goal, we 
mainly study two critical tasks. The first task is text-to-vis, which 
automatically translates natural language questions into DVs. It can be treated 
as combining the automatic machine translation problem with the DV problem. We 
propose a novel hybrid retrieval-generation framework named RGVisNet to tackle 
this task. RGVisNet integrates both the retrieval- and the generation-based 
approach to combine the merits of both methods. Specifically, it retrieves the 
most relevant DV query candidate as a prototype from the DV query codebase and 
then revises the prototype to generate the desired DV query.

The second task is CoVis, short for Conversational Text-to-Vis, which combines 
the dialogue system with DV and aims to compose data visualizations through a 
successive series of exchanges between the DV system and the users. Since CoVis 
is a new task with no literature, we first build a benchmark dataset named 
Dial-NVBench, including dialogue sessions with a sequence of queries (from a 
user) and responses (from the system). The ultimate goal of each dialogue 
session is to create a suitable DV. However, this process can contain diverse 
dialogue queries, such as seeking information about the dataset, manipulating 
parts of the data, and visualizing the data. Then, we propose a multi-modal 
neural network named MMCoVisNet to answer these DV-related queries. In 
particular, MMCoVisNet first fully understands the dialogue context and 
determines the corresponding responses Then, it uses adaptive decoders to 
provide the appropriate replies: a straightforward text decoder is used to 
produce general responses, a SQL-form decoder is applied to synthesize data 
querying responses, and a DV-form decoder tries to construct the appropriate 
DVs.

Few studies have been conducted in the community on advanced NLP techniques for 
DV topics. We hope this proposal will shed some light on more studies in NLP 
for DV direction to promote the development of both areas.


Date:			Wednesday, 10 May 2023

Time:                  	4:30pm - 6:30pm

Venue:			Room 4472
  			lifts 25/26

Committee Members:	Prof. Raymond Wong (Supervisor)
 			Prof. Xiaofang Zhou (Chairperson)
 			Dr. Xiaojuan Ma
 			Dr. Yangqiu Song


**** ALL are Welcome ****