From Dirt to Shovels: Automatic Tools Generation from Ad Hoc Data

Speaker:	Dr. Kenny ZHU
		Princeton University

Title: 		"From Dirt to Shovels: Automatic Tools Generation
		 from Ad Hoc Data"

Date: 		Friday, Feb 13, 2009

Time: 		1:00pm - 2:00pm

Venue: 		Lecture Theater F
		(Leung Yat Sing Lecture Theater, near lifts 25/26)
		HKUST

Abstract:

Ad hoc data is any non-standard, semi-structured data for which no useful
data analysis and transformation tools are readily available. Such data is
pervasive in many areas such as scientific repositories, financial data,
system logs and configs, sensor outputs, etc. In this work, we demonstrate
that it is possible to generate a suite of useful data processing tools
directly from the ad hoc data itself, without any human intervention, and
thus improves the productivity of data analysts.

The key technical contribution of the work is a multi-phase algorithm that
automatically infers the structure of an ad hoc data source, and produces
a format specification in a declarative language called PADS. Such
specifications can be used to generate printing and parsing libraries as
well as other useful tools for processing the data. At the end of the
talk, I will briefly introduce a few exciting new ideas in some on-going
work that further improve the productivity of ad hoc data users.


*****************
Biography:


Kenny Zhu is a Postdoctoral Researcher at Princeton University. He
graduated with B.Eng in Electrical Engineering and Ph.D in Computer
Science, both from National University of Singapore. Prior to joining
Princeton in 2007, he was a software design engineer at Microsoft in
Seattle. Kenny's main research interests are languages and systems for
data processing, artificial intelligence and concurrent/distributed
systems. He has published in top-tier conferences such as POPL, SIGMOD,
ICDE and ICLP, and has been actively reviewing for various conferences and
journals. His current research is centered around the PADS data
description language.