Probabilistic XML: Survey and Challenges

--------------------------------------------------------------------
The Hong Kong University of Science & Technology
Department of Computer Science and Engineering
Human Language Technology Center
--------------------------------------------------------------------
Speaker:	Dr. Pierre SENELLART
		Computer Science and Networking Department
		Télécom ParisTech
		France

Title:		"Probabilistic XML: Survey and Challenges"

Date:		Tuesday, 10 November 2009

Time:		11:00am - 12 noon

Venue:		Room 2404 (via lifts 17/18), HKUST

Abstract:

A large number of automatic tasks on real-world data generate imprecise
results, e.g., information extraction, natural language processing, data
mining. Moreover, in many of these tasks, information is represented in a
semi-structured way, either due to an inherent tree-like structure of the
original data, or because it is natural to represent derived information
or knowledge in a hierarchical manner. A number of recent works have dealt
with representing uncertain information in XML. We present a unifying
model for these works, distinguishing two main classes of frameworks,
depending whether arbitrary probabilistic dependencies are allowed or not.
For these two classes, we discuss expressiveness, query efficiency, update
capabilities. We also go over more recent work on the use of continuous
probabilistic distributions. Finally, we aim at providing insight into the
important open problems of probabilistic XML, by discussing the connection
with relational database models, the limitations of existing frameworks,
and other topics of interest.


*******************
Biography:

Dr. Pierre Senellart is an Associate Professor in the Computer Science and
Networking Department at Télécom ParisTech, the French leading engineering
school specialized in information technology. He obtained his M.Sc. (2003)
and his Ph.D. (2007) in Computer Science from Université Paris-Sud,
studying under the supervision of Serge Abiteboul. Pierre Senellart has
published in internationally renowned conferences and journals (PODS,
AAAI, VLDB Journal, etc.) He has been a member of the program committee of
ECML/PKDD, WWW, VLDB, ICDE, a member of the repeatability committee of
SIGMOD, and has performed reviews for various journals, such as VLDB
Journal, JCSS, DKE, Information Systems, and Communications of the ACM.
His research interests focus around theoretical aspects of database
management systems and the World Wide Web, and more specifically on the
intentional indexing of the deep Web, probabilistic XML databases, and
graph mining. He also has an interest in natural language processing, and
has been collaborating with SYSTRAN, the leading machine translation
company.