Matching Uncertain Strings

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering

FYT Presentation and Demonstration

Title: "Matching Uncertain Strings"

by

Mr. YAN Zhepeng

In many modern applications, such as bioinformatics, data integration
and sensor networking, we may collect a lot of uncertain string data.
More precisely, some characters in the string are not deterministic.
Instead, they have a few possible values, each with a certain
probability. In many cases, we are interested in retrieving all pairs
of similar strings. Yet, no known previous work studies reasonable
measures of probabilistic string similarity or efficient algorithms
which perform probabilistic string joins.

In this thesis, we first formulate the probabilistic string join
problem. Then we present two lower-bound filters, based on the q-gram
technique and the minimal edit distance, and an upper-bound filter.
These filters are effective at pruning away strings that are not
similar. Our experiments have verified the effectiveness and
efficiency of our filtering algorithms.


Date		:	14 May 2010 (Friday)

Time		:	9:50am to 10:30am

Venue		:	Room 3416

Advisor		:	Dr. Yi Ke

2nd Reader	:	Dr. Chen Lei