Title: High-dimensional Indexing
Date: Monday, 26 February 2001
Time: 4:00pm - 5:00pm
Venue: Lecture Theater F (Leung Yat Sing Lecture Theater), Academic Concourse (near lift nos. 25/26), HKUST
Abstract:
High-dimensional indexing has been considered as an important means to
facilitate fast query processing in data mining, fast retrieval and
similarity search in image and scientific databases. They are required to
support high-dimensional window/range queries, similarity range and
K-nearest neighbor queries. In this talk, I shall present two
transformation-based indexing schemes, iMinMax(theta) and iDistance. The
iMinMax(theta) method maps points in high-dimensional to
single-dimensional values determined by their maximum or minimum values
among all dimensions. With such representation, we are able to index
high-dimensional data points using a classical B+-tree. By varying the
tuning `knob', theta, we can obtain different family of iMinMax structures
that are optimized for different distributions of data sets. The
metric-based iDistance method is designed to efficiently support
similarity or K-nearest neighbor (KNN) queries. In the iDistance, the
high-dimensional space is split into partitions, and each partition is
associated with an `anchor' point (called reference point) whereby other
points in the same partitions can be made reference to. With such
transformation, the transformed points can be indexed using a B+-tree, and
similarity search in the high-dimensional space is performed as a sequence
of increasingly larger range queries on the single dimensional space. We
conducted extensive experiments using different data sets, and in this
talk, I shall present the results.
The major strengths of both indexes are their simplicity, efficiency, and
the fact that they can be integrated into existing systems effectively. In
fact, both methods are being used in a commercial image system.
Biography:
Ms Cui YU obtained her B. Eng from Nanjing University of Aeronautics and
Astronautics in 1995. She did her PhD at National University of Singapore,
and has recently submitted her thesis. Her research interests include
indexing, clustering techniques and query processing. She has two patents
pending, and has participated in a startup called GeoFoto.