MSc thesis of Dimitrios Nikolopoulos
Supervisor: Ioannis Emiris
Consider a set S of points in a real D-dimensional space RD, where distances are defined using function ∆ : RD × RD → R (the Euclidean metric). Nearest neighbor search is an optimization problem for finding the closest points in S to a given query point q ∈ RD. Given a positive real ε > 0 then a point p ∈ S is a (1+ε)-approximate nearest neighbor of the query point q ∈ RD if dist(q, p) ≤ (1 + ε)dist(q, pnn) where pnn ∈ S is the true nearest neighbor to q. If the data that is expressed in high- dimensional space RD lies closer to an embedded manifold M of dimension d, where d ≪ D, then, we show the data may be preprocessed into the Randomly-oriented RKD-trees structure and we provide a near optimal bound on the number of levels required to reduce the size of its cells by a factor s ≥ 2. We show the data may be preprocessed into the structure in OD · N · log N time and OD · N space, so that given a query point q ∈ RD and ε > 0, a (1+ε)-approximate nearest neighbor of q may ε high-dimensional data with an underlying low-intrinsic dimensional subspace.Defended: March 19, 2014.