UNIGE document Chapitre d'actes
previous document  unige:128619  next document
add to browser collection

Indexability-Based Dataset Partitioning

Hoyos, Angello
Ruiz, Ubaldo
Chávez, Edgar
Published in 12th International Conference on Similarity Search and Applications, SISAP 2019. Newark, NJ (USA) - 2-4 October 2019 - Cham: Springer International Publishing. 2019, p. 143-150
Abstract Indexing exploits assumptions on the inner structures of a dataset to make the nearest neighbor queries cheaper to resolve. Datasets are generally indexed at once into a unique index for similarity search. By indexing a given dataset as a whole, one faces the parameters of its global structure, which may be adverse. A typical well-studied example is a high global dimensionality of the dataset, making any indexing strategy inefficient due to the curse of dimensionality. We conjecture that a dataset may be partitioned into subsets of variable indexability. The strategy is, therefore, to define a procedure to extract parts of the dataset with predictable indexability and to adapt the index structure to this parameter. In this paper, we define and discuss indexability related to the curse of dimensionality and propose a related heuristic to partition the dataset into low-dimensional parts. Each data object is ranked according to its degree centrality, under a connected sparse graph, the Half-Space Proximal Graph (HSP). We postulate centrality measures are good predictors of dimensionality and indexability. In view of validation, we conducted an experiment using the degree centrality of the HSP graph as unique dimensionality/indexability measure. We ranked the data objects by their respective centrality degree under the HSP graph, then extracted the lower dimensional subsets, recomputed the HSP and repeated. Subsets were then indexed with an exact method in increasing, decreasing and random order. We measured the complexity of a fixed set of queries for each of the three arrangements. For each set we used a fixed dataset with 250 queries. The above single experiment demonstrated that the heuristic can extract low dimensional subsets, and also that those subsets are easier to index. This initial results demonstrate the validity of our conjecture and motivate the need for exploring further the notion of indexability and related dataset partitioning strategies.
Keywords IndexabilityDataset partitioningSpanning graphCentrality measureCurse of dimensionality
ISBN: 9783030320461
Full text
Research groups Computer Vision and Multimedia Laboratory
Viper group
(ISO format)
HOYOS, Angello et al. Indexability-Based Dataset Partitioning. In: 12th International Conference on Similarity Search and Applications, SISAP 2019. Newark, NJ (USA). Cham : Springer International Publishing, 2019. p. 143-150. doi: 10.1007/978-3-030-32047-8_13 https://archive-ouverte.unige.ch/unige:128619

75 hits



Deposited on : 2020-01-10

Export document
Format :
Citation style :