en
Proceedings chapter
Open access
English

Indexability-Based Dataset Partitioning

Presented at Newark, NJ (USA), 2-4 October 2019
PublisherCham : Springer International Publishing
Publication date2019
Abstract

Indexing exploits assumptions on the inner structures of a dataset to make the nearest neighbor queries cheaper to resolve. Datasets are generally indexed at once into a unique index for similarity search. By indexing a given dataset as a whole, one faces the parameters of its global structure, which may be adverse. A typical well-studied example is a high global dimensionality of the dataset, making any indexing strategy inefficient due to the curse of dimensionality. We conjecture that a dataset may be partitioned into subsets of variable indexability. The strategy is, therefore, to define a procedure to extract parts of the dataset with predictable indexability and to adapt the index structure to this parameter. In this paper, we define and discuss indexability related to the curse of dimensionality and propose a related heuristic to partition the dataset into low-dimensional parts. Each data object is ranked according to its degree centrality, under a connected sparse graph, the Half-Space Proximal Graph (HSP). We postulate centrality measures are good predictors of dimensionality and indexability. In view of validation, we conducted an experiment using the degree centrality of the HSP graph as unique dimensionality/indexability measure. We ranked the data objects by their respective centrality degree under the HSP graph, then extracted the lower dimensional subsets, recomputed the HSP and repeated. Subsets were then indexed with an exact method in increasing, decreasing and random order. We measured the complexity of a fixed set of queries for each of the three arrangements. For each set we used a fixed dataset with 250 queries. The above single experiment demonstrated that the heuristic can extract low dimensional subsets, and also that those subsets are easier to index. This initial results demonstrate the validity of our conjecture and motivate the need for exploring further the notion of indexability and related dataset partitioning strategies.

Keywords
  • Indexability
  • Dataset partitioning
  • Spanning graph
  • Centrality measure
  • Curse of dimensionality
Citation (ISO format)
HOYOS, Angello et al. Indexability-Based Dataset Partitioning. In: 12th International Conference on Similarity Search and Applications, SISAP 2019. Newark, NJ (USA). Cham : Springer International Publishing, 2019. p. 143–150. doi: 10.1007/978-3-030-32047-8_13
Main files (1)
Proceedings chapter (Accepted version)
accessLevelPublic
Identifiers
ISBN9783030320461
264views
198downloads

Technical informations

Creation01/07/2020 4:32:00 PM
First validation01/07/2020 4:32:00 PM
Update time03/15/2023 6:41:35 PM
Status update03/15/2023 6:41:34 PM
Last indexation01/17/2024 8:41:27 AM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack