Doctoral thesis
Open access

Distance and kernel based learning over composite representations

ContributorsWoznica, Adam
Defense date2008-09-03

The goal of this dissertation is to examine various aspects of the distance- and kernel-based learning paradigms applied in relational settings. We start by introducing a multi-relational representation formalism, at the core of which lie the concepts of tuples, sets and lists, which we implemented over the relational algebra language. By combining these data types we are able to model a variety of composite objects, such as trees and graphs. We proceed with the definition of various distance and kernel operators over the representation formalism. We are not constrained to a specific operator, instead, we are free to assign an operator, selected from a set of available operators, to a particular data type. The final operator over the composite objects is given as a recursive combination of operators assigned to the sub-structures which constitute the objects. We focus on mapping-based operators defined over sets. Next, we propose three new and flexible families of set kernels where the overall similarity is based on mappings between the elements of the two sets. These kernels differ from most of the existing set kernels which are based on averaging of the similarities of all the elements of the two sets. Finally, we propose a general framework for adaptively selecting representations of complex data and/or operators over representations. More precisely, our framework assumes a set of predefined representations and operators which are then combined in an optimal way. We focus only on the distance-based paradigm and we exploit previous work on metric learning over vectorial data. We use the optimal combination of different graph decompositions into substructures of specific types to define adaptive graph kernels which address the limitations of the existing kernels over these complex structures. We undertook extensive comparisons of our distance- and kernel-based relational system, which included: an empirical evaluation of various composite distances, with the focus on comparison of set distances based on mappings; an empirical evaluation of different complex kernels, with the focus on comparison of set kernels based on averaging and set kernels based on various mappings; an empirical evaluation of our adaptive framework for the tasks of combination of set distances, and combination of various graph decompositions into substructures of various types. The empirical evaluation of the system has shown that the proposed distance- and kernel-based paradigms are effective over a number of relational benchmark datasets. Additionally, in all of the examined relational problems we achieved state-of-the-art results which are better than the best results obtained using other relational systems.

  • Machine learning
  • Data mining
  • Complex structures
  • Graphs
  • Sets
  • Lists
  • Distance measures
  • Kernels
  • Support vector machines
  • SVM
  • Instance based learning
  • K nearest neighbor
  • KNN
  • Automatic classification of chemical molecules
  • Bioinformatics
Citation (ISO format)
WOZNICA, Adam. Distance and kernel based learning over composite representations. 2008. doi: 10.13097/archive-ouverte/unige:5483
Main files (1)

Technical informations

Creation03/17/2010 4:31:00 PM
First validation03/17/2010 4:31:00 PM
Update time03/14/2023 3:25:19 PM
Status update03/14/2023 3:25:19 PM
Last indexation01/29/2024 6:55:26 PM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack