UNIGE document Scientific Article
previous document  unige:1231  next document
add to browser collection

Design of multimodal dissimilarity spaces for retrieval of multimedia documents

Published in IEEE Transactions on Pattern Analysis and Machine Intelligence. 2008, vol. 30, no. 9, p. 1520-1533
Abstract This paper proposes a novel representation space for multimodal information, enabling fast and efficient retrieval of video data. We suggest describing the documents not directly by selected multimodal features (audio, visual or text), but rather by considering cross-document similarities relatively to their multimodal characteristics. This idea leads us to propose a particular form of dissimilarity space that is adapted to the asymmetric classification problem, and in turn to the queryby-example and relevance feedback paradigm, widely used in information retrieval. Based on the proposed dissimilarity space, we then define various strategies to fuse modalities through a kernel-based learning approach. The problem of automatic kernel setting to adapt the learning process to the queries is also discussed. The properties of our strategies are studied and validated on artificial data. In a second phase, a large annotated video corpus, (ie TRECVID-05), indexed by visual, audio and text features is considered to evaluate the overall performance of the dissimilarity space and fusion strategies. The obtained results confirm the validity of the proposed approach for the representation and retrieval of multimodal information in a realtime framework.
Keywords H.2.4.e Multimedia databasesH.5.1 Multimedia Information SystemsH.5.1.f Image/video retrievalI.2.6.g Machine learningI.2.6.b Concept learning
Stable URL http://archive-ouverte.unige.ch/unige:1231
Full text
Research group Computer Vision and Multimedia Laboratory

121 hits



Deposited on : 2009-03-26

Export document
Format :
Citation style :