Invariant content-based image retrieval using the Fourier-Mellin transform

Published in S. Singh. International Conference on Advances in Pattern Recognition (ICAPR'98). Plymouth (UK) - 23-25 November 1998 - Springer. 1999, p. 73-82
Abstract Recent advances in storage, computing and communication technology have created the need for efficient, user-friendly access methods to multimedia archives. In this paper we address the problem of automatically extracting visual descriptions suitable for indexing images and videos in a database. A new method is proposed, and its applicability is shown using a collection of still images extracted from a video archive. Contrarily to classical approaches, which describe different image aspects (e.g. color, shape, texture) separately, we take on a holistic approach, through the use of integral transforms. In this way, a unique multidimensional descriptor is available to represent all image aspects, and the user is not required to combine multiple independent rankings. With respect to other holistic approaches, such as those based on the wavelet transform, we seek a superior robustness to image transformations such as translation, rotation, and scaling. [insert abstract2] Invariance to rotation, translation and scaling has been verified for the ideal case of rigid 2D image transformations, as well as using images that have been transformed through camera motion (pan/tilt/rotation) and zooming effects. An experimental database has been created using various TV news clips. Shots presenting considerable camera motion, zooming, as well as unrestricted subject motion have been detected, and a number of still images have been extracted from each of them, for a total of 2'082 images. This shot-based clustering naturally provides a ground truth for the desired similarity rankings. Experimental results yield on average 67% recall for the 12 top-ranked hits, and 54% precision at 100% recall. This shows that, although the signature is only meant to conceal rigid 2D euclidean transformations, it is highly resilient to much more complex transformations (projection, arbitrary subject motion, subject appearance/disappearance), and seemingly captures perceptually relevant image features.
Research groups Viper group
Computer Vision and Multimedia Laboratory
MILANESE, Ruggero, CHERBULIEZ, Michel, PUN, Thierry. Invariant content-based image retrieval using the Fourier-Mellin transform. In: S. Singh (Ed.). International Conference on Advances in Pattern Recognition (ICAPR'98). Plymouth (UK). [s.l.] : Springer, 1999. p. 73-82. https://archive-ouverte.unige.ch/unige:47832

