Proceedings chapter
OA Policy
English

Word Distributions for Thematic Segmentation in a Support Vector Machine Approach

Presented atNew York (USA), 8-9 June
Publication date2006
Abstract

We investigate the appropriateness of using a technique based on support vector machines for identifying thematic structure of text streams. The thematic segmentation task is modeled as a binary classification problem, where the different classes correspond to the presence or the absence of a thematic boundary. Experiments are conducted with this approach by using features based on word distributions through text. We provide empirical evidence that our approach is robust, by showing good performance on three different data sets. In particular, substantial improvement is obtained over previously published results of word distribution based systems when evaluation is done on a corpus of recorded and transcribed multi-party dialogs.

Citation (ISO format)
GEORGESCUL, Maria, CLARK, Alexander, ARMSTRONG, Susan. Word Distributions for Thematic Segmentation in a Support Vector Machine Approach. In: Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X). New York (USA). [s.l.] : [s.n.], 2006. p. 101–108.
Main files (1)
Proceedings chapter
accessLevelPublic
Identifiers
  • PID : unige:3490
644views
1168downloads

Technical informations

Creation02/10/2009 11:31:28
First validation02/10/2009 11:31:28
Update time14/03/2023 16:14:59
Status update14/03/2023 16:14:59
Last indexation29/10/2024 13:19:16
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack