en
Proceedings chapter
Open access
English

SIB Text Mining at TREC 2019 Deep Learning Track: Working Note

Published inThe Twenty-Eighth Text REtrieval Conference (TREC 2019) Proceedings, Editors Voorhees, Ellen M. & Ellis, Angela
Presented at Gaithersburg (Maryland), November 13-15, 2019.
PublisherNational Institute of Standards and Technology (NIST)
Collection
  • NIST Special Publication; 1250
Publication date2019
Abstract

The TREC 2019 Deep Learning task aims at studying information retrieval in a large training data regime. It includes two tasks: the document ranking task (1) and the passage ranking task (2). Both of these tasks had a full ranking (a) and reranking (b) subtasks. The SIB Text Mining group participated at the full document ranking subtask (1a). In order to retrieve pertinent documents in the 3.2 million documents corpus, our strategy was two-fold. At first, we used a BM25 model to retrieve a subset of documents relevant to a query. We also tried to improve recall by using query expansion. The second step consisted in reranking the retrieved subset using an original model, so-called query2doc. This model, which has been designed to predict if a query-document pair was a good candidate to be ranked in position #1, was trained using the training dataset provided for the task. Our baseline, which is basically a BM25 ranking performed the best and achieve a MAP of 0.2892. Results of the query2doc run clearly indicates that the query2doc model could not learn any meaningful relationship. More precisely, to explain such a failure, we hypothesize that using documents returned by our baseline model as negative items confused our model. As future steps, it will be interesting to take into account features such as the document’s BM25 score as well as the number of times a document’s URL is mentioned in the corpus and use them along with learning to rank algorithms.

eng
Affiliation Not a UNIGE publication
Citation (ISO format)
KNAFOU, Julien David Marc et al. SIB Text Mining at TREC 2019 Deep Learning Track: Working Note. In: The Twenty-Eighth Text REtrieval Conference (TREC 2019) Proceedings. Gaithersburg (Maryland). [s.l.] : National Institute of Standards and Technology (NIST), 2019. (NIST Special Publication)
Main files (1)
Proceedings chapter (Published version)
Identifiers
  • PID : unige:159644
152views
47downloads

Technical informations

Creation02/03/2022 12:34:00 PM
First validation02/03/2022 12:34:00 PM
Update time03/16/2023 2:53:21 AM
Status update03/16/2023 2:53:20 AM
Last indexation02/12/2024 1:37:52 PM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack