Scientific article

English

An Automated Literature Review Tool (LiteRev) for Streamlining and Accelerating Research Using Natural Language Processing and Machine Learning: Descriptive Performance Evaluation Study

ContributorsOrel, Erol

; Ciglenecki, Iza

; Thiabaud, Amaury

; Temerev, Alexander

; Calmy, Alexandra

; Keiser, Olivia

; Merzouki, Fatma Aziza

Published inJMIR. Journal of medical internet research, vol. 25, e39736

Publication date2023-09-15

First online date2023-09-15

Abstract

Background: Literature reviews (LRs) identify, evaluate, and synthesize relevant papers to a particular research question to advance understanding and support decision-making. However, LRs, especially traditional systematic reviews, are slow, resource-intensive, and become outdated quickly.

Objective: LiteRev is an advanced and enhanced version of an existing automation tool designed to assist researchers in conducting LRs through the implementation of cutting-edge technologies such as natural language processing and machine learning techniques. In this paper, we present a comprehensive explanation of LiteRev's capabilities, its methodology, and an evaluation of its accuracy and efficiency to a manual LR, highlighting the benefits of using LiteRev.

Methods: Based on the user's query, LiteRev performs an automated search on a wide range of open-access databases and retrieves relevant metadata on the resulting papers, including abstracts or full texts when available. These abstracts (or full texts) are text processed and represented as a term frequency-inverse document frequency matrix. Using dimensionality reduction (pairwise controlled manifold approximation) and clustering (hierarchical density-based spatial clustering of applications with noise) techniques, the corpus is divided into different topics described by a list of the most important keywords. The user can then select one or several topics of interest, enter additional keywords to refine its search, or provide key papers to the research question. Based on these inputs, LiteRev performs a k-nearest neighbor (k-NN) search and suggests a list of potentially interesting papers. By tagging the relevant ones, the user triggers new k-NN searches until no additional paper is suggested for screening. To assess the performance of LiteRev, we ran it in parallel to a manual LR on the burden and care for acute and early HIV infection in sub-Saharan Africa. We assessed the performance of LiteRev using true and false predictive values, recall, and work saved over sampling.

Results: LiteRev extracted, processed, and transformed text into a term frequency-inverse document frequency matrix of 631 unique papers from PubMed. The topic modeling module identified 16 topics and highlighted 2 topics of interest to the research question. Based on 18 key papers, the k-NNs module suggested 193 papers for screening out of 613 papers in total (31.5% of the whole corpus) and correctly identified 64 relevant papers out of the 87 papers found by the manual abstract screening (recall rate of 73.6%). Compared to the manual full text screening, LiteRev identified 42 relevant papers out of the 48 papers found manually (recall rate of 87.5%). This represents a total work saved over sampling of 56%.

Conclusions: We presented the features and functionalities of LiteRev, an automation tool that uses natural language processing and machine learning methods to streamline and accelerate LRs and support researchers in getting quick and in-depth overviews on any topic of interest.

Keywords

HIV
LiteRev
Acute
Automation
Clustering
Early
Literature review
Machine learning
Natural language processing
Topic
Cluster Analysis
Databases, Factual
HIV Infections
Humans
Machine Learning
Natural Language Processing
Review Literature as Topic

Affiliation entities

Research groups

Citation (ISO format)

OREL, Erol et al. An Automated Literature Review Tool (LiteRev) for Streamlining and Accelerating Research Using Natural Language Processing and Machine Learning: Descriptive Performance Evaluation Study. In: JMIR. Journal of medical internet research, 2023, vol. 25, p. e39736. doi: 10.2196/39736

Article (Published version)

Identifiers

PID : unige:172960
DOI : 10.2196/39736
PMID : 37713261
PMCID : PMC10541641

Commercial URLhttps://www.jmir.org/2023/1/e39736

Datasets

https://osf.io/c3qht/

ISSN of the journal1438-8871

85views

19downloads

Creation07/11/2023 08:49:05

First validation15/11/2023 08:22:09

Update time15/11/2023 08:22:09

Status update15/11/2023 08:22:09

Last indexation01/11/2024 06:41:53