Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden. Login für vollen Zugriff.

Combining Semantic Matching, Word Embeddings, Transformers, and LLMs for Enhanced Document Ranking: Application in Systematic Reviews

Title:	Combining Semantic Matching, Word Embeddings, Transformers, and LLMs for Enhanced Document Ranking: Application in Systematic Reviews
Authors:	Goran Mitrov; Boris Stanoev; Sonja Gievska; Georgina Mirceva; Eftim Zdravevski
Source:	Big Data and Cognitive Computing, Vol 8, Iss 9, p 110 (2024)
Publisher Information:	MDPI AG
Publication Year:	2024
Collection:	Directory of Open Access Journals: DOAJ Articles
Subject Terms:	document ranking; systematic review; scoping review; rapid review; automated surveys; NLP toolkit; Technology
Description:	The rapid increase in scientific publications has made it challenging to keep up with the latest advancements. Conducting systematic reviews using traditional methods is both time-consuming and difficult. To address this, new review formats like rapid and scoping reviews have been introduced, reflecting an urgent need for efficient information retrieval. This challenge extends beyond academia to many organizations where numerous documents must be reviewed in relation to specific user queries. This paper focuses on improving document ranking to enhance the retrieval of relevant articles, thereby reducing the time and effort required by researchers. By applying a range of natural language processing (NLP) techniques, including rule-based matching, statistical text analysis, word embeddings, and transformer- and LLM-based approaches like Mistral LLM, we assess the article’s similarities to user-specific inputs and prioritize them according to relevance. We propose a novel methodology, Weighted Semantic Matching (WSM) + MiniLM, combining the strengths of the different methodologies. For validation, we employ global metrics such as precision at K, recall at K, average rank, median rank, and pairwise comparison metrics, including higher rank count, average rank difference, and median rank difference. Our proposed algorithm achieves optimal performance, with an average recall at 1000 of 95% and an average median rank of 185 for selected articles across the five datasets evaluated. These findings give promising results in pinpointing the relevant articles and reducing the manual work.
Document Type:	article in journal/newspaper
Language:	English
Relation:	https://www.mdpi.com/2504-2289/8/9/110; https://doaj.org/toc/2504-2289; https://doaj.org/article/6e06b9175bf8445788678d5737f8112d
DOI:	10.3390/bdcc8090110
Availability:	https://doi.org/10.3390/bdcc8090110; https://doaj.org/article/6e06b9175bf8445788678d5737f8112d
Accession Number:	edsbas.BF9A05C3
Database:	BASE