| Title: |
RAG pipeline for private well contamination guidance: A comparative study of retrieval and generation strategies. |
| Authors: |
El Moussaoui, Yasmine1 (AUTHOR) yasmine.elmoussaoui@uqar.ca; Adda, Mehdi1 (AUTHOR); Lessard, Lily2 (AUTHOR); Langlais, Tamari3 (AUTHOR); Turcotte, Stéphane4 (AUTHOR) |
| Source: |
Procedia Computer Science. 2025, Vol. 272, p335-342. 8p. |
| Subject Terms: |
Public health; Escherichia coli; Information storage & retrieval systems; Information retrieval; Language models; Industrial contamination; Water supply |
| Abstract: |
Access to safe drinking water remains a fundamental public health priority, particularly in rural and semi-urban areas where private wells are a primary source but often lack proper monitoring. This exposes users to microbiological risks such as E.coli and coliform bacteria. Although large language models (LLMs) hold promise in delivering accessible guidance, their performance in specialized low-resource domains remains limited. In this study, we develop a domain-adapted Retrieval-Augmented Generation (RAG) system tailored to support private well owners with contamination concerns. Starting from a naive RAG baseline, we explore key enhancements, including embedding model fine-tuning (BGE-M3) using synthetic QA pairs, query rewriting, and an adaptive reranking technique. Evaluation combines LLM-as-judge metrics via the deepeval framework, statistical significance testing, and expert review of the generated answers. Adaptive reranking with Llama delivered the highest performance (86.34% answer relevancy, 91.6% faithfulness), improved contextual relevancy, and received the highest expert-rated technical accuracy, demonstrating its advantage in factual correctness. [ABSTRACT FROM AUTHOR] |
| Database: |
Supplemental Index |