Katalog Plus
Bibliothek der Frankfurt UAS
Bald neuer Katalog: sichern Sie sich schon vorab Ihre persönlichen Merklisten im Nutzerkonto: Anleitung.
Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden.  Login für vollen Zugriff.

Progressive Entity Matching: A Design Space Exploration

Title: Progressive Entity Matching: A Design Space Exploration
Authors: Maciejewski, Jakub; Nikoletos, Konstantinos; Papadakis, George; Velegrakis, Yannis; Sub Data Intensive Systems
Publication Year: 2025
Description: Entity Resolution (ER) is typically implemented as a batch task that processes all available data before identifying duplicate records. However, applications with time or computational constraints, e.g., those running in the cloud, require a progressive approach that produces results in a pay-as-you-go fashion. Numerous algorithms have been proposed for Progressive ER in the literature. In this work, we propose a novel framework for Progressive Entity Matching that organizes relevant techniques into four consecutive steps: (i) filtering, which reduces the search space to the most likely candidate matches, (ii) weighting, which associates every pair of candidate matches with a similarity score, (iii) scheduling, which prioritizes the execution of the candidate matches so that the real duplicates precede the non-matching pairs, and (iv) matching, which applies a complex, matching function to the pairs in the order defined by the previous step. We associate each step with existing and novel techniques, illustrating that our framework overall generates a superset of the main existing works in the field. We select the most representative combinations resulting from our framework and fine-tune them over 10 established datasets for Record Linkage and 8 for Deduplication, with our results indicating that our taxonomy yields a wide range of high performing progressive techniques both in terms of effectiveness and time efficiency.
Document Type: article in journal/newspaper
File Description: application/pdf
Language: English
ISSN: 2836-6573
Relation: https://dspace.library.uu.nl/handle/1874/475481
Availability: https://dspace.library.uu.nl/handle/1874/475481
Rights: info:eu-repo/semantics/OpenAccess
Accession Number: edsbas.4605202
Database: BASE