Katalog Plus
Bibliothek der Frankfurt UAS
Bald neuer Katalog: sichern Sie sich schon vorab Ihre persönlichen Merklisten im Nutzerkonto: Anleitung.
Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden.  Login für vollen Zugriff.

DetCat: Detecting Categorical Outliers in Relational Datasets

Title: DetCat: Detecting Categorical Outliers in Relational Datasets
Authors: Zylinski, Arthur; Qahtan, Abdulhakim A.; Sub Data Intensive Systems; Data Intensive Systems
Publication Year: 2024
Subject Terms: categorical values; outliers; similarity metrics; syntactic structure; Taverne; General Business,Management and Accounting; General Decision Sciences
Description: Poor data quality significantly affects different data analytics tasks, leading to inaccurate decisions and poor predictions of the machine learning models. Outliers represent one of the most common data glitches that impact data quality. While detecting outliers in numerical data has been extensively studied, few attempts were made to solve the problem of detecting categorical outliers. In this paper, we introduce DetCat for detecting categorical outliers in relational datasets, by utilizing the syntactic structure of the values. For a given attribute, DetCat identifies a set of patterns that represents the majority of the values as dominating patterns. Data values that cannot be generated by the dominating patterns are declared as outliers. The demo will show the effectiveness of our tool in detecting categorical outliers and discovering the syntactical data patterns.
Document Type: book part
File Description: application/pdf
Language: English
ISSN: 2155-0751
Relation: https://dspace.library.uu.nl/handle/1874/482520
Availability: https://dspace.library.uu.nl/handle/1874/482520
Rights: info:eu-repo/semantics/OpenAccess
Accession Number: edsbas.97CD4F41
Database: BASE