| Title: |
FIONA: Detecting Syntactical Outliers in Attributes with Categorical Values |
| Authors: |
Tsiamis, Thanos; Qahtan, Hakim; Data Intensive Systems; Sub Data Intensive Systems |
| Publication Year: |
2025 |
| Subject Terms: |
Categorical outliers; generalization tree; patterns; similarity measures; syntactic structure; Taverne |
| Description: |
Outlier detection is crucial for data cleaning, influencing analysis and decision-making. While numerical outlier detection is well-studied, identifying outliers in relational data with categorical attributes poses greater challenges due to difficulties in defining a suitable similarity measure. Current approaches for detecting categorical outliers are based on coding the categorical values as numerical values, using the frequency as an indicator of the outlierness score and extracting predefined syntactic structures of the values. In this paper, we propose FIONA (FInding Outliers iN Attributes) to detect outliers in attributes with categorical values. Since categorical values in the relational model usually follow specific syntactic structures, FIONA defines a similarity measure that can reveal the hidden patterns and identify a set of dominant patterns in the data. Values that do not conform to the dominating patterns are declared as outliers. In comparison to alternative tools, FIONA accurately identifies outliers and dominant patterns within datasets and provides a clear explanation for declaring a given value as an outlier. |
| Document Type: |
conference object |
| File Description: |
application/pdf |
| Language: |
English |
| Relation: |
https://dspace.library.uu.nl/handle/1874/463123 |
| Availability: |
https://dspace.library.uu.nl/handle/1874/463123 |
| Rights: |
info:eu-repo/semantics/OpenAccess |
| Accession Number: |
edsbas.E416F23 |
| Database: |
BASE |