| Description: |
The linked open data (LOD) cloud maintains several interlinked knowledge graphs. These graphs span various domains such as government, media, life sciences, etc. The graphs are often manually curated or automatically extracted (e.g. YAGO—Yet Another Great Ontology) using information extraction techniques. They are used in various applications such as data governance, fraud detection, fact checking, etc. Although the graphs in LOD are widely used, they do not contain metadata about their representativeness (distribution of key features). Since most of the graphs are automatically curated, bias can manifest due to sensitive features and their causal influences, or through under (over)representation of certain entities (e.g. people) and relations (e.g. president-of, works-for). The aim of this work is to develop a system to automatically generate bias profiles (metadata about the representativeness of data) for knowledge graphs. As a result, the metadata can be used as a guide for users to choose bias free (balanced) datasets for their studies. Moreover, it enables researchers to quickly gauge the relevance of a graph for a problem at hand (e.g. classification task). |