Detecting abnormal data for ontology based information integration

return to the website
by Yang Yu, Jeff Heflin
Abstract:
To better support information integration on Semantic Web data with varying degrees of quality, this paper proposes an approach to detect triples which reflect some sort of error. In particular, erroneous triples may occur due to factual errors in the original data source, misuse of the ontology by the original data source, or errors in the integration process. Although diagnosing such errors is a difficult problem, we propose that the degree to which a triple deviates from similar triples can be an important heuristic for identifying errors. We detect such “abnormal triples” by learning probabilistic rules from the reference data and checking to what extent these rules agree with the triples. The system consists of two components for two types of abnormal relational descriptions that a Semantic Web statement could have, whether accidentally or maliciously: a statement could relate two resources that are unlikely to have anything in common or an inappropriate predicate could be used to describe the relation between the two resources. The classification technique is adopted to learn statistical characteristics for detecting a suspect resource pair, i.e. there is no significant relation between the subject and the object in the statement. For the suspect usages of a predicate, the system learns semantic patterns for each predicate from indirect semantic connections between the subject / object pairs.
Reference:
Detecting abnormal data for ontology based information integration (Yang Yu, Jeff Heflin), In 2011 International Conference on Collaboration Technologies and Systems (CTS), IEEE, 2011.
Bibtex Entry:
@inproceedings{Yu2011,
abstract = {To better support information integration on Semantic Web data with varying degrees of quality, this paper proposes an approach to detect triples which reflect some sort of error. In particular, erroneous triples may occur due to factual errors in the original data source, misuse of the ontology by the original data source, or errors in the integration process. Although diagnosing such errors is a difficult problem, we propose that the degree to which a triple deviates from similar triples can be an important heuristic for identifying errors. We detect such “abnormal triples” by learning probabilistic rules from the reference data and checking to what extent these rules agree with the triples. The system consists of two components for two types of abnormal relational descriptions that a Semantic Web statement could have, whether accidentally or maliciously: a statement could relate two resources that are unlikely to have anything in common or an inappropriate predicate could be used to describe the relation between the two resources. The classification technique is adopted to learn statistical characteristics for detecting a suspect resource pair, i.e. there is no significant relation between the subject and the object in the statement. For the suspect usages of a predicate, the system learns semantic patterns for each predicate from indirect semantic connections between the subject / object pairs.},
author = {Yu, Yang and Heflin, Jeff},
booktitle = {2011 International Conference on Collaboration Technologies and Systems (CTS)},
doi = {10.1109/CTS.2011.5928721},
isbn = {978-1-61284-638-5},
keywords = {SML-LIB-BIBLIO,lang:ENG},
mendeley-tags = {SML-LIB-BIBLIO,lang:ENG},
month = may,
pages = {431--438},
publisher = {IEEE},
title = {{Detecting abnormal data for ontology based information integration}},
url = {http://ieeexplore.ieee.org/xpl/freeabs\_all.jsp?arnumber=5928721},
year = {2011}
}
Powered by bibtexbrowser