Meta similarity

On, Byung-Won; Lee, Ingyu

doi:10.1007/s10489-010-0226-3

by Byung-Won On, Ingyu Lee

Abstract:

To see if two given strings are matched, various string similarity metrics have been employed and these string similarities can be categorized into three classes: (a) Edit-distance-based similarities, (b) Token-based similarities, and (c) Hybrid similarities. In essence, since different types of string similarities have different pros and cons in measuring the similarity between two strings, string similarity metrics in each class are likely to work well for particular data sets. Toward this problem, we propose a novel Meta Similarity that both (i) outperforms the existing similarity metrics and (ii) is the least affected by a variety of data sets. Our claim is empirically validated through extensive experimental tests—our proposal shows an improvement to the largest 20\% average recall, compared to the best case of the existing similarity metrics and our method is the most stable, showing from 0.95 to 1.0 average recall range in all the data sets.

View PDF

Reference:

Meta similarity (Byung-Won On, Ingyu Lee), In Applied Intelligence, volume 35, 2010.

Bibtex Entry:

@article{On2010,
abstract = {To see if two given strings are matched, various string similarity metrics have been employed and these string similarities can be categorized into three classes: (a) Edit-distance-based similarities, (b) Token-based similarities, and (c) Hybrid similarities. In essence, since different types of string similarities have different pros and cons in measuring the similarity between two strings, string similarity metrics in each class are likely to work well for particular data sets. Toward this problem, we propose a novel Meta Similarity that both (i) outperforms the existing similarity metrics and (ii) is the least affected by a variety of data sets. Our claim is empirically validated through extensive experimental tests—our proposal shows an improvement to the largest 20\% average recall, compared to the best case of the existing similarity metrics and our method is the most stable, showing from 0.95 to 1.0 average recall range in all the data sets.},
author = {On, Byung-Won and Lee, Ingyu},
doi = {10.1007/s10489-010-0226-3},
issn = {0924-669X},
journal = {Applied Intelligence},
keywords = {SML-LIB-BIBLIO,lang:ENG},
mendeley-tags = {SML-LIB-BIBLIO,lang:ENG},
month = mar,
number = {3},
pages = {359--374},
title = {{Meta similarity}},
url = {http://www.springerlink.com/index/10.1007/s10489-010-0226-3},
volume = {35},
year = {2010}
}