Exploiting hierarchical domain structure to compute similarity

Ganesan, Prasanna; Garcia-Molina, Hector; Widom, Jennifer

doi:10.1145/635484.635487

by Prasanna Ganesan, Hector Garcia-Molina, Jennifer Widom

Abstract:

The notion of similarity between objects finds use in many contexts, for example, in search engines, collaborative filtering, and clustering. Objects being compared often are modeled as sets, with their similarity traditionally determined based on set intersection. Intersection-based measures do not accurately capture similarity in certain domains, such as when the data is sparse or when there are known relationships between items within sets. We propose new measures that exploit a hierarchical domain structure in order to produce more intuitive similarity scores. We extend our similarity measures to provide appropriate results in the presence of multisets (also handled unsatisfactorily by traditional measures), for example, to correctly compute the similarity between customers who buy several instances of the same product (say milk), or who buy several products in the same category (say dairy products). We also provide an experimental comparison of our measures against traditional similarity measures, and report on a user study that evaluated how well our measures match human intuition.

View PDF

Reference:

Exploiting hierarchical domain structure to compute similarity (Prasanna Ganesan, Hector Garcia-Molina, Jennifer Widom), In ACM Transactions on Information Systems, ACM, volume 21, 2003.

Bibtex Entry:

@article{Ganesan2003,
abstract = {The notion of similarity between objects finds use in many contexts, for example, in search engines, collaborative filtering, and clustering. Objects being compared often are modeled as sets, with their similarity traditionally determined based on set intersection. Intersection-based measures do not accurately capture similarity in certain domains, such as when the data is sparse or when there are known relationships between items within sets. We propose new measures that exploit a hierarchical domain structure in order to produce more intuitive similarity scores. We extend our similarity measures to provide appropriate results in the presence of multisets (also handled unsatisfactorily by traditional measures), for example, to correctly compute the similarity between customers who buy several instances of the same product (say milk), or who buy several products in the same category (say dairy products). We also provide an experimental comparison of our measures against traditional similarity measures, and report on a user study that evaluated how well our measures match human intuition.},
annote = {
        From Duplicate 2 ( 
        
        
          Exploiting hierarchical domain structure to compute similarity
        
        
         - Ganesan, Prasanna; Garcia-Molina, Hector; Widom, Jennifer )

        
        

        

        

      },
author = {Ganesan, Prasanna and Garcia-Molina, Hector and Widom, Jennifer},
doi = {10.1145/635484.635487},
issn = {10468188},
journal = {ACM Transactions on Information Systems},
keywords = {SML-LIB-BIBLIO,lang:ENG},
mendeley-tags = {SML-LIB-BIBLIO,lang:ENG},
month = jan,
number = {1},
pages = {64--93},
publisher = {ACM},
title = {{Exploiting hierarchical domain structure to compute similarity}},
url = {http://ilpubs.stanford.edu:8090/498/1/2001-27.pdf http://portal.acm.org/citation.cfm?id=635487 http://portal.acm.org/citation.cfm?doid=635484.635487},
volume = {21},
year = {2003}
}