Semantic Measures and Related Metrics
This section presents the Semantic similarity or relatedness measures, and related metrics, which are supported by the Semantic Measures Library.
It can be quite technical for those who don't know exactly what semantic measures are - in this case you will probably need to know more about semantic measures.
As specified in the documentation section, the library currently mostly focuses on graph-based semantic measures. Probably not all implemented measures are specified in this documentation, developers can consult the source code for the complete and updated list of measures. Users can also ask questions regarding specific implementations or measures on the mailing-list.
Two status are used to characterised the measures/metrics:
- Supported: the implementation has been tested and can be used for algorithm definitions or semantic measure evaluation.
- Experimental: the implementation has not been heavily tested in order to ensure it is in perfect agreement with the original definition.
Two flags are specified for each semantic measures, they are used to refer to a measure in the source code or using the toolkit:
- Flag: The string which must be specified in the source code or in the XML configuration file.
Using the flag in source code (other examples):... // The Flag is used to specify the measure to use. SMconf smConf = new SMconf(SMConstants.FLAG_SIM_PAIRWISE_DAG_NODE_LIN_1998); double sim = engine.computePairwiseSim(smConf, concept_1, concept_2);
Using the flag in the XML configuration (other examples):<measures type = "pairwise"> <measure id = 'dice' flag = "SIM_FRAMEWORK_DAG_SET_DICE_1945" /> </measures>
- CLI Flag: The string which must be specified using the Command Line Interface (CLI) using SML-Toolkit profiles. Note that not all measures are supported through the command line interface, only those supported have a CLI Flag specified. Please refer to this documentation section for examples of use of CLI Flags.
Measures and metrics which can be used using the library and the toolkit are specified in a public spreadsheet.
An embedded version is proposed below, use the tabs to switch between different types of metrics.
Graph-based Semantic measures (Supported)
The measures listed below are currently supported by the library.
- Measures which can be used to compare
- Two concepts defined in a taxonomy (pairwise measures)
- Two groups of concepts defined in a taxonomy (groupwise measures)
- Measure used to assess the information content (specificity) of a concept defined in a taxonomy
Pairwise measures: Semantic measures between two concepts
- Edge-based measures
- Node-based measures
- Hybrid measures
Edge-based measures
Rada 1989
Shortest Path based semantic similarity measure.
Reference: Rada R, Mili H, Bicknell E, Blettner M: Development and application of a metric on semantic nets. Ieee Transactions On Systems Man And Cybernetics 1989, 19:17-30.
Status: supported
Flag: SIM_PAIRWISE_DAG_EDGE_RADA_1989
Rada 1989 LCA
Shortest Path based semantic similarity measure. In this measure the shortest path is constrained to the Least Common Ancestors of the compared concepts
Reference: Rada R, Mili H, Bicknell E, Blettner M: Development and application of a metric on semantic nets.
Ieee Transactions On Systems Man And Cybernetics 1989, 19:17-30.
Status: supported
Flag: SIM_PAIRWISE_DAG_EDGE_RADA_LCA_1989
Wu & Palmer 1994
Shortest Path based semantic similarity measure.
Reference: Wu Z, Palmer M: Verb semantics and lexical selection.
In 32nd. Annual Meeting of the Association for Computational Linguistics. 1994:133–138.
Status: supported
Flag: SIM_PAIRWISE_DAG_EDGE_WU_PALMER_1994
Leacock and Chodorow 1998
Shortest Path based semantic similarity measure.
Reference: Leacock C, Chodorow M: Combining Local Context and WordNet Similarity for Word Sense Identification.
In WordNet: An electronic lexical database. edited by Fellbaum C MIT Press; 1998:265 – 283.
Status: supported
Flag: SIM_PAIRWISE_DAG_EDGE_LEACOCK_CHODOROW_1998
Stojanovic 2001
Shortest Path based semantic similarity measure.
Reference: Stojanovic N, Alexander M, Staab S, Rudi S, York S:
SEAL - A Framework for Developing SEmantic PortALs. In Proceedings of the International Conference on Knowledge Capture. , 2097/2001,.
Status: supported
Flag: SIM_PAIRWISE_DAG_EDGE_STOJANOVIC_2001
Pekar and Staab 2002
Shortest Path based semantic similarity measure.
Reference: Pekar V, Staab S: Taxonomy learning: factoring the structure of a taxonomy into a semantic classification decision.
In COLING ’02 Proceedings of the 19th international conference on Computational linguistics.
Association for Computational Linguistics; 2002, 2:1–7.
Status: supported
Flag: SIM_PAIRWISE_DAG_EDGE_PEKAR_STAAB_2002
Kyogoku 2011
Basic implementation of Kyogoku et al. measure without edge weight estimation
Reference: Kyogoku R, Fujimoto R, Ozaki T, Ohkawa T: A method for supporting retrieval of articles on protein structure analysis considering users’ intention.
BMC Bioinformatics 2011, 12:S42. p 2
Status: supported
Flag: SIM_PAIRWISE_DAG_EDGE_KYOGOKU_BASIC_2011
IC-based measures
Resnik 1995
Resnik semantic similarity measure.
Reference: Resnik P: Using Information Content to Evaluate Semantic Similarity in a Taxonomy.
In Proceedings of the 14th International Joint Conference on Artificial Intelligence IJCAI. 1995, 1:448-453.
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_RESNIK_1995
CLI Flag: resnik
Lin 1998
Lin semantic similarity measure.
Reference: Lin D: An Information-Theoretic Definition of Similarity.
In 15th International Conference of Machine Learning. Madison,WI: 1998:296-304.
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_LIN_1998
CLI Flag: lin
Jiang and Conrath 1997
Jiang and Conrath semantic distance.
Reference: Jiang J, Conrath D: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy.
In International Conference Research on Computational Linguistics (ROCLING X). 1997, cmp-lg/970:15.
Status: supported
Flag: DIST_PAIRWISE_DAG_NODE_JIANG_CONRATH_1997
CLI Flag: jc
Jiang and Conrath Normalized 1997
Jiang and Conrath measure normalized.
Reference: Jiang J, Conrath D: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy.
In International Conference Research on Computational Linguistics (ROCLING X). 1997, cmp-lg/970:15.
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_JIANG_CONRATH_1997_NORM
CLI Flag: simjc_norm
Adaptation of JC in order to normalized values between [0,1] based on :
Applying Normalization discussed in Seco N, Veale T, Hayes J: An Intrinsic Information Content Metric for Semantic Similarity in WordNet.
In 16th European Conference on Artificial Intelligence. IOS Press; 2004, 16:1–5.
Which is a reformulation of:
Pesquita C, Faria D, Bastos H, et al.: Metrics for GO based protein semantic similarity: a systematic evaluation. BMC bioinformatics 2008, 9 Suppl 5:S4.s
The normalization makes sens only if IC of compared concepts are normalized [0;1]
Schlicker SimRel 2006
SimRel semantic similarity measure. In the original definition SimRel is a groupwise measure (i.e. the measure is defined to compare two sets of concepts).
This implementation only refer to the measure defined to compare a pair of concepts.
Reference: Schlicker, Andreas, et al. "A new measure for functional similarity of gene products based on Gene Ontology." BMC bioinformatics 7.1 (2006): 302.
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_SCHLICKER_2006
CLI Flag: schliker
SimIC 2010
SimIC semantic similarity measure.
Reference: Effectively integrating information content and structural relationship to improve the GO-based similarity measure between proteins.
Bo Li, James Z. Wang, F. Alex Feltus, Jizhong Zhou, Feng Luo
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_SIM_IC_2010
Jaccard IC
Semantic similarity measure relying on a IC-based formulation of the set-based Jaccard measure.
Reference: None
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_JACCARD_IC
Jaccard 3W IC
Semantic similarity measure relying on a IC-based formulation of the Jaccard 3W measure.
Reference: None
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_JACCARD_3W_IC
Gower and Legendre IC
Gower and Legendre IC-based abstract expression.
Reference: see Blanchard E, Harzallah M, Kuntz P: A generic framework for comparing semantic similarities on a subsumption hierarchy. 2008:20–24.
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_GL
Tversky IC
Semantic similarity measures based on a IC formulation of Tversky ratio Model
Reference: see Blanchard E, Harzallah M, Kuntz P: A generic framework for comparing semantic similarities on a subsumption hierarchy. 2008:20–24.
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_TVERSKY_IC
Set-based measures
Jaccard 1901
Set-based measures
Reference: Jaccard P: Distribution de la flore alpine dans le bassin des Dranses et dans quelques regions voisines.
Bulletin de la Societe Vaudoise des Sciences Naturelles 1901, 37:241 - 272.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_JACCARD_1901
Braun Blanquet 1932
Set-based measures
Reference: Braun-Blanquet J: Plant sociology: the study of plant communities. McGraw-Hill; 1932:439.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_BRAUN_BLANQUET_1932
Dice 1935
Set-based measures
Reference: Dice LR: Measures of the Amount of Ecologic Association Between Species. Ecology 1945, 26:297-302.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_DICE_1945
Ochiai 1957
Set-based measures
Reference: Ochiai A: Zoogeographic studies on the soleoid fishes found in Japan and its neighbouring regions.
Bulletin of the Japanese Society of Scientific Fischeries 1957, 22:526-530
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_OCHIAI_1957
Simpson 1960
Set-based measures
Reference: Simpson GG: Notes on the measurement of faunal resemblance.
American Journal of Science 1960, 258A:300-311.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_SIMPSON_1960
Sokal & Sneath 1963
Set-based measures
Reference: Sokal RR, Sneath PHA: Principles of numerical taxonomy. San Francisco: W. H. Freeman and Company; 1963:359.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_SOKAL_SNEATH_1963
Tversky 1977 Abstract Model
Set-based measures
Reference: Tversky A: Features of similarity. Psychological Review 1977, 84:327-352.
Implementation of the contrast model in a set-based manner.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_TVERSKY_1977
Korbel 2002
Set-based measures
Reference: Korbel JO, Snel B, Huynen M a, Bork P: SHOT: a web server for the construction of genome phylogenies. Trends in genetics : TIG 2002, 18:158-62.
Cited by Lin C, Cho Y-rae, Hwang W-chang, Pei P, Zhang A: Clustering methods in protein-protein interaction network.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_KORBEL_2002
Maryland Bridge 2003
Set-based measures
Reference: Mirkin B, Koonin E: A top-down method for building genome classification trees with linear binary hierarchies.
In Bioconsensus: DIMACS Working Group Meetings on Bioconsensus: October 25-26, 2000 and October 2-5, 2001, DIMACS Center.
Amer Mathematical Society; 2003, 61:97.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_MARYLAND_BRIDGE_2003
Bader 2003
Set-based measures
Reference: Bader G, Hogue C: An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics 2003, 4:2.
Cited by Lin C, Cho Y-rae, Hwang W-chang, Pei P, Zhang A: Clustering methods in protein-protein interaction network.
In Knowledge Discovery in Bioinformatics: Techniques, Methods and Application 2006.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_BADER_2003
Knappe 2004
Set-based measures
Reference: Knappe R, Bulskov H, Andreasen T: Perspectives on ontology-based querying. International Journal of Intelligent Systems 2004, 22:739-761.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_KNAPPE_2004
Batet 2010
Set-based measures
Reference: Batet M, Sanchez D, Valls A: An ontology-based measure to compute semantic similarity in biomedicine. Journal of biomedical informatics 2010, 44:118-125.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_BATET_2010
Hybrid measures
Groupwise measures: Semantic measures between groups of concepts
Direct Groupwise measures
Jaccard 1901
Set-based measures
Reference: Jaccard P: Distribution de la flore alpine dans le bassin des Dranses et dans quelques regions voisines.
Bulletin de la Societe Vaudoise des Sciences Naturelles 1901, 37:241 - 272.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_JACCARD_1901
Braun Blanquet 1932
Set-based measures
Reference: Braun-Blanquet J: Plant sociology: the study of plant communities. McGraw-Hill; 1932:439.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_BRAUN_BLANQUET_1932
Dice 1935
Set-based measures
Reference: Dice LR: Measures of the Amount of Ecologic Association Between Species. Ecology 1945, 26:297-302.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_DICE_1945
Ochiai 1957
Set-based measures
Reference: Ochiai A: Zoogeographic studies on the soleoid fishes found in Japan and its neighbouring regions.
Bulletin of the Japanese Society of Scientific Fischeries 1957, 22:526-530
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_OCHIAI_1957
Simpson 1960
Set-based measures
Reference: Simpson GG: Notes on the measurement of faunal resemblance.
American Journal of Science 1960, 258A:300-311.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_SIMPSON_1960
Sokal & Sneath 1963
Set-based measures
Reference: Sokal RR, Sneath PHA: Principles of numerical taxonomy. San Francisco: W. H. Freeman and Company; 1963:359.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_SOKAL_SNEATH_1963
Tversky 1977 Abstract Model
Set-based measures
Reference: Tversky A: Features of similarity. Psychological Review 1977, 84:327-352.
Implementation of the contrast model in a set-based manner.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_TVERSKY_1977
Korbel 2002
Set-based measures
Reference: Korbel JO, Snel B, Huynen M a, Bork P: SHOT: a web server for the construction of genome phylogenies. Trends in genetics : TIG 2002, 18:158-62.
Cited by Lin C, Cho Y-rae, Hwang W-chang, Pei P, Zhang A: Clustering methods in protein-protein interaction network.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_KORBEL_2002
Maryland Bridge 2003
Set-based measures
Reference: Mirkin B, Koonin E: A top-down method for building genome classification trees with linear binary hierarchies.
In Bioconsensus: DIMACS Working Group Meetings on Bioconsensus: October 25-26, 2000 and October 2-5, 2001, DIMACS Center.
Amer Mathematical Society; 2003, 61:97.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_MARYLAND_BRIDGE_2003
Bader 2003
Set-based measures
Reference: Bader G, Hogue C: An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics 2003, 4:2.
Cited by Lin C, Cho Y-rae, Hwang W-chang, Pei P, Zhang A: Clustering methods in protein-protein interaction network.
In Knowledge Discovery in Bioinformatics: Techniques, Methods and Application 2006.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_BADER_2003
Knappe 2004
Set-based measures
Reference: Knappe R, Bulskov H, Andreasen T: Perspectives on ontology-based querying. International Journal of Intelligent Systems 2004, 22:739-761.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_KNAPPE_2004
Batet 2010
Set-based measures
Reference: Batet M, Sanchez D, Valls A: An ontology-based measure to compute semantic similarity in biomedicine. Journal of biomedical informatics 2010, 44:118-125.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_BATET_2010
Ali 2009
Direct groupwise measures
Reference: Ali W, Deane CM. Functionally guided alignment of protein interaction networks for module detection. Bioinformatics 2009;25:3166–73.
Status: supported
Flag: SIM_GROUPWISE_DAG_ALI_DEANE
CLI Flag: ali_and_deane
Sim GIC
Direct groupwise measures
Reference: Pesquita C, Faria D, Bastos H: Evaluating gobased semantic similarity measures. Proc. 10th Annual Bio- 2007:1-4.
Status: supported
Flag: SIM_GROUPWISE_DAG_GIC
CLI Flag: gic
SimLP
Direct groupwise measures
Reference: Gentleman R: Visualizing and distances using GO. Retrieved Jan. 10th 2007.
http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/GOvis.pdf
Status: supported
Flag: SIM_GROUPWISE_DAG_LP
CLI Flag: lp
Lee 2004
Direct groupwise measures
Reference: Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P: Coexpression analysis of human genes across many microarray data sets. Genome Research 2004, 14:1085.
Status: supported
Flag: SIM_GROUPWISE_DAG_LEE_2004
CLI Flag: lee
Term Overlap
Direct groupwise measures
Reference: Mistry M, Pavlidis P: Gene Ontology term overlap as a measure of gene functional similarity. BMC bioinformatics 2008, 9:327.
Status: supported
Flag: SIM_GROUPWISE_DAG_TO
CLI Flag: to
Normalized Term Overlap
Direct groupwise measures
Reference: Mistry M, Pavlidis P: Gene Ontology term overlap as a measure of gene functional similarity. BMC bioinformatics 2008, 9:327.
Status: supported
Flag: SIM_GROUPWISE_DAG_NTO
CLI Flag: nto
Normalized (Max) Term Overlap
Direct groupwise measures
Reference: Mistry M, Pavlidis P: Gene Ontology term overlap as a measure of gene functional similarity. BMC bioinformatics 2008, 9:327.
Status: supported
Flag: SIM_GROUPWISE_DAG_NTO_MAX
CLI Flag: nto_max
Sim UI
Direct groupwise measures
Reference: Gentleman R: Visualizing and distances using GO. Retrieved Jan. 10th 2007.
http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/GOvis.pdf
Status: supported
Flag: SIM_GROUPWISE_DAG_UI
CLI Flag: ui
Indirect Groupwise measures
Those measures can be used to aggregate the scores of all the comparisons which can be made considering the two sets of concepts.
Min
Status: supported
Flag: SIM_GROUPWISE_MIN
CLI Flag: min
Average
Status: supported
Flag: SIM_GROUPWISE_AVERAGE
CLI Flag: avg
Max
Status: supported
Flag: SIM_GROUPWISE_MAX
CLI Flag: max
Best Match Average (BMA)
Indirect groupwise measures
Reference: Schlicker A, Domingues FS, Rahnenführer J, Lengauer T: A new measure for functional similarity of gene products based on Gene Ontology. BMC Bioinformatics 2006, 7:302.
Status: supported
Flag: SIM_GROUPWISE_BMA
CLI Flag:bma
Best Match Max (BMM)
Indirect groupwise measures
Reference: Schlicker A, Domingues FS, Rahnenführer J, Lengauer T: A new measure for functional similarity of gene products based on Gene Ontology. BMC Bioinformatics 2006, 7:302.
Status: supported
Flag: SIM_GROUPWISE_BMM
CLI Flag:bmm
Graph-based Semantic measures (Experimental)
Experimental measures are those located in a package named experimental. Please consult the source code.
Information Content measures (Supported)
Extrinsic
Resnik
Reference: Resnik P: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence IJCAI. Citeseer; 1995, 1:448–453.
Status: supported
Flag: IC_ANNOT_RESNIK_1995
CLI Flag: resnik
Intrinsic
Sanchez et al.
Reference: Sanchez D, Batet M, Isern D: Ontology-based information content computation. Knowledge-Based Systems 2011, 24:297-303.
Status: supported
Flag: ICI_SANCHEZ_2011
CLI Flag: sanchez
Seco et al.
Reference: Seco N, Veale T, Hayes J: An Intrinsic Information Content Metric for Semantic Similarity in WordNet. In 16th European Conference on Artificial Intelligence. IOS Press; 2004, 16:1-5.
Status: supported
Flag: ICI_SECO_2004
CLI Flag: seco
Zhou et al.
k is set at 0.5. This can be changed using the XML interface.
Reference: Zhou Z, Wang Y, Gu J: A New Model of Information Content for Semantic Similarity in WordNet. In FGCNS ’08 Proceedings of the 2008 Second International Conference on Future Generation Communication and Networking Symposia Volume 03. IEEE Computer Society; 2008:85-89.
Status: supported
Flag: ICI_ZHOU_2008
CLI Flag: zhou
Depth Max Non-Linear
Reference: See Blanchard thesis
Status: supported
Flag: ICI_DEPTH_MAX_NONLINEAR
CLI Flag: depth_max_non_linear
Depth Min Non-linear
Reference: See Blanchard thesis
Status: supported
Flag: ICI_DEPTH_MIN_NONLINEAR
CLI Flag: depth_min_non_linear