Semantic Measures and Related Metrics

This section presents the Semantic similarity or relatedness measures, and related metrics, which are supported by the Semantic Measures Library.
It can be quite technical for those who don't know exactly what semantic measures are - in this case you will probably need to know more about semantic measures.

As specified in the documentation section, the library currently mostly focuses on graph-based semantic measures. Probably not all implemented measures are specified in this documentation, developers can consult the source code for the complete and updated list of measures. Users can also ask questions regarding specific implementations or measures on the mailing-list.

Two status are used to characterised the measures/metrics:

  • Supported: the implementation has been tested and can be used for algorithm definitions or semantic measure evaluation.
  • Experimental: the implementation has not been heavily tested in order to ensure it is in perfect agreement with the original definition.

Two flags are specified for each semantic measures, they are used to refer to a measure in the source code or using the toolkit:

  • Flag: The string which must be specified in the source code or in the XML configuration file.
    Using the flag in source code (other examples):
    		...
    		// The Flag is used to specify the measure to use.
            SMconf smConf = new SMconf(SMConstants.FLAG_SIM_PAIRWISE_DAG_NODE_LIN_1998);
            double sim = engine.computePairwiseSim(smConf, concept_1, concept_2);
        

    Using the flag in the XML configuration (other examples):
        <measures type = "pairwise">
    		<measure id = 'dice'    flag = "SIM_FRAMEWORK_DAG_SET_DICE_1945" />
    	</measures>
    	
  • CLI Flag: The string which must be specified using the Command Line Interface (CLI) using SML-Toolkit profiles. Note that not all measures are supported through the command line interface, only those supported have a CLI Flag specified. Please refer to this documentation section for examples of use of CLI Flags.

Measures and metrics which can be used using the library and the toolkit are specified in a public spreadsheet.
An embedded version is proposed below, use the tabs to switch between different types of metrics.

A more detailled documentation is provided below.

Graph-based Semantic measures (Supported)

The measures listed below are currently supported by the library.

Pairwise measures: Semantic measures between two concepts

Edge-based measures
Rada 1989

Shortest Path based semantic similarity measure.
Reference: Rada R, Mili H, Bicknell E, Blettner M: Development and application of a metric on semantic nets. Ieee Transactions On Systems Man And Cybernetics 1989, 19:17-30.
Status: supported
Flag: SIM_PAIRWISE_DAG_EDGE_RADA_1989

Rada 1989 LCA

Shortest Path based semantic similarity measure. In this measure the shortest path is constrained to the Least Common Ancestors of the compared concepts
Reference: Rada R, Mili H, Bicknell E, Blettner M: Development and application of a metric on semantic nets. Ieee Transactions On Systems Man And Cybernetics 1989, 19:17-30.
Status: supported
Flag: SIM_PAIRWISE_DAG_EDGE_RADA_LCA_1989

Wu & Palmer 1994

Shortest Path based semantic similarity measure.
Reference: Wu Z, Palmer M: Verb semantics and lexical selection. In 32nd. Annual Meeting of the Association for Computational Linguistics. 1994:133–138.
Status: supported
Flag: SIM_PAIRWISE_DAG_EDGE_WU_PALMER_1994

Leacock and Chodorow 1998

Shortest Path based semantic similarity measure.
Reference: Leacock C, Chodorow M: Combining Local Context and WordNet Similarity for Word Sense Identification. In WordNet: An electronic lexical database. edited by Fellbaum C MIT Press; 1998:265 – 283.
Status: supported
Flag: SIM_PAIRWISE_DAG_EDGE_LEACOCK_CHODOROW_1998

Stojanovic 2001

Shortest Path based semantic similarity measure.
Reference: Stojanovic N, Alexander M, Staab S, Rudi S, York S: SEAL - A Framework for Developing SEmantic PortALs. In Proceedings of the International Conference on Knowledge Capture. , 2097/2001,.
Status: supported
Flag: SIM_PAIRWISE_DAG_EDGE_STOJANOVIC_2001

Pekar and Staab 2002

Shortest Path based semantic similarity measure.
Reference: Pekar V, Staab S: Taxonomy learning: factoring the structure of a taxonomy into a semantic classification decision. In COLING ’02 Proceedings of the 19th international conference on Computational linguistics. Association for Computational Linguistics; 2002, 2:1–7.
Status: supported
Flag: SIM_PAIRWISE_DAG_EDGE_PEKAR_STAAB_2002

Kyogoku 2011

Basic implementation of Kyogoku et al. measure without edge weight estimation
Reference: Kyogoku R, Fujimoto R, Ozaki T, Ohkawa T: A method for supporting retrieval of articles on protein structure analysis considering users’ intention. BMC Bioinformatics 2011, 12:S42. p 2
Status: supported
Flag: SIM_PAIRWISE_DAG_EDGE_KYOGOKU_BASIC_2011

IC-based measures
Resnik 1995

Resnik semantic similarity measure.
Reference: Resnik P: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence IJCAI. 1995, 1:448-453.
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_RESNIK_1995
CLI Flag: resnik

Lin 1998

Lin semantic similarity measure.
Reference: Lin D: An Information-Theoretic Definition of Similarity. In 15th International Conference of Machine Learning. Madison,WI: 1998:296-304.
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_LIN_1998
CLI Flag: lin

Jiang and Conrath 1997

Jiang and Conrath semantic distance.
Reference: Jiang J, Conrath D: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In International Conference Research on Computational Linguistics (ROCLING X). 1997, cmp-lg/970:15.
Status: supported
Flag: DIST_PAIRWISE_DAG_NODE_JIANG_CONRATH_1997
CLI Flag: jc

Jiang and Conrath Normalized 1997

Jiang and Conrath measure normalized.
Reference: Jiang J, Conrath D: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In International Conference Research on Computational Linguistics (ROCLING X). 1997, cmp-lg/970:15.
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_JIANG_CONRATH_1997_NORM
CLI Flag: simjc_norm
Adaptation of JC in order to normalized values between [0,1] based on : Applying Normalization discussed in Seco N, Veale T, Hayes J: An Intrinsic Information Content Metric for Semantic Similarity in WordNet. In 16th European Conference on Artificial Intelligence. IOS Press; 2004, 16:1–5.
Which is a reformulation of:
Pesquita C, Faria D, Bastos H, et al.: Metrics for GO based protein semantic similarity: a systematic evaluation. BMC bioinformatics 2008, 9 Suppl 5:S4.s
The normalization makes sens only if IC of compared concepts are normalized [0;1]

Schlicker SimRel 2006

SimRel semantic similarity measure. In the original definition SimRel is a groupwise measure (i.e. the measure is defined to compare two sets of concepts). This implementation only refer to the measure defined to compare a pair of concepts.
Reference: Schlicker, Andreas, et al. "A new measure for functional similarity of gene products based on Gene Ontology." BMC bioinformatics 7.1 (2006): 302.
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_SCHLICKER_2006
CLI Flag: schliker

SimIC 2010

SimIC semantic similarity measure.
Reference: Effectively integrating information content and structural relationship to improve the GO-based similarity measure between proteins. Bo Li, James Z. Wang, F. Alex Feltus, Jizhong Zhou, Feng Luo
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_SIM_IC_2010

Jaccard IC

Semantic similarity measure relying on a IC-based formulation of the set-based Jaccard measure.
Reference: None
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_JACCARD_IC

Jaccard 3W IC

Semantic similarity measure relying on a IC-based formulation of the Jaccard 3W measure.
Reference: None
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_JACCARD_3W_IC

Gower and Legendre IC

Gower and Legendre IC-based abstract expression.
Reference: see Blanchard E, Harzallah M, Kuntz P: A generic framework for comparing semantic similarities on a subsumption hierarchy. 2008:20–24.
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_GL

Tversky IC

Semantic similarity measures based on a IC formulation of Tversky ratio Model
Reference: see Blanchard E, Harzallah M, Kuntz P: A generic framework for comparing semantic similarities on a subsumption hierarchy. 2008:20–24.
Status: supported
Flag: SIM_PAIRWISE_DAG_NODE_TVERSKY_IC

Set-based measures
Jaccard 1901

Set-based measures
Reference: Jaccard P: Distribution de la flore alpine dans le bassin des Dranses et dans quelques regions voisines. Bulletin de la Societe Vaudoise des Sciences Naturelles 1901, 37:241 - 272.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_JACCARD_1901

Braun Blanquet 1932

Set-based measures
Reference: Braun-Blanquet J: Plant sociology: the study of plant communities. McGraw-Hill; 1932:439.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_BRAUN_BLANQUET_1932

Dice 1935

Set-based measures
Reference: Dice LR: Measures of the Amount of Ecologic Association Between Species. Ecology 1945, 26:297-302.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_DICE_1945

Ochiai 1957

Set-based measures
Reference: Ochiai A: Zoogeographic studies on the soleoid fishes found in Japan and its neighbouring regions. Bulletin of the Japanese Society of Scientific Fischeries 1957, 22:526-530
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_OCHIAI_1957

Simpson 1960

Set-based measures
Reference: Simpson GG: Notes on the measurement of faunal resemblance. American Journal of Science 1960, 258A:300-311.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_SIMPSON_1960

Sokal & Sneath 1963

Set-based measures
Reference: Sokal RR, Sneath PHA: Principles of numerical taxonomy. San Francisco: W. H. Freeman and Company; 1963:359.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_SOKAL_SNEATH_1963

Tversky 1977 Abstract Model

Set-based measures
Reference: Tversky A: Features of similarity. Psychological Review 1977, 84:327-352. Implementation of the contrast model in a set-based manner.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_TVERSKY_1977

Korbel 2002

Set-based measures
Reference: Korbel JO, Snel B, Huynen M a, Bork P: SHOT: a web server for the construction of genome phylogenies. Trends in genetics : TIG 2002, 18:158-62. Cited by Lin C, Cho Y-rae, Hwang W-chang, Pei P, Zhang A: Clustering methods in protein-protein interaction network.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_KORBEL_2002

Maryland Bridge 2003

Set-based measures
Reference: Mirkin B, Koonin E: A top-down method for building genome classification trees with linear binary hierarchies. In Bioconsensus: DIMACS Working Group Meetings on Bioconsensus: October 25-26, 2000 and October 2-5, 2001, DIMACS Center. Amer Mathematical Society; 2003, 61:97.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_MARYLAND_BRIDGE_2003

Bader 2003

Set-based measures
Reference: Bader G, Hogue C: An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics 2003, 4:2. Cited by Lin C, Cho Y-rae, Hwang W-chang, Pei P, Zhang A: Clustering methods in protein-protein interaction network. In Knowledge Discovery in Bioinformatics: Techniques, Methods and Application 2006.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_BADER_2003

Knappe 2004

Set-based measures
Reference: Knappe R, Bulskov H, Andreasen T: Perspectives on ontology-based querying. International Journal of Intelligent Systems 2004, 22:739-761.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_KNAPPE_2004

Batet 2010

Set-based measures
Reference: Batet M, Sanchez D, Valls A: An ontology-based measure to compute semantic similarity in biomedicine. Journal of biomedical informatics 2010, 44:118-125.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_BATET_2010

Hybrid measures

Groupwise measures: Semantic measures between groups of concepts

Direct Groupwise measures
Jaccard 1901

Set-based measures
Reference: Jaccard P: Distribution de la flore alpine dans le bassin des Dranses et dans quelques regions voisines. Bulletin de la Societe Vaudoise des Sciences Naturelles 1901, 37:241 - 272.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_JACCARD_1901

Braun Blanquet 1932

Set-based measures
Reference: Braun-Blanquet J: Plant sociology: the study of plant communities. McGraw-Hill; 1932:439.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_BRAUN_BLANQUET_1932

Dice 1935

Set-based measures
Reference: Dice LR: Measures of the Amount of Ecologic Association Between Species. Ecology 1945, 26:297-302.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_DICE_1945

Ochiai 1957

Set-based measures
Reference: Ochiai A: Zoogeographic studies on the soleoid fishes found in Japan and its neighbouring regions. Bulletin of the Japanese Society of Scientific Fischeries 1957, 22:526-530
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_OCHIAI_1957

Simpson 1960

Set-based measures
Reference: Simpson GG: Notes on the measurement of faunal resemblance. American Journal of Science 1960, 258A:300-311.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_SIMPSON_1960

Sokal & Sneath 1963

Set-based measures
Reference: Sokal RR, Sneath PHA: Principles of numerical taxonomy. San Francisco: W. H. Freeman and Company; 1963:359.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_SOKAL_SNEATH_1963

Tversky 1977 Abstract Model

Set-based measures
Reference: Tversky A: Features of similarity. Psychological Review 1977, 84:327-352. Implementation of the contrast model in a set-based manner.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_TVERSKY_1977

Korbel 2002

Set-based measures
Reference: Korbel JO, Snel B, Huynen M a, Bork P: SHOT: a web server for the construction of genome phylogenies. Trends in genetics : TIG 2002, 18:158-62. Cited by Lin C, Cho Y-rae, Hwang W-chang, Pei P, Zhang A: Clustering methods in protein-protein interaction network.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_KORBEL_2002

Maryland Bridge 2003

Set-based measures
Reference: Mirkin B, Koonin E: A top-down method for building genome classification trees with linear binary hierarchies. In Bioconsensus: DIMACS Working Group Meetings on Bioconsensus: October 25-26, 2000 and October 2-5, 2001, DIMACS Center. Amer Mathematical Society; 2003, 61:97.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_MARYLAND_BRIDGE_2003

Bader 2003

Set-based measures
Reference: Bader G, Hogue C: An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics 2003, 4:2. Cited by Lin C, Cho Y-rae, Hwang W-chang, Pei P, Zhang A: Clustering methods in protein-protein interaction network. In Knowledge Discovery in Bioinformatics: Techniques, Methods and Application 2006.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_BADER_2003

Knappe 2004

Set-based measures
Reference: Knappe R, Bulskov H, Andreasen T: Perspectives on ontology-based querying. International Journal of Intelligent Systems 2004, 22:739-761.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_KNAPPE_2004

Batet 2010

Set-based measures
Reference: Batet M, Sanchez D, Valls A: An ontology-based measure to compute semantic similarity in biomedicine. Journal of biomedical informatics 2010, 44:118-125.
Status: supported
Flag: SIM_FRAMEWORK_DAG_SET_BATET_2010

Ali 2009

Direct groupwise measures
Reference: Ali W, Deane CM. Functionally guided alignment of protein interaction networks for module detection. Bioinformatics 2009;25:3166–73.
Status: supported
Flag: SIM_GROUPWISE_DAG_ALI_DEANE
CLI Flag: ali_and_deane

Sim GIC

Direct groupwise measures
Reference: Pesquita C, Faria D, Bastos H: Evaluating gobased semantic similarity measures. Proc. 10th Annual Bio- 2007:1-4.
Status: supported
Flag: SIM_GROUPWISE_DAG_GIC
CLI Flag: gic

SimLP

Direct groupwise measures
Reference: Gentleman R: Visualizing and distances using GO. Retrieved Jan. 10th 2007. http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/GOvis.pdf
Status: supported
Flag: SIM_GROUPWISE_DAG_LP
CLI Flag: lp

Lee 2004

Direct groupwise measures
Reference: Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P: Coexpression analysis of human genes across many microarray data sets. Genome Research 2004, 14:1085.
Status: supported
Flag: SIM_GROUPWISE_DAG_LEE_2004
CLI Flag: lee

Term Overlap

Direct groupwise measures
Reference: Mistry M, Pavlidis P: Gene Ontology term overlap as a measure of gene functional similarity. BMC bioinformatics 2008, 9:327.
Status: supported
Flag: SIM_GROUPWISE_DAG_TO
CLI Flag: to

Normalized Term Overlap

Direct groupwise measures
Reference: Mistry M, Pavlidis P: Gene Ontology term overlap as a measure of gene functional similarity. BMC bioinformatics 2008, 9:327.
Status: supported
Flag: SIM_GROUPWISE_DAG_NTO
CLI Flag: nto

Normalized (Max) Term Overlap

Direct groupwise measures
Reference: Mistry M, Pavlidis P: Gene Ontology term overlap as a measure of gene functional similarity. BMC bioinformatics 2008, 9:327.
Status: supported
Flag: SIM_GROUPWISE_DAG_NTO_MAX
CLI Flag: nto_max

Sim UI

Direct groupwise measures
Reference: Gentleman R: Visualizing and distances using GO. Retrieved Jan. 10th 2007. http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/GOvis.pdf
Status: supported
Flag: SIM_GROUPWISE_DAG_UI
CLI Flag: ui

Indirect Groupwise measures

Those measures can be used to aggregate the scores of all the comparisons which can be made considering the two sets of concepts.

Min


Status: supported
Flag: SIM_GROUPWISE_MIN
CLI Flag: min

Average


Status: supported
Flag: SIM_GROUPWISE_AVERAGE
CLI Flag: avg

Max


Status: supported
Flag: SIM_GROUPWISE_MAX
CLI Flag: max

Best Match Average (BMA)

Indirect groupwise measures
Reference: Schlicker A, Domingues FS, Rahnenführer J, Lengauer T: A new measure for functional similarity of gene products based on Gene Ontology. BMC Bioinformatics 2006, 7:302.
Status: supported
Flag: SIM_GROUPWISE_BMA
CLI Flag:bma

Best Match Max (BMM)

Indirect groupwise measures
Reference: Schlicker A, Domingues FS, Rahnenführer J, Lengauer T: A new measure for functional similarity of gene products based on Gene Ontology. BMC Bioinformatics 2006, 7:302.
Status: supported
Flag: SIM_GROUPWISE_BMM
CLI Flag:bmm

Graph-based Semantic measures (Experimental)

Experimental measures are those located in a package named experimental. Please consult the source code.

Information Content measures (Supported)

Extrinsic

Resnik

Reference: Resnik P: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence IJCAI. Citeseer; 1995, 1:448–453.
Status: supported
Flag: IC_ANNOT_RESNIK_1995
CLI Flag: resnik

Intrinsic

Sanchez et al.

Reference: Sanchez D, Batet M, Isern D: Ontology-based information content computation. Knowledge-Based Systems 2011, 24:297-303.
Status: supported
Flag: ICI_SANCHEZ_2011
CLI Flag: sanchez

Seco et al.

Reference: Seco N, Veale T, Hayes J: An Intrinsic Information Content Metric for Semantic Similarity in WordNet. In 16th European Conference on Artificial Intelligence. IOS Press; 2004, 16:1-5.
Status: supported
Flag: ICI_SECO_2004
CLI Flag: seco

Zhou et al.

k is set at 0.5. This can be changed using the XML interface. Reference: Zhou Z, Wang Y, Gu J: A New Model of Information Content for Semantic Similarity in WordNet. In FGCNS ’08 Proceedings of the 2008 Second International Conference on Future Generation Communication and Networking Symposia Volume 03. IEEE Computer Society; 2008:85-89.
Status: supported
Flag: ICI_ZHOU_2008
CLI Flag: zhou

Depth Max Non-Linear

Reference: See Blanchard thesis
Status: supported
Flag: ICI_DEPTH_MAX_NONLINEAR
CLI Flag: depth_max_non_linear

Depth Min Non-linear

Reference: See Blanchard thesis
Status: supported
Flag: ICI_DEPTH_MIN_NONLINEAR
CLI Flag: depth_min_non_linear