Library and Toolkit

This section is dedicated to the documentation related to semantic measures. Specific and Technical documentations of both the Semantic Library and the Toolkit are available in their respective sections:

Semantic Measures

Numerous definitions of semantic measures are proposed in the literature. It's however commonly accepted that semantic measures aim at evaluating the likeness of units of language, concepts or instances based on their meaning. As an example if I ask you to distinguish the two concepts which are the more related between the two pairs (Monkey, Phone) and (Monkey, Banana), people will most of the time agree that the two concepts Monkey and Banana are more related. The aim of semantic measures is therefore to provide to machine the ability to compare units of language or entities (concepts, instances) based on their meaning. To this end, these measures are based on algorithms which analyse text corpora or ontologies (e.g. taxonomies). They are use in a large diversity of treatments, e.g., to define information retrieval systems, to develop recommender systems, to analyse data such as genes and diseases...

  • An extensive survey on semantic measures can be found at:

Semantic Similarity from Natural Language and Ontology Analysis
Sébastien Harispe*, Sylvie Ranwez, Stefan Janaqi and Jacky Montmain
Synthesis Lectures on Human Language Technologies, May 2015, Vol. 8, No. 1 , Pages 1-254 doi: 10.2200/S00639ED1V01Y201504HLT027

In order to avoid confusions and to clearly define the current scope of the library we briefly precise the definition of each type of semantic measures the library focuses on.
Various types of semantic measures have been proposed, the measures which are currently supported by the SML are those colored in blue:

  • Distributional semantic measures
    These measures are based on text analysis and are used to evaluate the semantic proximity of terms, sentences or documents. They are generally based on the distributional hypothesis which is based on the assumption that words which co-occur frequently are related semantically speaking.
  • Knowledge-based Semantic measures: These measures take advantage of Ontologies or Knowledge Organization Systems (such as taxonomies, thesaurus...) to compare concepts or instances defined in these models. Two subtypes of knowledge-based semantic measures can further be distinguished:
    • Graph-based semantic measures
      They are used to compare concepts or concepts defined in a data structure which can be processed as a graph. Generally these measures are used to compare a pair of (groups of) concepts defined in a taxonomy or semantic graph, e.g. RDF(S) graphs. These measures are often used when using lightweight ontologies mainly structured through the subclassof (isa) relationships.
    • Logic-based semantic measures
      These measures are generally used to compare more complex expressions of concepts or instances, e.g. definitions based on description logics. They are generally used when the information defined on the elements to compare (concepts or instances) cannot be reduced to a graph - this is often the case when using heavyweight ontologies expressed in OWL. Note that graph-based semantic measures (see above) can also be used on a reduction of an heavyweight ontology (e.g. based on the inferred taxonomy).

More information:

Bibliography

A list of references related to semantic measures, semantic relatedness, semantic similarities or semantic distances is provided at the bibliography section.