Other libraries and tools related to Semantic Measures

This section provides a list of existing libraries or toolkit which can also be used to compute semantic similarity or relatedness. Please send us a message if you favorite tool is not listed below.

Libraries

General libraries

Below a list of existing libraries which can be used to compute semantic measures in a generic manner, e.g. these libraries do not focus on a specific semantic graph.

SimPack

SimPack is intended primarily for the research of similarity between concepts in ontologies or ontologies as a whole. Possible other application areas of SimPack include

  • the investigation of similarity between software source code. For instance to detect changes between classes of different software releases.
  • the research of similarity between hierarchically-structured data, such as XML, to compare, search, or integrate data from different data sources.

SimPack is, for example, used in iSPARQL that is an extension of traditional SPARQL (SPARQL Protocol And RDF Query Language) that allows to query for similar concepts in ontologies. (source).

SemMF

SemMF is a flexible framework for calculating semantic similarity between objects that are represented as arbitrary RDF graphs. The framework allows taxonomic and non-taxonomic concept matching techniques to be applied to selected object properties. Moreover, new concept matchers are easily integrated into SemMF by implementing a simple interface, thus making it applicable in a wide range of different use case scenarios (source).

OWLSim

The OWLSim package provides the ability to do a number of standard semantic similarity methods and includes novel methods for combining these with dynamic selection of anonymous grouping classes. (source).

Similarity Library

The similarity is not a general library in the sense that the library is dedicated to specific semantic graph (ontologies, terminologies).

The Similarity Library aims at providing developers with a library for assessing similarity both between words and sentences. This library in an extension of the JWSL (Java WordNet Similarity Library). In the current implementation, there are two categories of similarity measures between words:

  • measures exploiting ontologies such as WordNet, MeSH or the Gene Ontology.
  • measures exploiting search engines.

Source

Informations:

Domain specific libraries

WordNet

Libraries dedicated to the computation of semantic measures based on WordNet.
Note that the SML also supports semantic measures computation using WordNet.

WordNet::Similarity

This is a Perl module that implements a variety of semantic similarity and relatedness measures based on information found in the lexical database WordNet. In particular, it supports the measures of Resnik, Lin, Jiang-Conrath, Leacock-Chodorow, Hirst-St.Onge, Wu-Palmer, Banerjee-Pedersen, and Patwardhan-Pedersen (source).

JWNL

JWNL is a Java API for accessing the WordNet relational dictionary. WordNet is widely used for developing NLP applications, and a Java API such as JWNL will allow developers to more easily use Java for building NLP applications (source).

WS4J

WS4J is a reimplementation of WordNet::Similarity in Java. WS4J provides APIs for several Semantic Relatedness/Similarity algorithms. In theory, any WordNet instance can be used to calculate relatedness score as long as it implements an interface ILexicalDatabase. The codebase has been mostly ported from WordNet-Similarity-2.05 (source).

UMLS

Libraries dedicated to the computation of semantic measures based on UMLS.
Note that the SML also supports semantic measures computation using UMLS underlying ontologies, terminologies (e.g. MeSH, SNOMED-CT).

UMLS::Similarity

This is a Perl module that implements a variety of semantic similarity and relatedness measures based on ontologies and terminologies found in the Unified Medical Language System (UMLS). The measures assign numeric values between pairs of medical concepts indicating how similar or related they are (source).

Gene Ontology
FastSemSim

The aim is to develop a library and a set of tools to easily use semantic similarity measures (i.e. Resnik, SimGIC, ...). So far the library only handles Gene Ontology ontologies, but it is easily extendable due to its modularity. (source)

GOSim

GOSim allows to calculate the functional similarity of genes based on various information theoretic similarity concepts for GO terms. GOSim extends existing tools by providing additional lately developed functional similarity measures for genes. (source)

GOSemSim

Implemented five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively for estimating GO semantic similarities. Support many species, including Anopheles, Arabidopsis, Bovine, Canine, Chicken, Chimp, Coelicolor, E coli strain K12 and Sakai, Fly, Human, Malaria, Mouse, Pig, Rhesus, Rat, Worm, Xenopus, Yeast, and Zebrafish. (source)

Disease ontology

Libraries dedicated to the computation of semantic measures based on the disease ontology (DO).
Note that the SML also supports semantic measures computation for both DO concepts and entities (e.g. genes) annotated by DO concepts.

DOSim package

DOSim is developed on DO to 1) measure the similarity between diseases (DO terms), 2) measure the similarity between human genes in terms of diseases, 3) detect DO-driven gene modules and multilayer annotate them on dieases (DO), functions (GO) and pathways (KEGG), 4) conduct DO enrichment analysis, and 5) visualize and describe DO structures and terms. It focuses on the computation of disease similarity and gene similarity. Besides, its module detection and annotation would promote our understanding of the complex pathogenesis of diseases (source).

Disease Ontology Semantic and Enrichment analysis (DOSE)

R package (Bioconductor v2.14) which implements five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively for measuring DO semantic similarities, and hypergeometric test for enrichment analysis. Citation : Yu G and Wang L. DOSE: Disease Ontology Semantic and Enrichment analysis. R package version 2.2.1. (source).