Other libraries and tools related to Semantic Measures
This section provides a list of existing libraries or toolkit which can also be used to compute semantic similarity or relatedness. Please send us a message if you favorite tool is not listed below.
Libraries
General libraries
Below a list of existing libraries which can be used to compute semantic measures in a generic manner, e.g. these libraries do not focus on a specific semantic graph.
SimPack
SimPack is intended primarily for the research of similarity between concepts in ontologies or ontologies as a whole. Possible other application areas of SimPack include
- the investigation of similarity between software source code. For instance to detect changes between classes of different software releases.
- the research of similarity between hierarchically-structured data, such as XML, to compare, search, or integrate data from different data sources.
SimPack is, for example, used in iSPARQL that is an extension of traditional SPARQL (SPARQL Protocol And RDF Query Language) that allows to query for similar concepts in ontologies. (source).
- Language: Java
- Developer/Reference: Abraham Bernstein, Esther Kaufmann, Christoph Kiefer and Christoph Burki
- Website: https://files.ifi.uzh.ch/ddis/oldweb/ddis/research/simpack/
SemMF
SemMF is a flexible framework for calculating semantic similarity between objects that are represented as arbitrary RDF graphs. The framework allows taxonomic and non-taxonomic concept matching techniques to be applied to selected object properties. Moreover, new concept matchers are easily integrated into SemMF by implementing a simple interface, thus making it applicable in a wide range of different use case scenarios (source).
- Language: Java
- Developer: Radoslaw Oldakowski
- Website: http://semmf.ag-nbi.de/doc/index.html
OWLSim
The OWLSim package provides the ability to do a number of standard semantic similarity methods and includes novel methods for combining these with dynamic selection of anonymous grouping classes. (source).
- Language: Java
- Website: http://code.google.com/p/owltools/wiki/OwlSim
Similarity Library
The similarity is not a general library in the sense that the library is dedicated to specific semantic graph (ontologies, terminologies).
The Similarity Library aims at providing developers with a library for assessing similarity both between words and sentences. This library in an extension of the JWSL (Java WordNet Similarity Library). In the current implementation, there are two categories of similarity measures between words:
- measures exploiting ontologies such as WordNet, MeSH or the Gene Ontology.
- measures exploiting search engines.
Informations:
- Language: Java
- Developer: Giuseppe Pirrò
- Website: http://simlibrary.wordpress.com/
Domain specific libraries
WordNet
Libraries dedicated to the computation of semantic measures based on WordNet.
Note that the SML also supports semantic measures computation using WordNet.
WordNet::Similarity
This is a Perl module that implements a variety of semantic similarity and relatedness measures based on information found in the lexical database WordNet. In particular, it supports the measures of Resnik, Lin, Jiang-Conrath, Leacock-Chodorow, Hirst-St.Onge, Wu-Palmer, Banerjee-Pedersen, and Patwardhan-Pedersen (source).
- Language: Perl
- Developer: Ted Pedersen, Siddharth Patwardhan, Jason Michelizzi, Satanjeev Banerjee et al.
- Website: http://wn-similarity.sourceforge.net/
JWNL
JWNL is a Java API for accessing the WordNet relational dictionary. WordNet is widely used for developing NLP applications, and a Java API such as JWNL will allow developers to more easily use Java for building NLP applications (source).
- Language: Java
- Developer: Brett Walenz and John Didion
- Website: http://sourceforge.net/projects/jwordnet/
WS4J
WS4J is a reimplementation of WordNet::Similarity in Java. WS4J provides APIs for several Semantic Relatedness/Similarity algorithms. In theory, any WordNet instance can be used to calculate relatedness score as long as it implements an interface ILexicalDatabase. The codebase has been mostly ported from WordNet-Similarity-2.05 (source).
- Language: Java
- Developer: Hideki Shima
- Website: http://code.google.com/p/ws4j/
UMLS
Libraries dedicated to the computation of semantic measures based on UMLS.
Note that the SML also supports semantic measures computation using UMLS underlying ontologies, terminologies (e.g. MeSH, SNOMED-CT).
UMLS::Similarity
This is a Perl module that implements a variety of semantic similarity and relatedness measures based on ontologies and terminologies found in the Unified Medical Language System (UMLS). The measures assign numeric values between pairs of medical concepts indicating how similar or related they are (source).
- Language: Perl
- Developer: McInnes, Pedersen and Pakhomov
- Website: http://umls-similarity.sourceforge.net/
Gene Ontology
FastSemSim
The aim is to develop a library and a set of tools to easily use semantic similarity measures (i.e. Resnik, SimGIC, ...). So far the library only handles Gene Ontology ontologies, but it is easily extendable due to its modularity. (source)
- Language: Python
- Developer: Marco Mina
- Website: http://sourceforge.net/p/fastsemsim
GOSim
GOSim allows to calculate the functional similarity of genes based on various information theoretic similarity concepts for GO terms. GOSim extends existing tools by providing additional lately developed functional similarity measures for genes. (source)
- Language: R
- Developer: Holger Fröhlich
- Website: http://www.dkfz.de/mga2/gosim/
GOSemSim
Implemented five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively for estimating GO semantic similarities. Support many species, including Anopheles, Arabidopsis, Bovine, Canine, Chicken, Chimp, Coelicolor, E coli strain K12 and Sakai, Fly, Human, Malaria, Mouse, Pig, Rhesus, Rat, Worm, Xenopus, Yeast, and Zebrafish. (source)
- Language: R (Bioconductor)
- Developer: Guangchuang Yu
- Website: http://www.bioconductor.org/packages/2.11/bioc/html/GOSemSim.html
Disease ontology
Libraries dedicated to the computation of semantic measures based on the disease ontology (DO).
Note that the SML also supports semantic measures computation for both DO concepts and entities (e.g. genes) annotated by DO concepts.
DOSim package
DOSim is developed on DO to 1) measure the similarity between diseases (DO terms), 2) measure the similarity between human genes in terms of diseases, 3) detect DO-driven gene modules and multilayer annotate them on dieases (DO), functions (GO) and pathways (KEGG), 4) conduct DO enrichment analysis, and 5) visualize and describe DO structures and terms. It focuses on the computation of disease similarity and gene similarity. Besides, its module detection and annotation would promote our understanding of the complex pathogenesis of diseases (source).
- Language: R
- Developer: Jiang Li
- Website: http://210.46.85.150/platform/dosim/
Disease Ontology Semantic and Enrichment analysis (DOSE)
R package (Bioconductor v2.14) which implements five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively for measuring DO semantic similarities, and hypergeometric test for enrichment analysis. Citation : Yu G and Wang L. DOSE: Disease Ontology Semantic and Enrichment analysis. R package version 2.2.1. (source).
- Language: R
- Developer: Guangchuang Yu, Li-Gen Wang - Maintener : Guangchuang Yu,
- Website: http://www.bioconductor.org/packages/release/bioc/html/DOSE.html