Fork me on GitHub

Semantic Measures Library

Library, Source Code and Javadoc

The SML is developed in JAVA version 1.7 and distributed under the open source CeCILL license.
The library and the source code are packaged into JAR files which can easily be extracted or loaded in your Integrated Development Environment (IDE). Associated Javadoc can also be downloaded or browsed in its html version.

The library, the source code and the Javadoc JARs can be downloaded from the download section.

The development version of the library is also available from the dedicated GitHub repository.
See the How to contribute section to learn how to test/improve the last version of the source code.

How to use the library?

The Semantic Measures Library can be used from the provided packaged library or from the sources. In both cases, you need to download/retrieve the library (+sources) into your IDE. To do so we encourage you to use Maven. The dependency to include in your pom.xml is (Replace CURRENT_VERSION by the version number of the latest release):

<dependency>
	<groupId>com.github.sharispe</groupId>
	<artifactId>slib-sml</artifactId>
	<version>CURRENT_VERSION</version>
</dependency>
Alternatively, you can also download the required JARs from the download section and load them into your IDE (e.g. Eclipse, NetBeans). Please refer to the snippets of code proposed below for examples of use of the library.

Examples of use

The source code examples presented below are compatible with the last version of the SML. Examples of code can be extracted from the slib-example-[version]-sources.jar archive. The Snippets of code of the development version can also be browsed from the the dedicated repository.
Do not hesitate to contact us if you encounter any troubles using these examples.

Generalities

  • Creation of a semantic graph.
    Snippet of code showing how to interact with the graph, i.e. add vertices, edges The example also shows how to retrieve all the ancestors and descendants of a particular vertex

  • Short example showing the computation of a semantic similarity using the library.
    Example of a Semantic measure computation using the Semantic Measures Library. In this snippet we estimate the similarity of two concepts expressed in a semantic graph. The semantic graph is expressed in Ntriples. The similarity is estimated using Lin's measure.

General Semantic Graphs

WordNet

WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. The purpose is twofold: to produce a combination of dictionary and thesaurus that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications. (source wikipedia).
The last version of the WordNet can be downloaded at dedicated web site.

Computation of the semantic similarity of pairs of nouns defined in WordNet.

YAGO

YAGO is a knowledge base developed at the Max-Planck-Institute Saarbrücken. It is automatically extracted from Wikipedia and other sources. (source wikipedia).
The last version of the YAGO2 can be downloaded at dedicated web site.

See source code example.

RDF(S) graphs

The RDF data model is similar to classic conceptual modeling approaches such as entity-relationship or class diagrams, as it is based upon the idea of making statements about resources (in particular Web resources) in the form of subject-predicate-object expressions. These expressions are known as triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as the triple: a subject denoting "the sky", a predicate denoting "has the color", and an object denoting "blue". Therefore RDF swaps object for subject that would be used in the classical notation of an Entity–attribute–value model within Object oriented design; object (sky), attribute (color) and value (blue). RDF is an abstract model with several serialization formats (i.e., file formats), and so the particular way in which a resource or triple is encoded varies from format to format. (source wikipedia).

See source code example.

OWL

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. The languages are characterised by formal semantics and RDF/XML-based serializations for the Semantic Web. OWL is endorsed by the World Wide Web Consortium (W3C) and has attracted academic, medical and commercial interest. (source wikipedia).

See source code example.

Biomedical Semantic Graphs

Gene Ontology

The Gene Ontology, or GO, is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species (source wikipedia).
The last version of the gene ontology can be downloaded at dedicated web site.

MeSH

Medical Subject Headings (MeSH) is a comprehensive controlled vocabulary for the purpose of indexing journal articles and books in the life sciences; it can also serve as a thesaurus that facilitates searching. Created and updated by the United States National Library of Medicine (NLM), it is used by the MEDLINE/PubMed article database and by NLM's catalog of book holdings (source wikipedia).
The last version of the MeSH can be downloaded at dedicated web site.

See source code example using MeSH XML (2014) - or prior release of the MeSH (e.g. 2013).
See dedicated documentation

SNOMED-CT

SNOMED CT (SNOMED Clinical Terms), is a systematically organised computer processable collection of medical terms providing codes, terms, synonyms and definitions covering diseases, findings, procedures, microorganisms, substances, etc. It allows a consistent way to index, store, retrieve, and aggregate clinical data across specialties and sites of care. It also helps in organizing the content of medical records, reducing the variability in the way data is captured, encoded and used for clinical care of patients and research. (source wikipedia).
The last version of the SNOMED Clinical Terms can be downloaded at dedicated web site.

See source code example.

Disease Ontology

The mission of the Disease Ontology (DO) is to provide an open source ontology for the integration of biomedical data that is associated with human disease. (source).
The last version of the Disease Ontology can be downloaded at dedicated web site.

Refer to the example dedicated to the Gene Ontology, see dedicated repository repository for the last updates.

OBO Ontologies

Open Biomedical Ontologies (abbreviated OBO; formerly Open Biological Ontologies) is an effort to create controlled vocabularies for shared use across different biological and medical domains. As of 2006, OBO forms part of the resources of the U.S. National Center for Biomedical Ontology, where it will form a central element of the NCBO's BioPortal. (source Wikipedia).
Numerous ontologies in OBO format can be downloaded at the OBO Foundry and BioPortal.

Refer to the example dedicated to the Gene Ontology, see dedicated repository repository for the last updates.

How to contribute?

You can report a bug or propose enhancements via both the bug tracking system and the mailing list.

If you'd like to contribute to the Semantic Measures Library, start by forking the repository on GitHub: https://github.com/sharispe/slib.
The best way to get your changes merged back into core is as follows:

  • Clone down your fork
  • Create a named topic branch to contain your change
  • Code ;)
  • Add tests and make sure everything still passes by running maven (do not change the version number)
  • Push the branch up to GitHub
  • Send a pull request for your branch

Thanks

  • The SML relies on the great Sesame API for processing RDF data.
  • Thanks to the users for their important feedbacks - they help us improving both the library and the toolkit.