Measuring gene similarity by means of the classification distance

Baralis, Elena; Bruno, Giulia; Fiori, Alessandro

doi:10.1007/s10115-010-0374-0

by Elena Baralis, Giulia Bruno, Alessandro Fiori

Abstract:

Microarray technology provides a simple way for collecting huge amounts of data on the expression level of thousands of genes. Detecting similarities among genes is a fundamental task, both to discover previously unknown gene functions and to focus the analysis on a limited set of genes rather than on thousands of genes. Similarity between genes is usually evaluated by analyzing their expression values. However, when additional information is available (e.g., clinical information), it may be beneficial to exploit it. In this paper, we present a new similarity measure for genes, based on their classification power, i.e., on their capability to separate samples belonging to different classes. Our method exploits a new gene representation that measures the classification power of each gene and defines the classification distance as the distance between gene classification powers. The classification distance measure has been integrated in a hierarchical clustering algorithm, but it may be adopted also by other clustering algorithms. The result of experiments runs on different microarray datasets supports the intuition of the proposed approach.

View PDF

Reference:

Measuring gene similarity by means of the classification distance (Elena Baralis, Giulia Bruno, Alessandro Fiori), In Knowledge and Information Systems, volume 29, 2011.

Bibtex Entry:

@article{Baralis2011,
abstract = {Microarray technology provides a simple way for collecting huge amounts of data on the expression level of thousands of genes. Detecting similarities among genes is a fundamental task, both to discover previously unknown gene functions and to focus the analysis on a limited set of genes rather than on thousands of genes. Similarity between genes is usually evaluated by analyzing their expression values. However, when additional information is available (e.g., clinical information), it may be beneficial to exploit it. In this paper, we present a new similarity measure for genes, based on their classification power, i.e., on their capability to separate samples belonging to different classes. Our method exploits a new gene representation that measures the classification power of each gene and defines the classification distance as the distance between gene classification powers. The classification distance measure has been integrated in a hierarchical clustering algorithm, but it may be adopted also by other clustering algorithms. The result of experiments runs on different microarray datasets supports the intuition of the proposed approach.},
author = {Baralis, Elena and Bruno, Giulia and Fiori, Alessandro},
doi = {10.1007/s10115-010-0374-0},
issn = {0219-1377},
journal = {Knowledge and Information Systems},
keywords = {SML-LIB-BIBLIO,clustering,data mining,lang:ENG,microarray,similarity measure},
mendeley-tags = {SML-LIB-BIBLIO,lang:ENG},
month = jan,
number = {1},
pages = {81--101},
title = {{Measuring gene similarity by means of the classification distance}},
url = {http://www.springerlink.com/index/10.1007/s10115-010-0374-0},
volume = {29},
year = {2011}
}