Utilization of gene ontology in semi-supervised clustering

return to the website
by Duong D. Doan, Yunli Wang, Youlian Pan
Abstract:
Semi-supervised clustering incorporating biological relevance as a prior knowledge has been favored over the past decade. However, selection of prior knowledge has been a challenge. We generate prior knowledge from Gene Ontology (GO) terms at different levels of GO hierarchy and use them to study their impact on the performance of subsequent clustering of microarray data by using MPCKMeans and GOFuzzy. We evaluate the performance by F-measure and the number of specific GO terms and transcription factors. The clustering result with prior knowledge generated from lower levels of GO hierarchy have higher F-measure and more number of specific GO terms and transcription factors. MPCKMeans with prior knowledge generated from multiple levels in the GO hierarchy outperforms GOFuzzy with prior knowledge from the first level in the GO hierarchy. A small amount (1-2\%) of prior knowledge can improve semi-supervised clustering result substantially and the more specific prior knowledge is generally more efficient in guiding the semi-supervised clustering process.
Reference:
Utilization of gene ontology in semi-supervised clustering (Duong D. Doan, Yunli Wang, Youlian Pan), In 2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), IEEE, 2011.
Bibtex Entry:
@inproceedings{Doan2011,
abstract = {Semi-supervised clustering incorporating biological relevance as a prior knowledge has been favored over the past decade. However, selection of prior knowledge has been a challenge. We generate prior knowledge from Gene Ontology (GO) terms at different levels of GO hierarchy and use them to study their impact on the performance of subsequent clustering of microarray data by using MPCKMeans and GOFuzzy. We evaluate the performance by F-measure and the number of specific GO terms and transcription factors. The clustering result with prior knowledge generated from lower levels of GO hierarchy have higher F-measure and more number of specific GO terms and transcription factors. MPCKMeans with prior knowledge generated from multiple levels in the GO hierarchy outperforms GOFuzzy with prior knowledge from the first level in the GO hierarchy. A small amount (1-2\%) of prior knowledge can improve semi-supervised clustering result substantially and the more specific prior knowledge is generally more efficient in guiding the semi-supervised clustering process.},
author = {Doan, Duong D. and Wang, Yunli and Pan, Youlian},
booktitle = {2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)},
doi = {10.1109/CIBCB.2011.5948467},
isbn = {978-1-4244-9896-3},
keywords = {SML-LIB-BIBLIO,lang:ENG},
mendeley-tags = {SML-LIB-BIBLIO,lang:ENG},
month = apr,
pages = {1--7},
publisher = {IEEE},
title = {{Utilization of gene ontology in semi-supervised clustering}},
url = {http://ieeexplore.ieee.org/xpl/freeabs\_all.jsp?arnumber=5948467},
year = {2011}
}
Powered by bibtexbrowser