Defining an informativeness metric for clustering gene expression data

Jessica C. Mar, Christine A. Wells, John Quackenbush

Research output: Contribution to journalArticle

37 Scopus citations

Abstract

Motivation: Unsupervised 'cluster' analysis is an invaluable tool for exploratory microarray data analysis, as it organizes the data into groups of genes or samples in which the elements share common patterns. Once the data are clustered, finding the optimal number of informative subgroups within a dataset is a problem that, while important for understanding the underlying phenotypes, is one for which there is no robust, widely accepted solution.Results: To address this problem we developed an 'informativeness metric' based on a simple analysis of variance statistic that identifies the number of clusters which best separate phenotypic groups. The performance of the informativeness metric has been tested on both experimental and simulated datasets, and we contrast these results with those obtained using alternative methods such as the gap statistic.

Original languageEnglish (US)
Article numberbtr074
Pages (from-to)1094-1100
Number of pages7
JournalBioinformatics
Volume27
Issue number8
DOIs
StatePublished - Apr 2011

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint Dive into the research topics of 'Defining an informativeness metric for clustering gene expression data'. Together they form a unique fingerprint.

  • Cite this