Classification by ensembles from random partitions of high-dimensional data

Hongshik Ahn, Hojin Moon, Melissa J. Fazzari, Noha Lim, James J. Chen, Ralph L. Kodell

Research output: Contribution to journalArticle

67 Citations (Scopus)

Abstract

A robust classification procedure is developed based on ensembles of classifiers, with each classifier constructed from a different set of predictors determined by a random partition of the entire set of predictors. The proposed methods combine the results of multiple classifiers to achieve a substantially improved prediction compared to the optimal single classifier. This approach is designed specifically for high-dimensional data sets for which a classifier is sought. By combining classifiers built from each subspace of the predictors, the proposed methods achieve a computational advantage in tackling the growing problem of dimensionality. For each subspace of the predictors, we build a classification tree or logistic regression tree. Our study shows, using four real data sets from different areas, that our methods perform consistently well compared to widely used classification methods. For unbalanced data, our approach maintains the balance between sensitivity and specificity more adequately than many other classification methods considered in this study.

Original languageEnglish (US)
Pages (from-to)6166-6179
Number of pages14
JournalComputational Statistics and Data Analysis
Volume51
Issue number12
DOIs
StatePublished - Aug 15 2007
Externally publishedYes

Fingerprint

Random Partitions
High-dimensional Data
Ensemble
Classifiers
Classifier
Predictors
Subspace
Unbalanced Data
Multiple Classifiers
Regression Tree
Classification Tree
Logistic Regression
Specificity
Dimensionality
Logistics
Entire
Prediction

Keywords

  • Class prediction
  • Classification tree
  • Cross validation
  • Logistic regression
  • Majority voting
  • Risk profiling

ASJC Scopus subject areas

  • Statistics and Probability
  • Computational Mathematics
  • Computational Theory and Mathematics
  • Applied Mathematics

Cite this

Classification by ensembles from random partitions of high-dimensional data. / Ahn, Hongshik; Moon, Hojin; Fazzari, Melissa J.; Lim, Noha; Chen, James J.; Kodell, Ralph L.

In: Computational Statistics and Data Analysis, Vol. 51, No. 12, 15.08.2007, p. 6166-6179.

Research output: Contribution to journalArticle

Ahn, Hongshik ; Moon, Hojin ; Fazzari, Melissa J. ; Lim, Noha ; Chen, James J. ; Kodell, Ralph L. / Classification by ensembles from random partitions of high-dimensional data. In: Computational Statistics and Data Analysis. 2007 ; Vol. 51, No. 12. pp. 6166-6179.
@article{b68fabe565994834a02acf1ff007b6dd,
title = "Classification by ensembles from random partitions of high-dimensional data",
abstract = "A robust classification procedure is developed based on ensembles of classifiers, with each classifier constructed from a different set of predictors determined by a random partition of the entire set of predictors. The proposed methods combine the results of multiple classifiers to achieve a substantially improved prediction compared to the optimal single classifier. This approach is designed specifically for high-dimensional data sets for which a classifier is sought. By combining classifiers built from each subspace of the predictors, the proposed methods achieve a computational advantage in tackling the growing problem of dimensionality. For each subspace of the predictors, we build a classification tree or logistic regression tree. Our study shows, using four real data sets from different areas, that our methods perform consistently well compared to widely used classification methods. For unbalanced data, our approach maintains the balance between sensitivity and specificity more adequately than many other classification methods considered in this study.",
keywords = "Class prediction, Classification tree, Cross validation, Logistic regression, Majority voting, Risk profiling",
author = "Hongshik Ahn and Hojin Moon and Fazzari, {Melissa J.} and Noha Lim and Chen, {James J.} and Kodell, {Ralph L.}",
year = "2007",
month = "8",
day = "15",
doi = "10.1016/j.csda.2006.12.043",
language = "English (US)",
volume = "51",
pages = "6166--6179",
journal = "Computational Statistics and Data Analysis",
issn = "0167-9473",
publisher = "Elsevier",
number = "12",

}

TY - JOUR

T1 - Classification by ensembles from random partitions of high-dimensional data

AU - Ahn, Hongshik

AU - Moon, Hojin

AU - Fazzari, Melissa J.

AU - Lim, Noha

AU - Chen, James J.

AU - Kodell, Ralph L.

PY - 2007/8/15

Y1 - 2007/8/15

N2 - A robust classification procedure is developed based on ensembles of classifiers, with each classifier constructed from a different set of predictors determined by a random partition of the entire set of predictors. The proposed methods combine the results of multiple classifiers to achieve a substantially improved prediction compared to the optimal single classifier. This approach is designed specifically for high-dimensional data sets for which a classifier is sought. By combining classifiers built from each subspace of the predictors, the proposed methods achieve a computational advantage in tackling the growing problem of dimensionality. For each subspace of the predictors, we build a classification tree or logistic regression tree. Our study shows, using four real data sets from different areas, that our methods perform consistently well compared to widely used classification methods. For unbalanced data, our approach maintains the balance between sensitivity and specificity more adequately than many other classification methods considered in this study.

AB - A robust classification procedure is developed based on ensembles of classifiers, with each classifier constructed from a different set of predictors determined by a random partition of the entire set of predictors. The proposed methods combine the results of multiple classifiers to achieve a substantially improved prediction compared to the optimal single classifier. This approach is designed specifically for high-dimensional data sets for which a classifier is sought. By combining classifiers built from each subspace of the predictors, the proposed methods achieve a computational advantage in tackling the growing problem of dimensionality. For each subspace of the predictors, we build a classification tree or logistic regression tree. Our study shows, using four real data sets from different areas, that our methods perform consistently well compared to widely used classification methods. For unbalanced data, our approach maintains the balance between sensitivity and specificity more adequately than many other classification methods considered in this study.

KW - Class prediction

KW - Classification tree

KW - Cross validation

KW - Logistic regression

KW - Majority voting

KW - Risk profiling

UR - http://www.scopus.com/inward/record.url?scp=34547187749&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34547187749&partnerID=8YFLogxK

U2 - 10.1016/j.csda.2006.12.043

DO - 10.1016/j.csda.2006.12.043

M3 - Article

AN - SCOPUS:34547187749

VL - 51

SP - 6166

EP - 6179

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

SN - 0167-9473

IS - 12

ER -