Heavy-Tailed Noise Suppression and Derivative Wavelet Scalogram for Detecting DNA Copy Number Aberrations

Nha H. Nguyen, An Vo, Haibin Sun, Heng Huang

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Most existing array comparative genomic hybridization (array CGH) data processing methods and evaluation models assumed that the probability density function of noise in array CGH is a Gaussian distribution. However, in practice such noise distribution is peaky and heavy-tailed. A more accurate and sufficient model of noise in array CGH data is necessary and beneficial to the detection of DNA copy number variations. We analyze the real array CGH data from different platforms and show that the distribution of noise in array CGH data is fitted very well by generalized Gaussian distribution (GGD). Based on our new noise model, we propose a novel array CGH processing method combining the advantages of both smoothing and segmentation approaches. The new method uses generalized Gaussian bivariate shrinkage function and one-directional derivative wavelet scalogram in generalized Gaussian noise. In smoothing step, with the new generalized Gaussian noise model, we derive the heavy-tailed noise suppression algorithm in stationary wavelet domain. In segmentation step, the 1D Gaussian derivative wavelet scalogram is employed to detect break points. Both real and simulated data are used in our experiments. Our new method outperforms other state-of-the-art methods, in terms of both root mean squared errors and receiver operating characteristic curves.

Original languageEnglish (US)
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
DOIs
StateAccepted/In press - Jul 6 2017
Externally publishedYes

Fingerprint

Noise Suppression
Comparative Genomics
Comparative Genomic Hybridization
Aberrations
Aberration
Noise
Wavelets
DNA
Derivatives
Derivative
Gaussian distribution
Normal Distribution
Gaussian Noise
Smoothing
Segmentation
DNA Copy Number Variations
Data Processing Methods
Probability density function
Directional derivative
Evaluation Model

Keywords

  • aCGH
  • Arrays
  • Biological cells
  • Data models
  • DNA
  • DNA copy number variations
  • Gaussian distribution
  • Heavy-tailed noise
  • Probability density function
  • Probes
  • wavelet

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics

Cite this

@article{f86c7b98f2594d79967f06fa80496eac,
title = "Heavy-Tailed Noise Suppression and Derivative Wavelet Scalogram for Detecting DNA Copy Number Aberrations",
abstract = "Most existing array comparative genomic hybridization (array CGH) data processing methods and evaluation models assumed that the probability density function of noise in array CGH is a Gaussian distribution. However, in practice such noise distribution is peaky and heavy-tailed. A more accurate and sufficient model of noise in array CGH data is necessary and beneficial to the detection of DNA copy number variations. We analyze the real array CGH data from different platforms and show that the distribution of noise in array CGH data is fitted very well by generalized Gaussian distribution (GGD). Based on our new noise model, we propose a novel array CGH processing method combining the advantages of both smoothing and segmentation approaches. The new method uses generalized Gaussian bivariate shrinkage function and one-directional derivative wavelet scalogram in generalized Gaussian noise. In smoothing step, with the new generalized Gaussian noise model, we derive the heavy-tailed noise suppression algorithm in stationary wavelet domain. In segmentation step, the 1D Gaussian derivative wavelet scalogram is employed to detect break points. Both real and simulated data are used in our experiments. Our new method outperforms other state-of-the-art methods, in terms of both root mean squared errors and receiver operating characteristic curves.",
keywords = "aCGH, Arrays, Biological cells, Data models, DNA, DNA copy number variations, Gaussian distribution, Heavy-tailed noise, Probability density function, Probes, wavelet",
author = "Nguyen, {Nha H.} and An Vo and Haibin Sun and Heng Huang",
year = "2017",
month = "7",
day = "6",
doi = "10.1109/TCBB.2017.2723884",
language = "English (US)",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Heavy-Tailed Noise Suppression and Derivative Wavelet Scalogram for Detecting DNA Copy Number Aberrations

AU - Nguyen, Nha H.

AU - Vo, An

AU - Sun, Haibin

AU - Huang, Heng

PY - 2017/7/6

Y1 - 2017/7/6

N2 - Most existing array comparative genomic hybridization (array CGH) data processing methods and evaluation models assumed that the probability density function of noise in array CGH is a Gaussian distribution. However, in practice such noise distribution is peaky and heavy-tailed. A more accurate and sufficient model of noise in array CGH data is necessary and beneficial to the detection of DNA copy number variations. We analyze the real array CGH data from different platforms and show that the distribution of noise in array CGH data is fitted very well by generalized Gaussian distribution (GGD). Based on our new noise model, we propose a novel array CGH processing method combining the advantages of both smoothing and segmentation approaches. The new method uses generalized Gaussian bivariate shrinkage function and one-directional derivative wavelet scalogram in generalized Gaussian noise. In smoothing step, with the new generalized Gaussian noise model, we derive the heavy-tailed noise suppression algorithm in stationary wavelet domain. In segmentation step, the 1D Gaussian derivative wavelet scalogram is employed to detect break points. Both real and simulated data are used in our experiments. Our new method outperforms other state-of-the-art methods, in terms of both root mean squared errors and receiver operating characteristic curves.

AB - Most existing array comparative genomic hybridization (array CGH) data processing methods and evaluation models assumed that the probability density function of noise in array CGH is a Gaussian distribution. However, in practice such noise distribution is peaky and heavy-tailed. A more accurate and sufficient model of noise in array CGH data is necessary and beneficial to the detection of DNA copy number variations. We analyze the real array CGH data from different platforms and show that the distribution of noise in array CGH data is fitted very well by generalized Gaussian distribution (GGD). Based on our new noise model, we propose a novel array CGH processing method combining the advantages of both smoothing and segmentation approaches. The new method uses generalized Gaussian bivariate shrinkage function and one-directional derivative wavelet scalogram in generalized Gaussian noise. In smoothing step, with the new generalized Gaussian noise model, we derive the heavy-tailed noise suppression algorithm in stationary wavelet domain. In segmentation step, the 1D Gaussian derivative wavelet scalogram is employed to detect break points. Both real and simulated data are used in our experiments. Our new method outperforms other state-of-the-art methods, in terms of both root mean squared errors and receiver operating characteristic curves.

KW - aCGH

KW - Arrays

KW - Biological cells

KW - Data models

KW - DNA

KW - DNA copy number variations

KW - Gaussian distribution

KW - Heavy-tailed noise

KW - Probability density function

KW - Probes

KW - wavelet

UR - http://www.scopus.com/inward/record.url?scp=85023192797&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85023192797&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2017.2723884

DO - 10.1109/TCBB.2017.2723884

M3 - Article

C2 - 28692986

AN - SCOPUS:85023192797

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

ER -