Mass spectrometry data processing using zero-crossing lines in multi-scale of Gaussian derivative wavelet

Nha H. Nguyen, Heng Huang, Soontorn Oraintara, An Vo

Research output: Contribution to journalConference article

Abstract

Motivation: Peaks are the key information in mass spectrometry (MS) which has been increasingly used to discover diseases-related proteomic patterns. Peak detection is an essential step for MS-based proteomic data analysis. Recently, several peak detection algorithms have been proposed. However, in these algorithms, there are three major deficiencies: (i) because the noise is often removed, the true signal could also be removed; (ii) baseline removal step may get rid of true peaks and create new false peaks; (iii) in peak quantification step, a threshold of signal-to-noise ratio (SNR) is usually used to remove false peaks; however, noise estimations in SNR calculation are often inaccurate in either time or wavelet domain. In this article, we propose new algorithms to solve these problems. First, we use bivariate shrinkage estimator in stationary wavelet domain to avoid removing true peaks in denoising step. Second, without baseline removal, zero-crossing lines in multi-scale of derivative Gaussian wavelets are investigated with mixture of Gaussian to estimate discriminative parameters of peaks. Third, in quantification step, the frequency, SD, height and rank of peaks are used to detect both high and small energy peaks with robustness to noise. Results: We propose a novel Gaussian Derivative Wavelet (GDWavelet) method to more accurately detect true peaks with a lower false discovery rate than existing methods. The proposed GDWavelet method has been performed on the real Surface- Enhanced Laser Desorption/Ionization Time-Of-Flight (SELDI-TOF) spectrum with known polypeptide positions and on two synthetic data with Gaussian and real noise. All experimental results demonstrate that our method outperforms other commonly used methods. The standard receiver operating characteristic (ROC) curves are used to evaluate the experimental results.

Original languageEnglish (US)
Pages (from-to)i659-i665
JournalBioinformatics
Volume27
Issue number13
DOIs
StatePublished - Jan 1 2011
Externally publishedYes
Event19th Annual International Conference on Intelligent Systems for Molecular Biology, Joint with the 10th European Conference on Computational Biology, ISMB/ECCB 2011 - Vienna, Austria
Duration: Jul 17 2011Jul 19 2011

Fingerprint

Zero-crossing
Mass Spectrometry
Mass spectrometry
Wavelets
Noise
Derivatives
Derivative
Line
Signal to noise ratio
Proteomics
Signal-To-Noise Ratio
Polypeptides
Quantification
Baseline
Ionization
Desorption
Noise Estimation
Shrinkage Estimator
Time-of-flight
Receiver Operating Characteristic Curve

ASJC Scopus subject areas

  • Statistics and Probability
  • Medicine(all)
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Mass spectrometry data processing using zero-crossing lines in multi-scale of Gaussian derivative wavelet. / Nguyen, Nha H.; Huang, Heng; Oraintara, Soontorn; Vo, An.

In: Bioinformatics, Vol. 27, No. 13, 01.01.2011, p. i659-i665.

Research output: Contribution to journalConference article

Nguyen, Nha H. ; Huang, Heng ; Oraintara, Soontorn ; Vo, An. / Mass spectrometry data processing using zero-crossing lines in multi-scale of Gaussian derivative wavelet. In: Bioinformatics. 2011 ; Vol. 27, No. 13. pp. i659-i665.
@article{adda7d7e305f4cdf90824f44a458623c,
title = "Mass spectrometry data processing using zero-crossing lines in multi-scale of Gaussian derivative wavelet",
abstract = "Motivation: Peaks are the key information in mass spectrometry (MS) which has been increasingly used to discover diseases-related proteomic patterns. Peak detection is an essential step for MS-based proteomic data analysis. Recently, several peak detection algorithms have been proposed. However, in these algorithms, there are three major deficiencies: (i) because the noise is often removed, the true signal could also be removed; (ii) baseline removal step may get rid of true peaks and create new false peaks; (iii) in peak quantification step, a threshold of signal-to-noise ratio (SNR) is usually used to remove false peaks; however, noise estimations in SNR calculation are often inaccurate in either time or wavelet domain. In this article, we propose new algorithms to solve these problems. First, we use bivariate shrinkage estimator in stationary wavelet domain to avoid removing true peaks in denoising step. Second, without baseline removal, zero-crossing lines in multi-scale of derivative Gaussian wavelets are investigated with mixture of Gaussian to estimate discriminative parameters of peaks. Third, in quantification step, the frequency, SD, height and rank of peaks are used to detect both high and small energy peaks with robustness to noise. Results: We propose a novel Gaussian Derivative Wavelet (GDWavelet) method to more accurately detect true peaks with a lower false discovery rate than existing methods. The proposed GDWavelet method has been performed on the real Surface- Enhanced Laser Desorption/Ionization Time-Of-Flight (SELDI-TOF) spectrum with known polypeptide positions and on two synthetic data with Gaussian and real noise. All experimental results demonstrate that our method outperforms other commonly used methods. The standard receiver operating characteristic (ROC) curves are used to evaluate the experimental results.",
author = "Nguyen, {Nha H.} and Heng Huang and Soontorn Oraintara and An Vo",
year = "2011",
month = "1",
day = "1",
doi = "10.1093/bioinformatics/btq397",
language = "English (US)",
volume = "27",
pages = "i659--i665",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "13",

}

TY - JOUR

T1 - Mass spectrometry data processing using zero-crossing lines in multi-scale of Gaussian derivative wavelet

AU - Nguyen, Nha H.

AU - Huang, Heng

AU - Oraintara, Soontorn

AU - Vo, An

PY - 2011/1/1

Y1 - 2011/1/1

N2 - Motivation: Peaks are the key information in mass spectrometry (MS) which has been increasingly used to discover diseases-related proteomic patterns. Peak detection is an essential step for MS-based proteomic data analysis. Recently, several peak detection algorithms have been proposed. However, in these algorithms, there are three major deficiencies: (i) because the noise is often removed, the true signal could also be removed; (ii) baseline removal step may get rid of true peaks and create new false peaks; (iii) in peak quantification step, a threshold of signal-to-noise ratio (SNR) is usually used to remove false peaks; however, noise estimations in SNR calculation are often inaccurate in either time or wavelet domain. In this article, we propose new algorithms to solve these problems. First, we use bivariate shrinkage estimator in stationary wavelet domain to avoid removing true peaks in denoising step. Second, without baseline removal, zero-crossing lines in multi-scale of derivative Gaussian wavelets are investigated with mixture of Gaussian to estimate discriminative parameters of peaks. Third, in quantification step, the frequency, SD, height and rank of peaks are used to detect both high and small energy peaks with robustness to noise. Results: We propose a novel Gaussian Derivative Wavelet (GDWavelet) method to more accurately detect true peaks with a lower false discovery rate than existing methods. The proposed GDWavelet method has been performed on the real Surface- Enhanced Laser Desorption/Ionization Time-Of-Flight (SELDI-TOF) spectrum with known polypeptide positions and on two synthetic data with Gaussian and real noise. All experimental results demonstrate that our method outperforms other commonly used methods. The standard receiver operating characteristic (ROC) curves are used to evaluate the experimental results.

AB - Motivation: Peaks are the key information in mass spectrometry (MS) which has been increasingly used to discover diseases-related proteomic patterns. Peak detection is an essential step for MS-based proteomic data analysis. Recently, several peak detection algorithms have been proposed. However, in these algorithms, there are three major deficiencies: (i) because the noise is often removed, the true signal could also be removed; (ii) baseline removal step may get rid of true peaks and create new false peaks; (iii) in peak quantification step, a threshold of signal-to-noise ratio (SNR) is usually used to remove false peaks; however, noise estimations in SNR calculation are often inaccurate in either time or wavelet domain. In this article, we propose new algorithms to solve these problems. First, we use bivariate shrinkage estimator in stationary wavelet domain to avoid removing true peaks in denoising step. Second, without baseline removal, zero-crossing lines in multi-scale of derivative Gaussian wavelets are investigated with mixture of Gaussian to estimate discriminative parameters of peaks. Third, in quantification step, the frequency, SD, height and rank of peaks are used to detect both high and small energy peaks with robustness to noise. Results: We propose a novel Gaussian Derivative Wavelet (GDWavelet) method to more accurately detect true peaks with a lower false discovery rate than existing methods. The proposed GDWavelet method has been performed on the real Surface- Enhanced Laser Desorption/Ionization Time-Of-Flight (SELDI-TOF) spectrum with known polypeptide positions and on two synthetic data with Gaussian and real noise. All experimental results demonstrate that our method outperforms other commonly used methods. The standard receiver operating characteristic (ROC) curves are used to evaluate the experimental results.

UR - http://www.scopus.com/inward/record.url?scp=84983126687&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84983126687&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btq397

DO - 10.1093/bioinformatics/btq397

M3 - Conference article

AN - SCOPUS:84983126687

VL - 27

SP - i659-i665

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 13

ER -