TY - JOUR
T1 - Reducing system noise in copy number data using principal components of self-self hybridizations
AU - Lee, Yoon Ha
AU - Ronemus, Michael
AU - Kendall, Jude
AU - Lakshmi, B.
AU - Leotta, Anthony
AU - Levy, Dan
AU - Esposito, Diane
AU - Grubor, Vladimir
AU - Ye, Kenny
AU - Wigler, Michael
AU - Yamroma, Boris
PY - 2012/1/17
Y1 - 2012/1/17
N2 - Genomic copy number variation underlies genetic disorders such as autism, schizophrenia, and congenital heart disease. Copy number variations are commonly detected by array based comparative genomic hybridization of sample to reference DNAs, but probe and operational variables combine to create correlated system noise that degrades detection of genetic events. To correct for this we have explored hybridizations in which no genetic signal is expected, namely "self-self" hybridizations (SSH) comparing DNAs from the same genome. We show that SSH trap a variety of correlated system noise present also in sample-reference (test) data. Through singular value decomposition of SSH, we are able to determine the principal components (PCs) of this noise. The PCs themselves offer deep insights into the sources of noise, and facilitate detection of artifacts. We present evidence that linear and piece-wise linear correction of test data with the PCs does not introduce detectable spurious signal, yet improves signal-to-noise metrics, reduces false positives, and facilitates copy number determination.
AB - Genomic copy number variation underlies genetic disorders such as autism, schizophrenia, and congenital heart disease. Copy number variations are commonly detected by array based comparative genomic hybridization of sample to reference DNAs, but probe and operational variables combine to create correlated system noise that degrades detection of genetic events. To correct for this we have explored hybridizations in which no genetic signal is expected, namely "self-self" hybridizations (SSH) comparing DNAs from the same genome. We show that SSH trap a variety of correlated system noise present also in sample-reference (test) data. Through singular value decomposition of SSH, we are able to determine the principal components (PCs) of this noise. The PCs themselves offer deep insights into the sources of noise, and facilitate detection of artifacts. We present evidence that linear and piece-wise linear correction of test data with the PCs does not introduce detectable spurious signal, yet improves signal-to-noise metrics, reduces false positives, and facilitates copy number determination.
KW - Comparative genomic hybridization
KW - Copy number variation
KW - Principal component analysis
KW - Singular value decomposition
UR - http://www.scopus.com/inward/record.url?scp=84863029822&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863029822&partnerID=8YFLogxK
U2 - 10.1073/pnas.1106233109
DO - 10.1073/pnas.1106233109
M3 - Article
C2 - 22207624
AN - SCOPUS:84863029822
SN - 0027-8424
VL - 109
SP - E103-E110
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 3
ER -