Copy number variation analysis based on AluScan sequences
- Jian-Feng Yang†1,
- Xiao-Fan Ding†1,
- Lei Chen2,
- Wai-Kin Mat1,
- Michelle Zhi Xu3,
- Jin-Fei Chen3,
- Jian-Min Wang4,
- Lin Xu5,
- Wai-Sang Poon6,
- Ava Kwong7,
- Gilberto Ka-Kit Leung7,
- Tze-Ching Tan8,
- Chi-Hung Yu8,
- Yue-Bin Ke9,
- Xin-Yun Xu9,
- Xiao-Yan Ke10,
- Ronald CW Ma11,
- Juliana CN Chan11,
- Wei-Qing Wan12,
- Li-Wei Zhang12,
- Yogesh Kumar1,
- Shui-Ying Tsang1,
- Shao Li13,
- Hong-Yang Wang2, 14Email author and
- Hong Xue1Email author
© Yang et al.; licensee BioMed Central. 2014
Received: 3 September 2014
Accepted: 12 November 2014
Published: 5 December 2014
AluScan combines inter-Alu PCR using multiple Alu-based primers with opposite orientations and next-generation sequencing to capture a huge number of Alu-proximal genomic sequences for investigation. Its requirement of only sub-microgram quantities of DNA facilitates the examination of large numbers of samples. However, the special features of AluScan data rendered difficult the calling of copy number variation (CNV) directly using the calling algorithms designed for whole genome sequencing (WGS) or exome sequencing.
In this study, an AluScanCNV package has been assembled for efficient CNV calling from AluScan sequencing data employing a Geary-Hinkley transformation (GHT) of read-depth ratios between either paired test-control samples, or between test samples and a reference template constructed from reference samples, to call the localized CNVs, followed by use of a GISTIC-like algorithm to identify recurrent CNVs and circular binary segmentation (CBS) to reveal large extended CNVs. To evaluate the utility of CNVs called from AluScan data, the AluScans from 23 non-cancer and 38 cancer genomes were analyzed in this study. The glioma samples analyzed yielded the familiar extended copy-number losses on chromosomes 1p and 9. Also, the recurrent somatic CNVs identified from liver cancer samples were similar to those reported for liver cancer WGS with respect to a striking enrichment of copy-number gains in chromosomes 1q and 8q. When localized or recurrent CNV-features capable of distinguishing between liver and non-liver cancer samples were selected by correlation-based machine learning, a highly accurate separation of the liver and non-liver cancer classes was attained.
The results obtained from non-cancer and cancerous tissues indicated that the AluScanCNV package can be employed to call localized, recurrent and extended CNVs from AluScan sequences. Moreover, both the localized and recurrent CNVs identified by this method could be subjected to machine-learning selection to yield distinguishing CNV-features that were capable of separating between liver cancers and other types of cancers. Since the method is applicable to any human DNA sample with or without the availability of a paired control, it can also be employed to analyze the constitutional CNVs of individuals.
KeywordsAluScan sequencing CNV calling Cancer classification Machine learning
The use of microarray platforms to perform copy number variation (CNV) calling is a valuable technique in genomic analysis. However, next-generation sequencing is fast becoming an attractive alternative platform for this purpose. Compared to microarrays, next-generation sequencing can make possible a higher resolution, multiple simultaneous analyses on the same sample, and at least comparable detection efficiency in CNV calling . Moreover, while CNV calling from microarrays requires the establishment of a relationship between copy number and the observed intensity for any site-specific probe , the read-depth of any fragment in an output of next-generation sequencing can be correlated to the copy number either linearly or based on a simple Poisson model ,.
A variety of algorithms have been designed for CNV calling from sequencing data obtained for both paired and unpaired samples -. In general, data from whole genome sequencing (WGS) are continuous and more evenly distributed so that they are readily fitted to simple statistical distributions following straightforward GC-normalization. On the other hand, CNV calling based on target-capture sequencing such as exome sequencing and AluScan , is more complex. As a method for genome-wide capture of the sequences amplified by inter-Alu PCR using multiple Alu-based primers with opposite ‘head type’ and ‘tail type’ orientations for next-generation sequencing, AluScan is not only expeditious in both experimental and informatics analysis, but also requires less DNA compared to WGS or exome sequencing. However, the sequences analyzed by both exome sequencing and AluScan are discontinuous. Moreover, while exome sequencing usually involves basically the same set of fixed target regions in every experiment, such that CNV calling on an unpaired sample can be performed without any control , the inter-Alu sequences analyzed by AluScan depend on the Alu-based PCR primers employed. As a result, CNV-calling algorithms developed for WGS or exome sequencing are not readily applicable to AluScans. Moreover, it is possible that Alu sequences could be one of the factors that induce CNVs, because the high similarity of neighboring Alu elements could cause homologous recombination that may result in changes in copy number ,.
In current cancer research, CNV is regarded as an important source of tumorigenesis besides single nucleotide substitution and large structural variation ,. Ovarian cancer, breast carcinoma and lung cell carcinoma for example are categorized as C-class (C stands for CNV) tumors , and a variety of cancers are associated with CNVs in tumor suppressor genes and oncogenes such as TP53 and RET ,.
Rare constitutional CNVs are well known to be associated with individual cancers, but recurrent constitutional CNVs are usually found to be only low to modest in penetrance suggesting that they could become significant factors in the aggregate ,-. In our earlier study, recurrent constitutional CNV-features selected by machine learning were found to be capable of distinguishing between genomes with higher predispositions to cancer and those with lower predispositions, and thereby provide a basis for the prediction of generalized cancer predisposition . In the present study, the generality of this approach has been expanded by machine-learning selection of localized as well as recurrent somatic CNV-features with the capability of distinguishing between different types of cancer such as liver versus non-liver cancers.
DNA samples and AluScan sequencing
Inter-Alu PCR amplifications were performed on 0.1 μg of each of the DNA samples in Additional file 1: Table S1 using, except where otherwise indicated, the four Alu-based PCR primers AluY278T18 (5’-GAGCGAGACTCCGTCTCA-3’), AluY66H21 (5’-TGGTCTCGATCTCCTGACCTC-3’), R12A/267 (5’-AGCGAGACTCCG-3’) and L12A/8 (5’-TGAGCCACCGCG-3’) (0.075 μM each), followed by sequencing of the amplicons with the Illumina-Solexa platform and mapping as described . The AluScan sequences of the blood samples from 23 non-cancer subjects (column 3 of Additional file 1: Table S1) were pooled together for the construction of a “23-sample reference template” for unpaired analysis (Figure 1). Descriptions of the various samples are given in Additional file 2: Table S2.
Correlation of read-depth
The genome in each DNA sample was divided into contiguous windows 5 kb in size. The read-depth for each window was calculated using the genomeCoverageBed program in BEDtools . The read-depths of the highest 5% were adopted as the 95% quantile value for the read-depth distribution for that sample. Read-depths of larger window sizes (100 kb, 300 kb and 500 kb) were generated by merging the read-depth values of 5 kb windows.
Calling of GHT-based localized CNVs
In the AluScanCNV procedure, detection of a copy-number gain or loss in a test sample relies on comparison of the read-depth of a sequence window on the test sample with that on either a paired control sample in the case of ‘paired analysis’, or a reference template constructed from pooled reference samples in the case of ‘unpaired analysis’, yielding in either case the read-depth ratio for that particular window (Figure 1). The source codes for the AluScanCNV procedure including read-depth calculation are given in Additional file 3: Source code of AluScanCNV.
Hence a reference template can be constructed by grouping together a series of reference samples for calculating the read-depth ratio of a corresponding window on an unpaired test sample.
where λ c represents the mean read-depth value of all the windows analyzed in a control sample in the case of paired analysis, or in a reference template in the case of unpaired analysis. With either unpaired or paired analysis, only those windows that display a finite read-depth in the test sample as well as a finite read-depth in the reference template or paired control are analyzed.
Copy-number gain is called for a window when p <0.05 and 1, and copy-number loss is called for a window when p <0.05 and 1. No CNV is called for a window if p 0.05 or r =1. Φ(t) in Eqn. 11 and Eqn. 12 is replaced by Φ() when Eqn. 10 is used instead of Eqn. 9.
Since the GHT represents a key step in CNV calling using Eqns. 11 and 12, a CNV called using these equations may be referred to as a GHT-based localized CNV in distinction from CNVs that are called by other means.
According to Chiang et al. , the theoretical minimum window size for CNV detection is determined by the required power, sequencing amount, coverage size and reference genome size. In the present study, AluScans with ~30 M reads covering ~150 M unique sequences (Additional file 2: Table S2) were aligned to the ~3 Gb human genome. On this basis, 50 kb would be the theoretical minimum window size for power >0.99, which however has to be increased for higher accuracy in CNV calling .
Identification of recurrent CNVs
Thus M is an “m n” matrix with m candidate windows (rows) and n samples (columns). Each element in M takes on a binary value of 0 or 1, with 1 representing ‘CNV identified’ and 0 representing ‘no CNV identified’. M ij therefore describes the CNV status of the ith window in the jth sample. Mi∙ stands for the CNV status at window i across all samples; and M∙j stands for the CNV status at all the windows in sample j.
Calling of CBS-based extended CNVs
Machine-learning selection of CNV-features to classify different types of cancers
where TP, FP and FN represent true positives, false positives and false negatives respectively.
Clustering of samples was performed with the Euclidean distance method and ward.D cluster method of the ‘pvclust’ package in R .
Results and discussion
The AluScanCNV package depends on two important prerequisites for CNV calling from AluScan sequences. First, there must be a close approximation of the GHT-derived t-distribution to a normal distribution in order to call localized CNVs and recurrent CNVs. Secondly, there should be a close correlation between the read-depths in the test sample and paired control or reference template in using CBS to call extended CNVs: while this is not essential for the application of CBS, it provides important extra assurance for the appropriateness and accuracy of such application. While close correlation between test sample and its paired control in this regard might be expected, it needs to be verified that a close correlation exists between test sample and a reference template constructed from reference samples.
In Figure 2, where the AluScans for blood sample GL2B and the 23 non-cancer reference samples that gave rise to the reference template were all performed with four Alu-based PCR primers as described in Methods, the t-values derived from read-depth ratios through the GHT conformed closely to a normal distribution either with or without GC normalization, thereby confirming the applicability of the GHT to AluScan sequence data. Since the t-distribution was well represented by a normal curve even without GC normalization in this example, the contribution made by GC normalization was not manifest. However, the advantage of GC normalization has been pointed out by other workers . Moreover, in Additional file 4: Figure S1, where a mismatch was introduced such that the AluScan for the test sample was conducted using only three Alu-based primers, whereas the reference-sample AluScans were carried out using four Alu-based primers, the deviation of the t-distribution from a normal curve was pronounced without GC normalization, but substantially improved with GC normalization, indicating that GC normalization enhanced the robustness of GHT-based CNV calling.
Q-Q plots in Figure 4A and 4B show that the high correlation between the read-depths of the test sample GL2T and those of its paired control GL2B (4A: Pearson’s coefficient =0.999), and the high correlation between the read-depths of GL2T and those of the reference template (4B: Pearson’s coefficient =0.994). The results therefore confirmed that a close correlation was obtained in both cases, and the use of the CBS algorithm to call extended CNVs from AluScans is valid when AluScan sequencing is performed employing the experimental conditions described in the Methods section.
Calling of GHT-based localized and recurrent CNVs
When recurrent somatic CNVs were called from the AluScans of liver cancers, the distribution of the CNV gains and losses, indicated by red peaks in Figure 7, were unevenly distributed among different chromosomes with a particularly high concentration of CNV gains in chromosomes 1q and 8q, in accord with the CNVs identified from WGS data of liver cancers  which are represented by orange column in the figure. This accord between the recurrent CNVs called from AluScans and WGS data provided useful validation for CNV calling from AluScans by means of AluScanCNV.
Identification of CBS-based extended CNVs
Application of Eqn. 16 to call CBS-based extended CNVs from the AluScan of glioma GL2T yielded Z scores based on a comparison between the test sample and either a paired control (Figure 9A) or the reference template (Figure 9B). Each dot in the plot, colored green and black on alternate autosomal chromosomes 1 to 22 represents the Z score for a window. The CBS-based extended CNVs revealed as red horizontal bars joining up neighboring windows with the same Z score were similar in Figure 9A and 9B, both of which exhibited large extended copy-number losses on chromosomes 1p and 9, and a large copy-number gain on chromosome 1q. The agreement between Figure 9A and 9B confirmed that either a paired control or a reference template can be employed for CNV analysis as indicated in Figure 1. That the extended copy number losses on chromosomes 1p and 9 were both frequently observed in gliomas pointed to the usefulness of AluScanCNV for calling extended CNVs from AluScan sequences.
A comparison between the extended CNV profile of the primary glioma GL1T (Additional file 5: Figure S2) and that of its recurrent cancer GL2T (Figure 9) showed that the two profiles were extensively similar in both paired and unpaired analysis. Therefore cancer recurrence in this instance was not accompanied by any alteration in extended CNVs.
Cancer classification using machine learning-selected CNV-features
Previously we found that machine learning can be employed to select from microarray-based recurrent CNV-features that are capable of distinguishing between constitutional genomes with a high generalized predisposition to cancer and those with a low predisposition . When this machine learning procedure was applied to the localized or recurrent somatic CNVs called from the AluScans of 21 liver cancers and 16 non-liver cancers shown in Additional file 2: Table S2, 43 localized CNV-features were selected (shown in Additional file 6: Table S3A) for their capability of distinguishing between these two classes of cancers with AUC =1.000 and F-score =1.000 in 1,000 iterations of two-fold cross validation based on the Naïve Bayes algorithm; as shown in the dendrogram in Figure 8A, these localized CNV-features enabled the hierarchical clustering of the 37 cancer samples into the liver and non-liver classes with 100% accuracy. On the other hand, only 12 recurrent CNV-features were selected (shown in Additional file 6: Table S3B) with AUC =0.982 and F-score =0.889 in 1,000 iterations of two-fold cross validation based on the Naïve Bayes algorithm; and these recurrent CNV-features enabled the hierarchical clustering of the 37 cancer samples into the liver and non-liver classes with 34/37 viz. 91.9% internal accuracy, with three incorrect entries as shown in the dendrogram in Figure 8B. It might be noted in this regard that, because the total of 37 cancer samples employed bordered on the minimum for recurrent CNV calling, there is a possibility that the 91.9% internal accuracy attained with the recurrent CNV-features might improve with a larger sample size. The demonstrated internal accuracy clearly showed that the selected CNV-features called by AluScanCNV are highly correlated to cancer-type, and therefore merit in-depth investigation to elucidate the mechanistic basis of such cancer-type correlation. In any event, the findings in Figures 8A and 8B pointed to the utility of CNV calling from AluScan sequences, and the distinguishing power of the machine-selected localized and recurrent CNV-features strongly suggests that such CNV-features are endowed with correlations with cancer types that could lead to valuable insight into type-specific factors underlying the oncogenesis and propagation of different types of cancers.
Performance on external dataset
The AluScan platform, comprising the usage of inter-Alu PCR with multiple Alu-based PCR primers to generate a huge range of amplicons for next-generation sequencing, enables the facile capture of Alu-proximal sequences that are widespread throughout the human genome. It makes possible a rapid scan of mutations and alterations in diverse genomic regions including exons, introns and other non-coding regions employing only ~0.1 μg DNA samples .
The results in Figures 2 and 4 showed that the distribution of t-values obtained from AluScan sequences conformed closely to a normal distribution, and the read-depths of a test AluScan sample were closely correlated with those of a paired control AluScan or a reference template constructed from the AluScans of reference samples. These findings established the validity of the AluScanCNV package for calling CNVs from AluScan sequences, which was further confirmed by the properties of the AluScan-derived CNVs identified in various cancer samples.
In Figure 9 and Additional file 5: Figure S2, the large extended copy-number losses identified on chromosomes 1p and 9 in the recurrent GL2T and primary GL1T tumors were entirely consistent with the frequent occurrence of copy number losses at these locations among gliomas - . Moreover, the localized CNVs of GL2T shown in both panels of Figure 6 clearly pointed to the concentration of localized CNV losses on chromosomes 1p and 9, and concentration of localized CNV gains on chromosome 1q, in complete agreement with the occurrence of extended CNV gains and losses on these chromosomes in Figure 9, even though the calling of localized CNVs and the calling of extended CNVs depend on different approximations: the former requires a close conformation of t-values to a normal distribution, whereas the latter requires a close correlation between the read-depths of a test sample and the read-depths of a reference template or paired control.
As well, in Figure 7 the distribution of recurrent somatic CNVs called from AluScans revealed a striking enrichment of CNV gains in chromosomes 1q and 8q compared to other chromosomes. Such enrichment in 1q and 8q likewise represented the most outstanding property of CNVs called from a WGS study : therefore there was excellent agreement in this regard between the CNVs called from AluScans and the CNVs called from WGS. Given the small DNA sample requirement and much lighter data-processing task of AluScan relative to WGS, the AluScan platform would provide an expedited means for characterizing the CNV profiles of normal and diseased human genomes even with small amounts of biopsied tissues. Moreover, because the AluScan method amplifies DNA sequences only from the Alu element-rich human genome but not from microbial genomes, it is applicable to the analysis of esophageal, stomach, intestinal, pulmonary and wound samplings etc. with little interference from the presence of microbial DNAs.
When the localized or recurrent CNVs obtained from liver and non-liver cancers derived from AluScans were subjected to machine learning-selection, distinguishing localized or recurrent CNV-features could be selected that enabled a highly accurate classification between liver cancers and non-liver cancers (Figure 8). These results corroborated and expanded our earlier finding that recurrent constitutional CNV-features provided a valuable basis for the classification and prediction of high versus low constitutional predisposition to cancer . In so doing, they have substantiated the usefulness of machine-learning selected CNV-features, both recurrent and localized ones, for identifying CNVs in the germ-line or cancer genomes that are correlated with the attributes of predisposition to cancer and cancer typing. An extension of this CNV-feature based approach to identify the role of CNVs important to other cancer attributes such as cancer staging and susceptibility or resistance to different treatment modalities, as well as the CNVs important to other diseases besides cancers likewise merits in-depth investigation.
Availability of supporting data
The AluScan sequencing data of the 63 samples listed in Additional file 1: Table S1 are available upon request.
Copy Number Variation
Single Nucleotide Polymorphism
Polymerase Chain Reaction
Whole Genome Sequencing
Circular Binary Segmentation
The study was supported by grants to H. Xue from University Grants Council of Hong Kong SAR (VPRDO09/10.SC08, VPRDO14SC01, DG14SC02, SRFI11SC06 and SRFI11SC06PG) and to L. Zhang from 863 Program, Ministry of Science and Technology, China (2012AA02A201), as well as grants to S. Li from National Science Foundation of China (91229201 and 81225025).
- Hayes JL, Tzika A, Thygesen H, Berri S, Wood HM, Hewitt S, Pendlebury M, Coates A, Willoughby L, Watson CM, Rabbitts P, Roberts P, Taylor GR: Diagnosis of copy number variation by Illumina next generation sequencing is comparable in performance to oligonucleotide array comparative genomic hybridisation. Genomics. 2013, 102: 174-181. 10.1016/j.ygeno.2013.04.006.View ArticlePubMedGoogle Scholar
- Pinkel D, Albertson DG: Array comparative genomic hybridization and its applications in cancer. Nat Genet. 2005, 37 (Suppl): S11-S17. 10.1038/ng1569.View ArticlePubMedGoogle Scholar
- Sathirapongsasuti JF, Lee H, Horst BA, Brunner G, Cochran AJ, Binder S, Quackenbush J, Nelson SF: Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics. 2011, 27: 2648-2654. 10.1093/bioinformatics/btr462.PubMed CentralView ArticlePubMedGoogle Scholar
- Xie C, Tammi MT: CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC bioinformatics. 2009, 10: 80-10.1186/1471-2105-10-80.PubMed CentralView ArticlePubMedGoogle Scholar
- Duan J, Zhang JG, Deng HW, Wang YP: Comparative studies of copy number variation detection methods for next-generation sequencing technologies. PLoS One. 2013, 8: e59128-10.1371/journal.pone.0059128.PubMed CentralView ArticlePubMedGoogle Scholar
- Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z: A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform. 2014, 15: 256-278. 10.1093/bib/bbs086.PubMed CentralView ArticlePubMedGoogle Scholar
- Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, Janoueix-Lerosey I, Delattre O, Barillot E: Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics. 2012, 28: 423-425. 10.1093/bioinformatics/btr670.PubMed CentralView ArticlePubMedGoogle Scholar
- Plagnol V, Curtis J, Epstein M, Mok KY, Stebbings E, Grigoriadou S, Wood NW, Hambleton S, Burns SO, Thrasher AJ, Kumararatne D, Doffinger R, Nejentsev S: A robust model for read count data in exome sequencing experiments and implications for copy number variant calling. Bioinformatics. 2012, 28: 2747-2754. 10.1093/bioinformatics/bts526.PubMed CentralView ArticlePubMedGoogle Scholar
- Abyzov A, Urban AE, Snyder M, Gerstein M: CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011, 21: 974-984. 10.1101/gr.114876.110.PubMed CentralView ArticlePubMedGoogle Scholar
- Yoon S, Xuan Z, Makarov V, Ye K, Sebat J: Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 2009, 19: 1586-1592. 10.1101/gr.092981.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Mei L, Ding X, Tsang SY, Pun FW, Ng SK, Yang J, Zhao C, Li D, Wan W, Yu CH, Tan TC, Poon WS, Leung GK, Ng HK, Zhang L, Xue H: AluScan: a method for genome-wide scanning of sequence and structure variations in the human genome. BMC Genomics. 2011, 12: 564-10.1186/1471-2164-12-564.PubMed CentralView ArticlePubMedGoogle Scholar
- Hastings PJ, Lupski JR, Rosenberg SM, Ira G: Mechanisms of change in gene copy number. Nat Rev Genet. 2009, 10: 551-564. 10.1038/nrg2593.PubMed CentralView ArticlePubMedGoogle Scholar
- Cook GW, Konkel MK, Walker JA, Bourgeois MG, Fullerton ML, Fussell JT, Herbold HD, Batzer MA: A Comparison of 100 Human Genes Using an Alu Element-Based Instability Model. PLoS One. 2013, 8: e65188-10.1371/journal.pone.0065188.PubMed CentralView ArticlePubMedGoogle Scholar
- Mermel CH, Schumacher SE, Hill B, Meyerson ML, Beroukhim R, Getz G: GISTIC20 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 2011, 12: R41-10.1186/gb-2011-12-4-r41.PubMed CentralView ArticlePubMedGoogle Scholar
- Olshen AB, Venkatraman ES, Lucito R, Wigler M: Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004, 5: 557-572. 10.1093/biostatistics/kxh008.View ArticlePubMedGoogle Scholar
- Volik S, Raphael BJ, Huang G, Stratton MR, Bignel G, Murnane J, Brebner JH, Bajsarowicz K, Paris PL, Tao Q, Kowbel D, Lapuk A, Shagin DA, Shagina IA, Gray JW, Cheng JF, de Jong PJ, Pevzner P, Collins C: Decoding the fine-scale structure of a breast cancer genome and transcriptome. Genome Res. 2006, 16: 394-404. 10.1101/gr.4247306.PubMed CentralView ArticlePubMedGoogle Scholar
- Frank B, Bermejo JL, Hemminki K, Sutter C, Wappenschmidt B, Meindl A, Kiechle-Bahat M, Bugert P, Schmutzler RK, Bartram CR, Burwinkel B: Copy number variant in the candidate tumor suppressor gene MTUS1 and familial breast cancer risk. Carcinogenesis. 2007, 28: 1442-1445. 10.1093/carcin/bgm033.View ArticlePubMedGoogle Scholar
- Ciriello G, Miller ML, Aksoy BA, Senbabaoglu Y, Schultz N, Sander C: Emerging landscape of oncogenic signatures across human cancers. Nat Genet. 2013, 45: 1127-1133. 10.1038/ng.2762.PubMed CentralView ArticlePubMedGoogle Scholar
- Shlien A, Malkin D: Copy number variations and cancer. Genome Med. 2009, 1: 62-10.1186/gm62.PubMed CentralView ArticlePubMedGoogle Scholar
- Diskin SJ, Hou C, Glessner JT, Attiyeh EF, Laudenslager M, Bosse K, Cole K, Mosse YP, Wood A, Lynch JE, Pecor K, Diamond M, Winter C, Wang K, Kim C, Geiger EA, McGrady PW, Blakemore AI, London WB, Shaikh TH, Bradfield J, Grant SF, Li H, Devoto M, Rappaport ER, Hakonarson H, Maris JM: Copy number variation at 1q21.1 associated with neuroblastoma . Nature. 2009, 459: 987-991. 10.1038/nature08035.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu W, Sun J, Li G, Zhu Y, Zhang S, Kim ST, Sun J, Wiklund F, Wiley K, Isaacs SD, Stattin P, Xu J, Duggan D, Carpten JD, Isaacs WB, Gronberg H, Zheng SL, Chang BL: Association of a germ-line copy number variation at 2p24.3 and risk for aggressive prostate cancer . Cancer Res. 2009, 69: 2176-2179. 10.1158/0008-5472.CAN-08-3151.PubMed CentralView ArticlePubMedGoogle Scholar
- Beroukhim R, Mermel CH, Porter D, Wei G, Raychaudhuri S, Donovan J, Barretina J, Boehm JS, Dobson J, Urashima M, Mc Henry KT, Pinchback RM, Ligon AH, Cho YJ, Haery L, Greulich H, Reich M, Winckler W, Lawrence MS, Weir BA, Tanaka KE, Chiang DY, Bass AJ, Loo A, Hoffman C, Prensner J, Liefeld T, Gao Q, Yecies D, Signoretti S: The landscape of somatic copy-number alteration across human cancers. Nature. 2010, 463: 899-905. 10.1038/nature08822.PubMed CentralView ArticlePubMedGoogle Scholar
- Krepischi AC, Pearson PL, Rosenberg C: Germline copy number variations and cancer predisposition. Future Oncol. 2012, 8: 441-450. 10.2217/fon.12.34.View ArticlePubMedGoogle Scholar
- Ding X, Tsang S-Y, Ng S-K, Xue H: Application of Machine Learning to Development of Copy Number Variation-based Prediction of Cancer Risk. Genomics Insights. 2014, 7: 11-Google Scholar
- Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010, 26: 841-842. 10.1093/bioinformatics/btq033.PubMed CentralView ArticlePubMedGoogle Scholar
- Hinkley D: On the ratio of two correlated normal random variables. Biometrika. 1969, 56: 635-639. 10.1093/biomet/56.3.635.View ArticleGoogle Scholar
- Chiang DY, Getz G, Jaffe DB, O'Kelly MJ, Zhao X, Carter SL, Russ C, Nusbaum C, Meyerson M, Lander ES: High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Methods. 2009, 6: 99-103. 10.1038/nmeth.1276.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang YH: On the Number of Successes in Independent Trials. Stat Sinica. 1993, 3: 295-312.Google Scholar
- Al-Khalidi HR, Hong Y, Fleming TR, Therneau TM: Insights on the robust variance estimator under recurrent-events model. Biometrics. 2011, 67: 1564-1572. 10.1111/j.1541-0420.2011.01589.x.View ArticlePubMedGoogle Scholar
- Teo SM, Pawitan Y, Ku CS, Chia KS, Salim A: Statistical challenges associated with detecting copy number variations with next-generation sequencing. Bioinformatics. 2012, 28: 2711-2718. 10.1093/bioinformatics/bts535.View ArticlePubMedGoogle Scholar
- Venkatraman E, Olshen AB: A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics. 2007, 23: 657-663. 10.1093/bioinformatics/btl646.View ArticlePubMedGoogle Scholar
- Hall MA, Smith LA: Feature Subset Selection: A Correlation based Filter Approach. International Conference on Neural Information Processing and Intelligent Information Systems. 1997, Springer, Berlin, 855-858.Google Scholar
- Dagliyan O, Uney-Yuksektepe F, Kavakli IH, Turkay M: Optimization based tumor classification from microarray gene expression data. PLoS One. 2011, 6: e14579-10.1371/journal.pone.0014579.PubMed CentralView ArticlePubMedGoogle Scholar
- Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA data mining software: an update. ACM SIGKDD explorations newsletter. 2009, 11: 10-18. 10.1145/1656274.1656278.View ArticleGoogle Scholar
- Suzuki R, Shimodaira H: Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics. 2006, 22: 1540-1542. 10.1093/bioinformatics/btl117.View ArticlePubMedGoogle Scholar
- Kan Z, Zheng H, Liu X, Li S, Barber TD, Gong Z, Gao H, Hao K, Willard MD, Xu J, Hauptschein R, Rejto PA, Fernandez J, Wang G, Zhang Q, Wang B, Chen R, Wang J, Lee NP, Zhou W, Lin Z, Peng Z, Yi K, Chen S, Li L, Fan X, Yang J, Ye R, Ju J, Wang K: Whole-genome sequencing identifies recurrent mutations in hepatocellular carcinoma. Genome Res. 2013, 23: 1422-1433. 10.1101/gr.154492.113.PubMed CentralView ArticlePubMedGoogle Scholar
- Reyes-Botero G, Dehais C, Idbaih A, Martin-Duverneuil N, Lahutte M, Carpentier C, Letouzé E, Chinot O, Loiseau H, Honnorat J: Contrast enhancement in 1p/19q-codeleted anaplastic oligodendrogliomas is associated with 9p loss, genomic instability, and angiogenic gene expression. Neuro Oncol. 2014, 16: 662-670. 10.1093/neuonc/not235.PubMed CentralView ArticlePubMedGoogle Scholar
- Boots-Sprenger SH, Sijben A, Rijntjes J, Tops BB, Idema AJ, Rivera AL, Bleeker FE, Gijtenbeek AM, Diefes K, Heathcock L: Significance of complete 1p/19q co-deletion, IDH1 mutation and MGMT promoter methylation in gliomas: use with caution. Mod Pathol. 2013, 26: 922-929. 10.1038/modpathol.2012.166.View ArticlePubMedGoogle Scholar
- Coco S, Valdora F, Bonassi S, Scaruffi P, Stigliani S, Oberthuer A, Berthold F, Andolfo I, Servidei T, Riccardi R, Basso E, Iolascon A, Tonini GP: Chromosome 9q and 16q loss identified by genome-wide pooled-analysis are associated with tumor aggressiveness in patients with classic medulloblastoma. OMICS. 2011, 15: 273-280. 10.1089/omi.2010.0103.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.