© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]
Somatic copy number alterations (SCNAs) play an important role in carcinogenesis. However, the impact of genomic architecture on the global patterns of SCNAs in cancer genomes remains elusive. In this work, we conducted multiple linear regression (MLR) analyses of the pooled SCNA data from The Cancer Genome Atlas (TCGA) Pan-Cancer project. We performed MLR analyses for 11 individual cancer types and three different kinds of SCNAs-amplifications and deletions, telomere-bound and interstitial SCNAs and local SCNAs. Our MLR model explains >30% of the pooled SCNA breakpoint variation, with the explanatory power ranging from 13 to 32% for different cancer types and SCNA types. In addition to confirming previously identified features [e.g. long interspersed element-1 (L1) and short interspersed nuclear elements], we also identified several novel informative features, including distance to telomere, distance to centromere and low-complexity repeats. The results of the MLR analyses were additionally confirmed on an independent SCNA data set obtained from the catalogue of somatic mutations in cancer database. Using a rare-event logistic regression model and an extremely randomized tree classifier, we revealed that genomic features are informative for defining common SCNA breakpoint hotspots. Our findings shed light on the molecular mechanisms of SCNA generation in cancer.