Comprehensive characterization of CRC with germline mutations reveals a distinct somatic mutational landscape and elevated cancer risk in the Chinese population ================================================================================================================================================================ * Jianfei Yao * Yunhuan Zhen * Jing Fan * Yuan Gong * Yumeng Ye * Shaohua Guo * Hongyi Liu * Xiaoyun Li * Guosheng Li * Pan Yang * Xiaohui Wang * Danni Liu * Tanxiao Huang * Huiya Cao * Peisu Suo * Yuemin Li * Jingbo Yu * Lele Song ## Abstract **Objective:** Hereditary colorectal cancer (CRC) accounts for approximately 5%–10% of all CRC cases. The full profile of CRC-related germline mutations and the corresponding somatic mutational profile have not been fully determined in the Chinese population. **Methods:** We performed the first population study investigating the germline mutation status in more than 1,000 (*n* = 1,923) Chinese patients with CRC and examined their relationship with the somatic mutational landscape. Germline alterations were examined with a 58-gene next-generation sequencing panel, and somatic alterations were examined with a 605-gene panel. **Results:** A total of 92 pathogenic (P) mutations were identified in 85 patients, and 81 likely pathogenic (LP) germline mutations were identified in 62 patients, accounting for 7.6% (147/1,923) of all patients. MSH2 and APC was the most mutated gene in the Lynch syndrome and non-Lynch syndrome groups, respectively. Patients with P/LP mutations had a significantly higher ratio of microsatellite instability, highly deficient mismatch repair, family history of CRC, and lower age. The somatic mutational landscape revealed a significantly higher mutational frequency in the P group and a trend toward higher copy number variations in the non-P group. The Lynch syndrome group had a significantly higher mutational frequency and tumor mutational burden than the non-Lynch syndrome group. Clustering analysis revealed that the Notch signaling pathway was uniquely clustered in the Lynch syndrome group, and the MAPK and cAMP signaling pathways were uniquely clustered in the non-Lynch syndrome group. Population risk analysis indicated that the overall odds ratio was 11.13 (95% CI: 8.289–15.44) for the P group and 20.68 (95% CI: 12.89–33.18) for the LP group. **Conclusions:** Distinct features were revealed in Chinese patients with CRC with germline mutations. The Notch signaling pathway was uniquely clustered in the Lynch syndrome group, and the MAPK and cAMP signaling pathways were uniquely clustered in the non-Lynch syndrome group. Patients with P/LP germline mutations exhibited higher CRC risk. * Colorectal cancer * germline * Lynch syndrome * hereditary cancer * next-generation sequencing * Notch signaling pathway * TMB * MSI * MMR ## Introduction Colorectal cancer (CRC) is the third and second most common cancer in men and women worldwide, respectively1, and the fifth most common cancer in China2. Although most cases of CRC are sporadic, inherited factors are known to contribute to approximately 30%–35% of CRC cases3. Approximately 5%–10% of patients with CRC carry high-risk germline mutations that are associated with known hereditary CRC syndromes, including Lynch syndrome (also known as hereditary non-polyposis CRC), familial adenomatous polyposis (FAP), MUTYH-associated polyposis, Peutz-Jeghers syndrome, juvenile polyposis syndrome, PTEN hamartoma tumor syndrome, and serrated polyposis syndrome4–6. The germline mutations associated with these syndromes have been extensively investigated at both the genomic and individual gene levels, and the heritability of many of these mutations has been confirmed in population and/or family studies. New germline mutations with suspected heritability have also been reported in recent years7,8. Many hotspot mutations have been identified in hereditary CRC syndromes, primarily involving APC, MLH1, MSH2, MSH6, and PMS27,8. Therefore, hereditary CRC syndromes are associated with both hotspot and non-hotspot germline mutations. Previous research has shown that pathogenic germline mutations increase the risk of cancers, including not only CRC7 but also hereditary breast and ovarian cancer syndrome9 and lung cancer10. However, this risk remains to be clearly defined for Chinese patients with CRC. Furthermore, the somatic mutational landscape of hereditary CRC syndromes has yet to be characterized and compared with that of sporadic CRC. This comparison may aid in understanding the mechanisms underlying hereditary CRC syndromes. In this study, we recruited a large cohort of 1,923 unselected patients with CRC, investigated both the germline and somatic mutational landscapes, and performed extensive comparisons between patients with and without pathogenic germline mutations. More importantly, by comparing the incidence of individual mutations in our cohort with that in the general population, we clarified the risk associated with the identified germline mutations. This study provides important information regarding the mutational landscape, cancer risk, and potential carcinogenic mechanisms of CRC-related germline mutations in the Chinese population. Our findings may help establish preventive and therapeutic strategies for patients with CRC with suspected heritability. ## Materials and methods ### Ethics approval All experimental plans and protocols for the study were submitted to the ethics/licensing committees of the indicated participating hospitals for review and approval before the start of the clinical study, and were approved by the corresponding committees of the participating hospitals (Approval No. S2015-032-02). Because the study had a retrospective design and used retrospective samples collected by the participating hospitals, informed consent was not required. Patients with pathogenic (P) or likely pathogenic (LP) germline mutations were informed of the test results. All experiments, methods, procedures, and personnel training were carried out in accordance with the relevant guidelines and regulations of the participating hospitals and laboratories. ### Study design The study was designed and implemented in 7 Chinese hospitals, and both cancer tissue and blood samples were collected retrospectively. The study was designed to include as many patients with CRC as possible, provided that the tissue or blood samples were available for next-generation sequencing (NGS). Samples collected between January 2016 and August 2020 from 1,923 patients with CRC were obtained according to the availability of samples for NGS testing in the participating hospitals. The details of patient demographic information, pathological information, family history, and microsatellite instability (MSI)/mismatch repair (MMR) information are summarized in **Table 1**. Family history was defined as confirmed CRC patients with at least one immediate family member (first degree relative) with a history of CRC diagnosis. The immediate family members included parents, siblings, and children. The collected samples comprised tissue samples [formalin-fixed paraffin-embedded (FFPE) samples or frozen samples from surgery] and blood samples obtained at the time of CRC diagnosis confirmation. Diagnosis was confirmed with imaging examinations and subsequent pathological examinations. No participants received chemotherapy, radiotherapy, targeted therapy, or immunotherapy before the tissue and blood samples were collected. The somatic sequencing data presented in this study were from FFPE samples or frozen tissue samples. Germline sequencing data were obtained from the corresponding genomic DNA of white blood cells. View this table: [Table 1](http://www.cancerbiomed.org/content/19/5/707/T1) Table 1 Demographic information and MSI/MMR status for recruited patients ### Sample preparation, targeted NGS, and data processing For the FFPE samples, ten 5 μm tumor slices were used for DNA extraction with a QIAamp DNA FFPE Kit (QIAGEN, Valencia, CA, USA) according to the manufacturer’s instructions. For tissue samples, a minimum of 50 mg tissue was used for DNA extraction with a QIAamp DNA Mini Kit (QIAGEN, Valencia, CA, USA). For blood samples, 2 mL of blood was collected in tubes containing EDTA and centrifuged at 1,600 × g for 10 min at 4 °C within 2 h of collection. The peripheral blood lymphocyte (PBL) debris was stored at −20 °C until further use. DNA from PBLs was extracted with a RelaxGene Blood DNA system (Tiangen Biotech Co., Ltd., Beijing, China) according to the manufacturer’s instructions. Both cancer tissue and white blood cell genomic DNA were quantified with a Qubit 2.0 fluorometer and Qubit dsDNA HS assay kit (Thermo Fisher Scientific, Inc., Waltham, MA, USA) according to the manufacturer’s instructions. Fragmented genomic DNA underwent end-repair, A-tailing, and ligation with indexed adapters sequentially, followed by size selection with Agencourt AMPure XP beads (Beckman Coulter Inc., Brea, CA, USA). DNA fragments were used for library construction with a KAPA Library Preparation kit (Kapa Biosystems, Inc., Wilmington, MA, USA) according to the manufacturer’s protocol. Hybridization-based target enrichment was performed with a HaploX germline gene panel (58 known hereditary cancer-related genes, HaploX Biotechnology; gene list in **Supplementary Table S1**) for white blood cell genomic DNA or a HaploX pan-cancer gene panel (605 cancer-relevant genes, HaploX Biotechnology; gene list in **Supplementary Table S2**) for cancer tissue sequencing. Depending on the amount of DNA used, 7 to 8 polymerase chain reaction cycles were performed with pre-capture ligation-mediated polymerase chain reaction oligonucleotides (Kapa Biosystems, Inc.) in 50 μL reactions. DNA sequencing was then performed on an Illumina Novaseq 6000 system according to the manufacturer’s recommendations at an average depth of 2,200× for tissue and FFPE samples. Data meeting the following criteria were included in subsequent analysis: ratio of remaining data filtered by fastq in raw data ≥85%; proportion of Q30 bases ≥85%; ratio of reads on the reference genome ≥85%; target region coverage ≥98%; and average sequencing depth in tissues ≥2,200×. The called somatic variants were required to meet the following criteria: read depth at a position ≥20×; variant allele fraction (VAF) ≥2% for tissue and PBL genomic DNA; somatic-*P* value ≤0.01; strand filter ≥1. VAF values were calculated for Q30 bases. The copy number variation (CNV) was detected with CNVkit version 0.9.3 ([https://github.com/etal/cnvkit](https://github.com/etal/cnvkit)). Further analyses of genomic alterations were also performed, including single nucleotide variants (SNVs), insertion/deletion (indels), and CNVs. ### Interpretation of pathogenicity of germline mutations and calculation of somatic TMB The pathogenicity of germline mutations was defined and predicted according to the 5-grade classification system of the American College of Medical Genetics and Genomics Guidelines for the Interpretation of Sequence. All germline mutations were categorized into P, LP, or non-pathogenic (non-P) groups. The variants of uncertain significance (VUS), and benign and likely benign mutations were defined as the non-P group in this study. TMB was calculated by division of the total number of tissue non-synonymous SNP and indel variations (VAF > 2%) by the full length of the exome region of the 605-gene NGS panel (**Supplementary Table S2**). The genomic sequence from the DNA of PBLs was used for genomic alignment when calling the somatic mutations. ### Statistical analysis Statistical analysis was performed, and figures were plotted in GraphPad Prism 5.0 software (GraphPad Software, Inc, La Jolla, CA, USA). Student’s t-test was performed when 2 groups were compared, and analysis of variance and post hoc tests were performed when 3 or more groups were compared. Chi-square test and Fisher’s test were performed when rates or percentages were compared for significance. Figures for the mutation spectrum were produced with R software ([https://www.r-project.org/](https://www.r-project.org/)). Data for pathway enrichment analysis were analyzed with the method described by DAVID Bioinformatics Resources 6.8 ([https://david.ncifcrf.gov/](https://david.ncifcrf.gov/)) and were visualized with corresponding packages for R software. The protein-protein interaction network was analyzed with the STRING database, and the hub genes were determined with Cytoscape software (cytoscape.org); the Degree method was used to rank the genes. The odds ratio (OR) was calculated on the basis of the frequency of a certain germline mutation from the Genome Aggregation Database (gnomAD) in the general population and the corresponding mutation frequency obtained from this study. The OR and 95% confidence interval (CI) for each germline mutation was calculated in SPSS 17.0 software (IBM China Company Limited, Beijing, China). **P* < 0.05; ***P* < 0.01; and \***|*P* < 0.001. ## Results ### The panorama of germline mutations in Chinese patients with CRC First, we investigated the genetic landscape of germline alterations in all 1,923 recruited patients with CRC, among whom we identified 92 P germline mutations in 85 patients (**Figure 1A**) and 81 LP germline mutations in 62 patients (**Figure 1A**). The remaining 1,776 patients carried VUS, benign, or likely benign germline alterations (non-P). The proportion of patients with P or LP germline mutations was 7.6% (147/1,923). The highest number of P mutations was seen in APC and MSH2 (*n* = 14), followed by BRCA1 (*n* = 8), MLH1 (*n* = 7), and RAD50 (*n* = 7). MLH1 and MSH2 exhibited the highest number of LP mutations (*n* = 10), followed by MSH6 (*n* = 7), NTRK1 (*n* = 7), and ATM (*n* = 6). Further analysis indicated that 27 of 92 P mutations were detected in patients who had been diagnosed with Lynch syndrome (**Figure 1B, left panel**). MSH2 was the gene associated with the most mutations in Lynch syndrome (14) and was followed by MLH1 (*n* = 7), MSH6 (*n* = 4), and PMS2 (*n* = 2) (**Figure 1B, middle panel**). For patients without Lynch syndrome, APC was identified as the gene associated with the most mutations (*n* = 14) and was followed by BRCA1 (*n* = 8), RAD50 (*n* = 7), MUTYH (*n* = 5), ATM (*n* = 5), and BRCA2 (*n* = 4) (**Figure 1B, right panel**). ![](http://www.cancerbiomed.org/https://www.cancerbiomed.org/content/cbm/19/5/707/F1/graphic-1.medium.gif) [](http://www.cancerbiomed.org/content/19/5/707/F1/graphic-1) ![](http://www.cancerbiomed.org/https://www.cancerbiomed.org/content/cbm/19/5/707/F1/graphic-2.medium.gif) [](http://www.cancerbiomed.org/content/19/5/707/F1/graphic-2) Figure 1 Category and distribution of germline mutations in the Chinese population. A. The number of mutations in highly mutated genes in the pathogenic (P) and likely pathogenic (LP) groups. B. Details of mutated genes and their numbers in the Lynch syndrome (LS) and non-Lynch syndrome (non-LS) groups. Interestingly, we observed a significantly higher ratio of patients with MSI-H or dMMR in the P or LP group than the non-P group (**Table 1**). We also identified a significantly higher ratio of patients with family history in the P and LP groups than the non-P group. Patients with P or LP mutations were significantly younger than those in the non-P group (**Table 1**). A significant difference in stage distribution was observed between the LP and the non-P group, possibly because of the low number of patients in the LP group in stages I and III. We observed no significant differences in P and LP germline mutations between males and females (**Table 1**). Next, we identified the specific types of mutations related to the P and LP alterations. Most mutations involved frameshift (deletion and insertion), nonsense, nonsynonymous (single nucleotide mutations), or splicing (**Figure 2A**). These mutations may cause large fragment changes or key amino acid alterations in proteins and therefore substantially influence gene function and potentially lead to high susceptibility to CRC. APC, MSH2, and MLH1, identified as the 3 genes with the highest number of P and LP mutations, might lead to familial adenomatous polyposis and Lynch syndrome. ![](http://www.cancerbiomed.org/https://www.cancerbiomed.org/content/cbm/19/5/707/F2/graphic-3.medium.gif) [](http://www.cancerbiomed.org/content/19/5/707/F2/graphic-3) ![](http://www.cancerbiomed.org/https://www.cancerbiomed.org/content/cbm/19/5/707/F2/graphic-4.medium.gif) [](http://www.cancerbiomed.org/content/19/5/707/F2/graphic-4) Figure 2 Types and distribution of mutations in highly mutated genes. A. Types and numbers of germline mutations in the P and LP groups. B. Distribution of P (red) and LP (blue) mutations in highly mutated genes, including APC, ATM, MLH1, MSH2, MSH6, and PMS2. Blue bars indicate key functional domains. The distribution of germline mutations in the highly mutated genes is shown in **Figure 2B**. Both P (red) and LP (blue) mutations of APC, ATM, MLH1, MSH2, MSH6, and PMS2 are plotted on individual gene schemes. Most germline mutations were located in key functional domains (blue bars). This effect was most prominent for APC, in which several mutations were distributed in the suppressor APC, APC_u9, and PTZ00449 superfamily domains. This observation suggested that P/LP germline mutations within key functional domains are more likely to be pathogenic than other mutations. We identified several novel, previously unreported germline mutations in the dbSNP, gnomAD, and ClinVar databases (**Table 2**). These mutations included frameshift, nonsense, and splicing mutations potentially causing large fragment alterations in genes. All were classified as LP mutations, owing to their deleterious properties and undetermined clinical significance. Interestingly, patients with mismatch repair-related gene mutations (MSH2 and MSH6) and NTRK1 germline mutations exhibited very high levels of somatic TMB and a high ratio of MSI-H, thus suggesting that these mutations might behave in the same manner as known P mutations, although further clinical evidence is needed to validate this hypothesis. View this table: [Table 2](http://www.cancerbiomed.org/content/19/5/707/T2) Table 2 Novel mutations identified in this study ### Correlations among characteristic somatic mutational landscapes, functional alterations, and germline mutations in CRC The somatic mutational features of CRC with germline mutations, and how this condition relates to sporadic CRC, remain to be investigated in detail. Here we studied the somatic mutational features of CRC with or without P/LP germline mutations (**Supplementary Figure S1A**), focusing specifically on the differences among the P, LP, and non-P groups in terms of individual gene mutational frequency (**Figure 3A–D**), TMB (**Figure 3E**), and mutations significantly affecting pathways or functions (**Figure 4**). ![Figure 3](http://www.cancerbiomed.org/https://www.cancerbiomed.org/content/cbm/19/5/707/F3.medium.gif) [Figure 3](http://www.cancerbiomed.org/content/19/5/707/F3) Figure 3 Comparison of somatic mutational frequency of highly mutated genes among the P, LP, and non-P groups. A. Comparison of somatic SNV/indel frequency among groups. B. Comparison of somatic CNV frequency among groups. C. Comparison of somatic SNV/indel frequency between patients with and without Lynch-related P germline mutations. D. Comparison of somatic CNV frequency between patients with and without Lynch-related P germline mutations. E. Comparison of TMB among the P, LP, and non-P groups. **P* < 0.05; ***P* < 0.01; \***|*P* < 0.001. ![](http://www.cancerbiomed.org/https://www.cancerbiomed.org/content/cbm/19/5/707/F4/graphic-6.medium.gif) [](http://www.cancerbiomed.org/content/19/5/707/F4/graphic-6) ![](http://www.cancerbiomed.org/https://www.cancerbiomed.org/content/cbm/19/5/707/F4/graphic-7.medium.gif) [](http://www.cancerbiomed.org/content/19/5/707/F4/graphic-7) ![](http://www.cancerbiomed.org/https://www.cancerbiomed.org/content/cbm/19/5/707/F4/graphic-8.medium.gif) [](http://www.cancerbiomed.org/content/19/5/707/F4/graphic-8) ![](http://www.cancerbiomed.org/https://www.cancerbiomed.org/content/cbm/19/5/707/F4/graphic-9.medium.gif) [](http://www.cancerbiomed.org/content/19/5/707/F4/graphic-9) ![](http://www.cancerbiomed.org/https://www.cancerbiomed.org/content/cbm/19/5/707/F4/graphic-10.medium.gif) [](http://www.cancerbiomed.org/content/19/5/707/F4/graphic-10) Figure 4 Representative highly significant somatic pathway clustering for the P, LP, and non-P groups. A. GO (biological function, BP) and KEGG somatic pathway clustering results for the groups. B. GO (BP) and KEGG somatic pathway clustering results for patients with or without Lynch P germline mutations. We identified substantial differences in the SNV/indel mutational frequency of highly mutated genes (**Figure 3A**). For many genes, including TP53, SYNE1, and KMT2D, a significantly higher mutational frequency was identified in the P group than the non-P group. Similarly, a higher mutational frequency was found in the LP group than the non-P group in several genes, including ZFHX3 and KMT2D. Interestingly, the mutational frequency of APC and KARS did not differ among the 3 groups. In contrast, most CNV alterations did not differ significantly across the 3 groups, except for NCOA3 (*P* < 0.05), although we did observe a trend toward higher CNV alterations in the non-P group (**Figure 3B**). The overall CNV rate of the P group was significantly lower than that of the non-P group (*P* < 0.001). Next, we investigated the difference between the Lynch syndrome and non-Lynch syndrome groups with P mutations (**Supplementary Figure S1B**). Patients with Lynch syndrome exhibited a significantly higher mutational frequency than those who did not have Lynch syndrome (**Figure 3C**); this was the case for most genes, except APC, TP53, and PIK3CA, whose mutational frequency did not significantly differ. In contrast, patients without Lynch syndrome exhibited a trend toward a higher frequency of CNV alterations than those with Lynch syndrome, although this association was not significant (**Figure 3D**). Next, we examined and compared the TMB for the P (including both patients with and without Lynch syndrome), LP, and non-P groups. Patients with Lynch syndrome and P mutations exhibited a much higher TMB than patients without Lynch syndrome with P mutations, and patients from the LP and non-P groups (**Figure 3E**). To further investigate the similarities and differences in somatic mutations among the P, LP, and non-P groups, and to study the mechanistic discrepancies between Lynch syndrome and patients without Lynch syndrome with CRC, we performed gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) clustering analysis and compared the results from each group. **Figure 4A** shows the most significant clustering in the GO (upper row) and KEGG (lower row) analysis for the P, LP, and non-P groups. Some common biological processes, functions, and pathways were observed among the groups, together with several substantial differences. The common clustering for GO and KEGG findings across the 3 groups is summarized in **Supplementary Table S3**. Although the 3 groups of patients had distinct hereditary backgrounds, they shared several common aberrant pathways, thus potentially indicating common carcinogenic mechanisms, including the Wnt signaling pathway, calcium signaling pathway, MAPK signaling pathway, cAMP signaling pathway, and human papillomavirus infection. In contrast, we observed distinct differences between the P/LP groups and the non-P group in terms of biological processes, functions, and pathways, as shown in **Supplementary Table S4** (GO clustering) and **Supplementary Table S5** (KEGG clustering). Notably, the Notch signaling pathway was clustered in the P/LP groups but not the non-P group (**Supplementary Table S5**). Similarities and differences were also compared between the Lynch syndrome and non-Lynch syndrome groups with regard to P germline mutations. **Figure 4B** shows the most significant clustering in GO (upper row) and KEGG (lower row) analysis for the Lynch syndrome and non-Lynch syndrome groups. Common clustering is shown in **Supplementary Table S6**. The most common pathways were the Wnt signaling pathway, the calcium signaling pathway, and human papillomavirus infection. Differences in the biological processes in terms of GO clustering are listed in **Supplementary Table S7**; interestingly, a large amount of Lynch-unique clustering was observed. Differences in KEGG clustering are shown in **Supplementary Table S8**. Notably, the Notch signaling pathway was clustered in the Lynch syndrome group but not the non-Lynch syndrome group, whereas the MAPK signaling pathway and AMP signaling pathway were clustered in the non-Lynch syndrome group but not the Lynch syndrome group. Information related to the genes enriched in each GO and KEGG category in **Figure 4** is provided in **Supplementary Table S9** (GO enrichment) and **Supplementary Table S10** (KEGG enrichment). Next, we used the STRING database to analyze the protein interaction network for each subgroup. The top 20 genes in terms of protein interaction are listed in **Supplementary Table S11**. Each subgroup was compared with the P group, and the same genes are labeled with identical colors. In all groups, TP53 was the most common interacting gene. However, EGFR and SRC genes were found in the LP, non-P, and non-Lynch syndrome groups, but not in the P group, thus suggesting substantial differences in the protein interaction network. NOTCH1 was found only in the P and P-Lynch syndrome groups but not in the other groups, thus verifying the results of the pathway enrichment analysis. These findings strongly suggest that the mechanism of carcinogenesis in patients with P germline mutations is distinct from that in patients with no P germline mutations. ### Germline mutations increase the risk of CRC in the Chinese population P or LP germline mutations may increase cancer susceptibility and risk. To quantify the risk of CRC in individuals carrying P or LP germline mutations, we calculated the ORs for individual germline mutations and all mutations as a whole. The prevalence of all germline mutations in the general population was determined by gnomAD screening. By comparing the prevalence in the general population and the mutation frequency identified in this study, we calculated the OR for each mutation site, or all mutations as a whole, as an indicator of CRC risk. **Table 3** shows the detailed demographic information, gene names, variation sites, allele counts, allele frequencies in the general population, and ORs for each P germline mutation detected in this study. The overall OR for all P mutations was 11.13 (95% CI:8.289–15.44). Similarly, **Table 4** shows demographic and mutational information, along with the calculated OR of all LP mutations, with an overall OR of 20.68 (95% CI: 12.89–33.18). These results indicated strong enrichment in P or LP mutations in the studied population of patients with CRC, thus indicating a significantly higher risk of CRC in patients carrying these germline mutations. View this table: [Table 3](http://www.cancerbiomed.org/content/19/5/707/T3) Table 3 Pathogenic germline mutations identified in this study View this table: [Table 4](http://www.cancerbiomed.org/content/19/5/707/T4) Table 4 Likely pathogenic germline mutations identified in this study Some patients with CRC recruited for this study lacked prognostic data. Consequently, we were unable to perform prognostic analysis. However, prognostic data were successfully obtained from a previous report11; the patient prognosis was then compared between those with and without germline mutations. As shown in **Supplementary Figure S2**, patients with germline mutations exhibited significantly poorer overall survival than those without germline mutations (*P* = 0.0087). The median survival time for the germline group was 1,323 days, whereas the median survival for the non-germline group had not been reached. ## Discussion Previous research has identified correlations between P germline mutations and hereditary CRC, including MLH1/MSH2/MSH6/PMS2 mutations with Lynch syndrome (also known as hereditary non-polyposis CRC), APC mutations with FAP, MUTYH mutations with MUTYH-associated polyposis, STK11 mutations with Peutz-Jeghers syndrome, SMAD4/BMPR1A mutations with juvenile polyposis syndrome, PTEN mutations with PTEN hamartoma tumor syndrome, and RNF43 mutations with serrated polyposis syndrome4–7. Although the relationships among these diseases and mutations are known, the frequency, location, and distribution of germline mutations in the Chinese population, and their quantitative relationships with CRC risk have yet to be elucidated. The distribution of rare germline mutations and their roles in the pathogenesis of CRC are also worthy of exploration. In addition, no systematic studies have investigated the similarities and differences in the somatic mutational landscape between patients with and without P/LP germline mutations. In this study, we recruited a large cohort of 1,923 cases and systematically investigated germline mutations and corresponding somatic mutational alterations in a Chinese population. As expected, a significantly higher proportion of patients with P or LP mutations had a family history of CRC than did non-P patients, thus suggesting that these germline mutations increased the risk of CRC in affected families. Because of the high proportion of affected MMR genes in P and LP mutations, the proportion of patients with dMMR and MSI-H was significantly higher in these groups; therefore, these patients may respond well to immunotherapy. Our results also confirmed the early onset of CRC in patients with P or LP mutations, thereby indicating a similar trend to those of FAP and Lynch syndrome. Although some novel mutations were not determined to be pathogenic, their overall influence appeared to be similar to that of confirmed hereditary CRC. We found that 7.6% of patients (147/1,923) carried P/LP mutations, and 1.4% of patients (27/1,923) had Lynch syndrome; these findings are similar to the proportions previously published for both Chinese and Western populations7,12,13. However, because of a lack of sufficient evidence for LP germline mutations, many mutations in MLH1, MSH2, MSH6, and PSM2 could not be confirmed as Lynch syndrome mutations. Therefore, the incidence of Lynch syndrome might have been underestimated, and the actual incidence could have exceeded 2%, as described in previous reports7,12,13. The APC gene had the highest number of P germline mutations, thus indicating that FAP is the most common form of hereditary CRC in Chinese population, followed by Lynch syndrome. In addition, ATM gene germline mutations have been detected in other malignant tumors14. Because ATM is an important candidate member of the DNA damage and repair (DDR) pathway, germline mutations may directly lead to abnormal DNA repair. The present evidence suggests that ATM germline mutations are not cancer type-specific, because they have been reported in many cancers and have been suggested to potentially increase the risk of some cancers14. In the present study, the OR of P ATM mutations varied from 6.4 to 63.87, thus suggesting an increased risk in patients with CRC carrying these mutations. We also identified several BRCA1 and BRCA2 germline mutations in this study. BRCA1/2 genes, encoding products that participate in the DDR and HRR pathways, represent confirmed carcinogenesis of hereditary breast and ovarian cancer syndrome. BRCA1/2 germline mutations have also been reported in CRC15. All BRCA1/2 P germline mutations reported herein are associated with CRC, on the basis of clear clinical evidence. Our previous studies have also confirmed that BRCA2 germline mutations increase the risk of lung cancer10. Because no hotspot mutations have been reported in BRCA1/2 in the Chinese population, many mutations were categorized as LP or VUS. Additional clinical evidence is necessary to confirm their pathogenicity in cancer. We compared the ratio and distribution of germline mutations between Chinese and Western populations by using the data from the present study and data reported by Hahnen et al.11 in 2017. We found that PALB2 was ranked as the top P mutation in the Western population but had a much lower ranking in the Chinese population (**Supplementary Figure S3A**). In contrast, APC was ranked as the top P mutation in the Chinese population but was not detected in the Western population. Moreover, ATR was ranked as the top LP mutation in the Western population but was not detected in the Chinese population. Differences between these populations were also reflected in the proportion of patients with Lynch syndrome. The proportion of patients with Lynch syndrome with P mutations in the Chinese population was 29.3% (27/92), compared with a ratio of 15.0% in the Western population (3/20) (**Supplementary Figure S3B**). These comparisons indicate a potential differential germline mutational landscape in CRC. Frameshift and nonsense mutations were the 2 most common types of mutations detected in the study, followed by missense and splicing mutations. Frameshift and nonsense mutations lead to the partial or complete loss of function of corresponding proteins, thus increasing the risk of cancer in mutation carriers. Missense mutations in key amino acids can also induce substantial changes in protein function, whereas splicing mutations can influence transcription and subsequent translation. We found that most mutations in highly mutated genes were located in known functional domains, thus reflecting the roles of these domains in maintaining normal protein function. Indeed, because all mutations identified in this study were heterozygous, a partial loss of function might be compensated for by the other normal allele. These heterozygous mutations might not be lethal but could increase the risk of cellular aberrant transformation and carcinogenesis. In this study, we conducted the first comparative study of somatic mutational landscapes on the basis of the pathogenicity classification of germline mutations. We found that the mutational frequency of most of the highly mutated genes in the P group was higher than that in the non-P group; the LP group also showed a similar trend toward a higher mutational frequency, possibly because the mutations in the P group affected the MMR, DDR, and homologous recombination deficiency pathways, thus leading to abnormal DNA repair and a large number of somatic mutations16. The patients with and without Lynch syndrome in the P group showed a similar trend, and the mutational frequency in patients with Lynch syndrome was much higher than that in patients without Lynch syndrome. This finding was also confirmed by TMB statistics: the TMB of patients with Lynch syndrome was significantly higher than that of the other 3 groups. TMB has been suggested to be an effective indicator for patient prognosis stratification in immunotherapy17. Our data provided strong evidence supporting the use of immunotherapy in patients with Lynch syndrome. Interestingly, we observed no difference in the frequency of APC and KRAS mutations across the 3 groups, thus suggesting that major driver gene mutations may be common driving factors for CRC, regardless of P germline mutations. In addition, our data showed that the CNV variation in the non-P group was higher than that in the P group, and that CNV variation in the patients without Lynch syndrome was also higher than that in patients with Lynch syndrome, thus indicating a seesaw effect. That is, a higher proportion of SNV/indel mutations corresponded to a lower proportion of CNV alterations, whereas a lower proportion of SNV/indel mutations corresponded to a higher proportion of CNV alterations. This observation suggests that CRC is a highly heterogeneous cancer in which pathogenesis is diverse and depends on different types of genetic alterations. The co-existence and balance of mutations and CNVs may be related to both genetic and environmental backgrounds. Similar observations of the seesaw effect have also been reported in other studies10,18,19. Our detailed clustering analysis led to interesting discoveries. We found the first reported evidence that the Notch pathway is clustered in only patients with Lynch syndrome with P germline mutations, but not patients without Lynch syndrome. Furthermore, we observed that the MAPK and cAMP signaling pathways were clustered in patients without Lynch syndrome but not patients with Lynch syndrome. In contrast, the Wnt and calcium signaling pathways, along with the human papillomavirus infection pathway, were all clustered in CRC. This finding suggests that the Notch pathway is specific to patients with Lynch syndrome, whereas the MAPK and cAMP signaling pathways are specific to patients without Lynch syndrome. The Wnt and calcium signaling pathways, along with human papilloma virus infection, may be common pathogenic factors for CRC, regardless of germline mutations. The Notch pathway plays an important role in embryonic development, cell proliferation, and differentiation. Furthermore, the role of the Notch pathway has been investigated for many different types of tumors20, including CRC21. However, the role of the Notch pathway in Lynch syndrome has not been studied previously. Our identification of Lynch-specific Notch pathway activity demonstrated the existence of distinct pathogenic mechanisms in patients with Lynch syndrome and patients without Lynch syndrome with CRC; therefore, our research provides key information that may facilitate molecular typing. In this study, we report the first quantification of the risk of CRC associated with P and LP germline mutations. We also calculated the overall OR for the P and LP groups. The frequency of mutations identified by gnomAD screening represents the frequency of a certain alteration in the general population. Because most P or LP germline mutations exhibited very low incidence, the frequency in the general population, and in patients with cancer, may exhibit a certain degree of randomness and may not accurately represent the true frequency. Thus, the overall OR for the P or LP group as a whole may have greater relevance and significance for the population. For some relatively common germline mutations, such as those from APC and the 4 MMR genes, the risk associated with individual genes can be calculated; for the less frequent gene mutations, larger population studies and familial evidence are urgently needed. In this study, the overall OR of both the P and LP groups exceeded 10, thus suggesting that patients with such germline mutations had a significantly greater risk of CRC than the average-risk population. Previous studies of other cancers also support this method for evaluating the risk of germline mutations from population data10,22,23. From the perspective of treatment, personalized therapeutic strategies should be given to patients with such mutations, and more frequent and detailed examinations should be performed on their unaffected family members carrying these mutations. This practice would enable detection of tumors as early as possible and support early intervention. ## Conclusions In this study, we fully characterized germline and somatic mutations in Chinese patients with CRC. We found that 7.6% of our study cohort carried germline variants linked to greater susceptibility to CRC. Patients with P or LP mutations had a higher proportion of MSI-H, dMMR, family history of CRC, and significantly lower age. The somatic mutations in Chinese patients with patients with CRC were fully characterized and found to exhibit distinct features. The Notch signaling pathway was uniquely clustered in patients with Lynch syndrome, whereas the MAPK and cAMP signaling pathways were uniquely clustered in patients with CRC who did not have Lynch syndrome. Our findings provide important information for potential molecular typing and therapy for patients with CRC with germline mutations. ## Supporting Information [[cbm-19-707-s001.pdf]](pending:yes) ## Grant support This study was supported by the Special Funds for Strategic Emerging Industry Development of Shenzhen (Grant No. 20170922151538732), the Science and Technology Project of Shenzhen (Grant No. JSGG20180703164202084), the Natural Science Foundation Project of China (Grant No. 71573022), the National Natural Science Foundation Regional Projects (Grant No. 82060440), and the special health research projects of 2019 funded by the Chinese PLA General Hospital (Grant No. NLBJ-2019003). No funders participated in the study design, study implementation, data collection, data analysis, data interpretation, and manuscript writing for the study. ## Conflict of interest statement Jianfei Yao, Pan Yang, Xiaohui Wang, Danni Liu, Tanxiao Huang, Huiya Cao, Peisu Suo, and Lele Song are employees of HaploX Biotechnology, and performed the NGS sequencing in this study. The other authors claim no conflicts of interest. ## Author contributions Conceived and designed the analysis: Lele Song, Jingbo Yu, and Jianfei Yao. Collected the data: Jianfei Yao, Yunhuan Zhen, Jing Fan, Yuan Gong, Yumeng Ye, Shaohua Guo, Hongyi Liu, Xiaoyun Li, Guosheng Li, Yuemin Li, Jingbo Yu, and Lele Song. Contributed data or analysis tools: Jianfei Yao, Yunhuan Zhen, Jing Fan, Yuan Gong, Yumeng Ye, Shaohua Guo, Hongyi Liu, Xiaoyun Li, Guosheng Li, Yuemin Li, Huiya Cao, Peisu Suo, Pan Yang, Xiaohui Wang, Danni Liu, Tanxiao Huang, and Lele Song. Performed the analysis: Jianfei Yao, Yunhuan Zhen, Jing Fan, Yuan Gong, Lele Song, Huiya Cao, Peisu Suo, Pan Yang, Xiaohui Wang, Danni Liu, and Tanxiao Huang. Wrote the paper: Jianfei Yao, Yunhuan Zhen, Jing Fan, Yuan Gong, Jingbo Yu, and Lele Song. ## Acknowledgements We thank all patients and their relatives for supporting the study, and thank all technicians in this study for processing the samples and providing technical support. ## Footnotes * *These authors contributed equally to this work. * Received March 24, 2021. * Accepted July 16, 2021. * Copyright: © 2022, Cancer Biology & Medicine [https://creativecommons.org/licenses/by/4.0/](https://creativecommons.org/licenses/by/4.0/) This is an open access article distributed under the terms of the [Creative Commons Attribution License (CC BY) 4.0](https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited. ## References 1. 1. Schreuders EH, Ruco A, Rabeneck L, Schoen RE, Sung JJ, Young GP, et al. Colorectal cancer screening: a global overview of existing programmes. Gut. 2015; 64: 1637–49. [Abstract/FREE Full Text](http://www.cancerbiomed.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ3V0am5sIjtzOjU6InJlc2lkIjtzOjEwOiI2NC8xMC8xNjM3IjtzOjQ6ImF0b20iO3M6MTg6Ii9jYm0vMTkvNS83MDcuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 2. 2. Chen W, Zheng R, Baade PD, Zhang S, Zeng H, Bray F, et al. Cancer statistics in China, 2015. CA Cancer J Clin. 2016; 66: 115–32. [CrossRef](http://www.cancerbiomed.org/lookup/external-ref?access_num=10.3322/caac.21338&link_type=DOI) [PubMed](http://www.cancerbiomed.org/lookup/external-ref?access_num=26808342&link_type=MED&atom=%2Fcbm%2F19%2F5%2F707.atom) 3. 3. Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, et al. Environmental and heritable factors in the causation of cancer – analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med. 2000; 343: 78–85. [CrossRef](http://www.cancerbiomed.org/lookup/external-ref?access_num=10.1056/NEJM200007133430201&link_type=DOI) [PubMed](http://www.cancerbiomed.org/lookup/external-ref?access_num=10891514&link_type=MED&atom=%2Fcbm%2F19%2F5%2F707.atom) [Web of Science](http://www.cancerbiomed.org/lookup/external-ref?access_num=000088116500001&link_type=ISI) 4. 4. Valle L. Recent discoveries in the genetics of familial colorectal cancer and polyposis. Clin Gastroenterol Hepatol. 2017; 15: 809–19. 5. 5. Lorans M, Dow E, Macrae FA, Winship IM, Buchanan DD. Update on hereditary colorectal cancer: improving the clinical utility of multigene panel testing. Clin Colorectal Cancer. 2018; 17: e293–305. [CrossRef](http://www.cancerbiomed.org/lookup/external-ref?access_num=10.1016/j.clcc.2018.01.001&link_type=DOI) [PubMed](http://www.cancerbiomed.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fcbm%2F19%2F5%2F707.atom) 6. 6. Stoffel EM, Mangu PB, Gruber SB, Hamilton SR, Kalady MF, Lau MW, et al. Hereditary colorectal cancer syndromes: American Society of Clinical Oncology Clinical Practice Guideline endorsement of the familial risk-colorectal cancer: European Society for Medical Oncology Clinical Practice Guidelines. J Clin Oncol. 2015; 33: 209–17. [Abstract/FREE Full Text](http://www.cancerbiomed.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamNvIjtzOjU6InJlc2lkIjtzOjg6IjMzLzIvMjA5IjtzOjQ6ImF0b20iO3M6MTg6Ii9jYm0vMTkvNS83MDcuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 7. 7. Ma H, Brosens LAA, Offerhaus GJA, Giardiello FM, de Leng WWJ, Montgomery EA. Pathology and genetics of hereditary colorectal cancer. Pathology. 2018; 50: 49–59. [PubMed](http://www.cancerbiomed.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fcbm%2F19%2F5%2F707.atom) 8. 8. Liu Q, Tan YQ. Advances in identification of susceptibility gene defects of hereditary colorectal cancer. J Cancer. 2019; 10: 643–53. 9. 9. Gao X, Nan X, Liu Y, Liu R, Zang W, Shan G, et al. Comprehensive profiling of BRCA1 and BRCA2 variants in breast and ovarian cancer in Chinese patients. Hum Mutat. 2020; 41: 696–708. 10. 10. Liu M, Liu X, Suo P, Gong Y, Qu B, Peng X, et al. The contribution of hereditary cancer-related germline mutations to lung cancer susceptibility. Transl Lung Cancer Res. 2020; 9: 646–58. 11. 11. Hahnen E, Lederer B, Hauke J, Loibl S, Kröber S, Schneeweiss A, et al. Germline mutation status, pathological complete response, and disease-free survival in triple-negative breast cancer: secondary analysis of the geparsixto randomized clinical trial. JAMA Oncol. 2017; 3: 1378–85. 12. 12. Gong R, He Y, Liu XY, Wang HY, Sun LY, Yang XH, et al. Mutation spectrum of germline cancer susceptibility genes among unselected Chinese colorectal cancer patients. Cancer Manag Res. 2019; 11: 3721–39. 13. 13. Jiang W, Cai MY, Li SY, Bei JX, Wang F, Hampel H, et al. Universal screening for Lynch syndrome in a large consecutive cohort of Chinese colorectal cancer patients: High prevalence and unique molecular features. Int J Cancer. 2019; 144: 2161–8. [CrossRef](http://www.cancerbiomed.org/lookup/external-ref?access_num=10.1002/ijc.32044&link_type=DOI) [PubMed](http://www.cancerbiomed.org/lookup/external-ref?access_num=30521064&link_type=MED&atom=%2Fcbm%2F19%2F5%2F707.atom) 14. 14. Choi M, Kipps T, Kurzrock R. ATM Mutations in cancer: therapeutic implications. Mol Cancer Ther. 2016; 15: 1781–91. [Abstract/FREE Full Text](http://www.cancerbiomed.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTA6Im1vbGNhbnRoZXIiO3M6NToicmVzaWQiO3M6OToiMTUvOC8xNzgxIjtzOjQ6ImF0b20iO3M6MTg6Ii9jYm0vMTkvNS83MDcuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 15. 15. Pearlman R, Frankel WL, Swanson B, Zhao W, Yilmaz A, Miller K, et al. Prevalence and spectrum of germline cancer susceptibility gene mutations among patients with early-onset colorectal cancer. JAMA Oncol. 2017; 3: 464–71. 16. 16. Li Z, Pearlman AH, Hsieh P. DNA mismatch repair and the DNA damage response. DNA Repair (Amst). 2016; 38: 94–101. [CrossRef](http://www.cancerbiomed.org/lookup/external-ref?access_num=10.1016/j.dnarep.2015.11.019&link_type=DOI) [PubMed](http://www.cancerbiomed.org/lookup/external-ref?access_num=26704428&link_type=MED&atom=%2Fcbm%2F19%2F5%2F707.atom) 17. 17. Yarchoan M, Hopkins A, Jaffee EM. Tumor mutational burden and response rate to PD-1 inhibition. N Engl J Med. 2017; 377: 2500–1. [CrossRef](http://www.cancerbiomed.org/lookup/external-ref?access_num=10.1056/NEJMc1713444.&link_type=DOI) [PubMed](http://www.cancerbiomed.org/lookup/external-ref?access_num=29262275&link_type=MED&atom=%2Fcbm%2F19%2F5%2F707.atom) 18. 18. Rachiglio AM, Lambiase M, Fenizia F, Roma C, Cardone C, Iannaccone A, et al. Genomic profiling of KRAS/NRAS/BRAF/PIK3CA wild-type metastatic colorectal cancer patients reveals novel mutations in genes potentially associated with resistance to anti-EGFR agents. Cancers (Basel). 2019; 11: 859. 19. 19. Ren Y, Huang S, Dai C, Xie D, Zheng L, Xie H, et al. Germline predisposition and copy number alteration in pre-stage lung adenocarcinomas presenting as ground-glass nodules. Front Oncol. 2019; 9: 288. 20. 20. Aster JC, Pear WS, Blacklow SC. The varied roles of notch in cancer. Annu Rev Pathol. 2017; 12: 245–75. [CrossRef](http://www.cancerbiomed.org/lookup/external-ref?access_num=10.1146/annurev-pathol-052016-100127&link_type=DOI) [PubMed](http://www.cancerbiomed.org/lookup/external-ref?access_num=27959635&link_type=MED&atom=%2Fcbm%2F19%2F5%2F707.atom) 21. 21. Vinson KE, George DC, Fender AW, Bertrand FE, Sigounas G. The Notch pathway in colorectal cancer. Int J Cancer. 2016; 138: 1835–42. 22. 22. Parry EM, Gable DL, Stanley SE, Khalil SE, Antonescu V, Florea L, et al. Germline mutations in DNA repair genes in lung adenocarcinoma. J Thorac Oncol. 2017; 12: 1673–8. 23. 23. Wang Y, McKay JD, Rafnar T, Wang Z, Timofeeva MN, Broderick P, et al. Rare variants of large effect in BRCA2 and CHEK2 affect risk of lung cancer [published correction appears in Nat Genet. 2017 Mar 30;49(4):651]. Nat Genet. 2014; 46: 736–41. [CrossRef](http://www.cancerbiomed.org/lookup/external-ref?access_num=10.1038/ng.3002&link_type=DOI) [PubMed](http://www.cancerbiomed.org/lookup/external-ref?access_num=24880342&link_type=MED&atom=%2Fcbm%2F19%2F5%2F707.atom)