Lynch syndrome pre-screening and comprehensive characterization in a multi-center large cohort of Chinese patients with colorectal cancer

Objective: Lynch syndrome (LS) pre-screening methods remain under-investigated in colorectal cancers (CRCs) in Asia. Here, we aimed to systematically investigate LS pre-screening and comprehensively characterize LS CRCs. Methods: Microsatellite instability (MSI) and germline variants of DNA mismatch repair (MMR) genes were examined in 406 deficient MMR (dMMR) and 250 proficient MMR CRCs. The genetic differences between LS and sporadic CRCs were studied with whole exome sequencing analysis. Results: The incidence of dMMR in Chinese patients with CRCs was 13.8%. Consistency analysis between MMR immunohistochemistry (IHC) and MSI testing showed the kappa value was 0.758. With next-generation sequencing (NGS), germline variants were detected in 154 CRCs. Finally, 88 patients with CRC were identified as having LS by Sanger sequencing. Among them, we discovered 21 previously unreported pathogenic germline variants of MMR genes. Chinese patients with LS, compared with sporadic CRCs, tended to be early-onset, right-sided, early-stage and mucinous. Overall, the performance of MMR IHC and MSI testing for LS pre-screening was comparable: the area under the ROC curve for dMMR, MSI-H, and MSI-H/L was 0.725, 0.750, and 0.745, respectively. dMMR_MSI-H LS and sporadic CRCs showed substantial differences in somatic genetic characteristics, including different variant frequencies of APC, CREBBP, and KRAS, as well as different enriched pathways of VEGF, Notch, TGFβR, mTOR, ErbB, and Rac protein signal transduction. Conclusions: MMR IHC and MSI testing were effective methods for LS pre-screening. The revealed clinical and somatic genetic characteristics in LS CRCs may have the potential to improve the performance of LS pre-screening in combination with dMMR/MSI.


Introduction
Lynch syndrome (LS) is the most frequent hereditary colorectal cancer (CRC) syndrome 1 . It originates from germline defects in DNA mismatch repair (MMR) genes (MLH1, MSH2, MSH6, PMS2, and EPCAM) 2 . It is clinically characterized by an elevated risk of diverse cancers, which may occur synchronously or metachronously with relative early-onset in people with family members with LS 3 .
Patients with LS and their high-risk relatives can benefit from intensive cancer surveillance, chemoprevention 4 , and risk-reducing surgeries 5 , particularly when they are identified sufficiently early.
Universal molecular screening for LS in newly diagnosed CRCs is routinely recommended by the NCCN Guidelines 2 . Paired tumor/germline or tumor multi-gene panel next-generation sequencing (NGS) has recently been proposed as a method for LS screening 6,7 providing an alternative to the traditional complex screening strategy. Given the 1%-5% incidence of LS in CRCs 8 , pre-screening with immunohistochemistry for mismatch-repair proteins (MMR IHC) or microsatellite instability (MSI) testing before multi-gene panel NGS can significantly decrease the economic burden, particularly in underdeveloped regions 9 . However, only 20%-30% of deficient MMR (dMMR) or MSI-high (MSI-H) CRCs are LS 10,11 . In addition, several molecular characteristics have been suggested to differ between sporadic and LS CRCs 12 , such as the mutation frequency of BRAF V600E 13 . To improve the efficiency of MMR IHC or MSI testing in LS pre-screening, we aimed to explore the molecular characteristics that distinguish LS from sporadic CRCs.
Several studies have been conducted to compare the performance of MMR IHC and MSI testing in LS pre-screening 10,14 , but no consensus has been reached 15,16 . Moreover, the MMR gene variant profile of LS in the Chinese population remains under-investigated.
Chinese patients with CRC were enrolled in a large multi-center cohort to investigate the consistency of MMR IHC and MSI testing, and compare their performance in LS pre-screening. To improve the efficiency of MMR IHC or MSI testing in LS pre-screening, we aimed to identify clinical and molecular characteristics distinguishing dMMR and MSI-H (dMMR_MSI-H) LS from dMMR_MSI-H sporadic CRCs. Whole exome sequencing (WES) was performed on patients with dMMR/MSI-H LS and dMMR/MSI-H sporadic CRCs. Then the clinical and molecular characteristics in each group, the MMR gene variant profile of LS, and the performance of MMR IHC and MSI testing in LS pre-screening in the Chinese population were explored. Our investigation reveals clinical and molecular characteristics that may potentially be used to improve the performance of LS pre-screening in combination with dMMR/MSI. In addition, the results reveal the potential mechanism of carcinogenesis of LS and sporadic CRCs, and may aid in the establishment of therapeutic strategies for patients with CRC.

Patients and samples
A total of 2,950 patients with CRC were reviewed who were treated from January 2014 to December 2016 in 5
MSI testing was performed with capillary electrophoresis with an NCI MSI panel kit (Tongshu BioTech., Shanghai, China). This panel contains 2 mononucleotide loci (BAT25 and BAT26), 3 dinucleotide loci (D2S123, D5S346, and D17S250), and one pentanucleotide repeat marker (Penta C) as the internal control. GeneScan Analysis and Genotyper Software packages (Applied Biosystems, CA, USA) were used to determine the predominant allele size for each locus. Finally, the MSI phenotype was determined according to the number of allelic bases and the internal control index.

Sanger sequencing and identification of LS
To confirm the germline variants in MMR genes and validate the NGS results, we performed Sanger sequencing on 130 cases with MMR gene variants detected by NGS. Briefly, DNA was amplified, and PCR products were subjected to electrophoresis. The sequencing was conducted on an ABI 3730 genetic Analyzer (Applied Biosystems, CA, USA) according to the manufacturer's protocols. The detected variants were annotated by Clinvar. Variants not annotated by Clinvar were classified according to the American College of Medical Genetics and Genomics guidelines.

Whole exome sequencing
DNA extracted from the normal and tumor tissue samples was isolated with a DNA Extraction Kit (FD-50, Changzhou Tongshu Biotechnology Co., Ltd, China). We created targeted capture pulldown and exon-wide libraries from native DNA with an xGen® Exome Research Panel (Integrated DNA Technologies, Inc., IL, USA) and TruePrep DNA Library Prep Kit V2 for Illumina (#TD501, Vazyme, Nanjing, China), and generated paired-end sequence data with Illumina HiSeq machines with an average sequencing depth of 170× for controls and 240× for tumors. The sequence data were aligned to the human reference genome (NCBI build 37) with BWA and sorted, and PCR duplications were removed with GATK 4.0. Single nucleotide variants, insertions, and deletions were detected with Strelka2 with default parameters. Variants and polymorphisms were annotated with the Ensembl Variant Effect Predictor. Somatic copy number variations (CNVs) were analyzed with FACETS, and the resulting CNVs were used in further analyses.

Statistical analysis
Chi-square tests or Fisher's exact tests were used to compare the frequency data between 2 groups. Kappa consistency testing was applied to examine the consistency of MMR and MSI detection results. Receiver operator characteristic (ROC) curves were used to evaluate the performance of LS pre-screening methods. Statistical significance was defined by two-tailed P values < 0.05.

The consistency of MSI testing and MMR IHC
A total of 2,950 patients with CRC from the 5 medical centers were pre-screened with MMR IHC; 406 (13.8%) cases were determined to be dMMR tumors (Figure 2A). A total of 250 sex-matched cases among the patients with pMMR were used as a control cohort for further analysis (Figure 1) (Figure 2C).

Identification of LS by detection of germline variants of MMR genes
Multi-gene panel NGS was performed on 656 cases, including 406 dMMR and 250 pMMR cases. Consequently, 154 cases were found to have at least one variant in MMR genes (MLH1, MSH2, MSH6, PMS2, or EPCAM) annotated as pathogenic or likely pathogenic, including 123 cases with 1 mutated MMR gene, 25 cases with 2 mutated MMR genes, and 6 cases with 3 mutated MMR genes (Supplementary Table S2).
Sanger sequencing was then performed on patients with MMR germline variants detected by NGS. Finally, 88 cases were confirmed as LS, and 38 cases were confirmed to be without germline variants according to Sanger sequencing; the other 28 cases were unclassified, owing to unidentified pathogenic or likely pathogenic variants. Among 88 LS cases, 85 cases had 1 mutated MMR gene, and 3 cases had 2 mutated MMR genes (Supplementary Table S2). Among the 96 pathogenic or likely pathogenic variants identified in 88 patients with LS, the variant frequency of MLH1, MSH2, MSH6, and PMS2 was 37.5%, 44.3%, 14.8%, and 6.8%, respectively, in agreement with results reported in the literature 11 . A total of 46 types of alterations were annotated as pathogenic/likely pathogenic in the Clinvar Database. The c.1699A>T variant in MSH2 was the most frequent variant observed in Chinese patients with LS, with 16 repeated variants ( Figure 3A). Meanwhile, 21 types of alterations were predicted as pathogenic and had not been collected in the Clinvar Database ( Table 1).
These 21 variants were predicted to be pathogenic according to the American College of Medical Genetics and Genomics guidelines for variant interpretation. These variants were mainly stop-gain and frameshift variations. Most of the novel variants can lead to premature translation-termination codons, which trigger nonsense-mediated mRNA decay and elimination of MMR gene expression. Among them, 9 variants occurred in MSH2 exons 2, 3, 9, 12, 14, and 15; 7 variants occurred in MLH1 exons 3, 9, 14, 16, 17, and 19; 5 variants occurred in MSH6 exons 4 and 7; and the MLH1 p.K618delK variant occurred in 2 patients ( Figure 3B). Meanwhile, all 21 patients with these novel variants confirmed by Sanger sequencing presented dMMR and MSI-H in tumors. These results indicated that the 21 novel MMR germline variants can also be used to identify patients with LS. Finally, the patients were classified as having 88 LS and 540 sporadic CRC cases according to variants detected by NGS and Sanger sequencing. In addition, 28 patients were unclassified, owing to unidentified pathogenic or likely pathogenic variants determined by Sanger sequencing.

Clinicopathologic features of LS in comparison with sporadic CRC
The comparisons of demographic and clinicopathologic features between LS (n = 88) and sporadic CRCs (n = 540) are shown in Table 2. The median age of patients with LS and sporadic CRCs was 53 (22-71) and 60 (23-86), respectively. However, most patients with LS (60.2%) were younger than 55 years of age, and most patients with sporadic CRCs (63.5%) were older than 55 years of age (P < 0.001). Compared with sporadic CRCs, LS had less advanced stages (P = 0.001) and occurred more frequently on the right side (67.5% vs. 39.4%, P < 0.001), i.e., from the cecum to the splenic flexure. Mucinous adenocarcinoma was much more common in LS (66.7%) than in sporadic CRCs (29.3%) (P < 0.001).

Genetic differences between patients with dMMR_MSI-H LS and dMMR_MSI-H sporadic CRCs
To enhance the PPV of MMR and MSI testing for LS pre-screening, we attempted to determine the genetic differences between patients with LS with dMMR_MSI-H and sporadic CRCs with dMMR_MSI-H. We performed WES on 14 cases of LS with dMMR_MSI-H and 15 sporadic CRCs with dMMR_MSI-H. A total of 15,066 mutated genes were detected in all samples. LS and sporadic CRCs had similar variant distributions, thus showing that missense mutation and intron were predominant ( Figure 5A). Among them, some cancer associated genes showed significantly different variant frequencies between LS and sporadic CRCs, including driver genes and genes in cancer related pathways ( Figure 5B). As shown, most of these genes had significantly higher variant frequency in LS than sporadic CRC. Notably, the variant frequencies of APC and CREBBP exceeded 60%, whereas variants of FLNA, SRGAP3, TLE4, CDH11, TET1, TLL1, SALL4, FAM46C, FZD2,   (Figure 5C). Thus, these LS and sporadic CRC associated genes, including highly mutated APC, CREBBP, and KRAS, may have the potential to distinguish dMMR_MSI-H LS from sporadic CRCs. Different aspects of genetic characteristics were also compared between groups. The median CNV burden and wGII of patients with LS (0.29 and 0.14, respectively) were higher than those of patients with sporadic CRCs (0.01 and 0.06, respectively) (Figure 5D and 5E). However, the CNV burdens of both groups were much lower than those for other types of tumors, including renal cancer and prostate cancer 18,19 . Compared with patients with sporadic CRCs, patients with LS had slightly lower intratumoral heterogeneity (ITH), as illustrated by the median ITH index (4 vs. 6) and the median Shannon diversity index (SDI) (1.0 vs. 1.1) (Figure 5F and 5G). These differences in CNV burden, wGII, ITH index, SDI, base substitutions, or mutational signatures were not significant between LS and sporadic CRCs. In both groups, C>T transitions were dominant, followed by T>C transitions and C>A transversions (Figure 5H). Mutational signature 1, which is associated with age at cancer diagnosis, and signature 6, which is associated with defective DNA mismatch repair, were prevalent in both patients with LS and patients with sporadic CRCs. Moreover, signature 15, which is associated with defective DNA mismatch repair and high numbers of small insertions and deletions at mono/polynucleotide repeats, was more often observed in patients with LS than sporadic CRCs, although no significant difference was observed (Figure 5I).  Different aspects of genetic characteristics were also compared among groups with other different clinical characteristics. Several cancer related genes with different variant frequencies in groups with different clinical characteristics were observed. As shown in Supplementary Figure S1, the variant frequencies of FN1, BAP1, NOTCH2, ARHGEF12, ARID1B, USP6, and ZNF780A were significantly lower in patients with CRC who were <55 years of age. The variant frequency of CREBBP was significantly higher in patients with early-stage CRC, whereas those of AFF3 and BRAF were significantly lower. The variant frequencies of NRG1, FLNA, PTCH1, SPEN, and TET1 were significantly higher in patients with left CRC. The variant frequencies of CTNND1, FOXA2, and CNOT3 were significantly higher in patients with low differentiated CRC. In addition, the TMB, CNV burden, wGII, ITH index, and SDI showed no significant differences according to age, stage, location, or degree of differentiation. No significantly different mutational signatures were observed in groups with different ages, stages, or locations, whereas signatures 6 and 15 were more prevalent in moderately and highly differentiated groups (Supplementary Figure S2). These results indicated several somatic variant genes and mutational signatures associated with clinical features.
We further explored the differences between patients with LS and sporadic CRCs by using GO and KEGG clustering analysis. Although several common biological processes and pathways were found, many unique enriched pathways were also observed. The LS-unique significantly enriched biological processes and pathways are shown in Figure 6A and 6C, including the VEGF signaling pathway, Notch signaling pathway, and transforming growth factor beta receptor (TGFβR) signaling pathway. The sporadic CRC-unique significantly   Figure 6B and 6D, including the mTOR signaling pathway, ErbB signaling pathway, and Rac protein signal transduction.

Discussion
In this large multi-center cohort study, we evaluated the consistency of MMR IHC and MSI testing, and compared the performance of MMR IHC and MSI testing in LS pre-screening.  multi-gene panel NGS, and also including MMR IHC and MSI testing for pre-screening, and comparison of systematic somatic genetic characteristics between LS and sporadic CRCs with dMMR_MSI-H. A single-center study of patients with CRC from Southeast China has identified 2.8% of CRCs as LS by MMR IHC plus BRAF screening and sequential germline sequencing 22 .
The performance of MMR IHC and MSI testing in LS pre-screening was comparable, with high sensitivity and high NPV but low PPV (<30%). Moreover, ROC analysis showed that the AUC values for MSI-H and MSI-H/L were slightly higher than that for dMMR, i.e., 0.750 and 0.745 vs. 0.725. These results indicated that MSI testing may have a technical advantage over MMR IHC; this finding warrants further attention and exploration in LS pre-screening. Although the PPV values for MSI-H and MSI-H/L were slightly higher than that for dMMR, i.e., 26.6% and 25.9% vs. 23.0%, the PPV values were all low. To enhance the performance of MSI and dMMR testing for LS pre-screening, combination with other clinical and somatic genetic characteristics is necessary to improve the performance of LS pre-screening.
MSI, dMMR, age, clinical stage, tumor location, and mucinous type showed significant differences between LS and sporadic CRCs, thus indicating that these factors had significant associations with LS. These relationships are supported by previous reports [23][24][25] . Family history has also been reported to be associated with LS CRCs in previous studies 17,23 . The combination of these LS related clinical features with MSI and dMMR may further improve the performance of LS pre-screening.
Using WES analysis, we observed substantial differences in somatic genetic characteristics between patients with dMMR_ MSI-H LS and dMMR_MSI-H sporadic CRCs. Highly mutated APC and KRAS showed significantly different mutational frequencies between patients with dMMR_MSI-H LS and dMMR_MSI-H sporadic CRCs. They have also been reported to be highly mutated and to play key roles in CRCs 26,27 . The LS unique mutated genes included TLL1, SALL4, FAM46C, FZD2, ARID2, INPP4B, and WAS. Among them, SALL4 has been reported to be associated with the progression and metastasis of CRCs 28 , and ARID2 is frequently mutated in microsatellite unstable CRC 29 . Although genetic characteristics of CRCs have been widely reported [30][31][32][33] , studies performing systematic comparison of somatic genetic characteristics between LS and sporadic CRCs with dMMR_MSI-H have been very limited. Some research has shown that APC mutations commonly occur after the onset of MMR deficiency, in support of the different variant frequencies of APC in this study 34 . A study on the relationship between the APC mutation frequency and LS has received substantial attention 35,36 . To our knowledge, the present work is the first investigation revealing that VEGF and Notch pathways are uniquely enriched in LS CRCs but not enriched in sporadic CRCs with dMMR_MSI-H. However, the mTOR signaling pathway and ErbB pathways were uniquely enriched in sporadic CRCs. The ErbB pathway difference was supported by a study indicating that the addition of anti-EGFR to chemotherapy is associated with different effects in progression-free survival in familial or sporadic MSI CRCs 37 . The higher median TMB of patients with LS in this study was consistent with findings from previous studies 18,19 . Insignificant differences might have been due to the limited samples analyzed by WES. These differences may contribute to distinguishing LS from sporadic CRCs, and the combination of these LS related genetic features with MSI and dMMR may further improve the performance of LS pre-screening.
Our study has several limitations. First, the efficacy, particularly the specificity of MMR IHC and MSI testing for LS pre-screening of tumor samples, may be improved by excluding patients with dMMR/MSI-H caused by methylation of the MLH1 promoter or double somatic MMR variants 13,38 . We assessed the efficacy of only dMMR and MSI-H as pre-screening approaches, because MLH1 promoter methylation was not tested, and the samples with somatic variants were limited in this study. Second, some LS cases might have been missed by the germline MMR gene panel NGS, which cannot detect large deletions in MMR genes. For patients with dMMR/MSI-H but without germline MMR variants detected by targeted NGS, if strong hereditary clinical criteria are met, further genetic screening is recommended. Clinicians importantly must interpret the results of germline MMR gene panel NGS in the context of family history. Third, family history and personal tumor disease history information were not collected in this study, but might have further supported the definition of LS and improved the LS pre-screening. Additional studies with more patients and more comprehensive information are needed to validate and improve on these results.

Conclusions
In conclusion, MMR IHC and MSI testing are comparable and effective methods for LS pre-screening. A total of 21 novel pathologic variants in MMR genes were found in Chinese patients with CRCs. Compared with patients with sporadic CRCs, patients with LS were younger. Tumors of patients with LS frequently occurred on the right side, were found in early stages, and were mucinous. Substantial differences in somatic genetic characteristics were observed between patients with dMMR_MSI-H LS and dMMR_MSI-H sporadic CRCs. Clinical and somatic genetic characteristics may have the potential to distinguish dMMR_MSI-H LS from dMMR_ MSI-H sporadic CRCs. In addition, the somatic genetic characteristics of dMMR_MSI-H LS and sporadic CRCs provided important information for potential therapy for patients with CRC.

Ethics approval and consent to participate
This study was approved by the ethics committees of the corresponding hospitals, and was performed in compliance with the Declaration of Helsinki. Written informed consent was obtained from all study participants.

Grant support
This work was supported by the National Natural Science Foundation of China (Grant No. 81572269).