Colorectal cancer (CRC), the third most common cancer and the second leading cause of cancer-related death globally, is responsible for more than 1.9 million new cases and 0.9 million deaths reported annually1. Randomized controlled trials and cohort studies have demonstrated that screening can significantly decrease CRC incidence and mortality2. Current screening guidelines primarily recommend CRC screening for individuals at average risk, typically starting at predefined ages2. However, because of individual variations in background CRC risk, a uniform screening approach might have limited effectiveness at the population level. Tailored screening strategies aimed at providing screening recommendations based on individual CRC risk have therefore been proposed to decrease the load on colonoscopy services and improve the benefit to harm ratio3,4. Several CRC risk prediction models have been developed to identify individuals at high risk of colorectal neoplasia. However, substantial gaps remain in the evidence, including model heterogeneity, bias, and a lack of external validation5–8. Few studies have simultaneously validated multiple risk prediction models within the same screening population, yet such validation is crucial for evaluating their performance and optimizing screening strategies.
Overall study design and analytical framework
To address this gap, we conducted a comprehensive head-to-head comparison of 17 previously published CRC risk prediction models within the large, real-world Zhejiang Provincial CRC Screening Program, which has been described in prior studies (Supplementary material)9,10. The screening program was approved by the Ethics Committee and Institutional Review Board of Zhejiang Cancer Hospital (approval No. IRB2023-464), and all participants provided written informed consent to participate. In the present study, all analyses were restricted to individuals with positive fecal immunochemical test (FIT) results who subsequently underwent diagnostic colonoscopy. Risk prediction models were applied to further stratify risk within this population. The outcome was advanced colorectal neoplasia (ACN), defined as CRC, advanced adenomas, or advanced serrated lesions. The model performance metrics, including the area under the receiver operating characteristic curve (AUC) and net reclassification improvement, were estimated. Detailed descriptions of variable handling, statistical tests, and uncertainty estimation are provided in the Supplementary material.
Detailed inclusion and exclusion criteria are provided in Figure 1A, and the basic characteristics of the validation cohort are presented in Table S1. Among the participants, 49.43% were men (n = 84,621), and approximately 2.42% reported a family history of CRC (n = 4,143). Except for vegetable and bean intake, which showed no significant differences (P > 0.05), all other factors demonstrated statistically significant differences between those with and without ACN (P < 0.05). A comparison of baseline characteristics between the present cohort and the general population revealed no significant differences, indicating that the cohort was representative of the broader population (Tables S2 and S3).
(A) Flow diagram of the Zhejiang Colorectal Cancer Screening Program. (B) Flow diagram of systematic review. (C) Discrimination of the prediction models in development and external validation. Orange bars represent the C statistics of each model in the development phase, and blue bars represent the C statistics in the validation phase. Circles indicate point estimates, whereas horizontal lines denote 95% confidence intervals. (D) Relative risks of advanced colorectal neoplasia by quartile for each risk model. Circles indicate point estimates, whereas horizontal lines denote 95% confidence intervals. (E) Enrichment efficiency of risk-prediction models. FIT, fecal immunochemical test; BMI, body mass index; Q, quartile; ACN, advanced colorectal neoplasia; PPV, positive predictive value.
Identification and characteristics of colorectal cancer risk prediction models
Risk models were identified via synthesis of 4 previous systematic reviews5–8 and a supplemental PubMed search covering January 2023 to October 2024 (Figure 1B). Briefly, we included models developed in screening populations that incorporated sociodemographic and environmental risk factors assessed by questionnaires, and predicted colorectal cancer or advanced adenomas confirmed by colonoscopy and pathology; detailed inclusion and exclusion criteria are provided in the Supplementary material. Basic characteristics and model information are presented in Tables S4–S6. Among the 17 models, approximately one-third were developed in Chinese populations, whereas the remainder originated predominantly from Europe and East Asia. Most models were developed in case–control settings, although 2 were derived from cohort studies. Across models, the predictor variables ranged from 3 to 9, and typically included age, sex, body mass index (BMI), smoking, alcohol consumption, and family history. The main characteristics of the validation cohort are shown in Table S1. The missing rates of questionnaire-based covariates were very low (0.0%–3.29%). Therefore, missing values were not imputed; each covariate was treated as an exposure, and analyses were conducted on available data. Standardized mean difference analyses showed no statistically significant differences in demographic characteristics or lifestyle factors between FIT-positive individuals and the overall screening population (Table S2), or between colonoscopy attendees and non-attendees among the FIT-positive participants (Table S3), indicating that the validation cohort remained representative. Although development AUCs varied (0.60–0.74), validation outside the derivation populations was limited, heterogeneity in predictor definitions and model structure underscored the need for comparative validation using a standardized population and outcome definition.
Discrimination and calibration performance of risk prediction models in external validation
After exclusion of 2 models that did not report original AUC, the 15 models included in our study showed original AUCs ranging from 0.60 to 0.74 in the development, whereas the AUCs in the validation ranged from 0.60 to 0.65, indicating moderate discrimination ability between cases and non-cases. Notably, the AUCs were generally lower in the validation than the development (Figure 1C, Table S7). Across models, recalibration slopes generally <1 indicated attenuation of risk gradients in the external screening population, whereas recalibration intercepts reflected differences in baseline risk between the derivation and validation cohorts. This attenuation might have reflected differences in validation context, including the restriction of the study population to FIT-positive individuals and variations in population characteristics and outcome definitions with respect to the original model development settings. Table S8 provides pairwise comparisons of net reclassification indices across the 17 models. Overall, most models showed no statistically significant differences in the net reclassification index, with values ranging from −0.16 to 0.29. Net reclassification improvement estimates were generally small, and many pairwise comparisons indicated confidence intervals including 0 (Table S8). These findings indicated limited evidence of meaningful improvement in individual risk reclassification. When stratified by lesion subtype (Table S7), models originally developed to predict CRC demonstrated higher discrimination for CRC outcomes than advanced adenomas, whereas models designed for advanced adenoma showed more balanced performance across lesion types. These findings were consistent with the original design intent of the respective models. In sex-stratified analyses (Table S9), discrimination was consistently higher among women than men across most models, although the relative ranking of models was largely preserved between sexes. No individual model demonstrated clear superiority in one sex over the other.
Screening yield and risk stratification performance for ACN
Stratification of the risk scores showed a clear separation between the highest and lowest risk groups across models (Figure 1D). With respect to the lowest risk group, the positive predictive values (PPVs) of ACN were 11.36%–15.79% for the Q2 group, 16.11%–21.56% for the Q3 group, and 21.82%–26.68% for the Q4 group, corresponding to relative risks (RRs) of 1.38–1.73, 1.89–2.63, and 2.30–3.57, respectively. The PPVs and RRs for ACN and its subgroups by quartiles are presented in Tables S10 and S11.
Among all FIT-positive individuals undergoing colonoscopy, the overall PPV of ACN was 14.76%. When model-specific cutoffs were applied, the integration of risk stratification into FIT-based screening resulted in meaningful absolute increases of 1.33%–11.69% in the PPV of ACN, with relative improvements of 8.99%–79.22%. This approach also improved the enrichment efficiency by 9%–79%, indicating substantial practical gains in targeting high-risk individuals and optimizing colonoscopy allocation (Figure 1E). For example, with the risk model developed by Imperiale et al.11, the PPV for ACN among FIT-positive individuals increased from 14.76% under the FIT-only strategy to 26.44% in the model-defined high-risk group, corresponding to an absolute increase of 11.69% and a relative improvement of 79.22%.
In conclusion, this study comprehensively evaluated the screening performance of various risk prediction models within a large-scale colorectal cancer screening program. The risk prediction models based on environmental risk factors demonstrated moderate effectiveness in identifying ACN. Incorporating these models into the FIT-positive population improved both the PPV and screening efficiency. The prominent roles of age and sex across models underscored their fundamental importance in colorectal cancer risk assessment. Therefore, a risk-stratified strategy, relative to the traditional age-based screening strategy, has potential to substantially enhance screening efficiency. The moderate discrimination observed might reflect the limited information captured by environmental risk factors alone, and the future integration of biomarkers or genomic variables might further improve risk stratification performance. Although existing risk prediction models may perform more discriminatively in women, their relative comparative performance remains broadly consistent between sexes. From a public health perspective, prioritizing risk models that yield favorable enrichment efficiency with a limited number of easily obtainable variables might offer promising pragmatic balance between effectiveness and scalability in population-based screening programs.
Supporting Information
Conflict of interest statement
For authors identified as personnel of the International Agency for Research on Cancer/World Health Organization, the authors alone are responsible for the views expressed in this article, which do not necessarily represent the decisions, policy, or views of the International Agency for Research on Cancer/World Health Organization.
Author contributions
Concept and design: Lingbin Du, Partha Basu, Yingying Mao, Le Wang, Weimiao Wu, Bin Liu.
Acquisition, analysis, or interpretation of data: Lingbin Du, Le Wang, Weimiao Wu, Chen Zhu, Tingting Pan, Huizhang Li.
Statistical analysis: Bin Liu, Yumeng Ding, Yi Zhou, Weiwei Chen, Lijuan Dai, Xueni Chen, Yuefan Shen.
Drafting of the manuscript: Lingbin Du, Partha Basu, Yingying Mao, Le Wang, Weimiao Wu, Bin Liu, Xiaohui Sun, Dong Hang.
Supervision: Lingbin Du, Partha Basu.
All authors confirm that they reviewed and critically revised the manuscript for important intellectual content, and approved the final draft for submission.
Data availability statement
The data generated in this study are available on request from the corresponding author.
- Received November 28, 2025.
- Accepted February 26, 2026.
- Copyright: © 2026, The Authors
This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.











