Abstract
Gastric cancer (GC), the fifth most common cancer globally, remains the leading cause of cancer deaths worldwide. Inflammation-induced tumorigenesis is the predominant process in GC development; therefore, systematic research in this area should improve understanding of the biological mechanisms that initiate GC development and promote cancer hallmarks. Here, we summarize biological knowledge regarding gastric inflammation-induced tumorigenesis, and characterize the multi-omics data and systems biology methods for investigating GC development. Of note, we highlight pioneering studies in multi-omics data and state-of-the-art network-based algorithms used for dissecting the features of gastric inflammation-induced tumorigenesis, and we propose translational applications in early GC warning biomarkers and precise treatment strategies. This review offers integrative insights for GC research, with the goal of paving the way to novel paradigms for GC precision oncology and prevention.
keywords
- Gastric cancer
- inflammation-induced tumorigenesis
- multi-omics
- artificial intelligence
- network-based methods
Introduction
Currently, gastric cancer (GC) is the fifth most common malignancy and the third leading cause of cancer mortality worldwide, contributing to approximately 723,000 deaths annually1. Eastern Asia, particularly China, has a substantial GC burden2,3. In 2020, 44.0% of global GC incidence and 48.6% of global GC-related deaths occurred in China. Notably, the 5-year survival outcomes for GC are strongly dependent on clinical stage, because early detection is associated with a 95% survival rate4. However, the rate of early diagnosis of GC is low: only 20% of GC cases in Europe5 are diagnosed in an early stage, and the rate is even lower in China6. Late-stage GC has a median survival of approximately 10 months and a 5-year survival rate below 30%7. Therefore, an innovative paradigm for early GC detection and prevention is required for precision oncology, and for decreasing GC incidence and mortality.
A key limitation in early GC detection and diagnosis is the insufficient knowledge regarding the malignant progression of premalignant GC lesions. Histologically, intestinal-type GC, the most common subtype, develops through an inflammation-induced tumorigenesis cascade, according to epidemiological observations of normal gastric epithelium. Disease progression involves premalignant lesions, including chronic atrophic gastritis (CAG), intestinal metaplasia (IM), and dysplasia, which ultimately develop into GC8. Gastric inflammation-induced tumorigenesis is an evolutionary process involving multiple changes at the phenotypic, cellular, and molecular levels; the dynamic disease progression often lasts 10–30 years (Figure 1). The risk of developing GC increases during this evolutionary process. One study has indicated that 1/50 of patients with CAG, 1/39 of patients with IM, and 1/19 of patients with dysplasia develop GC within 20-year follow-up9. In a study in 92,250 patients in Western populations, the annual incidence of GC has been found to be 0.1% for patients with CAG, 0.25% for patients with IM, 0.6% for patients with low-grade dysplasia (LGD), and 6% for patients with high-grade dysplasia (HGD) within 5 years after histopathological diagnosis10. When tumorigenesis occurs during progression remains unclear, thus hindering the early diagnosis and prevention of GC. In recent years, progress in multi-omics technologies accompanied by mathematical modeling methods, including network analysis11, has swiftly advanced the field. These methods have enabled systemic identification of key points of tumorigenesis onset, exploration of early GC biomarkers, and new strategies for GC prevention, thus providing substantial scientific and practical benefits in combating GC.
In this review, we discuss challenges in early GC diagnosis and intervention, focusing on the multi-level and dynamic characteristics of gastric inflammation-induced tumorigenesis from the perspective of omics-based approaches. We focus on the potential of multi-level biological networks based on artificial intelligence (AI) in early GC detection and intervention, and in providing a novel paradigm for the precise prevention and management of cancers.
Multi-omics data characterizing gastric inflammation-induced tumorigenesis
The rapid advancement of omics technology has enabled data-driven insights into GC tumorigenesis mechanisms, thus facilitating a holistic understanding of this dynamic multi-level process from both macroscopic and microscopic perspectives (Figure 2).
Macroscopically, phenomics, which involves multidisciplinary phenotypic data at the organismal level12, has gained growing attention for enabling the relationships of genotypes and phenotypes with GC incidence to be traced both clinically and morphologically. The clinical features of phenomics commonly refer to clinical manifestations, such as signs and symptoms, which are external features resulting from internal factors, such as molecules and environmental influences during disease development. Common clinical GC indicators include age; unhealthful lifestyle habits, such as smoking and drinking; and a family history of GC8. Symptoms may comprise indigestion, anorexia (restless appetite), weight loss, and abdominal pain13. Dysphagia or reflux can occur in proximal GC or tumors located at the gastroesophageal junction. Some patients with GC may exhibit bleeding symptoms14. However, common clinical symptoms lack pathological stage specificity, whereas several symptoms indicate GC development at an inoperable advanced stage. Extensive clinical experience in traditional Chinese medicine (TCM) has also been gained in treating and inhibiting gastric tumorigenesis, and identifying characteristic phenotype information associated with malignant progression15. Wu et al.16 have constructed a comprehensive database on the integration of TCM symptom mapping, thereby improving phenomic data formats and aiding in a deep understanding of GC incidence. Hou et al.17 have summarized the pathogenesis of GC premalignant lesions (GPLs) as internal deficiencies, such as spleen qi deficiency and stomach yin deficiency, and external excess, such as qi stagnation, damp heat, and blood stasis. Li et al.18,19 have analyzed patients with CAG with cold syndrome and hot syndrome by using a network balance model to evaluate the imbalanced network underlying TCM syndromes, thus revealing the potential associations between symptoms and molecular changes in gastric premalignant lesions. Additionally, in TCM, tongue coating is associated with GC diseases, and tongue information, such as color and coating thickness, is associated with malignant progression18,20. Integration of tongue images and TCM symptoms by using AI methods has been demonstrated to be effective in identifying GC precancerous lesions and predicting risk21,22.
Gastroscopy examination remains the gold standard for identifying gastric morphological features of malignant progression, and endoscopic and histopathological images directly reflect the pathological state23. The pathological states of precancerous GC lesions vary. CAG is defined by a decrease in parietal cells and chief cells24, whereas IM is characterized by the emergence of enterocytes and goblet cells25. Dysplasia is characterized by abnormal cellular atypia26. In early GC, endoscopy shows a mild mucosal uplift or depression, accompanied by mild redness; because the images lack typical features, early cancer interpretation is highly dependent on the endoscopists’ experience. Simultaneously, predicting progression risk according to pathological information regarding GPLs is difficult27. With the gathering of extensive gastroscopy image data, AI methods have emerged as a promising avenue in GC research, owing to their efficient computational and learning capabilities28. In particular, the application of machine learning algorithms to process gastroscopy images has garnered substantial interest, because it allows for automatic annotation and extraction of lesion conditions in images; facilitates analysis of the pathological features of gastric mucosal lesions; and enables prediction of their progression trends. Huang et al.29 have performed pioneering research in Helicobacter pylori (HP) infection by training a neural network on endoscopic images of a 30-patient cohort, which identified HP with a sensitivity of 85.4%. In 2018, Hirasawa et al.30 reported an automatic GC monitoring system based on convolutional neural networks under routine endoscopy, which had an overall sensitivity of 92.2% for tumor recognition in 2,296 test images. Wu et al.31 have used a deep convolutional neural network to develop an intelligent recognition method for early GC endoscopic images; the recognition accuracy rate of 92.5% indicated better performance than that of endoscopists. Luo et al.32 have conducted a multicenter case–control trial involving collection of a vast corpus of 1,036,469 GC gastroscopy images from 84,424 patients. Subsequently, they developed a deep learning framework that predicted early GC with an internal validation set accuracy rate of 95.5%, which was comparable to the performances of endoscopists. In general, accumulating phenomic knowledge regarding gastric inflammation-induced tumorigenesis has provided fundamental perspectives for identifying potential biological connections between phenotypes and genotypes, thus supporting translational application for the early diagnosis and treatment monitoring of GC.
Multi-omics at the microscopic level primarily involves the examination of cellular and molecular characteristics during the tumorigenesis process. Regarding cellular features, significant changes in cell states can reflect phenotypic transformations, such as morphologic diversity during disease progression. Corresponding molecular alterations occur, because cells are responsible for biological functions in organisms. Therefore, cellular features may serve as a crucial link between macroscopic phenotypic knowledge and microscopic molecular knowledge; consequently, reliable resolution of changes in cell states is necessary. Single-cell transcriptomics, a high-resolution technique capable of resolving gene expression differences in individual cells, can be used to study the molecular characteristics and heterogeneity of individual cells in GC precancerous lesions. This method aids in systematic understanding of the changes in cell associations during gastric inflammation-induced tumorigenesis. Zhang et al.33 have conducted the first single-cell transcriptomic studies on GC precancerous and cancer lesions and have successfully captured more than 50,000 cells from patients with gastritis and GC. On the basis of these findings, they have established the first single-cell atlas of GC tissue, thus revealing the gene expression changes occurring during the progression from precancerous lesions to early GC. This study has revealed gene expression changes during gastric inflammation-induced tumorigenesis and identified unique molecular features and specific marker genes, which may aid in the early diagnosis of GC. This research has provided a reliable molecular basis for studying GC mucosal cell heterogeneity and different types of precancerous lesions, thus aiding in the identification of cancer prevention biomarkers that could potentially be used to identify individuals with high-risk lesions expected to progress to invasive carcinoma. Sathe et al.34 have analyzed approximately 55,000 cells from biopsy samples of IM and GC, and generated a receptor–ligand network associated with different components of the GC immune microenvironment. Single-cell transcriptomic sequencing has also been widely applied in GC heterogeneity research. Kumar et al.35 have constructed a large-scale GC single-cell atlas from 31 patients (more than 200,000 cells); deeply analyzed intratumor and intertumor heterogeneity; discovered new features of the tumor microenvironment in diffuse GC; and identified and validated the role of INHBA in specific subtypes of cancer-associated fibroblasts. Wang et al.36 have comprehensively analyzed a single-cell atlas constructed from 45,000 cells from patients with malignant ascites, and have found that specific cancer cell subpopulations of GC origin lead to diminished patient survival rates, possibly through activating carcinogenic pathways such as cell cycle regulation, DNA repair, and metabolic reprogramming during the metastatic process. These findings have revealed the high developmental plasticity of GC cells during migration. In summary, single-cell transcriptomics has broad application prospects in gastric inflammation-induced tumorigenesis research. The exact subclonal composition of a sequenced cancer cell population has emerging roles in improving understanding of the biological mechanisms of GC development, and providing effective methods for early GC diagnosis and treatment.
At the molecular level, genomic, epigenomic, and transcriptomic technologies are used to analyze the molecular associations underlying GC tumorigenesis across various omics levels, thus providing data for understanding biological mechanisms. From genomic data, 2 broad categories of driver genes have been identified in GC: genes frequently mutated in various tumors, such as TP53, ARID1A, ERBB2, and FGFR237, and genes exhibiting tissue and lineage specificity, such as CDH1 and RHOA38. TCGA defines specific molecular subtypes of GC at the genomic level, including chromosomal instability, microsatellite instability, genomic stability, or Epstein-Barr virus (EBV) positivity39. In noncoding genes, mutations in CTCF binding sites involving AT>CG and AT>GC substitutions and "enhancer hijacking" events have been identified in GC. Common mutation features of GC include T>G substitutions, which may help determine the origin of GC according to tissue specificity. In the dysplasia stage of GPLs, genomic changes such as chromosomal instability40, telomere shortening, and copy number changes have been detected; consequently, the loss of chromosomal integrity regulation might be an essential feature of GC tumorigenesis. However, existing research on single types of omics is facing with difficulties in identifying functional associations among different data levels; moreover, the prioritization of samples with high tumor proportions in research may shift focus away from the role of the microenvironment. In recent years, studies have indicated that epigenetic changes promote carcinogenesis, thus providing new insights into the critical molecular features of GC development. Tumor epigenetic changes include primarily modifications to DNA, histones, and RNA. Changes in CpG island DNA methylation have been widely studied, and may be associated with exogenous stimuli such as HP and EBV. Chronic inflammation induced by HP has been shown to lead to widespread DNA hypermethylation and hypomethylation in the GC epithelium, such as CDH family methylation, which is irreversible even after HP eradication, thereby suggesting a possible risk marker for GC development41,42. In the study of gastric malignant progression, research on histone modifications43, such as changes in H3K27ac and H3K4me3 signals marking enhancers and promoters, is attracting attention. Alternative promoter selection is a common epigenetic feature of GC, and the use of alternative promoters can help newly formed tumors evade the host immune system and achieve immune programming, thus potentially representing an intervention target and direction for GC research44. Previous epigenetics research45 on RNA modifications has focused primarily on miRNAs and lncRNAs, such as the oncogenic lncRNA ZFAS1, which may promote the division of GC cells, and miR-584-3p, which may inhibit GC progression. Transcriptomic studies have also identified other events, such as tumor-associated selective splicing events and A-to-I base pair changes caused by RNA editing46. However, RNA-level changes themselves are not heritable, and the roles of GC-driving events have not yet been fully determined. Additional quantitative characterization of gastric inflammation-induced tumorigenesis has been increasingly provided by a variety of emerging omics technologies, including proteomics, metabolomics, lipidomics, microbiomics, and radiomics.
In summary, substantial omics data have been collected on the multi-layered and dynamic processes involved in gastric inflammation-induced tumorigenesis, thus facilitating understanding of the complex biological mechanisms underpinning this process, and highlighting the need for using robust analytical methods to uncover potential biological associations between patient characteristics and disease risk by using extensive, multi-level omics data (Table 1).
AI-based methods for systematically resolving multi-omics data
The dynamic characteristics of gastric inflammation-induced tumorigenesis involve associations of multi-level information, such as phenotypic features, including TCM symptoms, and cellular and molecular features. Achieving comprehensive and holistic characterization from single-level information is difficult, given that distinct levels of omics information present unique data structures while simultaneously containing deeply embedded correlations. Identifying key components associated with disease progression amid the massive accumulation of multi-level omics data is a critical methodological challenge in current research.
For multi-level omics integration, existing coupling methods can be roughly divided into 3 categories (Table 2). The first is based on similarity measurement methods, which calculate similarity information for each omics level and then use various fusion methods to process similarity features for unified analysis. These methods are ideally suited for applications in which the number of features exceeds the number of samples, thereby enabling effective integration of diverse data types. Rappoport et al.51 have constructed similarity-based multi-omics clustering methods to exploit similarity relationships among multi-omics levels. These methods have achieved reinforcement and supplementation of information among different levels of networks and identification of key data features. Notably, a key intrinsic shortcoming of algorithms in this category is the lack of feature importance at each omics level. Consequently, further computational methods are necessary to calculate the feature importance values among omics features, thus largely restricting the implementation of these algorithms. The second category comprises methods that output feature importance parameters from omics data, thus facilitating downstream analysis based on dimensionality reduction methods. This category of methods is based on an assumption that omics data have an inherently low-dimensional representation, and each level of omics data can be considered as a projection from this low-dimensional representation to high-dimensional space. Matrix decomposition methods can effectively identify significant hierarchical structures. For example, nonnegative matrix factorization (NMF) methods use an intrinsic low-rank representation of data and map it onto a high-dimensional transformation matrix that is also nonnegative. These methods enable the depiction of relationships among various omics features52. Mo et al.53 have proposed a flexible matrix decomposition structure that uses the EM algorithm to analyze the regularized clustering structures among structural data and the intrinsic connections in multi-level data. Notably, the NMF method is often used in single-cell transcriptome sequencing analysis to analyze the relationships between cellular data and molecular levels data, and to identify molecular features associated with different cell states54,55. As described above, Kumar et al.35 have constructed a GC single-cell atlas by using NMF methods to identify high variable genes for cell clusters, thus supporting the identification and clinical validation of the gene signatures of cancer-associated fibroblasts. Although these methods efficiently annotate the dominant features of omics data at multiple levels and have high inference accuracy in identifying potential connections, the lack of biological interpretability is currently largely challenging the implantation effects, as the original features of omics data are projected into the hidden feature space53. The third category encompasses network structure analysis methods, which use principles of network science to abstractly represent multi-level information. These methods use a node-edge model, wherein nodes represent distinct basic units within the system, and edges describe the interaction relationships among these units. Thus, network structure analysis can link conventionally disordered data samples and facilitate the preservation of biological interpretability. AI algorithm propagation has been widely applied to the analysis of multi-layer network structures and topologies. Han et al.56 have used a machine learning model to construct a large-scale multi-omics network, which has been applied to detect associated structures within the network. Wang et al.57 have constructed a network-based machine learning model called similarity network fusion, which was initially developed for patient stratification and survival analysis, and iteratively updates individual omics similarity networks. In bioinformatics research, methods such as deep learning strategies58, module-based optimization algorithms, and spectral clustering similarity network fusion57 have been widely used in biological gene-associated networks for effectively processing unweighted network structures; however, their calculations require extensive time and memory resources, and the large number of model parameters may lead to accidental overfitting. Wu et al.59 first revealed the existence of hierarchical modularization in macro-micro biological networks. Modularization primarily indicates that the biological elements within each module are closely related, whereas their connections with adjacent modules is relatively weaker. Furthermore, correlations have been observed among different levels of modules at macro- and microscales, such that stronger modular associations of disease genes or drug targets in the network are observed when the corresponding phenotypes are more similar. “Multi-level modular relationship” law could be thus uncovered and summarized among biological elements at various hierarchical levels. Predictive algorithms have been established for disease-causing genes and drug targets with higher accuracy than popular methods, and used to systematically analyze disease network regulation mechanisms under specific tissue or cell conditions, thus achieving systematic integration of multi-level information such as phenotype-cell-molecule modules60,61.
GC malignant progression involves intricate multi-level information that corresponds to dynamic evolution at various pathological stages, such as CAG, IM, and LGD. Given the multifaceted characteristics of this complex process, mathematical modeling methods require the analysis and fitting of dynamics at multiple time points. Recently, increasing attention has been paid to methods for dynamically fitting multi-level information. Because of the added dimension of time, structural features become increasingly complex, thus making description of deeper structural patterns in evolving patterns difficult through traditional statistical methods. Although these methods have successfully extracted dynamic network features, they may lead to error accumulation over time. To address this issue, machine learning methods have been widely applied in dynamic network feature extraction. Network representation learning is an important method for analyzing such networks and mining information from them; the core of this method involves embedding these unstructured data into a low-dimensional space through low-dimensional vectors, to characterize nodes and edges or even entire networks. Jiao et al.62 have proposed a temporal network embedding framework that uses a variational autoencoder tool to generate low-dimensional embedding vectors for network nodes while preserving the dynamic nonlinear features of network substructures. Additionally, Cui et al.63 have used graph convolutional networks to achieve low-dimensional representations while updating node representations on the basis of unified representations. When the network state changes, new representations from neighboring nodes relevant to the change are automatically aggregated along the graph. Notably, dynamic network analysis methods have also been widely applied to specific biological problems. By combining biological networks with multi-omic sequencing data, Greene et al.64 have integrated multiple levels of information analysis, and achieved the prediction of disease-causing genes and resolution mechanisms for specific tissue/cell type regulatory mechanisms by exploiting multiscale information integration methods. Chen et al.65,66 have developed dynamic signaling pathway recognition methods that use individual patient data to identify biomarkers indicative of distorted physiologies. Their approach leverages complex biomedical processes that operate across multiple scales and are influenced by metastable equilibria phenomena. In this context, critical molecular interactions between transduction scaffold complexes have facilitated the identification of regulatory pathways underlying targeted region-of-interest conditions that extend beyond single-cell resolution under perturbations, stressors, and other conditions. Specifically, this method has enabled the identification of key features associated with system/network boundaries, thereby providing valuable insights into the mechanisms driving cancer onset and progression. In a study targeting inflammation-induced tumorigenesis in the digestive system from the perspective of the phenotype-cell-molecule network, Guo et al.67 have established a dynamic mathematical model to interpret the interactions between inflammatory environments and cell functions across multiple scales according to relationship analysis. Through function relation methods, they have analyzed the dynamic evolutionary trends in key multi-level network modules. By fitting the long-term dynamics of inflammation-induced tumorigenesis and identifying metabolic-immune balance states playing critical roles in tumor transformation, they have defined the molecular pathways driving genetic mutations that are responsible for cancer onset, and have conducted etiological analyses. Their findings have established regulatory networks and risk assessment models for inflammation-induced tumorigenesis, and have made valuable contributions to the understanding of tumorigenesis and its underlying mechanisms.
Collectively, multi-level dynamic biological network analysis is poised to enable reliable characterization of multi-omics data in malignant progression. Consequently, understanding and measuring the underlying evolutionary process in gastric inflammation-induced tumorigenesis is an innovative approach for prognosticating the progression to cancer. Through systematic analysis of key network modules exhibiting dynamic multi-level features, trustworthy early warning biomarkers may be identified and used to stratify patients into those at truly high risk of progression who require enhanced endoscopic observation, as well as to shed light on new strategies for early recognition and diagnosis of GC (Figure 3).
Identifying biomarkers of gastric inflammation-induced tumorigenesis according to dynamic multi-level omics features
Biomarkers are objective measures used to evaluate complex diseases. In GC, biomarkers can serve as indicators of pathogenesis. Although several GC biomarkers are clinically applied, their effectiveness in improving the diagnosis rate of early-stage GC is suboptimal; therefore, effective diagnostic biomarkers must be explored. The systematic dissection of dynamic multi-level biological networks may reveal network modules that may serve as potential biomarkers of inflammation-induced tumorigenesis from a holistic perspective, thus substantially advancing the early diagnosis and precise treatment of GC.
GC biomarkers in clinical applications can be generally divided into 2 categories: serum biomarkers and liquid biopsies. Serum biomarkers, such as CEA, CA19-9, AFP, CA72-4, and CA12-5, have limited ability in early GC detection68. CEA is a widely used tumor marker in clinical practice, and its expression levels may increase in other conditions, such as inflammatory bowel disease and liver disease. Additionally, CEA levels may increase only in advanced stages rather than in early stages of GC69. Similarly, high CA19-9 expression is found in many other types of cancer, including pancreatic cancer69. AFP-positive GC is also observed in advanced stages70. Other conventional clinical biomarkers, including CA72-4 and CA125, generally exhibit high sensitivity and accuracy, yet little research has examined their ability to detect early GC71. In recent years, liquid biopsies have shown promise in early GC diagnosis; cell-free DNA (cfDNA) and circulating tumor DNA (ctDNA) are the most widely used. However, the translational practice of liquid biopsies remains challenging, because nearly all studies have focused on monitoring tumor signals in detectable conditions but have ignored the unique characteristics of gastric precancerous lesions. Notably, one study has found that cfDNA in the precancerous stage is not significantly elevated beyond that in healthy controls, thus limiting the potential value of cfDNA as an early diagnostic biomarker. Therefore, further studies are needed to establish a reliable set of biomarkers that can predict malignant progression and enable personalized treatment of early GC.
Because GC develops through stepwise progression, the most effective strategy for the early diagnosis of GC is identifying patients with premalignant lesions at high risk of progression. Extensive research has identified biomarkers in gastric premalignant lesions or early stages of GC from multi-omics perspectives (Table 3). Among genomic analyses, Fassan et al.72 have indicated that HGD and EGC share similar molecular signatures, and that TP53 might play an important role in the progression to invasive GC. Similarly, Rokutan et al.73 have described the somatic mutational landscape of LGD and emphasized the importance of TP53 mutation, which precedes other mutations in the development of GC. These results suggest that somatic TP53 mutation might serve as a potential marker for the high progression risk of LGD and thus contribute to the early diagnosis of GC. Among transcriptomic analyses, Lee et al.74 have performed microarray analysis on IM glands by using laser capture microdissection and have suggested that CDH17 might serve as a promising biomarker for early-stage GC. Dynamic changes in cell types play crucial roles in gastric tumorigenesis. Among single-cell transcriptomics studies, we have constructed the first dynamic cellular network across distinct premalignant lesions; this network has revealed the expression signature of exceedingly early cells of gastric cancer (EEGC) and characterized biomarkers of EEGC, including KLK1033,75,76. On the basis of EEGC, potential biomarkers for discriminating and warning GC in curable stages could be determined, thus improving understanding of the associated etiology and pathogenicity, while informing new therapies and prevention targets. Among microRNA-omics studies, researchers77,78 have found that miR-30, miR-194, and miR-143-3p might contribute to gastric tumorigenesis. Among proteomics studies, Li et al.79 have found distinct differences in proteomic features between gastric premalignant lesions (GPLs) and GC, thus identifying several proteins associated with increased risk of gastric lesion progression. Among metabolomic studies, Huang et al.80 have used an untargeted plasma metabolomic assay and identified 6 metabolites associated with a decreased risk of early GC, 3 of which were associated with the progression of IM. Among lipidomics studies, Liu et al.81 have investigated the association between lipidomic signatures and the risk of progression to GC. The study has identified 11 plasma lipids inversely associated with gastric lesion progression and GC occurrence. These lipids were organized into 5 clusters, thus improving the ability to predict the progression potential and risk of early GC. Among microbiome analyses, Cui et al.20 have performed metagenomic sequencing and have found that Campylobacter concisus is associated with the development of GPLs. The presence of Campylobacter concisus has been detected in both the tongue coating and gastric fluid of patients with gastritis, and thus may serve as a potential noninvasive biomarker for long-term monitoring of the disease. Because single-layer omics data might not be sufficient to decipher the multi-level biological mechanisms underlying the progression of gastric tumorigenesis, multi-omics level investigations have been performed to uncover the underlying mechanisms. By integrating the genomic and epigenomic levels, Huang et al.50 have found that IM exhibits specific genomic and epigenomic features, including low mutational burden; recurrent mutations in certain tumor suppressors, such as FBXW7, chromosome 8q amplification; and shortened telomeres. In patients with IM, shortened telomeres and chromosomal alterations are associated with subsequent LGD or GC. Several IMs exhibit hypermethylation at DNA methylation valleys but generally lack intragenic hypomethylation signatures of advanced malignancy. Min et al.82 have analyzed the genetic and transcriptomic characteristics of adenomas with LGD, HGD, and EGC. The study has demonstrated that RNF43 mutations and downregulation are key events in the progression from LGD to HGD, and eventually to EGC. The findings suggest that tumors with RNF43 mutations may be responsive to Wnt-targeted agents, thus highlighting the diagnostic value and potential therapeutic strategy for intestinal-type GC with RNF43 mutations.
In general, increasing the sensitivity and specificity of biomarkers for early GC diagnosis remains a key research priority, which requires in-depth investigation of the multi-level biological mechanisms underlying gastric tumorigenesis. Although most studies have focused on cross-sectional samples, recognizing the dynamic features of this process is crucial; thus, prospective clinical trials are needed to determine the effectiveness of early diagnosis biomarkers on the basis of longitudinal samples. By leveraging AI methods to identify crucial features in this process, robust biomarkers may be identified that enable early detection of GC and improve patient outcomes.
Network pharmacology and AI-based TCM in the prevention and treatment of gastric inflammation-induced tumorigenesis
Although current strategies for preventing gastric cancer have focused on addressing common risk factors, they have often been unable to effectively prevent GC at the precancerous stage. Therefore, new preventive strategies that target the molecular mechanisms underlying gastric tumorigenesis are needed. Because the core output of the biological network modules represents the multi-level and dynamic characteristics of gastric inflammation-induced tumorigenesis, systematically screening drugs to target network modules by using AI methods may enhance understanding for updating strategies of GC prevention and treatment, and ultimately achieving better patient outcomes.
Current strategies for preventing GC primarily target common risk factors but cannot accurately prevent cancer development of GPLs. HP infection is among the most important risk factors for GC occurrence. The most widely used clinical treatment for preventing GC is eradication of HP, but the effectiveness of HP eradication in reversing GPLs remains controversial, particularly in cases of IM and relatively severe lesions. For example, Hwang et al.83 have found that HP eradication contributes to the reversal of CAG and IM in a 10-year follow-up clinical study. In another 16-year clinical follow-up study84, researchers have found that HP eradication ameliorates CAG that has not progressed to IM. However, some studies85 have shown that HP eradication may be ineffective in patients with IM, thus suggesting that eradicating HP may not be sufficient to prevent the reversal of GPLs. The efficacy of HP eradication in decreasing the incidence of GC also has limitations. Although a long-term follow-up study has indicated that patients who received HP eradication had a lower incidence of GC, the benefit was not observed until 26.5 years later and was difficult to achieve in the short term86. These studies have indicated that, although HP eradication may be effective in interventions to prevent the progression of GPLs, additional interventions are required.
Several risk factors for gastric cancer have been identified, including hereditary factors, smoking and alcohol consumption, and EBV infection. Hereditary factors are responsible for 1%–3% of GCs87. The tumorigenesis of CDH1 mutation-associated diffuse-type gastric cancer does not strictly follow the Correa cascade model, and the underlying genetic causes of intestinal-type gastric cancer remain incompletely understood88. Lifestyle factors, such as smoking and alcohol use, increase the risk of various types of tumors, including GC. Smoking is also associated with a greater increase in the risk of EBV-positive GC than EBV-negative GC89. EBV is known to remodel host chromatin topology and promote activation of oncogenes90. It plays a critical role in activating the PI3K-Akt and Wnt signaling pathways91, thus leading to altered cell signaling in malignant cells. Currently, EBV-associated GC treatments include chemotherapy alone or in combination with specific inhibitors, such as PD-L1 inhibitors and PI3K inhibitors92. However, EBV has been reported to be associated with only 8%–10% of GCs92, and HP eradication remains the best-studied therapy strategy. By focusing on these risk factors, more effective prevention, intervention, and personalized treatment strategies can be developed to improve patient outcomes.
Given the abnormally elevated oxidative phosphorylation during gastric tumorigenesis, vitamin supplements with antioxidant properties have been used as an adjunct to HP eradication, but their effectiveness has remained insignificant in the short term. A randomized, double-blind study of 1980 patients receiving vitamin C, vitamin E, and beta-carotene has indicated no significant differences in the pathological progression rate and regression rate between the treatment and placebo groups93. Another follow-up study in 3,365 patients has shown that HP eradication and continuous use of various vitamins for as long as 7 years decreases the incidence of GC94. In summary, additional vitamin supplementation does not significantly enhance the effect of HP eradication in blocking gastric tumorigenesis in the short term, and additional drug intervention is needed.
Although some Western medicines target GPLs, including celecoxib, rebamipide, and aspirin, their strength of evidence is low, and their recommendation in clinical practice guidelines is poor95. Western medicines such as the COX2 inhibitor celecoxib have been found by Sheu et al.96 to promote IM reversion and delay progression in patients who underwent HP eradication. However, this finding contradicts the conclusion of another study by Wong et al.97, who have found no significant improvement after celecoxib intervention for 24 months after HP eradication in HP-positive patients. Therefore, the effectiveness of celecoxib for GPL intervention has not been uniformly agreed upon. Other Western medicine studies have targeted GPLs beyond celecoxib. For example, a meta-analysis by Huang et al.98 has indicated that aspirin decreases the incidence of GC in HP-positive patients but has no significant effect on HP-negative patients. Some researchers99 have found that rebamipide promotes the reversal of IM and LGD in patients; however, in another multicenter clinical trial, its effectiveness in improving IM was not found to be significant, and its efficacy requires further validation through more research100. Overall, the effects of Western medicine in GPL interventions have also been unsatisfactory. One possible reason is that gastric tumorigenesis is a long-term process, and the mechanisms involved are complex and must be extensively investigated from a systematic and comprehensive perspective. TCM provides a trove of treatments waiting to be explored, and its multi-component, systemic regulatory effects are highly compatible with treating the complex process of gastric tumorigenesis. Currently, the TCM Moluodan has been included in clinical consensus opinions and is considered to have potential value in treating GPLs95. The most reliable evidence supporting this treatment has come from a prospective, randomized, double-blind, placebo-controlled trial, in which Tang et al.101 have found that, compared with folic acid combined with vitamin E, Moluodan effectively ameliorates gastric mucosal CAG and IM, and notably reverses LGD. Intervention strategies for gastric tumorigenesis remain largely unsatisfactory, and TCM may provide novel candidate strategies for the prevention of precancerous lesions.
Multicomponent TCM is characterized by its holistic perspective; thus, a holistic TCM research approach is needed. Consequently, the concept of network pharmacology with a holistic network target as the core concept has been proposed102. Representative algorithms include CIPHER,59 which enables genome-wide pathogenic gene prediction and is based on multi-level biological networks, and a genome-wide target prediction algorithm for TCM ingredients called drugCIPHER103. TCM network pharmacology provides a promising approach for understanding the molecular network features of complex disease processes and the intervention mechanisms of multicomponent TCM, including elucidation of the overall mechanism of action of Moluodan on CAG104. Recent TCM intervention studies on gastric tumorigenesis based on TCM network pharmacology are listed in Table 4. However, few of them have integrated omics or dynamic data. Of note, with the progress in single-cell RNA-seq technology and the accumulation of single-cell data, studies have integrated single-cell RNA-seq data with network pharmacology, in an emerging method for conducting drug intervention research at the cellular level105,106. However, such studies have not yet been conducted on interventions for gastric tumorigenesis, and further in-depth exploration is urgently needed. Systematic research integrating multi-omics data, AI algorithms, and TCM network pharmacology is expected to address the problem of the unclear intervention mechanisms for gastric tumorigenesis and explore potential effective TCM intervention drugs. This approach has shown promise for precision intervention with Weifuchun capsules in patients with CAG107.
Future perspectives
With the pioneering accumulation of multi-omics data and machine learning methods, recent years have seen an explosion in gastric inflammation-induced tumorigenesis research, which has provided new biological understanding of GC oncology and prevention. On the basis of the newly determined biological mechanism, promising biomarkers and potential targets that characterize key state changes initiating tumorigenesis during inflammation-induced tumorigenesis may be reliably identified and validated, to enable better early GC risk stratification as well as personalized prevention strategies.
Despite major advances in this field, several challenges remain unsolved, thus strictly limiting further understanding of the key point of GC onset during gastric inflammation-induced tumorigenesis. Among multi-omics data, large-scale individual longitudinal data are lacking, and multi-omics data obtained from the same patients at different time points are needed to avoid crucial bias in feature identification. Moreover, prolonged surveillance of patient samples over the course of years could also help accrue sufficient parameters for simulating evolutionary models statistically while offering an exemplary opportunity to study lesion evolution over time and in space during progression. Greater attention should be paid to new omics in gastric inflammation-induced tumorigenesis research. For example, radiomics is increasingly becoming a powerful tool for mining quantitative medical image features, which may substantially broaden multi-level omics insights into inflammation-induced tumorigenesis. Radiomics has shown high potential in early tumor diagnosis for breast cancer and lung cancer114,115. These methods transform medical images into quantifiable features for mining through lesion image segmentation, radiomic feature extraction, and intelligent model construction, and use machine learning methods to combine image features and other clinical information to assist in the diagnosis and treatment of diseases116. The performance of radiomics in some tumor clinical tasks is similar to or better than that of the judgment of clinical physicians. Recent advances in spatial transcriptomics have been systematically used to generate biological insights into cancer contexts by providing transcriptomic profiles with crucial spatial information within biological tissues at subcellular levels117. The typical repertoire of operations of spatial transcriptomics, accompanied by single-cell transcriptomics, has been systematically demonstrated in liver diseases and cancer118. Multi-omics at spatial resolution has also led to integration of analysis methods119. Spatial omics may be inherently amenable to integration with other modalities, and adding time series samples could ultimately broaden biological understanding by enabling parallel insights to be gained.
With AI methods, the essential role of integrating multi-level information from various time points underscores the need for adequate robustness and repeatability. Machine learning approaches are used to satisfy the need to appropriately incorporate biological knowledge hidden at different levels, such as gene regulation mechanisms, into models; in contrast, statistical methods tend to ignore the details of biological relationships in attempts to explain most variations by using only a few surrogate parameters. By incorporating AI methods, network analysis of gastric inflammation-induced tumorigenesis can enable the elucidation of intricate relationships among various factors at multiple levels, including genes, cells, pathways, and phenotypes59–61,120–122. In summary, integrating information across levels and developing more sophisticated models will be key to advancing understanding of the complex processes underlying malignant progression.
On the basis of multi-omics data, series of biomarkers at different omics levels have been identified. However, room remains for further in-depth research, as summarized in the following 2 points. First, given that tumorigenesis is a dynamic process, testing of the effectiveness of early diagnosis biomarkers should be performed on more longitudinal samples which have greater credibility than cross-sectional samples. Second, given that the current effectiveness of early diagnosis biomarkers remains not ideal, researchers have attempted to improve the performance index through using a combination of multiple biomarkers. However, most of these combinations have remained at the single-omics level; therefore, further research on biomarker combinations at the multi-omics level is needed. From intervention perspectives, some therapeutic drug methods for GC are available, including blocking antibodies123–125, tyrosine kinase inhibitors126,127, and novel agents such as ATR inhibitors128 and FAK inhibitors129. However, the current intervention efficacy for GPLs remains unsatisfactory. Given the complexity of intervention mechanisms, drugs that exert holistic regulation are needed. The research strategy of combing multi-omics data, AI algorithms, and TCM network pharmacology provides a promising method to systematically predict intervention drugs.
In summary, multi-omics data and AI-based methods are critical tools for systematically deciphering the biological mechanisms of gastric inflammation-induced tumorigenesis. Reliable experimental designs for omics and clinical application can inform more realistic mathematical models, whereas quantitative AI models can generate testable predictions and specific intervention strategies from a network strategy perspective130. Although current research has not yielded treatment guidelines for individual patients, a comprehensive framework including computational, experimental, and clinical strategies may facilitate more anticipatory, precise, and adaptive approaches to GC oncology.
Conflict of interest statement
No potential conflicts of interest are disclosed.
Author contributions
Conceived and designed the analysis: Shao Li.
Collected the data: Bowen Wu and Xiaosen Wei.
Prepared the figures: Qian Zhang and Mingran Yang.
Wrote the paper: Qian Zhang and Mingran Yang.
Writing-review & editing: Peng Zhang and Shao Li.
Footnotes
↵*These authors contributed equally to this work.
- Received April 17, 2023.
- Accepted June 26, 2023.
- Copyright: © 2024, The Authors
This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.
- 48.
- 49.
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.
- 109.
- 110.
- 111.
- 112.
- 113.
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.
- 122.↵
- 123.↵
- 124.
- 125.↵
- 126.↵
- 127.↵
- 128.↵
- 129.↵
- 130.↵