Abstract
Artificial intelligence (AI) is significantly advancing precision medicine, particularly in the fields of immunogenomics, radiomics, and pathomics. In immunogenomics, AI can process vast amounts of genomic and multi-omic data to identify biomarkers associated with immunotherapy responses and disease prognosis, thus providing strong support for personalized treatments. In radiomics, AI can analyze high-dimensional features from computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography/computed tomography (PET/CT) images to discover imaging biomarkers associated with tumor heterogeneity, treatment response, and disease progression, thereby enabling non-invasive, real-time assessments for personalized therapy. Pathomics leverages AI for deep analysis of digital pathology images, and can uncover subtle changes in tissue microenvironments, cellular characteristics, and morphological features, and offer unique insights into immunotherapy response prediction and biomarker discovery. These AI-driven technologies not only enhance the speed, accuracy, and robustness of biomarker discovery but also significantly improve the precision, personalization, and effectiveness of clinical treatments, and are driving a shift from empirical to precision medicine. Despite challenges such as data quality, model interpretability, integration of multi-modal data, and privacy protection, the ongoing advancements in AI, coupled with interdisciplinary collaboration, are poised to further enhance AI’s roles in biomarker discovery and immunotherapy response prediction. These improvements are expected to lead to more accurate, personalized treatment strategies and ultimately better patient outcomes, marking a significant step forward in the evolution of precision medicine.
keywords
Introduction
Definition and composition of the tumor immune microenvironment
The tumor immune microenvironment (TIME) involves complex interactions among tumor cells, immune cells, stromal cells, and various signaling factors within the extracellular matrix, thereby either facilitating or impeding immune function, and ultimately influencing tumor progression. The extensive crosstalk of signals between the cellular and molecular components in the TIME plays crucial roles in initiating cancer, including its progression, response to treatment, and prognosis (Figure 1).
Schematic diagram of the immune microenvironment in tumor progression and metastasis. At the primary tumor site, malignant cells proliferate rapidly and consequently trigger recognition by the immune system. Various immune cells, such as T cells and macrophages, are recruited to mount a defense. With tumor progression, neovascularization occurs, thus enabling the tumor to invade surrounding normal tissues and infiltrate blood vessels for dissemination. At distant metastatic sites, the tumor establishes a supportive microenvironment that promotes its growth and further invasion. The tumor microenvironment undergoes significant alterations with respect to normal tissue. For instance, T cells become functionally exhausted, as characterized by high expression of programmed cell death protein 1 (PD-1), whereas macrophages polarize into M1 or M2 phenotypes, which contribute to anti-tumor or pro-tumor activities, respectively.
The TIME can be divided into 3 categories: infiltrated-excluded (I-E) TIMEs, infiltrated-inflamed (I-I) TIMEs, and tertiary lymphoid structure (TLS)-TIMEs1. I-E TIMEs, frequently referred to as “cold” tumors, show cytotoxic T lymphocytes (CTLs) localized at the tumor mass’s invasive margin or “trapped” within fibrotic nests. The I-E TIMEs are generally not sensitive to immune checkpoint inhibitor (ICI) treatment. Immunologically, I-I TIMEs are considered “hot” tumors and are characterized by high infiltration of CTLs expressing programmed cell death protein 1 (PD-1) and tumor cells expressing PD-1 ligand 1 (PD-L1), and showing elevated IFN-γ signaling and an association with favorable ICI efficacy2. TLS-TIMEs, which are tertiary lymphoid structures with similar compositions to those of lymph nodes, contain T cells, regulatory T cells, B cells, and dendritic cells. The overall effects of the tertiary lymphoid structure depend on the cell composition and geographic location, which primarily support the antitumor response.
With cancer occurrence, the TIME undergoes significant changes that differ from those in the normal cellular environment, such as immune cell enrichment; the recruitment of immunosuppressive cells into the microenvironment; the depletion of CD8+ T cells; and the reversal of the differentiation or polarization of immune cells into tumor-related cell subtypes such as tumor-associated macrophages and tumor-associated neutrophils3. Carlos et al.4 have reported that, compared with normal breast tissue, ductal carcinoma in situ and invasive ductal carcinoma have more white blood cells and a lower ratio of CD8+/CD4+ T cells. These changes in the TIME are usually associated with poor prognosis and treatment response. Patients with cholangiocarcinoma with a total survival rate of more than 3 years have relatively high relative densities of CD8+ T and B cells, but relatively low relative densities of regulatory T cells (Tregs) and M2 macrophages, thereby indicating the prognostic value of the immune microenvironment5. The observation of the composition and status of immune cells can help predict patient prognosis.
Definition of artificial intelligence (AI) in medicine
With progress in fields such as biotechnology and computational science, AI has become an effective tool for tumor diagnosis and has made breakthrough progress. AI is also playing increasingly important roles in research on the TIME. AI encompasses a diverse array of technologies with the common objective of simulating, augmenting, or surpassing specific facets of human intelligence through computational methods. AI uses a variety of technologies and disciplines, such as machine learning (ML), deep learning (DL), and natural language processing. ML is a subset of AI that focuses on allowing computer systems to learn patterns or characteristics from data to make predictions via algorithms and statistical models. ML’s many applications in medical research include cancer classification, subtyping, novel biomarker discovery, and drug discovery6–9.
DL is a subset of ML. The concept of DL originates from the study of artificial neural networks and comprises a multilayer perceptron structure with multiple hidden layers. Compared with other ML methods, such as logistic regression, DL has advantages in solving complex computing problems, such as large-scale image classification, natural language processing, and speech recognition and translation10–12. The advantages include the following aspects:
DL can autonomously identify and extract salient features from data with diminished reliance on manual labeling, thereby minimizing dependence on domain-specific expertise. This functionality is particularly beneficial for handling complex data modalities, including imaging, acoustic, and textual datasets13,14.
DL’s neural network has many layers and substantial width. In theory, a DL neural network can be mapped to any function, allowing it to solve many complex problems. Furthermore, DL can effectively process high-dimensional data, and learn complex structures and patterns in data15,16.
DL can achieve end-to-end learning, that is, directly from raw data to final results, thus simplifying the data processing process and increasing work operation efficiency10,17.
DL is highly dependent on data. The larger the amount of data, the better the performance and the stronger the generalization ability. In some tasks, such as image recognition, facial recognition, and natural language processing, DL has even surpassed human performance. Moreover, its performance potential can be further improved by adjusting parameters17,18.
This article reviews the current research status on the TIME from the perspective of AI in terms of 3 main aspects—immunogenomics (genomics/transcriptomics), radiomics, and digital pathology—and further highlights some of its limitations, prospects, and future directions. Examples of the application of AI to the TIME are provided in Table 1. Figure 2 presents the 4 main steps involving immunogenomics, radiomics, and pathomics applications of AI to the TIME.
Clinical significance of artificial intelligence in analysis of the TIME
Four main steps in applying artificial intelligence to analysis of immunogenomics, radiomics, and pathomics data regarding the tumor immune microenvironment. Step 1, data collection: immunogenomics (genomics/transcriptomics), radiomics, and digital pathology data from the real world are appropriately collected and stored. Step 2, data processing: data from various sources undergo several processing steps, including data cleaning to remove inconsistencies, data normalization to standardize values, data augmentation to enhance dataset diversity, and data splitting to create training and testing sets, thus ensuring quality and consistency for analysis and model development. Step 3, feature extraction and analysis: deep learning and machine learning algorithms are used to identify, quantify, and analyze relevant patterns, characteristics, and relationships within datasets for predictive modeling. Step 4, integration and application: extracted features are combined with clinical data to build predictive models and comprehensive systems that enhance diagnosis, treatment planning, and personalized patient care through advanced analysis.
AI in genomics and transcriptomics analysis of the TIME
Immunogenomics
Immunogenomics is an emerging field that spans the disciplines of immunology and genomics. Cancer immunogenomics was initially based on the hypothesis that cancer mutations produce novel peptides, which were viewed as “nonself” by the immune system. The combination of new sequencing technologies, specialized algorithmic analyses, and HLA binding predictions has facilitated the search for these “new antigens”31,32. High-throughput genomic and transcriptomic data can be used not only to assess heterogeneous cell changes in the TIME but also identify genomic changes that might serve as potential targets for immunotherapy33. The establishment of large-scale collaborative genomic experiments, combined with the development of new single-cell transcriptomics technologies, computational methods, and ML algorithms, has enabled characterization of the mutational and transcriptional profiles of many types of cancer, extraction of clinically useful information from sequencing data, and exploration of tumors and their microenvironments34.
With advancements in medical research, traditional sequencing tools have become inadequate for meeting the evolving demands of modern research. Gene sequencing technology produces vast amounts of high-dimensional, sparse, and complex data. AI algorithms are frequently used to analyze and process these data35. Xiong et al.36 have developed Single-Cell ATAC-seq Analysis via Latent feature Extraction (SCALEX), an AI algorithm based on a variational autoencoder DL framework. SCALEX can project heterogeneous datasets into a unified cellular embedding space and consequently achieve online integration of single-cell sequencing data. The Human Cell Atlas Project has led to an increase in annotated sequencing data with cell types. Duan et al.37 have proposed scLearn, a new AI-based method for single-cell type identification, along with a pretrained complete reference dataset. These resources provide effective tools for the identification of cell types via massive amounts of single-cell sequencing data. Mosaic integration and knowledge transfer (MIDAS), a crucial technology for integrating genomics, spatial transcriptomics, and other multimodal genomics data, uses a deep probabilistic framework for MIDAS to achieve flexible and accurate integration of multi-omics data38.
AI in genomics and transcriptomics analysis for predictive biomarker discovery
AI plays critical roles in immunogenomics by accelerating the discovery of predictive biomarkers that are essential for understanding immune responses and disease mechanisms. In immunogenomics, AI can identify patterns and correlations in complex data that indicate specific immune responses or disease states, by analyzing the genome and transcriptome associated with the immune system.
Use of next-generation sequencing and AI tools to analyze DNA and RNA sequences, particularly the latter, has been crucial in advancing understanding of the TIME and providing personalized therapeutic strategies39,40. Most studies have used large public databases, such as The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO), to obtain raw gene data, then processed these data via techniques such as quantile normalization and background correction, according to the research objectives. This approach is aimed at characterizing the TIME or evaluating gene scores associated with the TIME. In TIME research, various bioinformatics algorithms are used, including cell-type identification by estimating relative subsets of RNA transcripts (CIBERSORT), estimation of stromal and immune cells in malignant tumors via expression data (ESTIMATE), tumor immune estimation resource (TIMER), and the xCell and microenvironment cell populations-counter (MCP-counter) algorithm41–45. Among these algorithms, CIBERSORT and ESTIMATE are the most frequently used. CIBERSORT, a tool based on linear support vector regression principles, is used for the deconvolution of human immune cell subtypes from expression matrices41. Compared with other methods, CIBERSORT performs better in handling noise, unknown mixture contents, and closely related cell types, while also being relatively simple to operate. Additionally, CIBERSORT enables various visualizations, such as box plots, column charts, and heatmaps, and therefore is widely applied in immune infiltration analysis. Among these methods, single-cell RNA sequencing (scRNA-Seq) is currently relatively mature and widely applied. Compared with traditional sequencing methods, scRNA-Seq offers significant advantages in revealing the heterogeneity of cell populations hidden within bulk analyses, and exploring rare cell types associated with tumor occurrence and metastasis46–50. Chen et al.51 have used scRNA-Seq to analyze dynamic changes in tumor microenvironment components during the malignant progression of pancreatic ductal adenocarcinoma in situ. They have defined new characteristic genes for Tregs and exhausted T cells, including DUSP4, FANK1, and LAIR2, and identified a new subgroup of cancer-associated fibroblasts, termed complement-secreting cancer-associated fibroblasts. However, the instability of RNA molecules affects the accuracy of the results, whereas DNA methylation’s stability and high specificity make it a potential alternative for TIME analysis52. Genome-wide DNA methylation data (MethylCIBERSORT) can accurately estimate tumor purity and cell composition and identify immunologically hot and cold tumors across various cancer types analyzed by TCGA53. DNA methylation-based analysis of the tumor microenvironment with MethylCIBERSORT has identified 2 prognostically relevant clusters (IC1 and IC2) with distinct cellular compositions and mutational profiles; this method offers novel molecular insights and potential diagnostic applications for blastic plasmacytoid dendritic cell neoplasm54.
In recent years, several studies have leveraged AI technologies to identify novel biomarkers and develop predictive models, thus significantly enhancing the ability to forecast disease prognosis. Sun et al.55 have identified critical macrophage subpopulations associated with metastatic samples, which significantly influence the TIME. Furthermore, a prognostic model based on macrophage-associated genes has the potential to predict the prognosis of patients with uveal melanoma. Han et al.56 have proposed a transformer-based method for identifying esophageal cancer-associated lncRNAs that achieves superior performance, with an area under the ROC curve (AUC) of 0.87 and area under the precision-recall curve of 0.83. The identified lncRNAs and their target genes are closely associated with pathways involved in the development, progression, and prognosis of esophageal cancer, particularly within the immune microenvironment.
AI in genomics and transcriptomics analysis for predicting immunotherapy responses
Immunotherapy, a revolutionary treatment approach, has demonstrated tremendous potential. However, because of the significant variations in individual patient responses to immunotherapy, accurately predicting which patients are likely to benefit from the treatment has become crucial. AI, particularly predictive models based on ML and DL, is increasingly becoming an effective tool to address this challenge. By integrating patients’ immunogenomic data, including tumor mutational burden (TMB), immune cell infiltration, and gene expression profiles, AI can develop complex predictive models to assess the potential response of individual patients to immunotherapy. TIME phenotype is the main factor influencing the effectiveness of immunotherapy. Studies are increasingly showing that immunotherapy can reshape the immune microenvironment. Understanding the TIME phenotype of individual patients might facilitate screening for tumors likely to respond to immunotherapy.
ESTIMATE, a method used for inferring the proportions of stromal cells and immune cells in tumor samples42, enables pancancer immune infiltration analysis via data from public databases such as TCGA. First, 2 signatures are filtered from these datasets: the stromal signature (genes associated with the stroma) and the immune signature (genes associated with immunity). The matrix score and immune score are subsequently calculated through single-sample gene set enrichment analysis to predict the degree of infiltration57. Finally, this information is used for analyzing tumor tissue purity. Chen et al.58 have used the ESTIMATE algorithm to analyze gene expression information for 2,459 patients with gastric cancer, obtained from databases such as GEO and TCGA. The results regarding stromal evaluation, purity, and predicted recurrence prognosis of gastric cancer samples were used to evaluate tumor recurrence and prognosis in patients with gastric cancer, and to predict their response to chemotherapy and immune reactions. Patients with tumor recurrence presented elevated levels of stromal cell infiltration and diminished levels of tumor-infiltrating lymphocytes, thereby showing high stromal scores and low immune scores. One study has applied CIBERSORT and the ESTIMATE algorithm to develop a stable and robust immunogenic cell death-related profile for assessing prognosis and predicting immunotherapy benefits; this tool may be valuable for guiding treatment decisions and monitoring in patients with melanoma59.
The use of ICIs in immunotherapy is becoming increasingly common across various tumor types, but accurately predicting patient responses to ICIs remains a major clinical challenge. One study has developed TME-NET, a neural network model that accurately predicts patient responses to ICIs by integrating tumor microenvironment components; this model surpasses established models in performance and provides key insights into the roles of Th1 cells and M2 macrophages in modulating immune responses60. Wang et al.61 have developed a DL model integrating multidimensional features, including single-cell sequencing, PD-L1 (CD274) expression, TMB/mismatch repair, and somatic copy number alterations, thus demonstrating its potential to predict ICI outcomes across multiple cancer types.
The application of AI in immunogenomics is rapidly advancing the discovery of biomarkers and the prediction of immunotherapy responses. AI, by processing large-scale genomics and multi-omics data, can identify complex patterns and potential predictive biomarkers associated with diseases, thereby driving the development of precision medicine. These tools not only enhance the speed and accuracy of biomarker discovery, but also provide strong support for personalized immunotherapy strategies.
AI in radiomics analysis of the tumor immune microenvironment
Radiomics
Radiomics is a field of medical study that leverages the power of advanced imaging techniques and data analysis to extract many features from medical images, such as computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography/computed tomography (PET/CT)62–65. These features, which might not be visible to the human eye, encompass various image attributes, including intensity, texture, and morphology. These high-dimensional data provide insights into the spatial distribution of immune cells, tumor architecture, and interactions between tumor cells and the immune system. Such detailed characterization can identify biomarkers predictive of treatment response and patient prognosis, and consequently facilitate personalized medicine. Specifically, radiomics has the potential to predict immunotherapy efficacy by revealing patterns associated with immune infiltration and the presence of an immunosuppressive environment. Consequently, radiomics not only enhances understanding of the TIME but also aids in tailoring immunotherapy treatments, improving patient selection, and ultimately contributing to better clinical outcomes in cancer care25,66,67.
The radiomics workflow consists of the following steps:
Data collection and preprocessing
Data collection: Medical imaging data, such as CT, MRI, and PET/CT data, are collected from patients.
Image preprocessing: Steps such as image reconstruction, denoising, standardization, and image intensity correction are used to improve data quality and consistency.
Selection and segmentation of region of interest (ROI)
ROI selection: Automatic or semiautomatic segmentation of the ROI is performed via software tools to determine the ROI in the image, that is, the region associated with the disease. The accuracy of segmentation in this step directly affects the quality of subsequent feature extraction and analysis.
Feature selection and dimension reduction
Feature selection: To extract quantitative shape, strength, texture, and other features from the segmented ROI, statistical analysis and ML techniques are required to select the features most relevant to the research objectives.
Dimensionality reduction: Principal component analysis, least absolute shrinkage and selection operator (LASSO) regression, and other methods are used to reduce the dimensionality of features.
Model construction and verification
Model building: Selected features are used to construct models that predict disease prognosis, treatment response, etc.
Model validation: The accuracy and generalizability of the model are verified through methods such as cross-validation and internal or external independent dataset testing.
AI in radiomics analysis for predictive biomarker discovery
AI-driven radiomics can uncover novel biomarkers that are not visible to the human eye, thereby offering insights into tumor heterogeneity, treatment resistance, and disease progression. This approach facilitates non-invasive, real-time assessment of disease. Radiomics aids in noninvasive assessment of the characteristics of tumors and their microenvironment, including immune cell infiltration and the expression of molecular markers, by extracting many quantitative features from conventional medical images. This technique can quantify changes associated with the microenvironment. Tumor heterogeneity, including cellularity, extracellular matrix deposition, angiogenesis, necrosis, and fibrosis, can be assessed68. Luan et al.69 have used radiomics to predict the extent of immune cell infiltration in patients with glioblastoma, and explored the associations between these characteristics and patient clinical outcomes. A previous study has revealed that specific radiomics signatures are significantly associated with T cell infiltration levels and that these signatures might be used to predict disease prognosis70. Wang et al.71 used preoperative contrast-enhanced ultrasound, along with immune scores derived from immunohistochemistry and digital pathology, in an independent cohort of patients with hepatocellular carcinoma to verify the correlation between the model’s predictive value and T-cell infiltration. The average AUC was 0.905, indicating that the model demonstrates a high level of accuracy, making it easier to apply in clinical practice. Ma et al.72 have used MRI-based radiomics technology to estimate the composition of the immune microenvironment and provided the first report indicating that the marginal status of breast-conserving surgery is associated with infiltration of immune cells in the microenvironment and the epithelial-mesenchymal transition status of breast tumor cells. A retrospective study has built a DL grading signature with the potential to predict the histologic grade and personalize surgical treatments for clinical stage I invasive lung adenocarcinoma73. Zhang et al.74 have used the ResNet3D-18 model to extract radiological features, and have constructed a prognostic model for glioblastoma overall survival based on MRI Gd-T1WI images and DNA methylation sequences. These radiogenomics signatures are associated with biological pathways related to cellular immunity.
AI in radiomics analysis for predicting immunotherapy responses
Because traditional TIME assessment usually involves the acquisition of postoperative tissue biopsies, a noninvasive detection method is urgently needed. Several studies have demonstrated that radiomics signatures, such as ICI signatures, can be used to predict the tumor response to immunotherapy. Currently, PD-1, PD-L1, and cytotoxic T lymphocyte-associated antigen 4 (CTLA-4) are the primary immune checkpoint molecules targeted in immunotherapy. PD-L1 expression and TMB were the first clinically assessed biomarkers. PD-L1 is a protein that is found on the surfaces of cells and plays crucial roles in regulating the immune system. After binding its receptor, PD-1, PD-L1 decreases immune activity, thus helping maintain tolerance to self-cells and preventing the immune system from attacking normal cells75. However, many cancer cells also express PD-L1, and use this mechanism to suppress the immune system and evade immune surveillance. PD-L1 expression is the sole U.S. Food & Drug Administration approved biomarker for ICI use in individuals with lung adenocarcinoma76. Similarly, Li et al.77 have reported that PD-L1P146R serves as a prognostic marker and a negative predictor of the response to immunotherapy in patients with gastric cancer. Therefore, targeting PD-L1 has become a significant focus in cancer treatment, particularly in the field of cancer immunotherapy78. Blocking the interaction between PD-L1 and PD-1 can activate the immune system to attack cancer cells, thereby providing an effective strategy for cancer therapy79. TMB is the total number of nonsynonymous somatic mutations per megabase in the coding region of the tumor genome, which contains a wide range of mutations. A high TMB is positively associated with more tumor-associated neoantigens and greater immunotherapy effectiveness80,81. Zwanenburg et al.82 first applied an Image Biomarker Standardization Initiative-compatible algorithm on raw images and filtered them with LLL and HHH coif1 wavelets. Building on this approach, Monaco et al.83 established a method to extract metabolic parameters from 3 models of PET/CT scanners (Discovery 600, Discovery IQ, and Discovery MI). Radiomics features were subsequently calculated from the images, metabolic parameter features were extracted from the PET/CT images, and a tri-variate linear discriminant model was established. In the test set, the model achieved a sensitivity of 81% and a specificity of 82%. Han et al.84 have developed a radiomics signature based on ML; used statistical and ML techniques to screen features highly correlated with TIME phenotypes and immunotherapy responses; trained a classification model to identify different TIME phenotypes; and developed predictive models to estimate the probability of patient response to anti-PD-1/PD-L1 therapy. The authors further validated the model’s performance on internal and external datasets and explored the possibility of integrating radiomics signatures into clinical decision support systems, to help physicians predict the effectiveness of immunotherapy, and improve personalized decision-making for breast cancer immunotherapy and therapeutic effects. Sun et al.66 have developed a radiomics signature for CD8 cells to predict the clinical outcomes of patients treated with immunotherapy, then validated it in further prospective randomized trials. These studies suggest links between the immunotherapy response and radiomics characteristics. Wang et al.85 have proposed a multimodal DL radiomics approach that uses clinical data and CT images within a semisupervised framework to predict the immunotherapy response in patients with advanced gastric cancer, and has achieved excellent performance.
Radiomics combined with AI offers a transformative approach in radiology to extract high-dimensional quantitative data from medical images and provide valuable insights into the TIME. This integration enhances the accuracy of tumor assessment, improves prognostic and predictive models, and supports the development of personalized theranostic strategies.
AI in pathomics analysis of the tumor immune microenvironment
Pathomics
As technological advancements and the growing focus on precision medicine have paved the way to the development of quantitative pathology assessment methods using digital pathology techniques, researchers can now explore and extract information beyond human visual perception. These advancements enable exploration and extraction of information beyond human visual perception86. Digital pathology encompasses the digitization of pathology slides and the computational analysis of digitized whole-slide images87,88. Pathomics is a research field that uses advanced computing technology to extract and analyze quantitative features from pathological images. This field leverages high-resolution image data provided by digital pathology, by using DL, image processing, and other ML technologies to analyze subtle structures in images, such as cell morphology, arrangement, and changes in tissue structure. This approach is aimed at predicting treatment responses and providing personalized treatment recommendations89. As early as 1965, Prewitt and Mendelsohn90 designed the Cytologic Diagnosis by Computer (CYDAC) computer algorithm, which uses the frequency distribution of image optical densities to perform preliminary tissue quantification and evaluate numerous potential image feature parameters within a decision-theoretic framework, thereby preliminarily exploring the extent to which mechanized perception might complement human perception in the field of microscopic diagnosis. For whole-slide images, AI has demonstrated robustness and reproducibility, and consequently overcome the limitations of subjective visual assessment, while integrating vast amounts of data to capture the complexity of tissue architecture; AI methods have shown significant promise in enabling comprehensive understanding of the highly heterogeneous tumor microenvironment91.
AI in pathomics analysis for predictive biomarker discovery
By drawing correlations between genomics with pathological image features, the inherent heterogeneity of tumors can be captured in apparent pathological images92. This approach provides extensive understanding of tumor biology and can identify specific imaging biomarkers that combine genotypic and phenotypic metrics. Şenbabaoğlu et al.93 have developed Multi-Omic translation of whole slide images for Spatial Biomarker discoverY (MOSBY). Using “colocalization” analysis between tile-level predictions of 2 omics features, the researchers identified the colocalization of T effector cells with cysteine as a spatial biomarker that is associated with poor survival, which also shows significant tumor enrichment in breast cancer, squamous lung cancer, and ovarian cancer. MOSBY enables multi-omics inference and spatial biomarker discovery from whole slide images. Nicolas et al.94 have trained a deep convolutional neural network (inception v3) that automatically classifies pathology images into lung adenocarcinoma, squamous cell carcinoma, or normal lung tissue, and predicts the commonly mutated genes (STK11, EGFR, FAT1, SETBP1, KRAS, and TP53) in lung adenocarcinoma from pathology images. The ability to quickly and inexpensively achieve biomarker discovery from histopathology images might aid in the treatment of patients with cancer. AI has made substantial progress in improving the accuracy and of pathomics analysis for diagnostic, prognostic, and genomics prediction. Nevertheless, lack of interpretability continues to pose a major obstacle. Diao et al.95 have presented an approach using human-interpretable image features (HIFs) to predict clinically relevant molecular phenotypes from whole-slide histopathology images. HIFs have been integrated by using DL models, to quantify specific and biologically relevant features across 5 cancer types. These HIFs are associated with known biomarkers of the TIME, and can predict different molecular features including expression of 4 immune checkpoint proteins and defects in homologous recombination (AUC 0.601–0.864). These findings suggest that AI can assist pathologists in biomarker discovery.
AI in pathomics analysis for predicting immunotherapy responses
Studies are increasingly focusing on using simple and macroscopic image information to predict the treatment responses of patients with cancer, by analyzing the composition of the TIME, to predict the responses of various tumors to treatment in different patients. Mining pathological image information, directly presenting tumor tissue, and establishing links to the TIME are crucial for predicting treatment response in tumor patients. Zhao et al.96 have introduced single-cell morphological and topological profiling (sc-MTOP) for characterizing the tumor ecosystem by extracting features of individual cell nuclei morphology and intercellular spatial relationships; the authors have further explored the correlation between the localized inflammatory infiltrating breast cancer microenvironment and favorable immunotherapy responses. Klimov et al.97 have developed a novel ML pipeline enabling pathologists to apply manually trained classifiers to digital slides; annotated regions of stroma, normal/benign ducts, cancer ducts, dense lymphocyte areas, and vascular areas; the authors have further trained a recurrence risk classifier for 8 selected architectural and spatial organizational tissue features from the annotated regions to predict breast cancer recurrence risk. Several studies have shown that the use of multiplex immunohistochemistry and immunofluorescence data significantly improves the display of TIME heterogeneity. Väyrynen et al.98 have assessed the prognostic role of macrophage polarization in the colorectal cancer microenvironment via multiplex immunofluorescence with CD68, CD86, IRF5, MAF, MRC1 (CD206), and KRT (cytokeratins), combined with digital image analysis and ML. The tumor infiltrating lymphocytes (TIL) density and spatial structure were enriched across tumor types, immune subtypes, and tumor molecular subtypes, thereby implying that the spatial infiltration status might reflect specific tumor cell aberration states99. Wan et al.30 have used data augmentation to process histopathological images from the TCGA-UVM cohort; analyzed information such as the total infiltrating immune scores, stromal scores, tumor purity, and proportions of different immune cell types within the tumor tissues; and developed a DL model to predict the survival status of patients with uveal melanoma.
Furthermore, pathomics seeks to automatically extract quantitative pathological features from histopathological images, such as mitotic count and lymphocyte proportion, by deeply analyzing the information within these images100,101. Specifically, pathology informatics uses AI algorithms for data feature extraction to transform pathological data into mineable feature data. These data are integrated with other omics features and clinical information for comprehensive analysis. By analyzing the correlation between multi-omics feature data and research outcomes, the diagnosis of tissue structures, the degree of disease invasion, prognosis evaluation, and other objectives can ultimately be achieved26,102.
Limitations
Although AI has shown great potential for studying the TIME, various limitations or requirements must be addressed to fully exploit its applications regarding cancer progression, immune escape, and therapeutic efficacy. The first limitations are heterogeneity in the data and the need to integrate different data sources. Owing to the complexity and diversity of data, particularly in immunology and oncology research, involving multiple fields such as genomics, transcriptomics, and imaging, advanced AI algorithms must be designed that can handle heterogeneous data and extract useful information. In addition, establishing unified data format standards and rich metadata can help improve the compatibility between different data sources. Second, most studies have been single-center retrospective studies with small sample sizes, and have evaluated associations between various tumor microenvironment features and genomics or imaging data. The stratification of the training, validation, and test datasets is inadequate: the trained model might have defects such as weak generalizability and a high risk of overfitting. Third, the complexity of the immune microenvironment, including the cell types, signaling molecules, gene expression patterns, and other factors involved, makes highly accurate models particularly important. However, these models are often considered “black boxes,” because of their complex decision-making processes. Improving the interpretability of these models in studies of the immune microenvironment will be important for enabling understanding of the models, optimizing treatment strategies, and ultimately improving patient outcomes. Finally, AI models pose a series of ethical issues in immune microenvironment research and applications, including data privacy, consensus, algorithm transparency, and bias. These issues arise when patient data, such as genomics data and medical imaging data, are collected and analyzed. Moreover, the security and privacy of personal information must be ensured, to prevent data leakage or misuse. The dataset used for model training must span multiple groups of people, to avoid algorithm bias and ensure that the model is effective for different groups of people.
Conclusions
This article reviewed the application of AI in analysis of the TIME. Research on immunogenomics (genomics/transcriptomics), radiology, and histopathology was summarized. Significant advances in AI have produced many exciting results in related research on the immune microenvironment by integrating various data modalities, such as immunogenomics (genomics/transcriptomics), radiomics, and pathomics. AI plays a major role in disease diagnosis, personalized treatment, prognosis prediction, and efficacy evaluation. AI is expected to revolutionize tumor treatment, offer more precise treatment management for patients, and enhance treatment outcomes and patients’ quality of life.
Conflict of interest statement
No potential conflicts of interest are disclosed.
Author contributions
Conceived and designed the analysis: Luchen Chang, Jiamei Liu, Xi Wei.
Collected the data: Jialin Zhu, Shuyue Guo, Yao Wang, Zhiwei Zhou.
Contributed data or analysis tools: Luchen Chang, Jiamei Liu, Jialin Zhu.
Performed the analysis: Luchen Chang, Jiamei Liu.
Wrote the paper: Luchen Chang, Jiamei Liu.
Footnotes
↵*These authors contributed equally to this work.
- Received September 18, 2024.
- Accepted November 27, 2024.
- Copyright: © 2025, The Authors
This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.
- 8.
- 9.↵
- 10.↵
- 11.
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.↵
- 26.↵
- 27.
- 28.
- 29.
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.
- 44.
- 45.↵
- 46.↵
- 47.
- 48.
- 49.
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.
- 64.
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵