Abstract
Objective: Large-scale CRISPR screens have identified essential genes across cancer cell lines, but links between tumor functional properties and specific dependencies require investigation to reveal the mechanisms underlying dependencies and broaden understanding of targeted therapy.
Methods: We selected 47 breast cancer cell lines from the Cancer Cell Line Encyclopedia (CCLE) with multi-omics data including gene dependency; somatic mutations; copy number alterations; and transcriptomic, proteomic, metabolomic, and methylation data. We established a dependency marker association (DMA) analytic pipeline by using linear regression modeling to assess associations between 3,874 representative gene dependencies and multi-omics markers. Additionally, we conducted non-negative matrix factorization clustering, to stratify breast cancer cell lines according to gene dependency features, and investigated cluster-specific DMAs.
Results: We interpreted valuable DMAs according to two primary aspects. First, dependencies associated with gain-of-function alterations revealed addiction to lactate transporter SLC16A3, thus suggesting a promising therapeutic target. Second, dependencies associated with loss-of-function alterations included synthetic lethality (SL), collateral SL, and prioritized metabolic SL, encompassing paralog SL (e.g., IMPDH1 and IMPDH2), single pathway SL (e.g., GFPT1 and UAP1), and alternative pathway SL (e.g., GPI and PGD). DMA analysis of the two clusters with divergent dependency signatures demonstrated that cluster1 cell lines exhibited extensive metabolism with mitochondrial protein dependencies, whereas cluster2 displays enhanced cell signaling, and reliance on DNA replication and membrane organelle regulators.
Conclusions: We established a DMA analysis pipeline linking the gene dependencies of breast cancer cell lines to multi-omics characteristics, thus elucidating the underpinnings of tumor dependencies and offering a valuable resource for developing novel precision treatment strategies incorporating relevant markers.
keywords
Introduction
Cell lines serve as advantageous models for extensive therapeutic target screening1–4. The Cancer Dependency Map (DepMap) has conducted large-scale CRISPR/Cas9 knockout screens in hundreds of cancer cell lines to infer the probability of a given gene representing a depletion phonotype5,6. Genomics of Drug Sensitivity in Cancer (GDSC) datasets and the Broad Institute’s Profiling Relative Inhibition Simultaneously in Mixtures (PRISM) drug repurposing resources indicate the sensitivity of cancer cell lines to various drugs, to aid in accelerating the development of new oncology therapeutics7–10. The objective of these initiatives is to personalize treatment for patients according to the distinct molecular characteristics of their tumors.
Although genetic and drug perturbation screening provides an atlas of potential therapeutic targets in tumors, the associations between various molecular characteristics and corresponding vulnerabilities remain to be further investigated and elucidated11,12. A recent study has conducted a systematic analysis of clinically relevant molecular markers associated with cancer dependencies in pan-cancer cell lines13. However, because of the extensive diversity among pan-cancer cell lines, detailed functional interpretation of associations was limited, and analyses focusing specifically on particular tumor subtypes were lacking.
In this study, we established an analytical pipeline for systematic dependency marker association (DMA) analysis of an integrated breast cancer cell line cohort for which multi-omics data have been assembled. With this pipeline, we interpreted mechanisms through the dual lenses of gain-of-function addiction and loss-of-function-induced synthetic lethality (SL), thus enabling systematic exploration of precision therapeutic strategies. Furthermore, given the heterogeneity of breast cancer, we stratified the cell lines into two distinct clusters exhibiting divergent dependency signatures, then performed cluster-specific DMA analysis to elucidate the molecular mechanisms underlying the cluster-selective dependencies (Study flowchart).
Part I integrates a multi-omics cohort of breast cancer cell lines and establishes a dependency-marker association analysis pipeline. Part II deconvolutes statistically derived DMAs from dual biologically interpretable perspectives to pinpoint therapeutically relevant targets. Part III further delineates two core dependency-based clusters, characterizes molecular mechanisms underlying cluster-specific dependency signatures, and identifies DMAs within each cluster, to inform precision therapeutic strategies. This study highlights how DMA analysis can help elucidate fundamental cell-biological mechanisms and uncover clinically actionable therapeutic targets.
Materials and methods
Integrated breast cancer cell line cohorts
Our study incorporated multi-omics data for breast cancer cell lines obtained from the Cancer Cell Line Encyclopedia (CCLE). Gene dependency data derived from CRISPR screens, along with somatic mutation and gene expression data, were downloaded from the Broad DepMap Public 22Q2 (https://depmap.org/portal/download/). Gene dependency indicates the likelihood that knocking out a specific gene will result in cell growth inhibition or cell death5. Somatic mutation data were supplied as a binary mutation matrix in CCLE_mutations.csv. We curated oncogenic mutation data for a total of 670 genes, on the basis of a previously published cancer-associated gene list14. The gene expression data comprised TPM values inferred from RNA-seq data with the RSEM tool and were reported after log2 transformation with a pseudo-count of 1. A total of 18,846 protein-coding genes annotated in GRCh38 (https://ftp.ensembl.org/pub/release-111/gtf/homo_sapiens/) were further selected.
Copy number, methylation15, proteomics16, and metabolomics17 data were obtained from the CCLE project. Segmented copy-number profiles served as input for GISTIC2.0 (v2.0.23)18 to investigate the recurrence of gene level copy number alterations (CNAs) in breast cancer cell lines. GISTIC2.0 was run with parameters modified from the default settings (-genegistic 1 -smallmem 1 -broad 1 -qvt 0.05 -brlen 0.98 -conf 0.95 -armpeel 1 -savegene 1 -ta 0.25 -td 0.25 -rx 0 -cap 3.5). Additionally, we focused on clinically interpretable CNAs, as previously reported14. In the gene methylation and proteomics data, features with < 30% missing values across cell lines were retained.
Dependency marker association analysis
For the DMA analysis, we independently tested each omics layer with linear regression to assess correlations with gene dependencies and incorporated intrinsic subtype (luminal, HER2amp, and basal) as covariates in the regression model. The analysis included dependencies with the top 2,000 standard deviation values, excluding pan-cancer core essential genes13, thus resulting in a total of 1,839 gene dependencies. To ensure comprehensive assessment of addiction and SL, we supplemented the 1,839 dependencies with known cancer-associated genes14 and genes involved in oncogenic pathways19. Additionally, we incorporated metabolic genes20 to interpret metabolic dependencies. This process yielded a final set of 3,874 gene dependencies for correlation analysis with molecular characteristics across multiple layers.
Cell line dependency clustering and signature extraction
To identify high-level patterns or latent factors underlying the dependency profiles, we performed non-negative matrix factorization (NMF), and used cophenetic correlation and consensus silhouette scores for parameter selection. The genes with the top 2,000 standard deviation values were used for clustering after removal of essential pan-cancer genes, thereby resulting in a total of 1,839 genes.
We selected two latent factors for each of the two NMF signatures. Genes included in the signature were defined as those with weights in the top 70th percentile in only one cluster. Gene set enrichment analysis (GSEA) of Gene Ontology and Reactome gene sets downloaded from MSigDB (http://www.gsea-msigdb.org/gsea/downloads.jsp) within the latent factors was performed with the gost function of the gprofiler2 R package21.
Cluster co-dependency networks
The cluster dependency signatures were used to construct a co-dependency network. Genes were represented as vertices, and the absolute value of the correlation between the dependencies of two genes defined the similarity of the gene pair. The correlation was calculated across all samples with Pearson’s correlation.
Co-dependency networks were visualized in Gephi 0.9.3. Edge thickness represented correlation strength, and node color and size represented gene dependency. The ForceAtlas algorithm was used to lay out the network structures for both networks. For a more concise representation, we trimmed the networks by filtering out the correlation with |cor| > 0.4 and FDR < 0.05, and removed the outlier nodes (KCore < 3). Ultimately, 264 cluster1-specific dependencies and 141 cluster2-specific dependencies were included in the network illustration.
Gene set enrichment analysis
GSEA was performed to identify the enriched pathways and interpret transcriptomic data. Pathways were defined by the gene set file c5.go.bp.v2023.2.Hs.symbols.gmt from GSEA-MSigDB (https://docs.gsea-msigdb.org/). GSEA was performed with the gene set size limited to 15–500, and 1,000 permutations were performed. The results of the pathway analysis were visualized with Enrichment Map App (v.3.4) in Cytoscape (v.3.10.0). Network maps were generated for nodes with P < 0.05, and nodes that shared gene overlap with an overlap coefficient > 0.3 were connected by blue lines (edges). Clusters of related pathways were identified and annotated with a Cytoscape app with a Markov cluster algorithm connecting pathways according to shared keywords in their descriptions (AutoAnnotate, v.1.5.1). The resulting groups of pathways are designated as major pathways in a circular layout.
Single-sample GSEA (ssGSEA) scores were calculated for each cell line with the gsva function in the GSVA R package22. Metabolic pathway gene sets were curated from Kyoto Encyclopedia of Genes and Genomes20, whereas oncogenic pathway gene sets were obtained from a previous study19. PROGENy enrichment analysis was performed with the progeny R package (v.1.22.0) to quantify signaling pathway target gene enrichment23.
Statistical analysis
Two-sided Mann–Whitney Wilcoxon tests were used to analyze continuous variables, whereas Fisher’s exact test was used for comparison of categorical variables. Multivariate logistic regression was used to adjust for covariates in the correlation analyses. P values were adjusted for the false discovery rate with the Benjamini-Hochberg procedure in multiple comparisons. All analyses were performed in R version 4.3.1 (https://cran.r-project.org/).
Results
Overview of integrated breast cancer cell line cohorts and study design
CRISPR-Cas9 screens of cell lines can provide initial insights into potential therapeutic targets for tumors. To investigate the associations between dependencies and molecular features in breast cancer, we integrated multi-omics data for breast cancer cell lines from CCLE, including gene dependency; somatic mutations; CNAs; and transcriptomic, proteomic, metabolomic, and methylation data 5,6,15–17. We selected a total of 47 cell lines with available gene dependency data for analysis, 21 of which had all omics data available (Figure 1A). On the basis of the CCLE datasets, we filtered protein-coding genes to obtain the mRNA expression profiles of 18,846 genes and protein expression profiles of 7,852 genes, and we selected mutations of 670 cancer-associated genes. The final scale of each omics layer is presented in Figure 1B. A total of 372 genes had available data for all six omics (Figure 1C).
Overview of integrated breast cancer cell line cohorts and study design. (A) Forty-seven breast cancer cell lines with mRNA subtypes and omics annotations. Different mRNA subtypes are distinguished by colors: blue represents the luminal subtype, purple represents the HER2amp subtype, and red represents the basal subtype. Cell lines in the omics data are marked in black. (B) The number of genes or metabolites in each omics dataset used for analysis. (C) The number of genes covered by various quantitative omics. (D) Schematic overview of the dependency-marker association analysis. A total of 3,874 dependencies for analysis were determined by integration of high-variance distinct dependencies, cancer-associated genes, oncogenic genes, and metabolic genes. These dependencies were subjected to linear regression analysis with each omics marker, with a focus on dependencies associated with gain-of-function and loss-of-function alterations, thereby suggesting gene addiction and synthetic lethality. Validation was determined through correlations between dependencies and oncogenic/metabolic characteristics.
We developed a DMA analysis pipeline to decode the molecular features underlying tumor dependencies. This study, rooted in mechanistic interpretability, focused on two aspects: (1) addiction to gain-of-function alterations, particularly oncogenic activations, and (2) SL caused by loss-of-function alterations, such as the absence of tumor-suppressor genes (TSGs) or compensatory regulatory mechanisms24,25. Within the latter paradigm, we specifically dissected dependencies associated with metabolic reprogramming, given its crucial role in supporting tumor growth and potential SL after the loss of alternative metabolic pathways26,27. A total of 3,874 gene dependencies were subjected to DMA analysis, including 1,839 distinct dependencies deemed functionally important and characteristic of breast cancer cell lines, in addition to cancer-associated genes and metabolic genes. DMA analysis was performed with linear regression to calculate correlations between gene dependencies and each omics marker. Intrinsic subtype was incorporated as a regression covariate. For the identified significant DMAs, we further validated whether the molecular markers were also associated with the efficacy of inhibitors targeting these dependencies in the drug sensitivity dataset (Figure 1D)7–10.
Dependencies associated with gain-of-function alterations
We first investigated whether gene dependencies were associated with gene expression levels. To minimize spurious statistical correlations, we selected gene dependencies exhibiting significant associations with expression in at least two dimensions, at least one of which was either the RNA or protein level (Figures 2A and S1A). Because of statistical noise and the complexity of gene expression regulation, such as through protein modification and degradation, only 20.7% of self-expression correlated dependencies were validated at both the RNA and protein levels. Among these dependencies, functional proteins represented a larger proportion than that in the dependencies that were not consistently validated (Figure S1B).
Dependencies associated with gain-of-function alterations. (A) The self-correlation of each gene dependency in the figure was verified to be significant in at least two dimensions, indicated by point colors corresponding to the specific dimensions, as shown in the legend. Correlations of gene dependencies with their mRNA (if RNA contained) or protein (if RNA not contained) expression. (B) The percentage of expression-correlated gene dependencies across each Pharos functional family97. (C) Positive correlation of SLC16A3 dependency with SLC16A3 mRNA expression. (D) Positive correlation of SLC16A3 mRNA expression with glycolysis activity. (E) Positive correlation of SLC16A3 dependency with glycolysis activity. (F) Negative correlation of SLC16A3 dependency with SLC16A1 mRNA expression. (G) Positive correlation of PRISM viability data for the SLC16A1 inhibitor AR-C155858 with SLC16A3 mRNA expression. (H) Significant correlations of oncogene dependencies with their corresponding omics levels, as indicated by the legend. The oncogenes in the figure are correlated with their expression in at least two omics dimensions. (I) Positive correlation of FGFR1 dependency with FGFR1 mRNA expression. (J) Negative correlations of PRISM viability data for the FGFR1 inhibitors NVP-BGJ398 and nintedanib with FGFR1 mRNA expression. The R values and P values were from linear regression analysis with intrinsic subtype as regression covariates (A, H). The R values and P values were from Pearson correlation analysis (C–G, I, J).
We found that some dependencies were negatively correlated with their expression, including those of previously reported housekeeping genes such as MED14 and IPO7, which are characterized by high dependency at low expression levels. (Figure 2A)28,29. Notably, tumors might depend on certain weakly expressed TSGs, such as PALB2 and INTS10, thus suggesting a reliance on their encoded proteins to regulate essential cellular functions (Figure S1C)30,31. Interestingly, although spermine synthase (SMS) has been reported to be upregulated in some tumors32,33, its dependency was significantly negatively correlated with its expression in breast cancer cell lines (Figure S1D). This finding suggests that tumors are more dependent on SMS for its foundational metabolic functions than on its potential oncogenic function when highly expressed.
In contrast, more gene dependencies exhibited positive correlations with their expression levels, thus suggesting gene addiction. These genes were significantly enriched in transcription factors and epigenetic modification genes, e.g., SOX9, TFAP2A, and KAT6A, and therefore might serve as key driver genes in carcinogenesis (Figures 2A, B and S1E)34–36. Notably, we observed significant addiction to the lactate transporter SLC16A3: cell lines with higher SLC16A3 expression demonstrated higher sensitivity to SLC16A3 knockdown (Figure 2C). Cell lines exhibiting higher SLC16A3 expression also had higher glycolytic activity and consequently greater reliance on timely lactate efflux (Figure 2D, E)37. Moreover, the expression of SLC16A1, which transports lactate into cells, was downregulated in cell lines with high SLC16A3 expression and dependency, and insensitivity to SLC16A1 inhibitors (Figure 2F, G).
To identify crucial addictive oncogenes, we included mutations of cancer-associated genes to investigate their self-addiction characteristics. Breast cancer cell lines showed significant addiction to CCND1, CDH17, FGFR1/2/4, RRAS, and MECOM (Figure 2H). Drug sensitivity data further validated that cell lines with higher FGFR1 expression were more sensitive to FGFR inhibitors (Figure 2I, J). In contrast, oncogenic activation might drive extensive metabolic reprogramming in tumors, thus leading to increased dependency. Therefore, we analyzed the correlations between oncogene expression and dependencies across pathways. The finding that CDK4, FAT2, and RICTOR were associated with the most dependencies suggested that these genes have broad effects as core carcinogenic factors (Figure S2). Additionally, our findings suggested that activation of the cell cycle pathway might be associated with high fatty acid β-oxidation dependency, whereas MDM2 activation significantly correlated with high glycerolipid metabolism dependency (Figure S2). The DMA analyses identified potential therapeutic targets that are markedly upregulated in tumors and play critical oncogenic roles. Consequently, inhibition of these targets might effectively suppress tumor progression.
Dependencies associated with loss-of-function alterations
We next analyzed the dependencies associated with loss-of-function alterations, representing synthetic lethality of one gene upon loss of another. We first analyzed SL associated with deletions of TSGs. We selected the dependencies that significantly correlated with TSG deletion in at least two expression dimensions, thus yielding a total of 4,943 DMAs associated with 347 TSGs. We identified an average of 14 DMAs per TSG and a maximum of 126 DMAs associated with PTEN loss (Figure S3A). To focus on valuable synthetic lethal interactions for clinical application, we examined the drug-targetable dependencies accompanying the deletion of TSGs residing at Genomic Identification of Significant Targets in Cancer (GISTIC) peaks. The synthetic lethal effect of PARP1 inhibition in BRCA1-deficient cancers has been shown to be effective in clinical practice38,39, and CDK6 has been identified as a synthetic lethal gene in cancer cells bearing CDKN2A/B deletion40,41. In addition, targeting SLC4A7 might have synthetic lethal effects in cancers with SMAD4 deletion (Figure S3B).
Collateral lethality is a novel synthetic lethal mechanism explored in recent studies42,43. TSG deletions can be accompanied by deletion of passenger genes at adjacent chromosomal loci, thus leading to additional SL in tumors (Figure 3A)44. CDKN2A/B deletion at 9p21.3 and concomitant deletion of MTAP have been reported to produce SL of PRMT5 inhibition45,46. Targeting ALG1 and PSMG2 might have a collateral lethality effect with 9p21.3 loss, according to our analysis. Another notable instance of collateral lethality is ME3 inhibition in cancers with SMAD4 deletion and concurrent ME2 deficiency47. Although ME3 was not identified in our study, EXTL2, GSTM3, and SLC7A1 dependencies were significantly associated with the deletion of genes localized at 18q21.2 (e.g., SMAD4 and ME2). In addition, we observed deletions of many genes at the GISTIC peak 19p13.3, which might generate corresponding synthetic lethal targets, such as SLC3A2 (Figure S3C and Table S1).
Dependencies associated with loss-of-function alterations. (A) Diagram of collateral synthetic lethality and three types of metabolic synthetic lethality. (B) Significant correlations of dependencies with the mRNA expression of their paralogs. Points are labeled with gene dependencies. The R values and P values were from linear regression analysis that adjusted for intrinsic subtypes. (C) Gene dependencies with explicit metabolic pathway annotation were selected from all significant paralog synthetic lethality. Details of selected paralog synthetic lethality and corresponding metabolic annotations are listed. (D) Metabolic pathway map involving metabolic synthetic lethal gene pairs. Gene pairs are highlighted with a colored background, and dependencies are marked with red letters.
Beyond the SL associated with TSG deletion, a lack of other compensatory metabolic pathways might also lead to SL. Therefore, we further expanded our analysis to explore potential SL between metabolic genes. On the basis of the localization of synthetic lethal gene pairs in metabolic pathways, we classified metabolic SL into three categories: paralog SL, single pathway SL, and alternative pathway SL (Figure 3A). Paralog SL describes the dependencies associated with the lack of a paralog partner that performs similar functions48. We compiled a list of paralog pairs including previously reported paralogs and the isoenzymes annotated as catalyzing the same reaction in the ENZYME nomenclature database13,49. We then calculated the synthetic lethal correlations between each pair of paralog genes (Figure 3B and Table S2). Eight paralog SLs with clear metabolic pathway annotation were identified: CTPS1, IMPDH2, and ADSS2 in nucleotide metabolism; CDS2 and CEPT1 in glycerophospholipid metabolism; FASN catalyzing fatty acid synthesis; UAP1, a key enzyme for the synthesis of the N-glycosylation substrate N-acetylglucosamine; and IDH3A, an important enzyme in the tricarboxylic acid cycle (Figure 3C). The synthetic lethal effects of inhibition of IMPDH and FASN were also validated in the PRISM database (Figure S3D).
Figure 3D illustrates the synthetic lethal gene pairs in the relevant metabolic pathways. To enhance the credibility of the identified synthetic lethal targets, we verified the correlations of dependencies with the lack of partner genes at both the RNA and protein levels (Figure S3E–J). The lack of UAP1 was associated with GFPT1 dependency, thus indicating that a deficiency in a downstream enzyme increases the compensatory need for upstream metabolic substrates (Figure S3F). In glycerophospholipid metabolism, deficiency in 1-acylglycerol-3-phosphate O-acyltransferases, which catalyze diacylglycerol production, led to sensitivity to the inhibition of phosphocholine synthetase, CHKA, because these two enzymes collectively mediate the biosynthesis of phosphatidylcholine (Figure S3G). Another manifestation of metabolic SL is the combined inhibition of the alternative metabolic pathway when one specific metabolic pathway is suppressed. For example, when PGD, a key enzyme in the pentose phosphate pathway, is deficient, concurrent inhibition of GPI might impede tumor growth by blocking glucose metabolism (Figure S3H). Under insufficient ACAT1 and thus acetyl-CoA supply, cells depend on pyruvate dehydrogenase (PDH) to produce acetyl-CoA in the mitochondria. Additionally, significant synthetic lethal interactions were observed between lactate dehydrogenase (LDH) and PDH, both responsible for pyruvate degradation; SCD and FADS2, both catalyzing unsaturated fatty acid synthesis (Figure S3I); and ASS1 and GLUD2, both involved in anaplerosis of the tricarboxylic acid cycle (Figure S3J).
Classification of breast cancer cell line dependency patterns
Given the heterogeneity of breast cancers, we conducted a cluster analysis of breast cancer cell lines according to their dependency profiles, to decipher distinct dependency patterns and associated molecular features. We selected the top 2,000 most divergent dependencies, removed pan-cancer core essential genes with consistently high dependency, and obtained a total of 1,839 dependencies for NMF clustering (Figure S4A, B)13. Cophenetic correlation and consensus silhouette scores supported the robust classification of the breast cancer cell lines into two distinct clusters. (Figure S4C–E). As compared with the intrinsic molecular subtypes, cluster2 was significantly enriched in basal subtype cell lines, whereas cluster1 comprised predominantly luminal and HER2+ subtype cell lines (Figure 4A). These findings suggested that the most significant core differences among breast cancer cell lines were between basal and non-basal subtypes. Therefore, we further extracted cluster-specific dependencies from each NMF cluster to investigate the core dependency signatures of these two clusters. A total of 277 cluster1-specific dependencies and 228 cluster2-specific dependencies were obtained, most of which significantly differed between clusters (Figure 4B, C and Table S3). The top essential dependencies in cluster1 included FOXA1, SPDEF, and PIK3CA, factors known to promote the proliferation and differentiation of ER+ cells50–53, and TFAP2C and ERBB2, which support HER2+ cell proliferation54,55. The essential dependencies in cluster2 encompassed WWTR1, ILK, and MRE11, regulators of cellular proliferation and migration56–60.
Classification of breast cancer dependency patterns. (A) Dependency-based clustering results. Heatmap illustrates the cluster-specific dependency differences between clusters. Cell lines are annotated on top by dependency cluster and intrinsic subtype. (B) Weights of dependencies in two clusters. Dependencies with weights in the top 70% in only one cluster are considered cluster-specific dependencies. (C) Volcano plot illustrating the significant differential dependencies between clusters. FDRs were from two-sided Wilcoxon test and were adjusted with the Benjamini-Hochberg procedure. (D) Cluster specific dependency correlation network based on 277 cluster1-specific dependencies and 228 cluster2-specific dependencies, with Pearson correlation > 0.4, FDR < 0.05, and KCore > 3 cutoff. Node’s size and color indicate the average dependency for the respective cluster, with color progression from white (lowest dependency) to red (highest dependency), as shown in the legend. Edge thickness represents the pairwise correlation. (E) Gene set enrichment analysis results of cluster-specific dependencies.
We further deciphered the core dependency signatures of clusters through co-dependency networks and GSEA. We built cluster co-dependency networks by calculating the correlation of every two dependencies in cell lines of a cluster, where each node represented the average dependency for that cluster, and each edge represented the pairwise correlation (Figure 4D). Neighboring dependencies indicated similar essentiality among cell lines, thereby suggesting potential shared functional roles. Cluster1-specific dependencies were significantly partitioned into two communities, one of which was significantly enriched in mitochondrially associated proteins, including components of respiratory chain complexes and mitochondrial ribosomal proteins. GSEA also revealed that cluster1 cell lines are significantly dependent on mitochondrial gene expression, and core functions of aerobic respiration and ATP synthesis (Figure 4E). Cluster2 cell lines had a relatively dispersed co-dependency network, suggesting a broad range of connections among essential dependencies and possible mutual regulatory interactions. GSEA suggested that these dependencies consisted primarily of functional regulatory proteins affecting DNA replication, the mitotic cell cycle, and membrane organelle organization (Figure 4E), in agreement with the enhanced malignant proliferation capability and metastatic potential observed in basal subtype cell lines61.
Molecular characterization analysis of dependency clusters of breast cancer cell lines
Cell dependencies are influenced by their growth and metabolic properties and are underlain by corresponding cellular molecular features62.62. Therefore, we conducted a comprehensive analysis of the molecular characteristics of the two dependency clusters of cell lines.
In terms of somatic genomic alterations, we screened the somatic mutations of 670 tumor-associated genes. TP53 and PIK3CA mutations exhibited the highest frequencies, a finding consistent with observations in breast cancer patient cohorts (Figure 5A)63. As previously reported, cluster2, comprising basal subtype cell lines, contained more TP53 mutations64, and greater occurrence of RANBP2, PTPRT, and BRCA1 mutations, whereas cluster1 cell lines showed higher mutation frequencies in KMT2C and MLLT10 (Figure 5A). Regarding CNAs, we performed GISTIC analysis in the two clusters separately and focused on clinically interpretable CNAs (Tables S4 and 5). The overall level of gene loss/deletion in the breast cancer cell lines was relatively low, and no significant gene deletions were observed. However, the gene gain/amplification frequency was elevated, particularly in cluster1 cells (Figure S5A). MDM4 and MCL1 amplifications, and SETD2 deletion, were significantly enriched in cluster1, whereas RICTOR amplification was significantly enriched in cluster2 (Figure 5B).
Comprehensive analysis of dependency clusters of breast cancer. (A) Somatic mutations of the top mutated genes. Genes are ordered by the total mutation frequencies of the cell lines. P values were from Fisher’s exact test. * P < 0.05 (B) Copy number alterations in cancer-associated genes, ordered by their amplification (upper) or deletion (lower) rates in all cell lines. Amplification, gain, and loss were defined as GISTIC +2, +1, and −1, respectively. P values were from Fisher’s exact test. * P < 0.05 (C) Heatmap illustrating the pathway activity characteristics of differential pathways between clusters. Pathway scores were obtained from PROGENy. (D) Gene set enrichment analysis results of Gene Ontology BP gene sets within the cluster1 (blue background) and cluster2 (yellow background) cell lines.
Furthermore, we depicted the molecular functional characteristics of different clusters according to gene expression profiles. Analysis of differential mRNA and protein levels between clusters revealed substantial consistency between the transcriptomics and proteomics data (Figure S5B, C). The activity of representative signaling pathways and metabolic pathways in each cell line was assessed with PROGENy and ssGSEA, respectively23 (Figures 5C and S5D). Cluster1 showed heightened activity across multiple metabolic pathways, particularly in fatty acid metabolism, oxidative phosphorylation, and amino acid metabolism (Figure 5D), thus explaining its greater dependence on mitochondrially associated proteins (Figure 4D). The metabolomic data also exhibited accumulation of TAG, carnitine, glycerophosphate, and nucleotides in cluster1 cells (Figure S5E), in agreement with robust mitochondrial metabolism. In contrast, cluster2 showed high galactose and glycosaminoglycan metabolism, which were potentially associated with increased extracellular matrix synthesis (Figures 5D and S5D). Moreover, the activation of multiple oncogenic signaling pathways in cluster2 cell lines suggested that the cells might be regulated by extracellular signals, and have strong microenvironmental adaptability and metastatic potential (Figure 5C). Cluster2 cells were also enriched in cholesteryl esters (Figure S5E), possibly as a result of cholesterol accumulation, which in turn contributes to the immunosuppressive microenvironment65,66.
Cluster-specific dependency marker associations
Given that the dependency clusters exhibited significantly differing molecular features, we further investigated the correlations between the core dependency signatures and the corresponding underlying molecular attributes within each cluster. Initially, we verified that the cluster-specific core dependencies were associated with the differential molecular features of each cluster. The dependency on mitochondrial proteins in cluster1 was significantly associated with enhanced phosphatidylinositol metabolism, branched-chain amino acid catabolism, and unsaturated fatty acid synthesis, and was particularly associated with high expression of INPPL1, PLCB3, and BCKDHB. In cluster 2, the upregulated glycosaminoglycan biosynthesis and activation of the JAK-STAT pathway (represented by TRIM27 and GAS6) might have influenced its dependencies. (Figures 6A and S6A).
Cluster-specific dependency marker associations. (A) Top marker genes associated with cluster-specific dependency in each cluster. The DMA number is the number of cluster-specific dependencies associated with the marker gene. Each DMA was validated for at least two expression levels of the marker gene. The R values were from linear regression analysis, and the average R value of all DMA for one marker gene is presented. Marker genes involved in the metabolic pathway are highlighted. The expression levels of genes labeled with red letters positively correlated with dependencies, whereas the expression levels of genes labeled with blue letters negatively correlated with dependencies. (B) Significant correlations of dependencies with their corresponding omics levels, as indicated in the legend, for each cluster. The dependencies with correlations with at least two omics dimensions are labeled. (C) Difference between two clusters of cluster1-specific dependencies and the RNA expression of corresponding paralogs. P values were from two-sided Wilcoxon test. (D) The correlations of cluster2-specific dependencies with the expression of their paralogs were verified significant in at least two dimensions in cluster2 cell lines. Paralog synthetic lethal gene pairs and dimensions were labeled.
Within each cluster, we first investigated addiction to core oncogenic mechanisms. Cluster1 cell lines were significantly addicted to FOXA1 and TFAP2C, in agreement with previous reports that FOXA1 and TFAP2C are luminal and HER2+ subtype-specific essential transcription factors64, and both significantly correlated with ESR1 expression (Figures 6B and S6B). The addiction to RICTOR potentially suggested that mTORC2 plays an important role in maintaining cell proliferation in cluster1 cell lines. In contrast, the finding that cluster2 cell lines were significantly addicted to GOT2 and PRKRA suggested therapeutic potential (Figure 6B). We next analyzed the SL associated with genomic or metabolic alterations within each cluster. Cluster1 cells frequently showed Chr17p13.3 loss, and consequently YWHAE and INPP5K loss, both of which were significantly associated with AK4 dependency. Losses of NXN and SRR at the same chromosome location were associated with several cluster specific dependencies, such as FOXA1 and SPDEF. In contrast, the cluster2-specific dependency on ING3 was significantly associated with cluster-specific Chr16p11.2 loss (Figure S6C). Regarding metabolic SL, we identified cluster-specific paralog SL. The dependencies of NMT1 and SAMD4B were significantly elevated in cluster1 cell lines, probably because of the low expression of NMT2 and SAMD4A (Figure 6C). In cluster2 cell lines, the dependencies on IDH3A, CTPS1, and SMS correlated with losses of IDH2, CTPS2, and SRM, respectively, although no significant differences were observed between clusters (Figures 6D and S6D).
These analyses of cluster-specific dependencies alongside molecular signatures might potentially explain the potential molecular mechanisms underlying the generation of dependencies, while also suggesting possible cluster-specific therapeutic targets.
Discussion
Large-scale high-throughput genetic screens have mapped gene dependencies across cancer cell lines1,5,6. However, the mechanistic drivers of these dependencies remain elusive. Pacini et al. have identified gene dependencies functionally associated with fundamental molecular markers in pan-cancer cell lines, thus providing a novel analytical framework13. In breast cancer, although extensive studies have characterized multi-omics features and core dependencies in cell lines14,63,64, few investigations have bridged these molecular profiles to dependency landscapes. To address this gap, we constructed a systematic DMA analysis pipeline to delineate the multi-omics features underlying dependencies in a breast cancer cell line cohort. We then mechanistically interpreted these DMAs through two perspectives: addiction to gain-of-function alterations and synthetic lethal associations, both of which might serve as promising therapeutic targets. Furthermore, we discovered that breast cancer cell lines can be stratified into two clusters exhibiting divergent dependency signatures, and we deciphered the molecular underpinnings of the representative dependency patterns in each cluster, thus suggesting subtype-selective therapeutic targets.
With our DMA analytical framework, we parsed the statistically derived DMAs through two primary aspects to screen for valuable DMAs as potential therapeutic targeting candidates. First, focusing on addiction to gain-of-function alterations, we identified breast cancer cells addicted to SLC16A3, a key transporter mediating lactic acid efflux. The vigorous glycolysis observed in cancer cells results in substantial lactic acid production, thereby necessitating a reliance on high SLC16A3 expression for efficient excretion of lactic acid to maintain metabolic homeostasis. This metabolic rewiring suggests a shift from lactate uptake to efflux dependency in glycolytically active cells, which is likely to be modulated by intracellular acidification (low pH) or directly modulated by lactate and chaperones such as Basigin37,67–70. Blocking SLC16A3 might induce intracellular lactic acid accumulation and elicit cytotoxic effects71, thus suggesting a potentially actionable therapeutic target. Furthermore, our findings revealed that the SMS dependency was inversely correlated with SMS expression, thus implying that not all highly expressed oncogenic proteins can be considered viable therapeutic targets. This finding underscores the importance of using CRISPR screens and DMA analysis to identify effective treatment strategies.
Next, we delineated DMAs indicative of SL interactions72,73. Various classifications of SL have been proposed, such as synthetic dose lethality, collateral SL, and pathway SL1,74. Herein, for analytical convenience, we divided the SL into three major categories: SL associated with TSG deletion, collateral SL, and metabolic SL. SL between PARP1 and BRCA1 was a representative of the first class. Additionally, we identified that SLC4A7 shared a synthetic lethal interaction with SMAD4 deletion. Collateral SL specifically refers to SL associated with the co-deletion of TSGs and passenger genes. According to previous studies, MTAP is frequently co-deleted with CDKN2A/B45,46. Beyond the previously reported synthetic lethal target PRMT575, ALG1 and PSMG2 were additional potential targets suggested by our analysis. Moreover, we identified EXTL2, GSTM3, and SLC7A1 as potential candidates sharing synthetic lethal interactions with the co-deletion of SMAD4 and ME2 located at 18q21.247. Furthermore, we investigated metabolic SL, which we further classified into three categories—paralog SL, single pathway SL, and alternative pathway SL—on the basis of the localization of synthetic lethal gene pairs in the metabolic network. We particularly focused on paralog pairs catalyzing identical metabolic reactions, and we identified eight paralog SLs. Building on evidence that CTPS1, IMPDH2, and FASN are prioritized cancer targets76–78, we observed that their paralog deficiencies enhance SL vulnerability. Clinically actionable inhibitors against IMPDH2 and FASN further enhance the feasibility of targeted therapies79–82. Regarding the single pathway SL, the lack of key downstream enzymes involved in UDP-GlcNAc synthesis renders cancer cells increasingly reliant on the upstream metabolic enzyme GFPT183–85. In glycerophospholipid metabolism, phosphatidylcholine is synthesized from diglycerol and CDP-choline. Notably, the deletion of AGPAT, a key enzyme for diglycerol synthesis, heightens cancer dependence on CHKA, an enzyme crucial for CDP-choline production86–88. Blocked metabolic pathways force cancer cell dependency on alternative routes, inducing synthetic lethal interactions. Examples include the synthetic lethal interactions between SCD and FADS2 in the synthesis of unsaturated fatty acids89; LDH and PDH in pyruvate degradation90; PDH and ACAT1 in mitochondrial acetyl-CoA synthesis91,92; ASS1 and GLUD2 in anaplerosis of the tricarboxylic acid cycle93; and GPI and PGD, key enzymes in two distinct glucose degradation pathways94,95. These newly discovered metabolic SL targets might provide new possibilities for therapies targeting tumor metabolism.
Given the heterogeneity of breast cancer and the intricate linkage between cancer dependencies and tumor molecular characteristics, we performed NMF clustering on the gene dependency data. Optimal clustering was achieved with two clusters corresponding to non-basal and basal subtype cell lines. This finding underscored significant differences between their dependency portraits, which align with their pronounced differences in functional properties. We identified core dependency genes specific to each subtype. In agreement with earlier studies, non-basal cell lines exhibited a marked dependency on FOXA1, SPDEF, and PIK3CA, whereas basal cell lines displayed a notable dependency on WWTR1 and ILK64,96. Subsequently, we delineated the molecular features of the two clusters underpinning cluster-specific dependencies. Cluster1 cell lines exhibited mitochondrial hypermetabolism, including fatty acid metabolism and oxidative phosphorylation, in agreement with their reliance on mitochondrial proteins. This finding indicated that non-basal cell lines depend on upregulated essential metabolism to sustain tumor proliferation. In contrast, cluster 2 displayed close cell communication with active signal transduction and extracellular matrix synthesis, corresponding to dependency on functions associated with DNA replication, mitosis, and membrane organelle organization.
Furthermore, both clusters demonstrated specific addicted genes and synthetic lethal interactions. Activating mutations in the PI3K pathway present a therapeutic challenge in luminal breast cancer, whereas the addiction to RICTOR of cluster1 might offer a new treatment strategy97. Notably, the finding that cluster2 cell lines were highly addicted to GOT2 and PRKRA suggests therapeutic potential. A prior study has suggested that BRCA1 deletion might lead to high GOT2 expression in triple-negative breast cancer, thereby promoting cancer progression98. Our analysis further suggested that GOT2 might support the proliferation of cluster2 cells by maintaining intracellular aspartic acid levels. PRKRA, protein activator of interferon induced protein kinase EIF2AK2, was highly expressed in the cluster2 cells. PRKRA might participate in NF-κB pathway activation in the inflammatory response99 and might serve as another potential therapeutic target. Among the synthetic lethal gene pairs, NMT1 and SAMD4B were cluster1-specific paralog SL targets, whereas IDH3A, CTPS1, and SMS might share synthetic lethal interactions with their paralog in cluster2, thereby providing novel insights into precision treatment strategy in the future.
Our study has several limitations to be addressed in future research. First, the identified dependency targets, such as gene addiction and SL interactions, originated solely from CRISPR screening data in cell line models, and patient-level verification was lacking. Because most targets lack relevant targeted drugs, further therapeutic evaluation is necessary. Second, because of the constraint of limited cell line diversity, our analysis delineated only two predominant dependency patterns. Therefore, other potential clinically relevant subtype-specific dependencies remain uncharacterized. Moreover, statistical constraints might have yielded both false-negative DMAs (biologically relevant but statistically insignificant associations) and false-positive discoveries because of multiple testing.
Conclusions
In conclusion, our study established a DMA analytical framework to dissect the molecular mechanisms underlying gene dependencies in a multi-omics breast cancer cell line cohort. The cohort was stratified into two clusters exhibiting distinct dependency signatures and cluster-specific DMA profiles. Our findings decipher the molecular underpinnings of tumor dependencies and provide a precision oncology resource for targeting breast cancer heterogeneity through prioritized DMA signatures.
Conflict of interest statement
No potential conflicts of interest are disclosed.
Author contributions
Conceived and designed the analysis: Tianjian Yu, Zhiming Shao, Rui Ding.
Collected the data: Rui Ding.
Contributed data or analysis tools: Rui Ding, Tianjian Yu.
Performed the analysis: Rui Ding.
Wrote the paper: Rui Ding, Tianjian Yu.
Data availability statement
The authors confirm that the data supporting the findings of this study are available within the article.
- Received June 5, 2025.
- Accepted August 12, 2025.
- Copyright: © 2025, The Authors
This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.
References
- 1.↵
- 2.
- 3.
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.
- 9.
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.
- 52.
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.
- 58.
- 59.
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.
- 69.
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.
- 78.↵
- 79.↵
- 80.
- 81.
- 82.↵
- 83.↵
- 84.
- 85.↵
- 86.↵
- 87.
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵

















