Abstract
Objective: The presence of complex components in Chinese herbal medicine (CHM) hinders identification of the primary active substances and understanding of pharmacological principles. This study was aimed at developing a big-data-based, knowledge-driven in silico algorithm for predicting central components in complex CHM formulas.
Methods: Network pharmacology (TCMSP) and clinical (GEO) databases were searched to retrieve gene targets corresponding to the formula ingredients, herbal components, and specific disease being treated. Intersections were determined to obtain disease-specific core targets, which underwent further GO and KEGG enrichment analyses to generate non-redundant biological processes and molecular targets for the formula and each component. The ratios of the numbers of biological and molecular events associated with a component were calculated with a formula, and entropy weighting was performed to obtain a fitting score to facilitate ranking and improve identification of the key components. The established method was tested on the traditional CHM formula Danggui Sini Decoction (DSD) for gastric cancer. Finally, the effects of the predicted critical component were experimentally validated in gastric cancer cells.
Results: An algorithm called Chinese Herb Medicine-Formula vs. Ingredients Efficacy Fitting & Prediction (CHM-FIEFP) was developed. Ferulic acid was identified as having the highest fitting score among all tested DSD components. The pharmacological effects of ferulic acid alone were similar to those of DSD.
Conclusions: CHM-FIEFP is a promising in silico method for identifying pharmacological components of CHM formulas with activity against specific diseases. This approach may also be practical for solving other similarly complex problems. The algorithm is available at http://chm-fiefp.net/.
keywords
Introduction
Identifying the functional components in Chinese herbal medicine (CHM) is critical for optimizing formulations and understanding their mechanisms of action, but is often challenging, because most CHM formulas contain numerous substances from multiple medicinal plants. Network pharmacology, a tool based on large-scale drug-target-disease co-module screening, enables analysis of the intricate interactions between drugs and their targets at the molecular network level1,2. A variety of network pharmacology methods have been proposed for predicting key drug components in CHM and their corresponding target spectrums, and to analyze multi-target pharmacodynamic interactions3–10.
In this study, we developed an improved Python algorithm incorporating multiple approaches, including network pharmacology, bioinformatics, enrichment analysis, and an entropy weight method, to rank and identify the best-fit components that might show high pharmacological efficacy equivalency to a CHM formula (Figure 1). This algorithm is called the Chinese Herb Medicine-Formula vs. Ingredients Efficacy Fitting & Prediction (CHM-FIEFP).
The performance of CHM-FIEFP was tested on Danggui Sini Decoction (DSD), a formula extensively used to treat a variety of conditions involving “cold congealing in the meridians” in traditional Chinese medicine (TCM), including diabetic neuropathy, atherosclerosis, occlusive vasculitis, Raynaud’s disease, and dysmenorrhea11–16. We have recently described a novel application of DSD in cancer treatment9. A series of in vitro and in vivo assays demonstrated a direct inhibitory effect of DSD on the growth of gastric cancer (GC) cells by targeting multiple signaling pathways9. An overview of the components in DSD active against GC, as identified with the CHM-FIEFP algorithm, is illustrated in Figure 2.
Materials and methods
Development of CHM-FIEFP
The CHM-FIEFP algorithm incorporates three attributes—term, gene, and P-value—derived from enrichment analyses through network pharmacology and bioinformatics approaches. This in silico model was used to list and rank active ingredient components that might possess pharmacological activities and biological functions equivalent to the overall effects of CHM formulas. The modeling was based on the Python programming language.
Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform (TCMSP) database
The TCMSP online database (https://old.tcmsp-e.com/tcmsp.php) was used to predict absorption, distribution, metabolism, and excretion characteristics, including oral bioavailability, drug similarity, and Caco-2 cell permeability. Oral bioavailability ≥30% and drug similarity ≥0.18 were considered to indicate potentially reliable pharmacological activity. The TCMSP database was also used to predict potential component-related gene targets.
Gene Expression Omnibus (GEO) database
GEO datasets (https://www.ncbi.nlm.nih.gov/geo/) for various samples were searched to obtain mRNA expression profiles of GC and non-cancerous adjacent tissue. The cutoff criteria were P-value <0.01 and |log2FC| > 1. Overlapping differentially expressed targets among datasets were considered “GC-related” and subjected to further analysis.
Cytoscape software and plugins
Cytoscape open-source software can be used to construct, analyze, and visualize complex biological networks. We constructed an interaction network for the putative component-related gene targets and known GC-related gene targets, on the basis of data obtained from the Cytoscape (Version 3.2.1) plugin Bisogenet (Version 3.0.0). The topological properties17–19 of each target in each interaction network were assessed according to degree centrality, calculated with the Cytoscape plugin CytoNCA.
Database for Annotation, Visualization, and Integrated Discovery (DAVID)
The gene targets with high degree centrality values were uploaded to the DAVID 6.8 (https://david.ncifcrf.gov/) to determine functional and pathway enrichment.
Cell cultures and reagents
Human GC cell lines (SGC-7901 and AGS) were acquired from the China Infrastructure of Cell Line Resources (China) and maintained in RPMI-1640 and Ham’s F-12K media with 10% (v/v) fetal bovine serum and 100 U/mL penicillin/streptomycin at 37°C and 5% CO2. Reference standards of ferulic acid (ID: F103701-5g) and cinnamic acid (ID: C108495-5g) were purchased from Aladdin Corporation (China), and dissolved in dimethyl sulfoxide (≤ 0.5% volume of the working solution).
Verification of cellular pharmacodynamics
Cell viability and colony formation ability were examined as previously described9. Briefly, cells were exposed to ferulic acid or cinnamic acid at 0.025, 0.05, 0.1, 0.2, or 0.3 mg/mL for 24 or 48 h before determination of cell viability with CCK-8 assays (Dojindo, Japan). For colony formation assays, cells were cultured for 2 weeks before counting of colonies with 0.1% crystal violet. For cell motility assays, cells were cultured for 24 h; the trajectory and cumulative distance were monitored with an Operetta CLS High Content Analysis System and analyzed in Harmony software (PerkinElmer, Waltham, MA). The time-lapse imaging process used a ×20 long wide distance objective in digital phase contrast mode.
Statistical analyses
Control vs. low-dosage or high-dosage groups were analyzed with Student’s t-test in GraphPad Prism 6.0. P < 0.05 was considered statistically significant.
Results
In silico screening of pathway and gene targets
Identifying active components in DSD
On the basis of data from our previous study9, 66 candidate components from 7 herbs were identified. The numbers of candidate components from each of the 7 herbs were 11 from Dang Gui, 10 from Gui Zhi, 12 from Shao Yao, 8 from Xi Xin, 3 from Tong Cao, 11 from Da Zao, and 11 from Gan Cao (Supplementary Table S1). Subsequent UPLC-MS/MS analysis identified the following 7 candidate components from the Monarch herbs (Dang Gui and Gui Zhi), which deliver the major pharmacological effects of DSD, according to CHM theory: senkyunolide A, senkyunolide G, senkyunolide I, ferulic acid, ferulaldehyde, cinnamic acid, and O-methoxycinnamaldehyde.
Gene target collection for the DSD and the identified 7 components, according to the TCMSP database
We used the TCMSP database to obtain the potential drug-gene targets for each component (detailed information in Supplementary Table S2). Our previous study identified 243 potential gene targets from 66 components in the 7 herbs in DSD9. In this study, the numbers of gene targets corresponding to the 7 obtained candidate components from the Monarch herbs were 8 for C1 (senkyunolide A), 7 for C2 (senkyunolide G), 11 for C3 (senkyunolide I), 16 for C4 (ferulic acid), 11 for C5 (ferulaldehyde), 6 for C6 (cinnamic acid), and 0 for C7 (O-methoxycinnamaldehyde).
GC-related gene targets, according to the GEO database
Next, we obtained 3 gene expression datasets (GSE33335, GSE54129, and GSE79973) derived from human GC tissue and adjacent normal tissue from the GEO database. Analysis of overlap among the 3 datasets yielded a total of 189 differentially expressed GC-related gene targets9 (detailed information in Supplementary Table S2).
Gene target network building and analysis
Subsequently, we used the Bisogenet plugin in Cytoscape to perform gene target network building and analysis on the obtained potential gene targets for the DSD formula and its 7 components. The DSD-associated gene target network comprised 5,625 nodes (gene targets), whereas the gene target networks for C1–C7 consisted of 138, 79, 335, 1,368, 392, 49, and 0 nodes, respectively. Network building and analysis of the 189 GC-related gene targets with the same plugin yielded 3,237 nodes (detailed information in Supplementary Table S3).
Network intersection processing and topological analysis
Intersection of the DSD-related and GC-related gene targets yielded 2,098 common targets. A total of 70 common GC-related target genes were found for C1, 36 were found for C2, 138 were found for C3, 592 were found for C4, 205 were found for C5, and 29 were found for C6 (detailed information in Supplementary Table S4).
Further topological screening identified 1,062 formula core targets for DSD, and 35 component core targets (CCTs) for C1, 17 CCTs for C2, 69 CCTs for C3, 302 CCTs for C4, 106 CCTs for C5, and 16 CCTs for C6 (Supplementary Table S5).
Biological process and KEGG signaling pathway enrichment analyses
Three attributes from the original biological process (BP) and KEGG enrichment analysis data—term, gene, and P-value—were extracted for further analysis. The resultant BP and KEGG analysis data for DSD are denoted the enriched results and presented in Supplementary Table S6 (sheets FA and FB). Similarly, the BP and KEGG analysis results for components C1–C6 are denoted the sub-enriched results and listed in sheets A (1–6) and B (1–6), respectively.
Development of the CHM-FIEFP algorithm
The CHM-FIEFP algorithm was developed with the performance score assignment approach, which evaluates the degree of fitness between each term and target in the sub-enriched results with respect to the enriched results (BP and KEGG). Subsequently, the entropy weighting method was used to calculate the score for each component (Figure 1B, C). The specific calculation process was conducted as follows.
Update the enriched results and the sub-enriched results by removing the terms from BP and KEGG analysis results with P-value >0.05.
Calculate the term number ratio (Ri) for each sub-enriched result, according to the following formula:
where Ni is the total number of terms derived from a specific ingredient of the CHM formula, and Nf is the total number of terms derived from the formula. is the number of terms overlapping between the ingredient and formula (annotation as shown in Figure 1).
Update the enriched results and the sub-enriched results again by removing discordant terms.
Calculate the target number ratio (ri) for each sub-enriched result, according to the following formula:
where ni(Ti) is the number of targets corresponding to the term derived from the ingredient; n(Ti) is the number of targets corresponding to a specific term (the default selection is the “pop hits” list); is the number of targets overlapping between the term derived from the ingredient and term (the “pop hits” list); and nf(Ti) is the count of targets derived from the CHM formula.
Calculate the contribution weights of the term number ratio (WRi) and target number ratio (Wri) in the analysis of sub-enriched results with the entropy weight method. These can be determined as follows:
Normalize the Ri and ri obtained for each sub-enriched result. The Ri and ri obtained for each sub-enriched result are assumed to form a vector as follows:
Normalization against Ri and ri yields and , which are determined as follows:
Calculate the information entropy (E) of and as follows:
Calculate the contribution weights (W) of and as follows:
The fitting score (F) for each sub-enriched result is finally obtained to quantify the degree of fit between the sub-enriched results (ingredient) and the enriched results (entire formula). This score is calculated as follows:
Use of CHM-FIEFP to identify key DSD ingredients with anti-tumor effects
After having obtained table FA and table FB (sheet FA and sheet FB in Supplementary Table S6) for DSD, as well as the sub-tables for each component (sheets 1A–6A and 1B–6B in Supplementary Table S6), we imported the data from these tables and sub-tables into the CHM-FIEFP for calculation. Among all tested components, ferulic acid (C4) had the highest fit (F) value (Table 1). Subsequent CCK-8 assays revealed dose- and time-dependent cytotoxic effects of C4 in both GC cells (Figure 3A). With 24 h exposure, the IC50 of C4 was 0.127 ± 0.005 and 0.192 ± 0.001 mg/mL in SGC-7901 cells and AGS cells, respectively. The growth inhibitory effect of C4 was time-dependent within 72 h (Figure 3B). A dose-dependent decrease in clonal formation ability was observed when cells were exposed to C4 for 48 h (Figure 3C). C4 exposure also inhibited cell motility (Figure 3D). These findings collectively demonstrated the inhibitory effects of C4 on the growth and migratory ability of GC cells; these effects were comparable to those observed with DSD9. Importantly, these phenotypic changes closely resembled those induced by DSD treatment but were scarcely evident in cells treated with cinnamic acid (C6), which exhibited the lowest fit (F) value among all tested components (data not shown), thereby suggesting a crucial role of C4 in mediating the anti-GC effect of the formula by modulating similar biological functions and cell signaling pathways to those modulated by DSD.
Discussion
TCM has been used to treat many diseases but currently faces many challenges. With advances in bioinformatics and network pharmacology, TCM has achieved important breakthroughs20–22. Despite these advances, identifying key components of CHM formulas active against specific diseases remains difficult.
The algorithm developed in the current study was based on the entropy weighting method, and integrated various computational techniques and statistical analyses. Application of this algorithm to the DSD formula identified the most potent component (ferulic acid) against GC9. This prediction was validated through a series of in vitro assays in 2 cell lines. The identification of ferulic acid as an effective inhibitor of GC cell growth highlights the potential benefits of integrating traditional medicine with modern technology (Figure 4).
Ferulic acid is a member of the phenolic acid family. Previous studies have indicated that ferulic acid has various effects, including antioxidant, antibacterial, and anti-inflammatory properties, as well as an ability to prevent blood clot formation, protect against radiation damage, and treat or prevent Alzheimer’s disease23. Ferulic acid elicits anti-tumor effects via several mechanisms. Ferulic acid has been shown to induce cell cycle arrest and inhibit autophagy in cervical cancer cells24; inhibit metastasis and local by blocking the MET pathway in breast cancer cells25; and induce apoptosis by inhibiting the PI3K/Akt signaling pathway in osteosarcoma cells26. However, studies on the potential anti-tumor effects of ferulic acid in GC are relatively limited. The specific molecular mechanisms underlying its actions require further investigation. Importantly, ferulic acid has shown low toxicity in in vivo studies, thus suggesting favorable safety potentially supporting clinical use27.
In comparison to traditional methods such as orthogonal design, the CHM-FIEFP algorithm has several advantages. The algorithm prioritizes the biological processes and signaling pathways whose effects are most similar to the overall effect of the formula by using the fitting scores of various components, thus avoiding random testing of the components. This targeted strategy could conceivably accelerate the discovery process and also enable the discovery of new, previously overlooked therapeutic applications.
However, the CHM-FIEFP algorithm is capable only of ranking individual components within a CHM formula. To interpret the compatibility of CHM formulas, it is necessary to separately evaluate the fit (F) values of different components within each herb, referred to as Monarch, Minister, Assistant and Envoy herbs. This evaluation would help identify the specific components that best represent different herbs from a pharmacodynamics perspective. Selected components could then be reconstituted to generate “molecular recipes” with enhanced efficacy and favorable safety profiles.
In summary, the CHM-FIEFP algorithm offers an efficient means to determine the active components in CHM formulas against specific diseases.
Supporting Information
Conflict of interest statement
No potential conflicts of interest are disclosed.
Author contributions
Designed and supervised the study, and finalized the manuscript: Liren Liu, Shao Li, and Guoguang Ying. Contributed to the study design, performed the bioinformatics analysis and biological experiments, and drafted the manuscript: Boyu Pan, Han Zhu, Jiaqi Yang, Liangjiao Wang, Zizhen Chen, Jian Ma, Bo Zhang, and Zhanyu Pan.
Data availability statement
The data generated in this study are available upon request from the corresponding author.
Footnotes
↵*These authors contributed equally to this work.
- Received November 14, 2023.
- Accepted January 22, 2024.
- Copyright: © 2024, The Authors
This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.