Elsevier

The Lancet Oncology

Volume 20, Issue 2, February 2019, Pages 193-201
The Lancet Oncology

Articles
Diagnosis of thyroid cancer using deep convolutional neural network models applied to sonographic images: a retrospective, multicohort, diagnostic study

https://doi.org/10.1016/S1470-2045(18)30762-9Get rights and content

Summary

Background

The incidence of thyroid cancer is rising steadily because of overdiagnosis and overtreatment conferred by widespread use of sensitive imaging techniques for screening. This overall incidence growth is especially driven by increased diagnosis of indolent and well-differentiated papillary subtype and early-stage thyroid cancer, whereas the incidence of advanced-stage thyroid cancer has increased marginally. Thyroid ultrasound is frequently used to diagnose thyroid cancer. The aim of this study was to use deep convolutional neural network (DCNN) models to improve the diagnostic accuracy of thyroid cancer by analysing sonographic imaging data from clinical ultrasounds.

Methods

We did a retrospective, multicohort, diagnostic study using ultrasound images sets from three hospitals in China. We developed and trained the DCNN model on the training set, 131 731 ultrasound images from 17 627 patients with thyroid cancer and 180 668 images from 25 325 controls from the thyroid imaging database at Tianjin Cancer Hospital. Clinical diagnosis of the training set was made by 16 radiologists from Tianjin Cancer Hospital. Images from anatomical sites that were judged as not having cancer were excluded from the training set and only individuals with suspected thyroid cancer underwent pathological examination to confirm diagnosis. The model's diagnostic performance was validated in an internal validation set from Tianjin Cancer Hospital (8606 images from 1118 patients) and two external datasets in China (the Integrated Traditional Chinese and Western Medicine Hospital, Jilin, 741 images from 154 patients; and the Weihai Municipal Hospital, Shandong, 11 039 images from 1420 patients). All individuals with suspected thyroid cancer after clinical examination in the validation sets had pathological examination. We also compared the specificity and sensitivity of the DCNN model with the performance of six skilled thyroid ultrasound radiologists on the three validation sets.

Findings

Between Jan 1, 2012, and March 28, 2018, ultrasound images for the four study cohorts were obtained. The model achieved high performance in identifying thyroid cancer patients in the validation sets tested, with area under the curve values of 0·947 (95% CI 0·935–0·959) for the Tianjin internal validation set, 0·912 (95% CI 0·865–0·958) for the Jilin external validation set, and 0·908 (95% CI 0·891–0·925) for the Weihai external validation set. The DCNN model also showed improved performance in identifying thyroid cancer patients versus skilled radiologists. For the Tianjin internal validation set, sensitivity was 93·4% (95% CI 89·6–96·1) versus 96·9% (93·9–98·6; p=0·003) and specificity was 86·1% (81·1–90·2) versus 59·4% (53·0–65·6; p<0·0001). For the Jilin external validation set, sensitivity was 84·3% (95% CI 73·6–91·9) versus 92·9% (84·1–97·6; p=0·048) and specificity was 86·9% (95% CI 77·8–93·3) versus 57·1% (45·9–67·9; p<0·0001). For the Weihai external validation set, sensitivity was 84·7% (95% CI 77·0–90·7) versus 89·0% (81·9–94·0; p=0·25) and specificity was 87·8% (95% CI 81·6–92·5) versus 68·6% (60·7–75·8; p<0·0001).

Interpretation

The DCNN model showed similar sensitivity and improved specificity in identifying patients with thyroid cancer compared with a group of skilled radiologists. The improved technical performance of the DCNN model warrants further investigation as part of randomised clinical trials.

Funding

The Program for Changjiang Scholars and Innovative Research Team in University in China, and National Natural Science Foundation of China.

Introduction

The incidence of thyroid cancer has been increasing worldwide over the past two decades, including in the USA, where a decrease in the incidence of many other cancer types has been reported.1 Thyroid cancer is three times more prevalent in women than in men1 and is the most frequently diagnosed type of cancer in women younger than 30 years of age in China.2 Patients who are suspected of thyroid disease undergo ultrasound imaging, the results of which are interpreted by a radiologist for clinical diagnosis. A key aspect of a radiologist's interpretation of thyroid cancer is recognition of the malignant thyroid nodule, according to the Thyroid Imaging, Reporting and Data System (TI-RADS) guidelines. The American College of Radiology (ACR) TI-RADS,3 European TI-RADS,4 and American Thyroid Association guidelines5 propose multiple criteria to interpret sonographic images. Among these criteria, solid aspect, hypoechogenicity, taller-than-wide shape, irregular margin, extrathyroidal extension, calcification, and punctate echogenic foci are clinically relevant features associated with suspicion of malignant disease.3, 4, 5, 6, 7, 8 Patients with suspected thyroid cancer undergo fine-needle aspiration biopsy or surgical resection, which is assessed by pathological examination (the gold standard for diagnosis). Therefore, diagnosis of thyroid cancer is a time-consuming and often subjective process requiring substantial experience and expertise of radiologists.

Research in context

Evidence before this study

We searched PubMed on Aug 26, 2018, for research articles that contained the terms “deep learning” OR “convolutional neural network” AND “large scale thyroid imaging data”, without date or language restrictions. We found no studies that examined the use of deep learning to improve diagnostic accuracy of thyroid cancer by analysing large-scale sonographic imaging datasets. When we searched PubMed with the terms “deep learning” OR “convolutional neural network” AND “thyroid cancer”, we found seven studies that either used deep learning or conventional feature extraction-based machine-learning algorithms to characterise malignancy of thyroid nodules from ultrasonographic images. However, these studies did not include large training datasets (<100 000 images) or external validation sets. The best diagnostic classification method obtained so far was trained with 15 000 images and was not externally validated. Speculatively, the heterogeneity of thyroid nodules was not fully characterised with a limited dataset, and its generalisability remains unknown.

Added value of this study

The high performance of the deep learning model we developed in this study was validated in several cohorts. The improvement in accuracy and specificity seen with this model could lead to a reduction in unnecessary invasive fine-needle aspiration biopsy procedures and overdiagnosis and overtreatment of thyroid cancer. Furthermore, it has the potential to reduce barriers and provide equal access to diagnostic tools for thyroid cancer in regions and countries where medical resources are scarce.

Implications of all the available evidence

The results of our study could improve accuracy, efficiency, and reproducibility of thyroid cancer diagnosis. The artificial intelligence approach proposed could be particularly valuable in community hospitals in which expertise in radiological imaging interpretation is insufficient. Construction of a website running this deep learning framework is ongoing and will be freely available online.

There are four main subtypes of thyroid cancer: papillary, follicular, medullary, and anaplastic.7 The 5-year relative survival of patients with thyroid cancer is 99·7%,1 but this value varies substantially for different subtypes when stratified by stages: near 100% for stage I and II papillary, follicular, and medullary carcinoma; 71% for stage III follicular carcinoma, 81% for stage III medullary carcinoma, and 93% for stage III papillary carcinoma; and 7% for anaplastic, 28% for medullary, 50% for follicular, and 51% for papillary carcinoma at stage IV.9 All anaplastic thyroid cancers are considered stage IV.9 In view of the good prognostic outcome of early-stage thyroid cancer, analysis of thyroid ultrasound imaging data by an artificial intelligence algorithm with high performance could help differentiate patients at different risk and avoid unnecessary fine-needle aspiration biopsy or thyroidectomy for those at lower risk, particularly for those patients with papillary carcinomas.

The widespread use of sensitive imaging methods for screening has led to a steady increase in incidence of thyroid cancer, causing overdiagnosis and overtreatment in this setting.10, 11 Indolent and well-differentiated papillary carcinomas and other early-stage thyroid cancers are the main reasons for the growth in incidence, since the incidence of advanced-stage thyroid cancer is rising only marginally. Mortality from thyroid cancer has decreased slightly during the past decade.10 The frequency of estimated age-standardised thyroidectomy has risen annually by threefold to fourfold in both sexes over the same period.10 Therefore, development of an artificial intelligence framework based on a precise algorithm with high sensitivity and specificity could maintain a high recall rate for patients with thyroid cancer and identify individuals at low risk for developing advanced disease, thus avoiding unnecessary fine-needle aspiration biopsy. Recently, deep convolutional neural network (DCNN) models have been shown to achieve dermatologist-level classification accuracy in skin cancer diagnosis.12 Deep learning models have also shown improved performance compared with human experts in detection of diabetic retinopathy and eye-related diseases from raw input pixels of retinal fundus photographs.13, 14, 15

A traditional machine-learning algorithm for diagnosis of thyroid cancer has been previously developed,16 but it used as inputs features that were identified explicitly by human experts. Unlike traditional machine learning, deep learning does not require engineered features designed by human experts. Rather, deep learning takes raw image pixels and corresponding class labels from medical imaging data as inputs and automatically learns feature representation with a general manner.17 Learned representations can be used for classification and object detection. In this study, we aimed to ascertain the capability of deep learning models for automated diagnosis of thyroid cancer using real-world sonographic data from clinical thyroid ultrasound examinations. We compared results with pathological examination reports (the diagnostic gold standard). This study encompassed model development with a cohort of more than 300 000 images, and validation of the model in three validation datasets.

Section snippets

Study design and participants

We did a retrospective, multicohort, diagnostic study using ultrasound images sets from three hospitals in China. We obtained ultrasound images for the training set (312 399 images from 42 952 patients) from the thyroid imaging database at Tianjin Cancer Hospital, Tianjin, China. We obtained images for validation sets from thyroid imaging databases at Tianjin Cancer Hospital (internal validation set, 8606 images from 1118 patients), the Integrated Traditional Chinese and Western Medicine

Results

Between Jan 1, 2012, and Dec 15, 2017, 396 998 ultrasound images were obtained for the training set from the Thyroid Imaging Database in Tianjin Cancer Hospital. After quality control evaluation, 84 599 (21%) images that did not match with pathological reports in terms of anatomical locations were removed from this set. The complete training set consisted of 312 399 images from 42 952 individuals: 17 627 patients with thyroid cancer (131 731 images) and 25 325 controls (180 668 images).

Between

Discussion

The findings of our retrospective study show that our DCNN model tested in three validation sets can achieve high accuracy, sensitivity, and specificity in automated thyroid cancer diagnosis in a real-world setting. The developed artificial intelligence system had significantly higher accuracy and specificity in classifying thyroid cancer patients compared with a group of skilled radiologists. The thyroid ultrasound images used in our study were produced by several different types of ultrasound

References (27)

  • Thyroid cancer survival rates, by type and stage

  • S Jegerlehner et al.

    Overdiagnosis and overtreatment of thyroid cancer: a population-based temporal trend study

    PLoS One

    (2017)
  • S Park et al.

    Association between screening and the thyroid cancer “epidemic” in South Korea: evidence from a nationwide study

    BMJ

    (2016)
  • Cited by (294)

    View all citing articles on Scopus
    *

    Contributed equally and are joint first authors

    Contributed equally and are joint senior authors

    View full text