The incidence of thyroid cancer has been increasing worldwide over the past two decades, including in the USA, where a decrease in the incidence of many other cancer types has been reported.1 Thyroid cancer is three times more prevalent in women than in men1 and is the most frequently diagnosed type of cancer in women younger than 30 years of age in China.2 Patients who are suspected of thyroid disease undergo ultrasound imaging, the results of which are interpreted by a radiologist for clinical diagnosis. A key aspect of a radiologist's interpretation of thyroid cancer is recognition of the malignant thyroid nodule, according to the Thyroid Imaging, Reporting and Data System (TI-RADS) guidelines. The American College of Radiology (ACR) TI-RADS,3 European TI-RADS,4 and American Thyroid Association guidelines5 propose multiple criteria to interpret sonographic images. Among these criteria, solid aspect, hypoechogenicity, taller-than-wide shape, irregular margin, extrathyroidal extension, calcification, and punctate echogenic foci are clinically relevant features associated with suspicion of malignant disease.3, 4, 5, 6, 7, 8 Patients with suspected thyroid cancer undergo fine-needle aspiration biopsy or surgical resection, which is assessed by pathological examination (the gold standard for diagnosis). Therefore, diagnosis of thyroid cancer is a time-consuming and often subjective process requiring substantial experience and expertise of radiologists.
Research in context
Evidence before this study
We searched PubMed on Aug 26, 2018, for research articles that contained the terms “deep learning” OR “convolutional neural network” AND “large scale thyroid imaging data”, without date or language restrictions. We found no studies that examined the use of deep learning to improve diagnostic accuracy of thyroid cancer by analysing large-scale sonographic imaging datasets. When we searched PubMed with the terms “deep learning” OR “convolutional neural network” AND “thyroid cancer”, we found seven studies that either used deep learning or conventional feature extraction-based machine-learning algorithms to characterise malignancy of thyroid nodules from ultrasonographic images. However, these studies did not include large training datasets (<100 000 images) or external validation sets. The best diagnostic classification method obtained so far was trained with 15 000 images and was not externally validated. Speculatively, the heterogeneity of thyroid nodules was not fully characterised with a limited dataset, and its generalisability remains unknown.
Added value of this study
The high performance of the deep learning model we developed in this study was validated in several cohorts. The improvement in accuracy and specificity seen with this model could lead to a reduction in unnecessary invasive fine-needle aspiration biopsy procedures and overdiagnosis and overtreatment of thyroid cancer. Furthermore, it has the potential to reduce barriers and provide equal access to diagnostic tools for thyroid cancer in regions and countries where medical resources are scarce.
Implications of all the available evidence
The results of our study could improve accuracy, efficiency, and reproducibility of thyroid cancer diagnosis. The artificial intelligence approach proposed could be particularly valuable in community hospitals in which expertise in radiological imaging interpretation is insufficient. Construction of a website running this deep learning framework is ongoing and will be freely available online.
There are four main subtypes of thyroid cancer: papillary, follicular, medullary, and anaplastic.7 The 5-year relative survival of patients with thyroid cancer is 99·7%,1 but this value varies substantially for different subtypes when stratified by stages: near 100% for stage I and II papillary, follicular, and medullary carcinoma; 71% for stage III follicular carcinoma, 81% for stage III medullary carcinoma, and 93% for stage III papillary carcinoma; and 7% for anaplastic, 28% for medullary, 50% for follicular, and 51% for papillary carcinoma at stage IV.9 All anaplastic thyroid cancers are considered stage IV.9 In view of the good prognostic outcome of early-stage thyroid cancer, analysis of thyroid ultrasound imaging data by an artificial intelligence algorithm with high performance could help differentiate patients at different risk and avoid unnecessary fine-needle aspiration biopsy or thyroidectomy for those at lower risk, particularly for those patients with papillary carcinomas.
The widespread use of sensitive imaging methods for screening has led to a steady increase in incidence of thyroid cancer, causing overdiagnosis and overtreatment in this setting.10, 11 Indolent and well-differentiated papillary carcinomas and other early-stage thyroid cancers are the main reasons for the growth in incidence, since the incidence of advanced-stage thyroid cancer is rising only marginally. Mortality from thyroid cancer has decreased slightly during the past decade.10 The frequency of estimated age-standardised thyroidectomy has risen annually by threefold to fourfold in both sexes over the same period.10 Therefore, development of an artificial intelligence framework based on a precise algorithm with high sensitivity and specificity could maintain a high recall rate for patients with thyroid cancer and identify individuals at low risk for developing advanced disease, thus avoiding unnecessary fine-needle aspiration biopsy. Recently, deep convolutional neural network (DCNN) models have been shown to achieve dermatologist-level classification accuracy in skin cancer diagnosis.12 Deep learning models have also shown improved performance compared with human experts in detection of diabetic retinopathy and eye-related diseases from raw input pixels of retinal fundus photographs.13, 14, 15
A traditional machine-learning algorithm for diagnosis of thyroid cancer has been previously developed,16 but it used as inputs features that were identified explicitly by human experts. Unlike traditional machine learning, deep learning does not require engineered features designed by human experts. Rather, deep learning takes raw image pixels and corresponding class labels from medical imaging data as inputs and automatically learns feature representation with a general manner.17 Learned representations can be used for classification and object detection. In this study, we aimed to ascertain the capability of deep learning models for automated diagnosis of thyroid cancer using real-world sonographic data from clinical thyroid ultrasound examinations. We compared results with pathological examination reports (the diagnostic gold standard). This study encompassed model development with a cohort of more than 300 000 images, and validation of the model in three validation datasets.