Dataset | Radiologist ID | Accuracy (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | Positive predictive value (95% CI) | Negative predictive value (95% CI) | Kappaa | F1b |
---|---|---|---|---|---|---|---|---|
TMUCIH test set (n = 439) | Radiologist 1 | 0.911 (0.881–0.936) | 1.000 (0.990–1.000) | 0.748 (0.672–0.815) | 0.879 (0.839–0.913) | 1.000 (0.975–1.000) | 0.794 | 0.936 |
Radiologist 2 | 0.927 (0.899–0.950) | 1.000 (0.990–1.000) | 0.794 (0.721–0.854) | 0.899 (0.860–0.930) | 1.000 (0.976–1.000) | 0.833 | 0.947 | |
Radiologist 3 aided with THCaDxNLP | 0.929 (0.901–0.952) | 0.919 (0.881–0.948) | 0.948 (0.901–0.977) | 0.970 (0.942–0.987) | 0.865 (0.804–0.912) | 0.849 | 0.944 | |
Radiologist 4 aided with THCaDxNLP | 0.909 (0.878–0.934) | 0.993 (0.975–0.999) | 0.755 (0.679–0.820) | 0.881 (0.841–0.915) | 0.983 (0.941–0.998) | 0.789 | 0.934 | |
TGH test set (n = 186) | Radiologist 1 | 0.903 (0.851–0.942) | 0.939 (0.871–0.977) | 0.864 (0.774–0.928) | 0.885 (0.807–0.939) | 0.927 (0.848–0.973) | 0.805 | 0.911 |
Radiologist 2 | 0.866 (0.808–0.911) | 0.980 (0.928–0.998) | 0.739 (0.634–0.827) | 0.807 (0.724–0.873) | 0.970 (0.896–0.996) | 0.727 | 0.885 | |
Radiologist 3 aided with THCaDxNLP | 0.930 (0.883–0.962) | 0.898 (0.820–0.950) | 0.966 (0.904–0.993) | 0.967 (0.907–0.993) | 0.895 (0.815–0.948) | 0.86 | 0.931 | |
Radiologist 4 aided with THCaDxNLP | 0.935 (0.890–0.966) | 1.000 (0.970–1.000) | 0.864 (0.774–0.928) | 0.891 (0.817–0.942) | 1.000 (0.961–1.000) | 0.87 | 0.942 | |
TFCH test set (n = 82) | Radiologist 1 | 0.841 (0.744–0.913) | 0.867 (0.693–0.962) | 0.827 (0.697–0.918) | 0.743 (0.567–0.875) | 0.915 (0.796–0.976) | 0.67 | 0.8 |
Radiologist 2 | 0.854 (0.758–0.922) | 0.867 (0.693–0.962) | 0.846 (0.719–0.931) | 0.765 (0.588–0.893) | 0.917 (0.800–0.977) | 0.693 | 0.812 | |
Radiologist 3 aided with THCaDxNLP | 0.976 (0.915–0.997) | 1.000 (0.905–1.000) | 0.962 (0.868–0.995) | 0.938 (0.792–0.992) | 1.000 (0.942–1.000) | 0.948 | 0.968 | |
Radiologist 4 aided with THCaDxNLP | 0.988 (0.934–1.000) | 0.967 (0.828–0.999) | 1.000 (0.944–1.000) | 1.000 (0.902–1.000) | 0.981 (0.899–1.000) | 0.974 | 0.983 | |
Weihai test set (n = 343) | Radiologist 1 | 0.924 (0.891–0.950) | 0.958 (0.916–0.983) | 0.892 (0.837–0.934) | 0.894 (0.839–0.935) | 0.957 (0.914–0.983) | 0.849 | 0.925 |
Radiologist 2 | 0.787 (0.740–0.829) | 0.599 (0.520–0.674) | 0.966 (0.927–0.987) | 0.943 (0.881–0.979) | 0.717 (0.655–0.774) | 0.57 | 0.733 | |
Radiologist 3 aided with THCaDxNLP | 0.950 (0.922–0.971) | 0.964 (0.923–0.987) | 0.938 (0.891–0.968) | 0.936 (0.888–0.968) | 0.965 (0.925–0.987) | 0.901 | 0.95 | |
Radiologist 4 aided with THCaDxNLP | 0.968 (0.943–0.984) | 0.994 (0.967–1.000) | 0.943 (0.898–0.972) | 0.943 (0.898–0.972) | 0.994 (0.967–1.000) | 0.936 | 0.968 | |
Chengde test set (n = 171) | Radiologist 1 | 0.865 (0.805–0.913) | 0.824 (0.726–0.898) | 0.907 (0.825–0.959) | 0.897 (0.808–0.955) | 0.839 (0.748–0.907) | 0.731 | 0.859 |
Radiologist 2 | 0.842 (0.779–0.893) | 0.788 (0.686–0.869) | 0.895 (0.811–0.951) | 0.882 (0.787–0.944) | 0.811 (0.717–0.884) | 0.684 | 0.832 | |
Radiologist 3 aided with THCaDxNLP | 0.889 (0.832–0.932) | 0.894 (0.808–0.950) | 0.884 (0.797–0.943) | 0.884 (0.797–0.943) | 0.894 (0.808–0.950) | 0.778 | 0.889 | |
Radiologist 4 aided with THCaDxNLP | 0.906 (0.853–0.946) | 1.000 (0.965–1.000) | 0.814 (0.716–0.890) | 0.842 (0.756–0.907) | 1.000 (0.958–1.000) | 0.813 | 0.914 |
aMeasures the agreement between predicted classification with pathological report. bHarmonic average of the precision and recall rates.