Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians

Wu, E. et al. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat. Med. 27, 582–584 (2021).

Article
CAS
PubMed

Google Scholar

Reddy, S. Explainability and artificial intelligence in medicine. Lancet Digit. Health 4, E214–E215 (2022).

Article
CAS
PubMed

Google Scholar

Young, A. T. et al. Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models. npj Digit. Med. 4, 10 (2021).

Article
PubMed
PubMed Central

Google Scholar

DeGrave, A. J., Janizek, J. D. & Lee, S.-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3, 610–619 (2021).

Article

Google Scholar

Singh, N. et al. Agreement between saliency maps and human-labeled regions of interest: applications to skin disease classification. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 3172–3181 (IEEE, 2020).

Bissoto, A., Fornaciali, M., Valle, E. & Avila, S. (De) constructing bias on skin lesion datasets. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2766–2774 (IEEE, 2019).

Winkler, J. K. et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatol. 155, 1135–1141 (2019).

Article
PubMed
PubMed Central

Google Scholar

Singla, S., Pollack, B., Chen, J. & Batmanghelich, K. Explanation by progressive exaggeration. In International Conference on Learning Representations (ICLR, 2020).

Mertes, S., Huber, T., Weitz, K., Heimerl, A., & Andr, E. GANterfactual—counterfactual explanations for medical non-experts using generative adversarial learning. Front. Artif. Intell. 5, 825565 (2022).

Article
PubMed
PubMed Central

Google Scholar

Ghoshal, B. & Tucker, A. Estimating uncertainty and interpretability in deep learning for coronavirus (COVID-19) detection. Preprint at arXiv:2003.10769 (2020).

Ozturk, T. et al. Automated detection of COVID-19 cases using deep neural networks with X-ray images. Comput. Biol. Med. 121, 103792 (2020).

Article
CAS
PubMed
PubMed Central

Google Scholar

Brunese, L., Mercaldo, F., Reginelli, A. & Santone, A. Explainable deep learning for pulmonary disease and coronavirus COVID-19 detection from X-rays. Comput. Methods Programs Biomed. 196, 105608 (2020).

Article
PubMed
PubMed Central

Google Scholar

Karim, M. et al. DeepCOVIDExplainer: explainable COVID-19 diagnosis from chest X-ray images. In 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 1034–1037 (IEEE, 2020).

Geirhos, R. et al. Shortcut learning in deep neural networks. Nat. Mach. Intell. 2, 665–673 (2020).

Article

Google Scholar

Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).

Article
CAS
PubMed
PubMed Central

Google Scholar

Liu, Y. et al. A deep learning system for differential diagnosis of skin diseases. Nat. Med. 26, 900–908 (2020).

Article
CAS
PubMed

Google Scholar

Han, S. S. et al. Augmented intellignece dermatology: deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders. J. Invest. Dermatol. 140, 1753–1761 (2020).

Article
CAS
PubMed

Google Scholar

Sun, M. D. et al. Accuracy of commercially available smartphone applications for the detection of melanoma. Br. J. Dermatol. 186, 744–746 (2022).

Article
CAS
PubMed
PubMed Central

Google Scholar

Freeman, K. et al. Algorithm based smartphone apps to assess risk of skin cancer in adults: systematic review of diagnostic accuracy studies. Br. Med. J. 368, m127 (2020).

Article

Google Scholar

Beltrami, E. J. et al. Artificial intelligence in the detection of skin cancer. J. Am. Acad. Dermatol. 87, 1336–1342 (2022).

Article
PubMed

Google Scholar

Daneshjou, R. et al. Disparities in dermatology AI performance on a diverse, curated clinical image set. Sci. Adv. 8, eabq6147 (2022).

Article
PubMed
PubMed Central

Google Scholar

Han, S. S. et al. Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. J. Invest. Dermatol. 138, 1529–1538 (2018).

Article
CAS
PubMed

Google Scholar

Ha, Q., Liu, B. & Liu, F. Identifying melanoma images using EfficientNet ensemble: winning solution to the SIIM-ISIC melanoma classification challenge. Preprint at arXiv:2010.05351 (2020).

Rotemberg, V. et al. A patient-centric dataset of images and metadata for identifying melanomas using clinical context. Sci. Data 8, 34 (2021).

Article
PubMed
PubMed Central

Google Scholar

Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 (2018).

Article
PubMed
PubMed Central

Google Scholar

Combalia, M. et al. BCN20000: dermoscopic lesions in the wild. Preprint at arXiv:1908.02288 (2019).

Groh, M. et al. Evaluating deep neural networks trained on clinical images in dermatology with the Fitzpatrick 17k dataset. In Proceedings of the Computer Vision and Pattern Recognition (CVPR) Sixth ISIC Skin Image Analysis Workshop (IEEE, 2021).

Karras, T. et al. Analyzing and improving the image quality of StyleGAN. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 8107–8116 (IEEE, 2020).

Shi, K. et al. A retrospective cohort study of the diagnostic value of different subtypes of atypical pigment network on dermoscopy. J. Am. Acad. Dermatol. 83, 1028–1034 (2020).

Article
PubMed

Google Scholar

Yélamos, O. et al. Usefulness of dermoscopy to improve the clinical and histopathologic diagnosis of skin cancers. J. Am. Acad. Dermatol. 80, 365–377 (2019).

Article
PubMed

Google Scholar

Halpern, A. C., Marghoob, A. A. & Reiter, O. Melanoma Warning Signs: What You Need to Know About Early Signs of Skin Cancer (Skin Cancer Foundation, 2021); https://www.skincancer.org/skin-cancer-information/melanoma/melanoma-warningsigns-and-images/. Accessed April 2023.

Massi, D., De Giorgi, V., Carli, P. & Santucci, M. Diagnostic significance of the blue hue in dermoscopy of melanocytic lesions: a dermoscopic-pathologic study. Am. J. Dermatopathol. 23, 463–469 (2001).

Article
CAS
PubMed

Google Scholar

Marghoob, N. G., Liopyris, K. & Jaimes, N. Dermoscopy: a review of the structures that facilitate melanoma detection. J. Osteopath. Med. 119, 380–390 (2019).

Article

Google Scholar

Oliveria, S. A., Saraiya, M., Geller, A. C., Heneghan, M. K. & Jorgensen, C. Sun exposure and risk of melanoma. Arch. Dis. Child. 91, 131–138 (2006).

Article
CAS
PubMed

Google Scholar

Zhu, J.-Y., Park, T., Isola, P. & Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV) 2223–2232 (IEEE, 2017).

Illumination, I. C. on. ISO/CIE 11664-5:2016(e) Colorimetry—part 5: CIE 1976 L*u*v* colour space and u’, v’ uniform chromaticity scale diagram (2016).

Deng, Z., Gijsenij, A. & Zhang, J. Source camera identification using auto-white balance approximation. In 2011 IEEE International Conference on Computer Vision 57–64 (IEEE, 2011).

Rader, R. K. et al. The pink rim sign: location of pink as an indicator of melanoma in dermoscopic images. J. Skin Cancer 2014, 719740 (2014).

Article
PubMed
PubMed Central

Google Scholar

Tschandl, P. et al. Human–computer collaboration for skin cancer recognition. Nat. Med. 26, 1229–1234 (2020).

Article
CAS
PubMed

Google Scholar

Tschandl, P. et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based international, diagnostic study. Lancet Oncol. 20, 938–947 (2019).

Article
PubMed
PubMed Central

Google Scholar

Weber, P., Sinz, C., Rinner, C., Kittler, H. & Tschandl, P. Perilesional sun damage as a diagnostic clue for pigmented actinic keratosis and Bowen’s disease. J. Eur. Acad. Dermatol. Venereol. 35, 2022–2026 (2021).

Article
CAS
PubMed
PubMed Central

Google Scholar

Fitzpatrick, J. E., High, W. A. & Kyle, W. L. Urgent Care Dermatology: Symptom-Based Diagnosis. 477–488 (Elsevier, 2018).

Wu, E. et al. Toward Stronger FDA Approval Standards for AI Medical Devices (Stanford University Human-centered Artificial Intelligence (2022).

Bansal, G. et al. Does the whole exceed its parts? The effect of AI explanations on complementary team performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (ACM, 2021).

Rok, R. & Weld, D. S. In search of verifiability: explanations rarely enable complementary performance in AI-advised decision making. Preprint at arXiv:2305.07722v3 (2023).

Roth, L. Looking at Shirley, the ultimate norm: colour balance, image technologies, and cognitive equity. Can. J. Commun. 34, 111–136 (2009).

Article

Google Scholar

Lester, J. C., Clark, L., Linos, E. & Daneshjou, R. Clinical photography in skin of colour: tips and best practices. Br. J. Dermatol. 184, 1177–1179 (2021).

Article
CAS
PubMed

Google Scholar

Poplin, R. et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat. Biomed. Eng. 2, 158–164 (2018).

Article
PubMed

Google Scholar

Yamashita, T. et al. Factors in color fundus photographs that can be used by humans to determine sex of individuals. Transl Vis. Sci. Technol. 9, 4 (2020).

Article
PubMed
PubMed Central

Google Scholar

Codella, N. C. F. et al. Skin lesion analysis toward melanoma detection: a challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC). In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI), 168–172 (IEEE, 2018).

Tan, M. et al. MnasNet: platform-aware neural architecture search for mobile. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2820–2828 (IEEE, 2019).

Jacob, B. et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2704–2713 (IEEE, 2018)

He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, 2016).

Tan, M. & Le, Q. EfficientNet: rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML 2019) 6105–6114 (PMLR, 2019).

Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 7132–7141 (IEEE, 2018).

Zhang, H. et al. ResNeSt: split-attention networks. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2735–2745 (IEEE, 2022).

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2818–2826 (IEEE, 2016).

Giotis, I. et al. MED-NODE: a computer-assisted melanoma diagnosis system using non-dermoscopic images. Expert Syst. Appl. 42, 6578–6585 (2015).

Article

Google Scholar

Source link