Uncategorized

Enhancing foveal avascular zone analysis for Alzheimer’s diagnosis with AI segmentation and machine learning using multiple radiomic features


Fivefold cross-validation results on the training set

We compared the diagnostic performance of the proposed technique and the baseline technique for AD diagnosis on the training set. We divided the training set into 85 OCTA scans, which were divided into five sets to apply fivefold cross-validation. Each diagnosis technique was trained five times, and the mean validation performance was considered as the final diagnostic performance. The training details for each technique are detailed below.

Training details

Training for baseline 1

As the CNN backbone used in baseline 1, we tested four representative models: ResNet31, DenseNet32, EfficientNet33, and Inception34. Each model was trained with fivefold cross-validation using the pretraining parameters on the ImageNet dataset for initialization. Each training procedure proceeded for 50 epochs by applying the cross-entropy loss41 to a two-dimensional output probability vector for binary classification of positive (AD) and negative (NC) samples. The optimal learning rates were \(1e^{-2}\) for EfficientNet33, \(1e^{-2}\) for ResNet31, \(1e^{-5}\) for Inception34, and \(1e^{-2}\) for DenseNet32.

Training for baseline 2

Baseline 2 required manual extraction of FAZ binary mask \(S^{ma}_O\) from an OCTA scan. Thus, ophthalmologists extracted the FAZ binary masks from the 85 OCTA scans used in this study. In baseline 2, given the FAZ binary mask \(S^{ma}_O\) provided by the ophthalmologists, the area was calculated, and the ML model was applied to learn and evaluate the AD diagnosis by fivefold cross-validation. We used the LGBM39 as the ML model.

Training for proposed technique

Unlike baseline 2, the proposed technique performs AI-based segmentation. It receives an OCTA scan as input and predicts the FAZ binary mask as the output. To train the segmentation model, we used 2000 OCTA scan–FAZ mask pairs42 based on nnUNet43. As the learning objective function, the conventional pixel-based cross-entropy loss was used for training over 100 epochs under Adam optimization44 with a learning rate of 0.01. Thus, 85 FAZ mask prediction results for the 85 evaluation OCTA scans were obtained from the learned segmentation model, and multiple radiomic features were extracted. Then, the ML model (LGBM39 for a fair comparison with baseline 2) was applied for AD diagnosis in fivefold cross-validation. Training of the proposed technique is illustrated in Fig. 4.

Figure 4
figure 4

Training overview of proposed diagnosis technique. The proposed technique comprises AI-based FAZ segmentation and ML-based AD diagnosis using multiple radiomic FAZ features. Segmentation and classification loss functions are used as training losses for the AI and ML models, respectively. Data from our hospital are used for training and evaluation with fivefold cross-validation.

Diagnostic performance

The diagnostic performance results (AUC) per fold and across folds of the evaluated techniques are listed in Table 4. The AUC of the proposed technique was at least 13% higher than that of the baselines. The baselines did not provide clinically meaningful results because all the AUC values were below 60%. In contrast, the proposed technique could achieve clinical significance with AUC values above 70%. Furthermore, the proposed method also demonstrates statistical significance with very low p-values when compared to the baselines, providing evidence of its statistical significance (\(p < 0.05\)). Hence, this is the first technique demonstrating that multiple radiomic FAZ features are meaningful biomarkers for AD diagnosis.

Figure 5 shows the receiver operating characteristic curves of each technique for the aggregate AUC values derived from the cross-fold mean. In all the areas, regardless of the threshold, the proposed technique demonstrated higher sensitivity than the baselines, confirming its superiority.

Table 4 AUC of differential diagnosis between AD and NC. The mean and standard deviation are calculated from fivefold cross-validation.
Figure 5
figure 5

Receiver operating characteristic curves of differential diagnosis between AD and NC. The AUC values are \(72.2\pm 4.2\) (\(95\%\) confidence interval, \(66.7-75.5\)), \(52.4\pm 3.9\) (\(95\%\) confidence interval, \(49-55.8\)), and \(59.1\pm 6.2\) (\(95\%\) confidence interval, \(52.9-65.3\)) for the proposed technique, baseline 1 with DenseNet, and baseline 2, respectively.

Analysis of proposed technique and its elements

Performance of different ML models

The proposed technique diagnosed AD by feeding multiple radiomic FAZ features into an ML classifier. The results using LGBM39 as a representative ML model are listed in Table 4, and the performance comparison of other ML models (i.e., XGBoost37 and random forest38) is presented in Table 5. LGBM showed the highest performance, thus being selected as the ML model for the proposed technique. For models other than LGBM, the mean diagnostic performance was at least 60%. Thus, the proposed technique showed higher performance than the baselines (AUC of 60% or less), as shown in Table 4. This demonstrates that the proposed method is superior to the baselines regardless of the underlying ML model.

Table 5 Performance of differential diagnosis between AD and NC using different ML models in the proposed technique. The mean and standard deviation are obtained from fivefold cross-validation.

Ablation study for proposed technique

The proposed technique uses multiple radiomic FAZ features (i.e., area, compactness, eccentricity, roundness, and solidity) instead of only one feature (i.e., area used in baseline 2). Table 6 shows that the diagnostic performance gradually improved when each of these features was added to the proposed technique. Hence, the diagnostic performance was improved by adding the four features to the area, justifying their inclusion in the technique.

Table 6 Comparison of AD diagnostic performance by including radiomic features. A, compactness; B, eccentricity; C, roundness; D, solidity. The mean and standard deviation are obtained from fivefold cross-validation.

Validity of AI-based segmentation in proposed technique

The proposed approach automatically extracts the FAZ by AI-based segmentation. Table 7 lists the diagnostic performance when the proposed technique uses the binary masks manually annotated by an ophthalmologist, as in baseline 2, instead of the automatically segmented FAZ. The AUC of AD diagnosis was enhanced by 10.3% when AI-based automatic segmentation was used compared with manual segmentation. This performance improvement followed from the more accurate and precise AI-based FAZ extraction compared with manual annotations. We explain this performance improvement in section Discussion.

Table 7 AD diagnostic performance for manual and automatic segmentation. The mean and standard deviation are obtained from fivefold cross-validation.

Comparison of diagnostic performance of proposed methods on holdout test set

We conducted a holdout test using a set of 45 OCTA scan holdout datasets. The holdout test was compared and verified by baseline35(i.e., baseline 2 method only area feature) and ophthalmologists, respectively. accuracy, sensitivity, specificity, and AUC measured the evaluation of each diagnostic technique.

Comparison diagnostic performance between baseline and proposed method

The diagnostic performance results for the baseline35 method (i.e., the baseline 2 method uses only a single area feature) and the proposed method (multiple radiomic features) are detailed in Table 8.

Table 8 Diagnostic performance comparison between proposed and baseline method. Accuracy, sensitivity, specificity, and AUC score obtained from the holdout test set. Values in parentheses represent improvements in performance between the proposed and baseline method diagnoses. The mean and standard deviation are obtained from the holdout test.

FAZ binary masks were obtained manually and automatically (i.e., AI-based segmentation) for a holdout dataset for both the baseline and the proposed method. The FAZ binary masks were then used to test pre-trained models through a fivefold cross-validation process. In the holdout test, the proposed method showed a significant improvement in AUC compared to the baseline, with an improvement of 14% (baseline AUC 58.0% vs proposed AUC 72.0%). Furthermore, when compared to the baseline, the proposed method showed a 14.1% increase in accuracy (baseline accuracy 50.7% vs proposed accuracy 64.8%), a 5.0% increase in specificity (baseline specificity 78.7% vs proposed specificity 83.7%), and a 19.2% increase in sensitivity (baseline sensitivity 35.2% vs proposed sensitivity 54.4%). These results demonstrate the robustness of the proposed method in the holdout test, demonstrating superior performance in all evaluation metrics compared to the baseline (\(p < 0.05\)). Therefore, the proposed method demonstrates excellent diagnostic performance for the diagnosis of Alzheimer’s disease based on FAZ.

Comparison diagnostic performance between the ophthalmologist and proposed method

diagnosis of humans was conducted with the evaluation of three experienced ophthalmologists who did not participate in the collection of the OCTA holdout test dataset. For the diagnosis of humans, training was performed using the FAZ binary mask of 85 labeled training sets, and subsequently, an evaluation was performed using the FAZ binary mask of 45 unlabeled holdout datasets. The results of comparing the proposed method with the ophthalmologists are presented in Table 9.

Table 9 Diagnostic performance comparison between proposed method and ophthalmologist. Accuracy, sensitivity, specificity, and AUC score obtained from the holdout test set. Values in parentheses represent improvements in performance between the proposed method and ophthalmologist diagnoses. The mean and standard deviation are obtained from the holdout test.

The proposed method showed superior performance in all metrics (sensitivity, specificity, accuracy, and AUC) compared to ophthalmologists, particularly showing a significant improvement of over 30% in specificity (\(p < 0.05\)). This suggests that the proposed method is more sensitive in reducing false positives compared to ophthalmologists (humans’ specificity 53.7% vs proposed specificity 83.7%). In other words, it can significantly reduce the rate of false positive predictions for normal patients, which is cost-effective by saving on additional testing expenses (humans’ specificity 53.7% vs proposed specificity 83.7%). Furthermore, the proposed technique demonstrated higher sensitivity compared to ophthalmologists (humans’ sensitivity 52.8% vs proposed sensitivity 54.4%) and showed strong discriminative power for false negatives (humans’ AUC 53.2% vs proposed AUC 72.0%). Consequently, the proposed method shows potential utility as a clinical support tool for Alzheimer’s diagnoses based on FAZ in the future.

Figure 6 shows the AUC results for the binary classification of AD and NC using the proposed method and three ophthalmologists. The proposed method yields results that are 18.8% higher than the average AUC of the three ophthalmologists (humans’ AUC 53.2% vs proposed AUC 72.0%). In addition, the AUC of the proposed method showed a 14% improvement compared to the baseline (baseline AUC 58.0% vs proposed AUC 72.0%). This confirms that the proposed method (i.e., using multiple radiomics features including area) exhibited a significant performance improvement by considering multiple radiomics features, in contrast to the baseline method that relied on a single feature (i.e., using only area) for Alzheimer’s diagnosis (baseline AUC 58.0% vs proposed AUC 72.0%). Notably, a significant performance improvement was achieved even when compared to ophthalmologists (humans’ AUC 53.2% vs proposed AUC 72.0%). This indicates the potential of multiple radiomics features as a novel biomarker in FAZ-based Alzheimer’s diagnosis.

Figure 6
figure 6

Comparison diagnostic performance of the proposed method and average of AUCs for three ophthalmologists on holdout test. The AUC values are \(72.0\pm 4.8\) (\(95\%\) confidence interval, \(67.7-76.2\)), \(58.0\pm 0.009\) (\(95\%\) confidence interval, \(57.9-58.0\)), and \(53.2\pm 21.0\%\) (\(95\%\) confidence interval, \(32.0-69.5\)) for the proposed technique, baseline, and average of AUCs for three ophthalmologists.

Discussion

We showed that multiple radiomic FAZ features can be extracted by an AI model to support AD diagnosis. To the best of our knowledge, this is the first report using multiple radiomic FAZ features for diagnosis in patients with AD. We developed an automatic AD diagnosis technique comprising AI-based FAZ segmentation and ML-based AD diagnosis using the automatically extracted FAZ features.

Clinical implications of multiple radiomic FAZ features

Early detection of AD is of paramount importance, as it allows intervention prior to the onset of irreversible brain degeneration. Nevertheless, the current gold-standard diagnostic methods for AD, such as amyloid PET scans or CSF analysis, are insufficient as early screening tools. The retina, due to its embryological similarities with the brain and its easily and safely examined anatomical features, presents a promising avenue for the early detection of AD. The FAZ is a potential retinal biomarker for AD. The FAZ can be extracted from OCTA, which is a noninvasive retinal imaging modality. A recent meta-analysis revealed an enlargement of the FAZ in AD14. Another meta-analysis reported an enlarged FAZ in patients with mild cognitive impairment but no significant enlargement in AD45, while another meta-analysis showed no significant enlargement of the FAZ in AD46. Although limitations included the heterogeneity of OCTA equipment, diverse scanning protocols, and unmeasured confounders, previous studies only investigated the FAZ area, neglecting the FAZ shape, which may be a reliable indicator of retinal disorders15,17,47.

While area is frequently utilized as a primary metric for characterizing the FAZ, it is essential to recognize the substantial normal variation in FAZ size48. This variability may potentially constrain its utility as a pathological indicator in cross-sectional screening applications49. Evaluating the regularity of the FAZ’s overall shape, measured in terms of roundness or circularity, may offer a more precise indication of disease due to reduced variability within the healthy population50. Consequently, it is imperative to investigate whether biomarkers related to the shape of the FAZ possess diagnostic capabilities in individuals with AD. Recently, not only have there been reports of studies using ophthalmic imaging and AI for the diagnosis of ophthalmic diseases, but there have also been reports on their use for diagnosing AD29,51,52,53. In this study, we first revealed that multiple radiomic FAZ features, including roundness, eccentricity, compactness, and solidity, can improve the AD diagnostic performance compared with the FAZ area alone. Therefore, multiple radiomic FAZ features are useful for diagnosis and should be considered when evaluating the FAZ as new biomarkers for AD.

While our advanced AI-based methodology demonstrated a successful diagnosis of AD with a favorable diagnostic accuracy of “\(72.2\pm 4.2\%\),” it is important to acknowledge that this figure falls short of direct comparison with current gold standard diagnostic methods. Notably, our diagnosis was solely based on retinal imaging, without the utilization of traditionally established diagnostic tools for AD, such as amyloid PET scans, CSF tapping, brain imaging, and even the Mini-Mental State Examination. Nevertheless, the findings from our study have significant clinical implications. They bridge a well-recognized diagnostic gap by providing a non-invasive and cost-effective means for screening AD, circumventing the need for invasive and expensive tests like PET, CSF tapping, and brain MRI. This innovative approach not only offers potential clinical utility but also signals a promising avenue for further refinement. Moreover, our study’s results indicate the potential for further refinement of AI-based diagnostic techniques, which holds promise for future research endeavors focused on enhancing the early detection of AD. This work not only contributes to the field’s knowledge but also paves the way for continued exploration and development in the realm of AD diagnosis.

Possible mechanisms for FAZ changes in AD

Vascular dysfunction in patients with AD likely leads to cerebral hypoperfusion during AD development54,55,56,57,58. In vivo and autopsy data have revealed that AD is associated with the deposition of amyloid and collagen within the cerebral capillaries, which can result in cellular apoptosis and vessel dropout59,60,61,62. In addition, various studies have found the accumulation of beta-amyloid plaques in the inner retina of postmortem tissue extracted from patients with AD63,64,65,66. Therefore, FAZ changes in patients with AD may be secondary to retinal degeneration owing to beta-amyloid accumulation within the retina.

Performance improvement by AI-based FAZ segmentation

To evaluate the effectiveness of AI-based FAZ segmentation integrated into the proposed technique, we compared it with manual FAZ segmentation, obtaining the results listed in Table 7. Manual segmentation was the same as that in baseline 2. Compared with manual segmentation, AI-based segmentation improved the diagnostic performance in terms of AUC from 61.9 to 72.2 (improvement of 10.3\(\%\)). As shown in Fig. 7, the performance improvement was due to AI-based FAZ segmentation overcoming problems and errors in manual annotation, which showed some inaccurate or mistaken results. Nguyen et al.67 reported the high performance of AI-based FAZ segmentation. We observed that the AI-based FAZ segmentation extracted the FAZ more precisely. Thus, the multiple radiomic FAZ features were more precisely determined, thereby improving AD diagnosis.

Figure 7
figure 7

Comparison between FAZ segmentation methods. (a) Original OCTA scan and results from (b) manual and (c) AI-based segmentation.

Performance improvement by multiple radiomic features

Different from previous studies35, we considered multiple FAZ features (i.e., area, roundness, eccentricity, compactness, and solidity) to diagnose AD. Existing techniques relied only on the FAZ area, and their AD diagnostic performance was not high. We demonstrated that various FAZ features contributed to further improving AD diagnosis, as indicated in Table 10. Every feature considered in this study (i.e., roundness, eccentricity, compactness, and solidity) contributed to the diagnosis, individually leading to comparable performance to that of the area. This individual validation may indicate the diagnostic performance improvement achieved by feature combination, as shown in Table 5, with the performance gradually improving as more features were added.

Table 10 AD diagnostic performance obtained from every radiomic feature. The mean and standard deviation are obtained from fivefold cross-validation.

Technical implications

Our hybrid technique achieved an AUC of 72.2%, thus improving the AD diagnostic performance using FAZ biomarkers by 13.1% compared with existing techniques. This result holds notable clinical significance because it confirms that the FAZ is a suitable biomarker for AD, even though it was previously overlooked due to its low diagnostic performance in AD diagnosis. Hence, high AD diagnostic performance may be achieved by using FAZ biomarkers along with well-known biomarkers (e.g., global retinal nerve fiber layer, retinal thickness, vascular density, and FAZ area) that have been used for noninvasive AD diagnosis.

Promising performance of additional features in isolated feature analysis

We conducted comparative experiments by individually diagnosing isolated features, including the previously reported area35 feature and the four additional multiple features introduced in this study (i.e., solidity, compactness, eccentricity, roundness). The results for each of these isolated single features are detailed in Table 10.

Compared to the area, which was previously reported in FAZ-based Alzheimer’s diagnosis, the additional multiple features demonstrated their diagnostic potential, with solidity at 64.6% (+ 1.4%), roundness at 59.9% (\(-\) 3.3%), compactness at 66.0% (+ 2.8%) and eccentricity at 63.5% (+ 0.3%) in the single feature comparison experiments. This confirms the excellent performance of the majority of these features. Furthermore, it suggests that these multiple features (i.e., solidity, compactness, eccentricity, roundness) have significant correlations with structural changes in FAZ caused by Alzheimer’s disease, beyond only area. This research not only contributes to the significant impact of FAZ-based Alzheimer’s diagnosis but also provides the first study presenting meaningful biomarkers for detecting structural changes due to other ocular diseases.

Instrumental applicability of the proposed method in real clinical settings

The proposed method showed a high-specificity model, but it was possible to derive a model with high sensitivity by adjusting the thresholds. Threshold adjustment resulted in a sensitivity of 90% and a specificity of 33%. This means that it can identify 90% of Alzheimer’s patients while detecting around 30% of the normal control group. Specifically, it excels at accurately detecting 90% of Alzheimer’s patients, enabling them to be referred for secondary testing such as amyloid PET scans and CSF analysis. At the same time, it provides a basis for reducing the cost of secondary testing in around 30% of normal patients. This is because, unlike ophthalmologists, the proposed technique uses AI technology to achieve a model with high sensitivity through various threshold adjustments. As a result, we can provide a model that, for the first time, detects up to 90% of actual Alzheimer’s patients while providing a false positive rate of less than 10%.

Comparison between proposed and existing techniques

We compared and analyzed the differences between the proposed and existing techniques regarding various aspects, as summarized in Table 11.

OCTA provides scans in a short time, enabling efficient noninvasive FAZ analysis. In only one other study, OCTA was used for AD diagnosis (third column of Table 11)35. However, that study used only the FAZ area, discarding other radiomic features. We demonstrated the importance of using multiple FAZ features for AD diagnosis by improving the diagnostic performance when using the proposed technique compared with conventional techniques that use a single feature (i.e., baseline 2).

Among existing studies using OCTA, Chan et al.68 and Mirshahi et al.50 used AI-based segmentation to extract the FAZ (fourth column of Table 11). They reported that using AI enabled the extraction of FAZ boundaries with better accuracy than existing signal processing methods, thereby validating the use of AI-based FAZ segmentation in our technique. However, the contribution of the extracted FAZ to diagnosis was not confirmed in those studies. Our study has both technical and clinical significance because we showed that AI-based FAZ segmentation can improve the diagnostic performance for AD.

Shiihara et al.69 and Philip et al.70 extracted multiple FAZ features like in our study (fifth column of Table 11). However, they did not develop an ML model for diagnosing a specific disease using multiple FAZ features (sixth column of Table 11). Shiihara et al.69 found a small difference between individuals in other FAZ features in addition to the area for healthy subjects, thereby suggesting their potential as biomarkers. However, their study was limited to healthy subjects, without confirming the possibility of using multiple biomarkers for diagnosing specific diseases. In contrast, we demonstrated that features other than the FAZ area are useful biomarkers for AD and developed an ML model for diagnosis. Philip et al.70 analyzed whether multiple FAZ features were individually correlated with primary open-angle glaucoma or exfoliation glaucoma, but they did not observe a correlation with a specific disease by combining multiple features. Therefore, they did not validate feature combinations. In addition, they did not implement a technique for disease diagnosis taking those features as inputs. Our study provides clinical and technical significance by overcoming existing limitations in AD diagnosis by implementing an ML model that receives multiple radiomic FAZ features as inputs and provides the AD diagnosis result as output.

Table 11 Characteristic of proposed and existing techniques. Unlike previous studies, our study covers all the listed aspects.

Limitations

A major limitation of our study was the small sample size, which consisted solely of Asian individuals. Nevertheless, to overcome this limitation, a fivefold cross-validation and a holdout test were applied in the paper. The limitation of this holdout test is that it relies on data from a single institution and lacks external validation. However, in our study, we tried to collect data for the fivefold cross-validation experiment and the holdout test at different periods, attempting to separate the data as effectively as possible. Furthermore, the exclusion of patients with known vascular disease from our study was another limitation. We could not evaluate whether these results are applicable to individuals who may have retinal microvascular alterations by other causes. In addition, the inclusion of participants with cognitive changes and positive biomarkers for AD limited comparisons with subjects with preclinical and positive biomarkers, such as mild cognitive impairment. Nevertheless, we demonstrated the diagnostic ability of FAZ with AI for AD and all individuals in this study, including those in the AD and NC groups, were screened by amyloid PET. In future work, a comparison between patients with mild cognitive impairment and NCs and longitudinal changes in FAZ in AD will be considered.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *