Cardiopulmonary exercise testing (CPET) has become an essential diagnostic and prognostic tool in clinical practice (1,2). The prognostic value of CPET is established by the inverse relationship of peak oxygen uptake (V̇O2peak) and all-cause mortality (3), as well as the risk for various noncommunicable diseases (e.g., coronary artery disease, type 2 diabetes) (3,4). V̇O2peak reflects the upper ceiling of oxygen supply and utilization and is dependent on the interplay of 1) pulmonary–vascular, 2) mechanical–ventilatory, 3) cardiocirculatory, and 4) muscular systems (5). CPET is thus useful to determine the status quo of an individual’s cardiorespiratory fitness, detect physiological factors limiting V̇O2peak, and quantify changes of such (e.g., due to disease or interventions) (1).
The correct interpretation of the multitude of CPET data and especially their interplay requires extensive training (6). There is currently a lack of sufficiently trained staff, particularly in smaller institutions (7). CPET is thus underused (7). Several attempts have been made to simplify the interpretation of CPET, such as the nine-panel Wasserman plot or numerous decision trees (1,8,9). Although these tools might simplify the interpretation of CPET data by visualizing large amounts of data and decision processes, a substantial understanding of various physiological mechanisms is still required. In addition, this complexity in the interpretation and differences between decision trees as well as thresholds used to signal abnormal responses to exercise can lead to interobserver differences. To exploit the full potential of this clinically highly relevant tool, CPET interpretation needs to be further simplified and standardized to enable broader dissemination. As seen recently, machine learning (ML) seems to be a promising approach to achieve this (10,11).
Inbar et al. (10) developed an algorithm for CPET data able to discriminate between individuals with chronic heart failure and chronic obstructive pulmonary disease, and healthy counterparts. As the authors pointed out, the algorithm may be valuable to identify the investigated chronic conditions in clinical practice (10). Only recently, another important step forward was taken. Portella et al. (11) used ML to classify individuals according to their primary exercise limitation. Extending their approach by categorizing the severity of exercise limitations may be valuable considering that patients often present with a combination of exercise limitations (12). This would be particularly relevant in a patient population that is representative for clinical practice.
Thus, we aimed to provide a proof-of-concept and 1) determine the most important CPET parameters to identify pulmonary–vascular, mechanical–ventilatory, cardiocirculatory, and muscular exercise limitations; 2) create models that can automate the identification of exercise limitations and categorization of their severity; and 3) compare the accuracy of these models to expert consensus in a real-life scenario using CPET data of patients presenting at a pulmonary clinic.
METHODS
Study Design
This cohort study included 200 valid, historical CPET data sets from patients presenting at the Lung Centre (Bogenhausen-Harlaching), München Klinik, Germany. The CPETs were conducted between June 25, 2018, and February 15, 2019. No preselection was done based on the diagnosis, indication for the exercise test, or sex to obtain a real-life situation. The cohort thus included a larger fraction of patients with lung diseases than other diagnoses. However, this may still closely reflect the distribution of disease in patients referred for CPET in clinical practice. Although an ML algorithm should ideally apply to a wide range of individuals, we focused on patients presenting at a lung clinic to provide a proof-of-concept of our method before moving on to large and more diverse populations. This ensures the presence of a diagnosis and a more controlled development process regarding the availability of diagnoses but also consistency of data collection and devices used. The cohort was randomly split into a training (n = 100) and a confirmation group (n = 100) for the analyses. This was done to verify that the models not only perform well on the training data but also generalize effectively to the unseen data set. The study was approved by the Ethics Committee of the Technical University Munich (165/19 S-SR), and all procedures followed the Declaration of Helsinki.
Study Participants
CPET data were eligible if the following criteria were fulfilled: technically correct and valid data recording, complete CPET including inspiratory capacity maneuver (described elsewhere (13)), blood gas analysis and clinical report, sufficient patient compliance during testing (i.e., ability to tolerate physical effort and to obtain the target cadence on the cycle ergometer, and sufficient knowledge of the German language), and age >40 yr. The age limit was applied to reduce heterogeneity in the sample and ensure patients and reasons for referral are typical for clinical practice. CPET data were excluded in the case of premature, nonclinically justified test termination (i.e., no apparent symptom or organ limitation). These strict eligibility criteria are unlikely to be met in clinical practice. However, they were chosen to achieve a high number of parameters with valid data that can be included in the ML models. This will allow to detect key parameters relevant for identifying organ limitations from a wide range of parameters. If a subject was excluded, the subsequent subject (i.e., 201st subject) was included until 200 eligible data sets were obtained. More details on the number of CPET data that were excluded are available in the Results section.
Study Procedures
Expert rating of CPET data
Each of the 200 data sets was independently rated by two experts regarding pulmonary–vascular (related to the lungs and blood vessels that supply them), mechanical–ventilatory (related to mechanical aspects of breathing such as airway resistance and respiratory muscle function), cardiocirculatory (related to the cardiovascular system, including the heart, blood vessels, and circulatory function), and muscular limitations (related to muscle pathologies, e.g., metabolic, mitochondrial origin, not deconditioning) based on their experience. Experts perform >1000 CPETs yearly and publish regularly in this research area. The severity of the limitation was graded on a visual analog scale from 0 (no impairment) to 6 (severe impairment) for each category (Supplemental Fig. 1, Supplemental Digital Content, Visual analog scale used by exerts to rate limitations in the respective organ limitations, https://links.lww.com/MSS/C911). The 0–6 rating was based on the German school rating system. Rating combined limitations was possible. The mean of the two ratings was calculated for deviations ≤2 points. Substantial deviations of expert ratings (≥3 points) in a patient were resolved by involving a third expert.
Spirometry and CPET
Lung function was assessed via body plethysmography (MasterScreen Body; CareFusion Germany 234 GmbH, Höchberg, Germany). The procedures were performed according to respective guidelines (13–15). Forced expiratory volume in 1 s (FEV1) and forced vital capacity (FVC) were expressed as % of predicted (16).
CPET was performed on an electronically braked cycle ergometer (ER900; ergoline GmbH, Bitz, Germany). The breath-by-breath measurement of gas and airflow parameters was done using the MasterScreen CPX (CareFusion Germany 234 GmbH). Data were stored as 10-s means. Gas and ambient air calibrations were performed in standard fashion daily before the first test. Furthermore, the volume sensor was calibrated before each test. Cardiac function and heart rate were monitored by a 12-lead ECG. Peripheral oxygen saturation was recorded on the finger with a pulse oximeter (Vyaire Medical GmbH, Hoechberg, Germany). No measures of blood lactate concentration were done.
CPET started with a resting phase (2:30 min) used to assess the plausibility of gas and volume parameters while the patient was at rest (1). This phase was followed by 30 s of unloaded cycling. Subsequently, a ramp protocol with constantly increasing workload was initiated aiming at reaching maximum voluntary exertion of the patient within 8–12 min. Depending on the patient’s predicted maximum power, one of six ramp protocols were chosen with increments of 5, 10, 15, 20, 25, or 30 W·min−1. The test ended with a recovery phase consisting of cycling at a low workload. The protocol was chosen based on the physical activity history and subjective evaluation of the patient. Inspiratory capacity maneuvers were performed at rest and every 2 min during the ramp protocol as described elsewhere (17). V̇O2peak was recorded as the mean of the three highest consecutive 10-s intervals of V̇O2 values during the test (30-s mean). V̇O2peak as % of predicted was calculated according to the formulas in Wasserman et al. (18).
Maximum heart rate and respiratory exchange ratio were reported as the highest 10-s interval before the recovery phase of the test. Test duration refers to the time from the end of the unloaded phase to the start of the recovery phase. Ventilatory efficiency (V̇E/V̇CO2) refers to the lowest ratio of ventilation and carbon dioxide production (nadir V̇E/V̇CO2), whereas V̇E/V̇CO2 slope refers to the slope of these two parameters from the beginning of the ramp protocol to the second ventilatory threshold (19). In case a change in the slope of V̇O2 in relation to work rate (in watts) from the beginning of the ramp protocol to the beginning of recovery phase (V̇O2/WR slope) was detected during visual inspection of panel 3 in the nine-panel Wasserman graph, the V̇O2/WR slope up to this point was determined (V̇O2/WR slope 1). Furthermore, V̇O2/WR slope from this point onward was determined (V̇O2/WR slope 2). The percentage change was then calculated based on these two slopes. If the change in slope occurred during the last 30 s before the recovery phase, this was excluded and not considered abnormal (20). Breathing reserve was calculated as 100% − (V̇Epeak [in L.min−1]/(FEV1 [in L] × 35)) × 100 (21). V̇O2/WR slope was calculated from the start of the ramp protocol to the start of the recovery phase.
Statistical Analysis
To provide a proof of concept, a sample size of N = 200 in the present study was chosen based on a prior study including N = 199 examinations to develop a standardized procedure for the interpretation of CPET (9).
Analyses and figures were done in R version 4.1.3 (22). Data are presented as mean (SD) unless stated otherwise. Decision trees are a well-known regression method for building an interpretable predictive model (23). They represent a function that takes all measured variables (Table 1) as an input vector of attribute values and returns a “decision”—a single output value as a predicted limitation score (24).
Full list of all variables included in the analyses.
Variables | Abbreviations in Figures |
---|---|
Sex | — |
Age | — |
Heart rate–reducing drugs (yes/no) | HR_drugs |
Breathing frequency at rest | BF_rest |
Oxygen uptake at rest | V̇O2_rest |
Respiratory exchange ratio at rest | RER_rest |
Respiratory exchange ratio variation at rest in percent | RER_rest_deltapercent |
Minute ventilation volume at rest | VE_rest |
Reached maximal oxygen consumption in percent of predicted | V̇O2peak_percent |
Maximal reached respiratory exchange ratio during exercise | RER_exercise |
Ratings of perceived exertion during exercise (scale 0–10) | RPE_exercise |
Maximum heart rate during exercise | HR_exercise_is |
Lowest V̇E/V̇CO2 ratio during the CPET | nadir_VEVCO2 |
Duration of CPET | duration_exercise |
Exercise oscillatory ventilation (present/absent) | EOV |
Slope of V̇E/V̇CO2 from the beginning of the ramp protocol to the second ventilatory threshold | VE/VCO2slope |
Breathing reserve in percent | BRR |
Difference in arterial oxygen saturation between rest and exercise cessation | DeltaSaO2 |
Slope of increase in V̇O2 to work rate | VO2WRslope1 |
Inflection of slope of V̇O2/work rate in percent | inflectionpercent |
Systolic blood pressure during exercise | RRsys_exercise |
FVC in percent | FVCpercent |
FEV1 in percent | FEV1percent |
Ratio of FEV1 and FVC in percent | FEV1FVCpercent |
Heart rate reserve in percent during exercise | HRRpercent |
Exercise-induced hypertension (yes/no) | Exercise_hypertension |
Exercise-induced hypotension (yes/no) | Exercise_Hypotension |
Dynamic hyperinflation (present/absent) | Dyn_Hyper |
Exhaustion criteria combination (yes/no) | exhaustion_DT |
Clinical exhaustion criteria (yes/no) | exhaustion_clinic |
Exhaustion criteria combination: fulfillment of a combination of cardiometabolic exhaustion criteria. Clinical exhaustion criteria: fulfillment of ≥1 cardiometabolic exhaustion criterion.
The R-package “caret” (version 6.0-91) was used to generate regression-based decision trees (25). The most accurate decision tree of each limitation category, with respect to the smallest root mean square error (RMSE) in comparison to 500 randomly generated decision trees, was presented graphically to show the corresponding thresholds of the parameters included. These decision trees may therefore be useful to identify pulmonary–vascular, mechanical–ventilatory, cardiocirculatory, or muscular limitations and help quantify their severity. Assessed limitations were combined through conditional inference trees using “partykit” (version 1.2-15) (26). Random forests combine several randomized decision trees and aggregate their predictions by averaging (27). Therefore, the feature importance (FI) of all variables can be analyzed for each limitation. We used “randomForest” (version 4.7-1) with default ntree = 500 to train a random forest based on ntree randomized decision trees for imputation of the data, as well as for random forests analysis (28). All models were trained using data of the training group and tested based on data of the confirmation group. Errors of random forests and decision trees are presented as mean absolute error (MAE) and/or RMSE. MAE_0 and RMSE_0 are error measures of the null models that use only the mean of the confirmation group. MAE and RMSE are error measures of the trained models for the confirmation group. Both metrics reflect how far the predicted and actual errors (mean expert rating) lay apart. To examine the utility of ML in a real-life scenario, we further generated a combined decision tree that is capable of identifying the presence of multiple limitations within the same patient as well as rating their severity.
RESULTS
Twenty-nine CPETs were excluded because of incompleteness (missing inspiratory capacity maneuver (n = 19), missing blood gas analysis (n = 2), incomplete examination report (n = 8)), 32 because of a lack of compliance during CPET, and 21 because of invalid/implausible data (calibration error (n = 4), implausible heart rate data (n = 3), implausible CPET trajectory (n = 10), technical artifacts (n = 4)) until the target number of valid CPET data sets was achieved. Reasons for referral were preoperative examination (i.e., patients planned for lung resection due to lung carcinoma), evaluation of therapy progress, and unexplained dyspnea. The cohort characteristics are presented in Table 2 and medical diagnoses in Supplemental Table 1, Supplemental Digital Content, https://links.lww.com/MSS/C911).
Characteristics of the full cohort (N = 200) and stratified by group.
Variable | Total | Training Group (n = 100) | Confirmation Group (n = 100) |
---|---|---|---|
Female sex, n (%) | 97 (48.5) | 52 (52.0) | 45 (45.0) |
Age, Mdn (IQR) [min, max], yr | 69 (61–75) [41, 94] | 68 (61, 74) [42, 85] | 71 (62, 77) [41, 94] |
Body mass index, Mdn (IQR), kg·m−2 | 25.3 (22.3–30.7) | 25.7 (22.3–30.8) | 24.3 (22.3–30.3) |
FEV1, Mdn (IQR), % of pred. | 80.8 (61.4–94.8) | 80.7 (67.7–94.5) | 81.0 (57.6–94.9) |
FVC, Mdn (IQR), % of pred. | 90.6 (73.7–100.6) | 91.2 (78.4–101.3) | 86.6 (72.1–99.9) |
FEV1/FVC, Mdn (IQR), % pred. | 94.3 (79.7–105.1) | 92.5 (83.2–103.9) | 95.1 (76.9–105.8) |
V̇E/V̇CO2 slope, Mdn (IQR) | 35.0 (31.0–42.0) | 35.0 (31.1–41.6) | 35.0 (30.6–43.3) |
V̇O2/WR slope | 9.5 (1.6) | 9.5 (1.5) | 9.5 (1.6) |
P max, Mdn (IQR), W |
89.0 (67.0–119.0) | 92.5 (70.5–122.8) | 88.0 (60.5–113.5) |
V̇O2peak, Mdn (IQR), L·min−1 | 1.20 (0.97–1.45) | 1.23 (1.03–1.46) | 1.17 (0.95–1.44) |
V̇O2peak, Mdn (IQR), mL·min-1.kg−1 | 15.8 (12.6–19.7) | 16.2 (12.6–20.3) | 15.6 (12.3–19.2) |
V̇O2peak, % of pred. | 78.5 (22.6) | 79.7 (22.7) | 77.3 (22.5) |
RER | 1.1 (0.1) | 1.12 (0.1) | 1.10 (0.1) |
HRmax, % of pred. | 95.9 (14.3) | 95.6 (14.0) | 96.2 (14.6) |
Medication | |||
β-Blockers, n (%) | 61 (30.5) | 28 | 33 |
Antiarrhythmics, n (%) | 93 (46.5) | 43 | 50 |
Lipid-lowering drugs, n (%) | 66 (33.0) | 32 | 34 |
Antidiabetics, n (%) | 28 (14.0) | 16 | 12 |
Diuretics, n (%) | 60 (30) | 26 | 34 |
Antihypertensives, n (%) | 91 (45.5) | 48 | 43 |
PDE5-antagonists, n (%) | 6 (3.0) | 4 | 2 |
Endothelin-receptor antagonists, n (%) | 4 (2.0) | 2 | 2 |
Guanylate-cyclase stimulators, n (%) | 1 (0.5) | 1 | 0 |
Neprilysin inhibitors, n (%) | 1 (0.5) | 0 | 1 |
All data are presented as mean (SD) unless stated otherwise.
Expert Review of Exercise Limitations
Descriptive statistics of expert ratings are displayed in Table 3. Differences between the expert rating and the decision tree-based rating of organ limitations are shown in Supplemental Figure 2, Supplemental Digital Content, https://links.lww.com/MSS/C911.
Descriptive statistics of expert ratings as limitation points stratified by limitation category for the full cohort (N = 200).
Limitation Category | Mean (SD) [Min, Max] | Sum of Limitation Points | Mean (SD) Deviation between Experts |
---|---|---|---|
Pulmonary–vascular | 1.8 (1.7) [0, 6] | 357.5 | 1.1 (1.2) |
Mechanical–ventilatory | 1.8 (1.8) [0, 6] | 353.5 | 1.0 (1.2) |
Cardiocirculatory | 1.0 (1.3) [0, 6] | 205.5 | 1.1 (1.2) |
Muscular | 0.1 (0.3) [0, 2.5] | 14.5 | 0.4 (0.9) |
Limitation-Specific Decision Trees
Pulmonary–vascular decision tree
Our trained random forest analysis yielded nadir V̇E/V̇CO2 (FI 0.91) and V̇E/V̇CO2 slope (IS 0.77) as the most relevant parameters to detect pulmonary–vascular limitations (Fig. 1A). The null model with only the mean of the confirmation group had an RMSE_0 of 1.78. After training, the error decreased by 50% to an RMSE of 0.89. The decision tree with the lowest RMSE in this category (RMSE = 0.93) is displayed in Figure 1B.
Mechanical–ventilatory decision tree
To detect mechanical–ventilatory limitations, breathing reserve (FI 1.05), FEV1 (FI 1.00), and FVC (FI 0.44) were the most important parameters (Fig. 2A). RMSE_0 was 1.76 and RMSE was 1.03 (41% decrease). The RMSE of the most accurate decision tree was 1.28 (Fig. 2B).
Cardiocirculatory decision tree
V̇O2peak (FI 0.56), percentage change of V̇O2/WR slope (FI 0.37), and V̇O2/WR slope (FI 0.24) had the highest FI to detect cardiocirculatory limitations (Fig. 3A). RMSE_0 and RMSE were 1.24 and 0.99, respectively (20% decrease). The most accurate decision tree in this category had an RMSE of 1.26 (Fig. 3B).
Muscular limitations decision tree
It was not possible to train a model as indicated by a larger RMSE after training. Muscular limitations were largely absent in our cohort as evident by the expert rating (Table 3 and Supplemental Fig. 2, Supplemental Digital Content, Expert ratings for each patient in the training group compared with the rating of the decision tree for all four limitation categories, https://links.lww.com/MSS/C911).
Explorative Analyses
Using an 80:20 split (80% of the cohort in the training data set and 20% in the confirmation data set) with fivefold cross-validation to perform the previously presented analyses yielded a similar RMSE for the pulmonary–vascular decision tree (1.00 ± 0.21 vs 0.93), the mechanical–respiratory (1.31 ± 0.20 vs 1.28), and cardiocirculatory decision tree (1.31 ± 0.15 vs 1.26). Detailed results are presented in Supplemental Figures 3–8, Supplemental Digital Content, Random forests and most accurate decisions trees of the pulmonary–vascular, mechanical–ventilatory, and cardiocirculatory limitation categories, https://links.lww.com/MSS/C911.
Combined Decision Tree
Figure 4 shows the combined decision tree based on all 200 CPET data sets. The analyses yielded nine categories of limitations (displayed as boxplot columns; see Supplemental Table 2, Supplemental Digital Content, Limitations categories of the combined decision tree based on 200 CPET data sets, https://links.lww.com/MSS/C911).
DISCUSSION
The major novel findings were as follows: 1) we found good trainability of random forests and decision trees for detecting specific limitation patterns with accuracies comparable to expert consensus for pulmonary–vascular, mechanical–ventilatory, and cardiocirculatory but not muscular limitations; 2) nadir ventilatory efficiency for CO2 and ventilatory efficiency slope for CO2 were central to identify pulmonary–vascular limitations; breathing reserve, FEV1, and FVC for mechanical–ventilatory limitations; and V̇O2peak, O2 uptake/work rate slope, and % change of the latter for cardiocirculatory limitations; 3) decision trees yielded parameter thresholds for the interpretation of organ-specific limitations and their severity; and 4) we demonstrated the feasibility/ability of ML techniques to create algorithms performing a comprehensive cross-category quantification of organ limitations reflecting a real-life situation.
Expert Review of Exercise Limitations
Although all reviewers were experts in CPET, the data indicate that opinions may diverge when it comes to rating the degree of limitation. This again highlights the difficulty of CPET interpretation and the need for simplifying this process.
Limitation-Specific Decision Trees
We defined a total of eight crucial parameters to identify pulmonary–vascular, mechanical–ventilatory, and cardiocirculatory but not muscular limitations. The RMSE of the best decision tree of each of the aforementioned limitation categories ranged from 0.93 to 1.28 points indicating comparable accuracy of ML approaches to expert ratings (mean difference ranging from 1.0 to 1.1; SD, 1.2 points).
Pulmonary–vascular decision tree
The pulmonary–vascular model showed the best trainability. Nadir V̇E/V̇CO2 and V̇E/V̇CO2 slope were the most important parameters to predict these limitations. In the most accurate decision tree, the most severe pulmonary–vascular limitations were seen in patients with both high nadir V̇E/V̇CO2 and steeper V̇E/V̇CO2 slope. This is in line with the current scientific consensus (30). Ventilatory efficiency is recognized as a central predictor of pulmonary vascular conditions, that is, pulmonary arterial hypertension or chronic thromboembolic pulmonary hypertension, and its use is widely established (30). However, different ways of expressing ventilatory efficiency exist (described elsewhere (30–32)). V̇E/V̇CO2 slope from the onset of incremental exercise to the second ventilatory threshold was used in this article. For this form, consensus regarding thresholds used to indicate pathological ventilatory inefficiency is lacking (30). Our results suggest the use of 34 for nadir V̇E/V̇CO2 and 42 for V̇E/V̇CO2 slope allowing for differentiation between patients with pulmonary–vascular limitation scores of 0.75 and 3.1 as well as 2.3 and 3.9, respectively (no to mild limitation vs moderate to severe limitation). To our knowledge, these are the first data-based cutoffs for a cohort resembling real-life patients. Most available studies focused on specific diseases. For instance, Dumitrescu et al. (33) found that a threshold for nadir V̇E/V̇CO2 of 35.5 had the highest sensitivity and specificity for detecting pulmonary arterial hypertension in patients with systemic sclerosis (N = 173). We aimed at detecting pulmonary–vascular limitations in general and not specific conditions. In this case, a nadir V̇E/V̇CO2 of 34 might be most suitable. Also considering that in Figure 1B, a V̇E/V̇CO2 slope of 42 was used to distinguish individuals from the most severe limitation in this category, a higher threshold compared with, for example, 34, which was used to predict cardiac-related mortality in heart failure (34), seems to be justified. Importantly, these two parameters are not specific to pulmonary–vascular limitations but may also be abnormal in cardiac limitations (e.g., heart failure (34)) or patients presenting with combined limitations (e.g., chronic obstructive pulmonary disease with pulmonary hypertension (35)). In such cases, parameters specific to other organ systems may help rule out other limitations or identify combined organ limitations as they often occur in clinical practice (discussed later on).
Mechanical–ventilatory decision tree
The random forests of the mechanical–ventilatory category with the highest IS are also in line with the current scientific consensus (14). FEV1 and FVC are key parameters of spirometry and routinely used to diagnose ventilatory dysfunction at rest (14). Breathing reserve provides additional insight into ventilatory function during exercise, with lower values indicating a ventilatory limitation in clinical populations (1). FEV1 can be used to identify patients with the most severe mechanical–respiratory limitation (severity score of 4.1) in the most accurate decision tree. Breathing reserve was also a node in this tree and located below FEV1. This makes sense from a clinical standpoint because patients with substantial ventilatory dysfunction at rest (as evident by subnormal FEV1) would be expected to have the most severe limitation in this category. Normal lung function at rest does, however, not imply that lung function may not be limiting during intensive exercise. Here, breathing reserve comes into play. Thus, for patients with moderate mechanical–respiratory limitation (limitation score of 2.8), a low breathing reserve as well as V̇O2peak below 96% of predicted were critical. CPET is therefore most valuable for identifying organ limitations when combined with spirometry at rest (1).
Cardiocirculatory decision tree
Cardiocirculatory limitations were best detected through V̇O2peak, inflection of V̇O2/WR slope in percent, and V̇O2/WR slope. Although low V̇O2peak is not specific to cardiocirculatory limitations, the latter two parameters are (36). V̇O2 failing to increase linearly with WR has been highlighted as a sign of cardiocirculatory limitation already by Wasserman (5). Percentage change of V̇O2/WR slope was also an important node in a decision tree by Schmid et al. (9). In addition to these parameters, heart rate and oxygen pulse responses during exercise are attenuated (5,36). Moreover, the oxygen uptake efficiency slope (slope of regression line between log10 V̇E and V̇O2) is flattened, and the ventilatory threshold 1 (defined by V-slope method) is reached at a lower relative exercise intensity in patients with this limitation (36). To differentiate between specific conditions, other parameters than the inflection of V̇O2/WR slope and V̇O2/WR slope may be more adequate (36). However, if the aim is to identify underlying organ limitations, for example, in an individual with unexplained dyspnea, the parameters proposed in this study (i.e., V̇O2peak, inflection of V̇O2/WR slope in percent, and V̇O2/WR slope) may be preferred. Interestingly, V̇O2peak followed by FEV1 was the only two parameters included in the most accurate decision tree to categorize patients according to the severity of their cardiocirculatory limitation. Patients with V̇O2peak <63% of predicted but without apparent mechanical–ventilatory limitation (in the present study: FEV1 > 67% of predicted) were the most limited in this category (limitation score of 3.9).
Muscular limitations decision tree
No decision trees could be trained for muscular limitations. This may be explained by the low number of patients with muscular limitations as well as their mild severity, evident by a mean limitation score of 0.1 (0.3) on a scale from 0 to 6 (see Table 3 and Supplemental Fig. 2, Supplemental Digital Content, Expert ratings for each patient in the training group compared with the rating of the decision tree for all four limitation categories, https://links.lww.com/MSS/C911). To tackle this in future studies, it may be advisable to specifically include patients with muscular pathologies, that is, mitochondrial myopathies or McArdle disease (37), and a wide range of severity.
Although less likely in this case, the parameters included in the models being too unspecific to identify muscular limitations might be another explanation. Although this is not done routinely, including measurements of blood lactate concentration might be helpful to identify muscular limitations using ML. Drawing blood lactate samples at set time points throughout CPET might reveal an altered anaerobic metabolism and might therefore be useful to rule out other organ limitations (37). To assess general muscular deconditioning, measuring handgrip strength might help to detect strength deficits and is time-saving and inexpensive and (29,38). However, it needs to be investigated whether these measurements improve the models in detecting either myopathies and/or deconditioning.
Explorative Analyses
Including more patients in the training group (training-confirmation ratio, 80:20) yielded similar RMSEs compared with the 50:50 split. This indicates that larger sample sizes alone may not improve the precision of these models. Only few patients were severely limited in one or several of the organ systems (Table 3). It seems thus more important to also increase the variance of limitations in the cohort. A greater number of expert ratings for each patient would likewise be valuable considering the deviation between experts (mean discrepancy ranged between 1.0 and 1.1 points).
Combined Decision Tree
In clinical practice, patients commonly present with a combination of exercise limitations; for example, patients with chronic obstructive pulmonary disease may develop peripheral limitations secondary to respiratory limitations (12). The patient’s medical condition may thus not be described by a single limitation category but a combination of several. The combined decision tree in Figure 4 allows the classification of exercise intolerance into three categories: pulmonary–vascular, ventilatory–mechanical, and cardiocirculatory limitations. Furthermore, a subclassification is done, making it possible to reflect combinations of limitation categories. These results are promising as they indicate that a real-life classification using ML approaches is possible.
Ending this section, we want to point out another particular strength of a combined algorithm. Individual parameters, for example, ventilatory efficiency, not only reflect pulmonary–vascular limitations but could also be altered through processes induced by healthy aging (39). Moreover, depending on the context, individual CPET parameters may connote abnormality, whereas, in fact, the subject is limitation-free and vice versa. For instance, breathing reserve less than 30% is not rare in athletes and suggests great effort as well as motivation rather than ventilatory–mechanical limitations (40). On the other hand, breathing reserve was weakly associated with dyspnea burden in patients with chronic obstructive pulmonary disease (41). Our combined algorithm incorporates numerous CPET parameters (Table 1) in the identification of specific limitation patterns. Although a single parameter may not be sufficient to rule in a specific limitation category, additional parameters may be used to rule out other categories. This may be an advantage over traditional decision trees and correspond to the expert approach.
Synthesis of Our Findings with Available ML Algorithms
As mentioned in the Introduction section, two research groups developed similar algorithms. Inbar et al. (10) provided a proof of concept that ML may be used to successfully discriminate between healthy individuals, patients with heart failure, and those with chronic obstructive pulmonary disease. However, they did not report key features of their models. Portella et al. (11) used ML to classify individuals according to their main exercise limitation into four categories (cardiac, pulmonary, others, and normal response). Their method is different from ours because they analyzed the average data from 30-s intervals of selected CPET parameters (11). In contrast, we used precalculated parameters that are commonly used to detect exercise limitations, which hampers the direct comparison of results. Our key features of the pulmonary–vascular limitations model are partly in line with theirs. Although parameters describing the trajectories of V̇O2 and V̇CO2 were also important in our model, they additionally found % of O2 pulse predicted and maximum respiratory exchange ratio to be relevant for detecting limitations in this category (11). However, it should be noted that Portella et al. (11) did not differentiate between pulmonary–vascular and mechanical–ventilatory limitations, as was done in our study. Similarly, the key features of the cardiocirculatory limitations identified in our study are somewhat consistent with those in Portella et al. (11). V̇O2peak is included in both studies, whereas V̇O2/WR slope was only relevant in our model and V̇E slope as well as heart rate slope only in theirs.
Taken together, Inbar et al. (10) and Portella et al. (11) demonstrated the potential of ML algorithms to differentiate not only between selected conditions but also between primary organ limitations, mirroring the work of experts in clinical practice. These are the first crucial layers in the process of developing an accurate and implementable algorithm to detect exercise limitations using CPET data.
In line with this, our findings add another layer showcasing the potential of ML to identify and rate the severity of pulmonary–vascular, mechanical–ventilatory, or cardiocirculatory exercise limitations. Moreover, our research shows the capability of ML to detect and quantify even combined exercise limitations within a patient.
Although all three studies highlight the promising prospects of ML together with CPET data in clinical practice, there is also consensus that a heterogeneous patient population encompassing exercise limitations in all four categories and various degrees of severity may be helpful in the advancement of such ML algorithms. Moreover, standardized expert ratings of limitation and severity would advance the development process.
Limitations
First, 29 data sets were excluded because of incompleteness before obtaining 200 valid data sets. This may suggest limited applicability in clinical practice. However, our analyses reduced the multitude of parameters to only a few, meaning that less patients will be excluded in the future, for example, because of missing inspiratory capacity maneuver, as this is not part of our key parameters. On the other hand, 32 tests were excluded because of lack of compliance. Although insufficient compliance prevents the adequate determination of V̇O2peak, it is to be investigated whether our algorithm may also work with such data. Second, a larger sample size of patients and expert ratings will be beneficial to improve these models. This would yield a greater variance in the type and severity of limitations. Implementing an international database would facilitate this and may ultimately make the ML-based interpretation of CPET data ready for use in clinical practice. Finally, patients with unexplained dyspnea have been shown to be more often referred for CPET than other symptoms, for example, fatigue or chest pain (42). This could have induced bias.
CONCLUSIONS
This study identified few robust parameters crucial for identifying exercise limitations that are also considered important in clinical practice. Furthermore, we defined data-based cutoffs for relevant parameters that proved helpful for ML-based CPET interpretation. Combining these two aspects in the presented ML models allows for an automated interpretation, categorizing patients by the degree of impairment in the respective limitation category and facilitating clinical interpretation of CPET and decision making. These key aspects set our method apart from available decision trees and recommendations for CPET interpretation that are mostly based on expert experience. Finally, cross-category decision trees may be possible, improving real-life classification of patients. These findings may be generalizable to patients presenting at lung clinics in real-life practice, provided that CPET data are complete and valid. This study may enhance CPET being an even more frequently used assessment instrument of CRF and organ limitations in patients with cardiovascular or pulmonary entities.
We thank Dr. Johann-Jakob Schmid and Mirko Gadza of Schiller AG, Baar, for their support in soliciting and implementing the Innosuisse project.
This study was funded by Innosuisse – Swiss Innovation Agency (Project No. 28081.1). R. K. was funded by the Swiss National Science Foundation (Grant P2BSP3_191755). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
The authors declare no conflict of interest. The results of the study are presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation. The results of the present study do not constitute endorsement by the American College of Sports Medicine.
F. S. and A.-K. B. wrote the original draft of the manuscript. A.-K. B. was responsible for data curation and screened the CPET data. M. N.-H. performed the statistical analyses, created the figures, and contributed to writing, review, and editing. V. R. planned and supervised the statistical analyses. A. S.-T., F. J. M., and A. H. were responsible for the conceptualization, supervision, rating of limitation categories, review, and editing. R. K. contributed to conceptualization, review, and editing. D. D. contributed the scale for the rating of organ limitations, review, and editing. A. S.-T. was responsible for funding acquisition. All authors read and approved the final version of the manuscript.
Data availability: The data underlying this article will be shared on reasonable request to the corresponding author.
REFERENCES