The data utilized in this study has been previously described [3]. In summary, the study presents findings from a secondary analysis of data obtained through a cross-sectional, self-administered paper survey conducted in collaboration with the patient organization German League against Rheumatism (Deutsche Rheuma-Liga, Landesvertretung Brandenburg) and outpatient rheumatologists. This survey was part of an exploratory mixed-methods study spanning over 2 years, aiming to investigate the acceptance, opportunities, and barriers to the implementation of TM [3]. Data collection took place between September 1 and December 30, 2019. The questionnaire’s preliminary version was crafted by a team comprising two healthcare researchers and two rheumatologists. This initial draft was informed by insights gathered from expert interviews [3]. In the subsequent phase, the draft underwent a comprehensive review and modification process at the hands of the German League against Rheumatism (Deutsche Rheuma-Liga, Landesvertretung Brandenburg e.V.). The feedback of the patient representatives was discussed during a teleconference and then incorporated into the questionnaire. To refine the questionnaire further, a pretest involving 30 RMD patients was conducted. This evaluation aimed to assess the questionnaire’s clarity, wording and the exhaustiveness of predefined response options. Subsequent to this pretest, minor revisions were applied to enhance its precision and relevance. The final version of the questionnaire (Supplementary Material), spanning five pages, encompassed 24 questions thoughtfully grouped into four essential sections: (1) medical care, (2) technology usage, (3) telemedicine and (4) personal data. Response options were categorized as nominal or ordinal, ensuring a nuanced understanding of participants’ experiences. Additionally, the questionnaire featured open-ended queries, encouraging participants to express their thoughts freely. Supplementing the questionnaire was pertinent study information, including a definition of telemedicine illustrated through practical examples: “Telemedicine involves the utilization of information and communication technology in medical treatment to bridge geographical gaps. For instance, this could involve a video consultation with a physician for a visual joint examination or a phone conversation with a doctor to assess the effectiveness of prescribed medication”. This contextual information aimed to enrich participants’ understanding of the survey’s scope and purpose.
The survey’s inclusion criteria required participants to meet the following conditions: (1) receiving treatment in rheumatology care; (2) being 18 years or older; and (3) residing in Germany. Sampling was conducted through a non-probability, voluntary approach, which involved collaborating with (1) patient organization German League against Rheumatism’s working groups, (2) outpatient rheumatology practices and (3) inpatient rheumatology wards. The questionnaires were provided to representatives from these institutions, who then distributed them to eligible individuals meeting the specified inclusion criteria.
Data selection/population considered
From the aforementioned German nationwide survey, a dataset of 438 patients in total was available. Each patient answered 24 questions related to socio-demographics and health characteristics. Individuals with missing answer or that answered “do not know” regarding TM try were considered as a distinct category leading to three categories: “yes”, “no” and “not answered/do not know”.
Statistical analysis
All statistical analyses were performed using R software 4.1.2® (R Core Team, Vienna, Austria) for Windows 10©.
Machine learning algorithm selection
To select the best ML algorithm, a subset of the dataset was used, with the inclusion of only RMD patients that answered “yes” or “no” to TM try. For this algorithm selection, 294 patients (67.1%) were considered. A total of 12 ML algorithms were used to identify key features contributing to TM try, namely logistic regression, Lasso regression, ridge regression, support vector machine (SVM) using linear classifier, SVM using polynomial basis kernel, SVM using radial basis kernel, random forest, neural network, AdaBoost, k-nearest neighbors, naive Bayes and extreme gradient boosting (XGBoost).
Downsampling was used to produce a balanced 80%/20% train/test split [6]. The training split was used to generate the learned models, while the testing dataset was used for the validation phase to assess the performance of each model to predict the class labels (answering yes or no to TM try). One-hot encoding was applied to all categorical variables with more than two categories, with missing data considered as a category and the elimination of one category for each factor to avoid multicollinearity [7]. Collinear covariates, with a variance inflation factor > 2.5, were excluded from the analysis [8]. Continuous variables were standardized by removing the mean and scaling them to unit variance before ML [9].
Nested cross-validation was used for each model to limit overfitting [10]. More specifically, each of the models underwent a tenfold cross-validation and the classifier hyperparameters were tuned. A random-search approach for model parameter tuning was used to determine the optimal combination of hyperparameters for maximizing accuracy to generate the best model parameters [7]. To ensure robust results, the cross-validation was performed ten times using a different random number generator seed each time [7].
Evaluation indicators, including area under the receiver operator characteristic (AUROC), precision (positive predictive value), recall (sensitivity), balanced accuracy, F1 measure, kappa, specificity, detection rate, detection prevalence, no information rate, average precision, and prevalence were calculated to compare ML algorithms. The best performing model was selected based on the mean AUROC.
Lasso and ridge regression were performed using the glmnet package version 4.1–6 [11]. The XGBoost algorithm was computed using the xgboost package version 1.6.0.1 [12]. The caret package version 6.0–93 was used to perform all other ML models as well as to calculate the confusion matrix [13]. The pROC package version 1.18.0 [14] was used to calculate the AUROC, while the MLmetrics package version 1.1.1 [15] was used to compute evaluation metrics not provided by the caret package.
Identification of TM try predictors
For the best performing ML model, a multinomial/multiclass ML approach was performed with the consideration of the three following classes: “yes”, “no”, “do not know/not answered”. Both one-vs.-one and one-vs.-rest strategies were used. Hence the following binary classifications were performed: no vs. rest, yes vs. rest, not answered/do not know vs. rest, yes vs. no, yes vs. not answered/do not know, and no vs. not answered/do not know. For each classification, the feature importance was investigated using Shapley additive explanation (SHAP) that showed each feature’s impact on the model prediction [16, 17]. This analysis indicates to which extent and in which direction (wanting to use TM versus not wanting to use TM) a certain feature influences the ML model.