Uncategorized

Machine learning-based prediction model for myocardial ischemia under high altitude exposure: a cohort study


Study design and population

In this study, we employed a prospective cohort study design. The sample population was drawn from soldiers who received health examinations at the 920th Hospital of the Joint Logistic Support Force between January 2022 and June 2022, and who were scheduled to undergo high-altitude training (at an altitude of 3000–3500 m) within six months. Inclusion criteria: 1. Male and female military personnel aged ≥ 18 & ≤ 60; 2. Underwent health examination at the 920th Hospital between January 2022 and June 2022; 3. Scheduled to undergo high-altitude training within 6 months of health examination; 4. Completed first self-reported questionnaire; 5. Completed second health examination and questionnaire at high-altitude training site. Exclusion criteria: 1. History of structural heart disease, hypertension, coronary artery disease, chronic obstructive pulmonary disease, pneumonia, asthma; 2. Abnormal ECG findings. 3. Did not ultimately participate in high-altitude training abnormal electrocardiogram; 4. Developed acute mountain sickness, severe training injuries, or COVID-19 during study period.

In total, 4000 individuals participated in the health examination, of whom 3800 were men and 200 were women aged 18 to 54 years old. The examination included chest X-rays, electrocardiograms, ultrasounds, and hematological tests. After careful screening, we excluded a total of 1093 individuals from the study. Specifically, we excluded 189 cases (both male and female) with a history of pneumonia, asthma, hypertension, abnormal electrocardiogram findings (including ST-T changes, abnormal Q waves, and arrhythmia), and myocardial hypertrophy, as well as 904 individuals who did not ultimately participate in high-altitude training. The remaining 2907 soldiers completed the first self-reported electronic questionnaire, which included smoking status, duration and amount of smoking, recent physical ability test (3 km test, Sit-ups and Serpentine Run) scores (Table 1), Chesttightness or Chestpain, altitude of residence, education level, marital status, and content from the Scale for Outcomes in Parkinson′s Disease for Autonomic Symptom (SCOPA-AUT) score4. A medical team performed a health examination on the population at the high-altitude training site from December 2022 to January 2023, which included chest X-rays, electrocardiograms, heart rate (HR), systolic blood pressure (SBP) and diastolic blood pressure (DBP), oxygen saturation (OS), and a second self-reported electronic questionnaire. We also obtained electronic medical records from the previous six months and excluded 52 individuals with acute mountain sickness, severe training injuries, or COVID-19 infections, resulting in a final sample of 2855 individuals, including 2810 men and 45 women (Fig. 1; Additional file 1).

Table 1 Characteristics of participants in the training and test datasets.
Figure 1
figure 1

Workflow for data management and myocardial ischemia prediction model development. Data came from 4000 adults who completed physical exams at the 920th Hospital of Joint Logistics Support Force between January and June 2022 and prepared to enter the plateau within 6 months. After exclusion, there were 2855 people remaining. Participants’ data (n = 2855) were randomly assigned to a prediction model training dataset (n = 2141) and a test dataset (n = 714) in a 3:1 ratio following preprocessing. fivefold cross-validation was used for training and selecting the prediction model, and five classification algorithms were evaluated. Feature selection was conducted using the RFE (Recursive Feature Elimination) algorithm. The final prediction model was validated using a test dataset. LR logistic regression, RF random forest, XGBoost eXtreme gradient boosting, KNN K-nearest neighbor, SVM support vector machines.

Outcome variable

In the present study, the diagnosis of myocardial ischemia was determined through evaluation of electrocardiogram (ECG) results, specifically based on the following criteria: (1) the presence of horizontal or downsloping ST depression of greater than 0.5 mm at the J-point in at least two consecutive leads, and (2) the presence of T wave inversion with a depth of at least 1 mm in at least two continuous leads with a dominant R wave (R/S ratio > 1)5.

Considering the challenges of large-scale population screening at high altitudes, we are restricted to the use of ECG alone for detecting myocardial ischemia. The definition of myocardial ischemia in this research is based on ECG examinations conducted during resting conditions throughout the high-altitude training period. Participants were required to abstain from any form of physical training, including various sports activities, for a minimum of 48 h prior to the ECG examination.

Candidate predictors

In this prospective cohort study, we incorporated clinical results from two health examinations conducted prior to and following entry into high-altitude areas as initial variables. Variables such as the number of health examinations and home address were excluded, as were highly correlated variables such as education level and marital status. Univariate analysis was performed and only hematological test results with a P-value < 0.001 were retained in the final variables. Subsequently, feature selection and engineering were applied to select the most pertinent features and create new variables that could prove useful for our predictive model. To achieve our goal of predicting the risk of myocardial ischemia upon entering high-altitude areas, we included 27 variables in our machine learning analysis, such as age, gender, body mass index (BMI), SBP, DBP, HR, OS, Highland acclimatization training, Altitude of original station, 3 km test in Non-High Altitude (Score), Sit-ups in Non-High Altitude (Score), Serpentine Run in Non-High Altitude (Score), and SCOPA-AUT SCORE in Non-High Altitude (Table 1).

Sample size estimation

The use of machine learning models in predicting binary clinical outcomes has gained significant attention in recent years, but determining the appropriate sample size for these models remains a challenge. While no established method for determining sample size currently exists, a conventional approach suggests a minimum of 10 outcome events per variable for building a binary clinical outcome prediction model6. In this study, we utilized the R pmsampsize package to calculate a sample size of 1100 for a traditional logistic regression model and our study utilized a sample size greater than this calculation.

Datasets

We preprocessed the data by standardizing it and dividing it randomly into a training set (75%) and a test set (25%). The outcome variable was dichotomized, and continuous and categorical variables were analyzed using appropriate statistical tests. Significant variables (p < 0.001) were included in subsequent analysis focused on hematological findings. Prediction models were constructed using algorithms and features and evaluated in the test set. The analysis was conducted using R software.

Identification and validation of the prediction model

Our study aims to develop a machine learning model to categorize individuals as having “myocardial ischemia” or not. Model efficacy will be evaluated using the area under the receiver operating characteristic curve (AUC)7. The dataset will be randomly divided into training and test sets using a stratified sampling approach. We will use LR8, RF9, XGBoost10, KNN11, and SVM12 algorithms to fit models to the training set and validate them on the test set. The best algorithm will be selected based on AUC scores and calibration curve performance.

We used the Recursive Feature Elimination (RFE) algorithm13 to identify significant variables for a more optimal and clinically feasible model. The resulting model, based on the most influential variables, was compared to the full dataset model and found to be well-suited for practical use.

We used the tidymodels 1.0.0 framework and R programming language (version 4.2.0), along with tidyverse, tidymodels, and caret packages for data analysis, implementing fivefold cross validation (Additional file 1).

Ethics approval and consent to participate

In accordance with the ethical guidelines of the Helsinki Declaration, the experimental protocol was developed and approved by the Human Ethics Committee of the 920th Hospital of the Joint Logistics Support Force. (Lot no. 2022-135-01). Written informed consent was obtained from individual or guardian participants.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *