Childhood asthma is a chronic respiratory disease that often persists throughout an individual’s lifetime, imposing a significant burden on both patients and healthcare systems. Despite numerous treatment options, there are currently no curative therapies available for asthma, and patients often require ongoing treatment to manage their symptoms and prevent exacerbations. As such, early identification of children at risk of developing asthma is of utmost importance, as this will enhance early prognostication of patients for families. Identifying children at risk of asthma in early childhood may therefore reduce family uncertainty and potentially improve long-term outcomes for patients and their families.
Identification of early-life pediatric asthma using ML
ML has emerged as a promising tool in various medical settings, owing to its ability to capture complex and non-linear relationships among multiple predictors and their synergistic effects on the target variable. The highly heterogeneous nature of asthma outcomes, which result from the interplay of genetic, environmental, and clinical factors, makes it a prime candidate for ML prediction models. While several studies have utilized ML to predict various clinical outcomes of pediatric asthma, including hospitalization, exacerbation, response to treatment, and remission,16 research on the early identification of pediatric asthma through ML predictive modeling using as extensive predictors over time remains scarce.17 Consequently, there is a pressing need for further exploration of ML’s potential in predicting early-life asthma risk, which could aid in timely and personalized prognosis by distinguishing between those most likely to have transient vs. persistent symptoms. These findings are consistent with previous studies that have highlighted the value of ML in medical decision-making.4,16,17
By leveraging the CHILD study cohort, we have identified a short list of salient features and ML models that offer promising prospects for identifying children at high risk of asthma. Analysis of these features across different time points has revealed that early-life asthma diagnosis risk prediction is feasible but challenging before 1 year of age. However, accurate asthma diagnosis prediction at the age of 5 years is achievable with high sensitivity, specificity, and precision (AUROC > 0.9, AUPRC > 0.8) when clinical information on parental asthma, wheezing, atopy, respiratory infections, and antibiotics usage is available at age 3 years.
Identification of important predictors for pediatric asthma
Our ML models demonstrate the significance of a short list of established predictors of asthma and are highly consistent with previous research on risk and protective factors while demonstrating the limited predictive benefit of other correlated factors.18,19,20,21,22,23,24,25,26,27,28
Among the features that our models identified as most important for predicting asthma were wheezing status, atopic status, parental asthma, history of respiratory infection, and antibiotics usage. Parental (primarily maternal) asthma emerged as the earliest and most consistent predictor of asthma diagnosis at age 5 years in our models, in agreement with established research.18,19
While our models revealed an association between maternal psychological stress and childhood asthma, in agreement with previous research,20,21 this relationship appeared to be limited to the first year of life. Our findings support the notion that early antibiotic exposure is associated with an increased risk of childhood asthma,8,22 and suggest that strategies to mitigate the impacts of antibiotics may be useful. In addition, our models showed that lower respiratory tract illness in early childhood increases the risk of developing asthma, in agreement with previous studies linking early, lower respiratory illness and later-life asthma.23,24 Finally, our models supported the protective effect of exclusive breastfeeding and longer gestational age, consistent with previous researches.25,26,27,28
Strengths
Our study boasts several notable strengths, including its longitudinal design, which enables investigation through data collected at multiple time points with short intervals. Specifically, we tracked participants from birth up to four years of age to predict physician-diagnosed asthma at the 5-year mark. By collecting data at various time points, we were able to uncover the relative importance of putative predictors across these time intervals, as well as identify trends in asthma predictive capacity using multiple ML algorithms, both individual and ensemble.
Additionally, our study adopted an agnostic approach to evaluate the importance of all 132 variables without any manual intervention during the feature selection process. This approach was facilitated through ML algorithm discovery, which allowed for the automatic identification of important variables without any prior bias or preconceptions. Notably, this methodology achieved high consistency with established factors for impacting asthma prediction while also revealing less well-known factors such as maternal stress, gestational age, and jaundice as significant contributors to asthma prediction.29
Lastly, development and testing of ML predictive models is a complex process that requires careful consideration of several critical factors. Our study stands out by meeting all of these criteria, including the use of an adequate sample size, high-quality training data, algorithmic-based selection of input features, cross-validated tuning and validation of ML algorithms, appropriate and careful study design, and continual clinical involvement during ML model development.4,16,17,30 By meeting these requirements, our models achieve high performance and provide reliable and accurate predictions, making them highly applicable in clinical settings. Furthermore, our approach not only contributes to the field of asthma prediction but also serves as a valuable blueprint for the development of ML models in other areas of healthcare.
Limitations
While the present study utilized data from the large-scale observational CHILD Cohort, it is important to recognize that a large proportion of the data used for developing the ML models were derived from gathered questionnaires and clinical assessments completed by parents and clinicians. Despite using rigorous quality control measures to mitigate survey bias, the data may still be susceptible to other forms of bias, including desirability bias, response bias, and recall bias, which can introduce non-objectivity and noise into the data and ultimately lead to reduced predictive performance of the ML models.
To overcome this limitation and potentially enhance the predictive performance of the models, future studies may benefit from the incorporation of objective measurements, such as biological and genetic markers. However, these types of measurements are often associated with high costs, time constraints, and specialized equipment requirements, which may limit their availability to a much smaller subset of individuals and affect the generalizability of the model to the broader population such as our study. Nevertheless, further research is warranted to evaluate the feasibility and potential benefits of including such measurements to improve the predictive performance of the ML models, particularly at earlier stages of asthma development.