Uncategorized

Towards Automated Vocal Mode Classification in Healthy Singing Voice-An XGBoost Decision Tree-Based Machine Learning Classifier




. 2023 Nov 10:S0892-1997(23)00281-3.


doi: 10.1016/j.jvoice.2023.09.006.


Online ahead of print.

Affiliations

Item in Clipboard

Jeroen Sol et al.


J Voice.


.

Abstract

Auditory-perceptual assessment is widely used in clinical and pedagogical practice for speech and singing voice, yet several studies have shown poor intra- and inter-rater reliability in both clinical and singing voice contexts. Recent advances in artificial intelligence and machine learning offer models for automated classification and have demonstrated discriminatory power in both pathological and healthy voice. This study develops and tests an XGBoost decision tree based machine learning classifier to develop automated vocal mode classification in healthy singing voice. Classification models trained on mel-frequency cepstrum coefficients, MFCC-Zero-Time Windowing, glottal features, voice quality features, and α-ratios demonstrated 92% average F1-score accuracy in distinguishing metallic and non-metallic singing for male singers and 87% average F1-score for female singers. The model distinguished vocal modes with 70% and 69% average F1-score for male and female samples, respectively. Model performance was compared to human auditory-perceptual assessments of 64 corresponding samples performed by 41 professional singers. The model performed with approximating or subpar performance to human assessors on task-matched problems. The XGBoost gains observed across tested features reveal that the most important attributes for the tested classification problems were MFCCs and α-ratios between high and low frequency energy, with models trained on only these features achieving performance not statistically significantly different from the best tested models. The best automated models in this study do not yet match human auditory-perceptual discrimination but improve on previously reported F1-average accuracies in automated classification in singing voice.


Keywords:

Artificial intelligence; Complete vocal technique; Machine learning; Singing voice; Vocal modes.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest This research was conducted as part of one co-author’s (JS) master’s thesis project at Radboud University in Computing Science: Data Science with co-author LB as supervisor, beyond which co-authors JS and LB have no conflict of interests to declare. During the study, co-author MA was employed in a PostDoc grant from the Danish Innovation Foundation (ref. no. 8054-00039B), which was in part given to Nottingham University Hospitals NHS, with which MA holds an honorary research contract, and in part to Complete Vocal Institute, with which co-author CS is employed.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *