Uncategorized

mp3 – lossy vs. lossless audio format in machine learning


We want to provide an ML model that recognises certain things from the voice. The feature extraction is based on a proprietary algorithm. Normally we have always used wav files. We keep asking ourselves if we could also use something like mp3? Since we probably won’t be able to collect the data we collect again and we can’t foresee what kind of information a further development of our proprietary algorithm will need at some point, we are concerned that something like mp3 would lead to too much loss of information. What do you think?

I am less interested in a concrete answer, as I am aware that it is not possible to give a concrete answer due to the open questions on certain details. It’s more about sharing experiences and exchanging ideas. It could be, for example, that someone has experienced in the field of machine learning that the areas that are cut away in MP3, for example, usually have little or no effect on ML algorithms. Which is just one example. I haven’t really found anything on this topic on the internet.

Even if this doesn’t have so much to do with an “either/or” regarding the decision in favour of a format, one article, for example, recommends the following: “Data augmentation is a well known practice in machine learning. We can simply treat the coded version of our audio data as an augmentation of the data like we would by adding traffic noise or simulating the echoes of a room on clean speech. For example, augmenting your training data with just a few different Opus quality levels will improve the classification of all Opus test samples.” …and that is very interesting.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *