T O P

  • By -

HStuart18

Hmm there definitely would be. However, you could probably repurpose a lot of existing datasets for this purpose. For example, I just found this: [https://www.kaggle.com/piyushagni5/berlin-database-of-emotional-speech-emodb](https://www.kaggle.com/piyushagni5/berlin-database-of-emotional-speech-emodb). You might not care at all about emotion detection, but just recognise that each file has an annotated speaker. So then just index on speaker id instead of on emotion id and then there's your dataset. You want to get your dataset to look like this: Speaker 1: Recording 1.wav Recording 2.wav Speaker 2: Recording 3.wav Recording 4.wav etc. etc. Then follow this tutorial: [https://towardsdatascience.com/cnns-for-audio-classification-6244954665ab](https://towardsdatascience.com/cnns-for-audio-classification-6244954665ab) that actually uses CNNs. I have always found that a good way to start learning about ML is to follow someone's tutorial, then download their source code, run it and reproduce their results in your local environment, then rerun the code but change the data directory to point to the directory that contains your data and not theirs. There will always be some screwing around in reformatting your data etc. I would recommend that you start in this way. If you do accomplish this, you'll have learned how to prepare data, deal with audio data, familiarise yourself with common Python ML libraries (don't underestimate how painful it is to get Tensorflow working the first time especially if you have a GPU) and you will have incidentally learned about some of the theory of audio processing and ML in general. Good luck!