| Abstract: |
The process of genre classification involves the identification of distinctive stylistic elements and musical characteristics that define a particular genre. It assists in developing a comprehensive understanding of the historical context, cultural influences, and musical evolution of a particular genre. This study was conducted to resolve the challenges of classifying Ethiopian music genres according to their melodic structures using deep learning techniques. The main objective was to develop a deep learning model for effective audio classification into six genres classes of Ethiopian music: Ancihoye Lene, Ambassel Major, Ambassel Minor, Bati, Tizita Major, and Tizita Minor. To achieve this, we first prepared a dataset consisting of 3952 audio recordings, which includes 533 tracks from Ethiopian Orthodox church music and 3419 samples of secular Ethiopian music. A total of 46 unique features, namely chroma short-time Fourier transform (STFT), root mean square error (RMSE), spectral centroid, spectral bandwidth, roll-off, zero crossing rate, and mel frequency cepstral coefficient (MFCC) 1 up to MFCC40, were extracted both at middle-level and low-level audio features from each sample, focusing on aspects suggested by Ethiopian music experts and preliminary experiments that highlighted the importance of tonality features. A 30-second segment of audio recordings was selected for feature extraction, resulting in datasets formatted in both CSV and JSON for further processing. We proposed deep learning algorithms namely convolutional neural networks (CNN), recurrent neural networks (RNN), a parallel RNN–CNN architecture, and long short-term memory (LSTM) networks for our classification by developing models. Our experiments revealed that the LSTM model achieved the best performance, reaching a classification accuracy of 97% using 40 MFCC features extracted from audio datasets. [ABSTRACT FROM AUTHOR] |