Speech emotion classification, also known as Speech Emotion Recognition (SER), has become increasingly important with the growing prevalence of human–machine interaction, particularly in the domains of healthcare, online education, and customer service. This study aims to develop a robust speech emotion classification system by employing Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction and a Decision Tree–based Bagging algorithm for classification. The proposed approach is designed to address the challenges of low classification accuracy, especially under speaker-independent conditions and limited availability of labeled emotional speech data. The research workflow includes speech signal preprocessing, MFCC feature extraction, dataset partitioning through bootstrapping, ensemble model training, and performance evaluation using accuracy, precision, recall, and F1-score metrics. Experimental results on a balanced dataset comprising five emotion classes (anger, disgust, fear, happy, and sad) demonstrate that the proposed model achieves an overall accuracy of 61.04%. While the fear and happy emotions are classified effectively with recall values of 0.75, the anger class exhibits the lowest performance with an F1-score of 0.49. Confusion matrix analysis further reveals substantial acoustic overlap among several emotion categories, particularly the frequent misclassification of sad as disgust or anger. In conclusion, the integration of MFCC features with the Bagging algorithm improves model stability and robustness; however, further optimization of acoustic features and hyperparameters is required to enhance overall classification accuracy.