This analysis takes on a comparative review of three distinct machine learning approaches: Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), and Random Forest (RF) to ascertain emotional states in verbal communication by utilizing the RAVDESS resource. In this review, we perform a strategy that unites audio feature extraction, model training with or without tweaks to hyperparameters, and evaluation via metrics including accuracy, precision, recall, and F1-score. The assessment shows that, before any refinement, SVM secured the utmost accuracy of 79%, trailed by MLP at 76% and RF at 71%. Following optimization, only SVM exhibited an enhancement, reaching 80%, whereas MLP and RF displayed negligible or no improvement. An examination of the confusion matrix revealed that SVM produced the most uniformly distributed predictions and effectively reduced misclassification errors, particularly within the emotion categories of “calm” and “happy.” This investigation offers empirical substantiation of SVM as a robust baseline model for speech emotion recognition in localized settings, while simultaneously providing insights into model optimization and development that could inform future implementations in speech-based human–computer interaction.
Copyrights © 2026