Voice-based gender recognition has gained increasing importance in biometrics, security, forensics, and human–computer interaction. While humans can easily distinguish male and female voices, automatic classification remains challenging due to variability and high-dimensional acoustic data. This study investigates the role of feature selection in enhancing the performance and efficiency of Random Forest for gender classification. The dataset, obtained from Kaggle, consists of 3,168 balanced voice samples with 23 acoustic features. Using Pearson’s correlation analysis, five features with the strongest associations to the target variable were selected. Random Forest classification was then conducted using both the full set of 22 features and the reduced set of 5 features. Results suggest that although the accuracy gain was marginal (98% to 99%), computation time decreased substantially from 0.3 to 0.1 seconds, representing a 66% efficiency improvement. These findings suggest that lightweight correlation-based feature selection can simplify models and enable faster real-time applications without compromising predictive performance. The study emphasizes efficiency rather than accuracy as the main contribution, providing a methodological insight for designing scalable and inclusive voice-based gender recognition systems.
Copyrights © 2026