In the wake of contemporary challenges such as the COVID-19 pandemic, understanding children’s mental health through non-verbal forms like drawing has become paramount. This study enhances pediatric psychological assessments by employing an ensemble of deep learning models to interpret children’s drawings, aiming for early detection of psychological states. Traditional drawing analysis methods are often subjective, variable and time consuming. To ddress these limitations, we developed an ensemble model that combines the strengths of VGG16, VGG19, and MobileNet architectures using a hard voting mechanism. This approach reduces bias and enhances reliability by integrating the unique capabilities of each model. Our methodology involved rigorous data collection through a custom Android application, followed by exploratory data analysis, data preprocessing, and comprehensive model valuation. The ensemble model was trained and validated on the diverse Kids’ Hand Movement Dataset (KHMD), demonstrating superior accuracy and robustness in classifying drawings that indicate various psychological conditions. It significantly outperformed individual models, achieving a 99% accuracy rate. These findings underscore the potential of advanced machine learning techniques in providing more accurate and bias-free insights into children’s psychological health, suggesting that ensemble learning can greatly improve the precision of pediatric psychological evaluations. Future work will explore expanding the dataset and employing more sophisticated ensemble methods to further enhance diagnostic accuracy.