Electric Vehicle Charging Stations (EVCS) are key enablers of sustainable transportation, yet accurate forecasting of their energy demand remains challenging due to complex spatial-temporal variability. This study introduces a novel hybrid deep learning framework, Two-Fold EfficientNetV2 BiGRU with Attention (TF-EffBiGRU-AttNet), optimized using the Self-Adaptive Hippopotamus Optimization Algorithm (SA-HOA), to enhance prediction accuracy and computational efficiency in EVCS energy demand forecasting. The main objective is to integrate multi-scale spatial learning, bidirectional temporal modeling, and adaptive feature prioritization within a single architecture capable of robust and interpretable forecasting. The model’s novelty lies in its dual-fold spatial feature extraction using EfficientNetV2 and dynamic optimization through SA-HOA, which adaptively balances exploration and exploitation during training. Experimental validation on two real-world datasets from Palo Alto and Perth demonstrates that the proposed model consistently outperforms state-of-the-art baselines. For the 7-1 forecasting task, TF-EffBiGRU-AttNet achieved the lowest MAE of 0.012 and RMSE of 0.051 for Palo Alto, and MAE of 0.029 with RMSE of 0.12 for Perth. For the 30-7 task, it achieved MAE of 0.0332, RMSE of 0.1654, and MAPE of 0.20% on Palo Alto, and MAE of 0.0235, RMSE of 0.0824, and MAPE of 0.37% on Perth, outperforming Bi-LSTM and EfficientNet by over 60% in RMSE reduction. Moreover, SA-HOA improved optimization efficiency with a best fitness value of 0.0003 and reduced convergence time to 1.2 seconds, surpassing PSO, GWO, and HOA. These results highlight the framework’s ability to capture spatial-seasonal and nonlinear dependencies while maintaining low computational overhead. The findings confirm the model’s potential as a robust, adaptive, and scalable solution for intelligent EV energy demand forecasting, supporting smart grid planning and sustainable energy management.