Accurate and efficient classification of retinal fundus images plays a critical role in supporting the early diagnosis of ocular diseases. However, models relying on a single deep learning backbone often struggle to capture the multi-scale and heterogeneous characteristics of retinal lesions, leading to unstable performance across visually similar disease classes. To address this limitation, this study proposes a novelty feature-level fusion framework that integrates complementary representations from DenseNet121 and EfficientNetV2-s, followed by classification using XGBoost. The fusion pipeline extracts 1024-dimensional features from DenseNet121 and 1280-dimensional features from EfficientNetV2-s, which are concatenated into a unified 2304-dimensional feature vector. Experiments were conducted on a dataset of 10,247 retinal fundus images spanning six categories: Central Serous Chorioretinopathy, Diabetic Retinopathy, Macular Scar, Retinitis Pigmentosa, Retinal Detachment, and Healthy. The proposed fusion model achieved an accuracy of 91.60%, outperforming DenseNet121 XGBoost (91.31%) and EfficientNetV2-s XGBoost (89.70%). Moreover, the fusion strategy demonstrated improved class-level stability, particularly for visually similar retinal disorders where single-backbone models exhibited higher misclassification rates. This study contributes a lightweight yet effective multi-backbone feature-level fusion approach that enhances discriminative representation and classification stability without increasing model complexity. In addition, the use of XGBoost introduces a tree-based decision mechanism that is inherently more interpretable than conventional fully connected layers, offering potential advantages for clinical analysis. Overall, the results highlight the effectiveness of multi-backbone feature fusion as a reliable strategy for automated retinal disease classification.