The large amount of Internet traffic infected with malware disrupts Internet activity. Undetected malwares can shutdown systems and leak sensitive data. To mitigate such threats, a reliable classification model is required. The Compact Convolutional Transformer (CCT) is a deep learning architecture that combines encoder layers from the Vision Transformer with traditional convolutional layers, and is designed to perform classification effectively even with relatively small datasets. However, when tested using a combination of the MalImg dataset and benign class from DikeDataset, the original CCT model showed signs of overfitting. To address this, this study proposes a modification to the CCT architecture by incorporating an average pooling mechanism parallel to the existing sequence pooling layer. The outputs of both pooling layers are concatenated before being passed to the classification layer. Experiments were conducted under several conditions, including full training without early stopping. The results show that the modified CCT with average pooling reduces overfitting and improves accuracy, indicated by a longer training duration before convergence (8+ epochs), an increase in test accuracy by 3.35 with early stopping and by 5.72% without early stopping. Furthermore, this performance improvement was statistically validated using Welch’s t-test on evaluation accuracy with p value 5.83×10⁻⁶ and average validation loss with p value 7.54×10⁻⁷, both yielding significant results (p < 0.05). However, after stratification and class weighted is applied, the evaluation accuracy of modified CCT decrease by 1.74% from the baseline and the validation loss increasing significantly with p-value 0.0009 (p < 0.05), showing that the result is sensitive to the dataset balance.
Copyrights © 2025