Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Journal of Computer Networks, Architecture and High Performance Computing

Hyperparameter Sensitivity of Vanilla Knowledge Distillation for Compact CNNs on CIFAR-100 Fauzan, Mochamad Rizal; Rachman, Raden Muhammad Rafi; Saputra, Shifa Rangga; Nugraha, Daffa Irsyad
Journal of Computer Networks, Architecture and High Performance Computing Vol. 8 No. 2 (2026): Research Paper April 2026
Publisher : Information Technology and Science (ITScience)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47709/cnahpc.v8i2.8239

Abstract

Knowledge distillation has become an effective strategy for improving compact convolutional neural networks, yet the performance of vanilla knowledge distillation in lightweight image classification is still often reported using default hyperparameter settings without systematic justification. This study addresses the limited empirical understanding of how two core vanilla knowledge distillation hyperparameters, temperature scaling (T) and loss balancing (?), affect compact convolutional neural networks under a unified experimental setting. Using CIFAR-100 as the benchmark dataset, a ResNet-50 teacher was employed to distill knowledge into two lightweight student models, MobileNetV2 and ShuffleNetV2 ×1.0. Performance was evaluated using top-1 accuracy, top-5 accuracy, parameter count, and inference latency. The teacher achieved 81.24% top-1 accuracy and 96.05% top-5 accuracy. Under the default distillation setting, MobileNetV2 improved from 79.18% to 80.83% top-1 accuracy and from 95.77% to 96.40% top-5 accuracy, while reducing latency from 3.98 ms to 3.44 ms. ShuffleNetV2 ×1.0 improved from 77.00% to 78.36% top-1 accuracy and from 94.81% to 95.45% top-5 accuracy, with only a marginal latency increase from 4.23 ms to 4.29 ms. To examine hyperparameter sensitivity, an ablation study was conducted on MobileNetV2 with T = 2, 4, and 6, and ? = 0.3, 0.5, and 0.7. The best configuration was obtained at T = 4 and ? = 0.3, yielding 80.88% top-1 accuracy and 96.51% top-5 accuracy. These results show that vanilla knowledge distillation consistently improves compact convolutional neural networks, but its effectiveness depends strongly on careful hyperparameter selection rather than inherited default settings.