International Journal of Informatics and Communication Technology (IJ-ICT)
Vol 15, No 1: March 2026

Leveraging distillation token and weaker teacher model to improve DeiT transfer learning capability

Gavra Reswara, Christopher (Unknown)
Putra Kusuma, Gede (Unknown)



Article Info

Publish Date
01 Mar 2026

Abstract

Recently, distilling knowledge from convolutional neural networks (CNN) has positively impacted the data-efficient image transformer (DeiT) model. Due to the distillation token, this method is capable of boosting DeiT performance and helping DeiT to learn faster. Unfortunately, a distillation procedure with that token has not yet been implemented in the DeiT for transfer learning to the downstream dataset. This study proposes implementing a distillation procedure based on a distillation token for transfer learning. It boosts DeiT performance on downstream datasets. For example, our proposed method improves the DeiT B 16 model performance by 1.75% on the OxfordIIIT-Pets dataset. Furthermore, we present using a weaker model as a teacher of the DeiT. It could reduce the transfer learning process of the teacher model without reducing the DeiT performance too much. For example, DeiT B 16 model performance decreased by only 0.42% on Oxford 102 Flowers with EfficientNet V2S compared to RegNet Y 16GF. In contrast, in several cases, the DeiT B 16 model performance could improve with a weaker teacher model. For example, DeiT B 16 model performance improved by 1.06% on the OxfordIIIT-Pets dataset with EfficientNet V2S compared to RegNet Y 16GF as a teacher model.

Copyrights © 2026






Journal Info

Abbrev

IJICT

Publisher

Subject

Computer Science & IT

Description

International Journal of Informatics and Communication Technology (IJ-ICT) is a common platform for publishing quality research paper as well as other intellectual outputs. This Journal is published by Institute of Advanced Engineering and Science (IAES) whose aims is to promote the dissemination of ...