Claim Missing Document
Check
Articles

Found 1 Documents
Search

Optimizing of IndoBERT Embedding with Ditto Whitening for Measuring Research Title Similarity Ishak, Rezqiwati; Bengnga, Amiruddin
Jambura Journal of Electrical and Electronics Engineering Vol 8, No 1 (2026): Januari - Juni 2026
Publisher : Electrical Engineering Department Faculty of Engineering State University of Gorontalo

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.37905/jjeee.v8i1.35554

Abstract

Measuring the semantic similarity of research titles is a crucial component in maintaining academic originality and preventing topic duplication in higher education. However, IndoBERT embeddings, as a pretrained Indonesian language model, are known to suffer from anisotropy, causing many titles to exhibit high similarity scores despite being semantically distinct. This study aims to optimize the quality of IndoBERT embeddings through Ditto Whitening and to evaluate its impact on research title similarity measurement. The dataset comprises 7.785 undergraduate thesis titles collected from six disciplinary domains and processed using mean pooling and L2 normalization before and after whitening. An intrinsic evaluation was conducted by assessing embedding isotropy, cosine similarity distribution, global bias toward the mean vector, and hubness phenomena, supported by embedding space visualizations using t-SNE, UMAP, and cosine similarity heatmaps. Experimental results demonstrate substantial improvements in embedding quality, indicated by a reduction in Cosine Pair Mean from 0.559 to −0.000145, a decrease in MeanCos-to-Mean from 0.748 to 0.0068, and a reduction in Hubness Skew from 1.60 to 0.68. The isotropy of the embeddings also increased markedly, reflecting a more uniform vector distribution. These findings confirm that Ditto Whitening effectively improves the isotropy of IndoBERT embeddings and directly enhances the accuracy of research title similarity detection and academic document retrieval systems, thereby supporting topic management and research quality assurance in higher education.