Jurnal Teknik Informatika (JUTIF)
Vol. 7 No. 3 (2026): JUTIF Volume 7, Number 3, June 2026

Comparative Evaluation Of Sparse, Dense, And Hybrid Retrieval Models On Indonesian Wikipedia

Tino Saputra (Magister Computer,Computer Science, Esa Unggul University, Jakarta, Indonesia)
Eric Julianto (Magister Computer,Computer Science, Esa Unggul University, Jakarta, Indonesia)
Ari Widjonarko (Master Of Data Science, Data Science and AI, Monash University, Melbourne, Australia)
Budi Tjahjono (Computer Science, Esa Unggul University, Jakarta, Indonesia)



Article Info

Publish Date
15 Jun 2026

Abstract

This study presents a comparative evaluation of Information Retrieval (IR) models on the Indonesian Wikipedia corpus, focusing on sparse, dense, and hybrid retrieval approaches. The evaluated methods include TF-IDF and BM25 as sparse models, SBERT (MiniLM) as a dense retrieval model, and hybrid retrieval implemented through score fusion. The dataset consists of 713,044 Wikipedia articles, with experiments conducted using 1,000 test queries. Performance is measured using Precision@10 (P@10) and Mean Reciprocal Rank (MRR). The results show that BM25 achieves the highest performance, with a P@10 of 0.973 and an MRR of 0.9174, significantly outperforming TF-IDF and SBERT. Hybrid retrieval provides a slight performance improvement, where the BM25 + SBERT combination reaches a P@10 of 0.979 and an MRR of 0.9253 at higher α values. These findings indicate that lexical matching remains dominant in encyclopedic corpora, while semantic representations provide complementary improvements. However, the performance gain of hybrid retrieval is relatively marginal compared to the additional computational cost introduced by dense embedding and score fusion processes, indicating a trade-off between effectiveness and efficiency. These results highlight that, for low-resource languages such as Indonesian, lexical-based retrieval remains highly reliable, while hybrid approaches provide incremental improvements. Therefore, this study provides practical guidelines for developing efficient, scalable, and reliable Information Retrieval systems for Indonesian Wikipedia and other low-resource language corpora.

Copyrights © 2026






Journal Info

Abbrev

jurnal

Publisher

Subject

Computer Science & IT

Description

Jurnal Teknik Informatika (JUTIF) is an Indonesian national journal, publishes high-quality research papers in the broad field of Informatics, Information Systems and Computer Science, which encompasses software engineering, information system development, computer systems, computer network, ...