JISA (Jurnal Informatika dan Sains)
Vol 8, No 1 (2025): JISA(Jurnal Informatika dan Sains)

Hybrid Feature Combination of TF-IDF and BERT for Enhanced Information Retrieval Accuracy

Aprilio, Pajri (Unknown)
Felix, Michael (Unknown)
Nugraha, Putu Surya (Unknown)
Fahmi, Hasanul (Unknown)



Article Info

Publish Date
27 Jun 2025

Abstract

Text representation is a critical component in Natural Language Processing tasks such as information retrieval and text classification. Traditional approaches like Term Frequency-Inverse Document Frequency (TF-IDF) provide a simple and efficient way to represent term importance but lack the ability to capture semantic meaning. On the other hand, deep learning models such as Bidirectional Encoder Representations from Transformers (BERT) produce context-aware embeddings that enhance semantic understanding but may overlook exact term relevance. This study proposes a hybrid approach that combines TF-IDF and BERT through a weighted feature-level fusion strategy. The TF-IDF vectors are reduced in dimension using Truncated Singular Value Decomposition and aligned with BERT vectors. The combined representation is used to train a fully connected neural network for binary classification of document relevance. The model was evaluated using the CISI benchmark dataset and compared with standalone TF-IDF and BERT models. Experimental results show that the hybrid model achieved a training accuracy of 97.43 percent and the highest test accuracy of 80.02 percent, outperforming individual methods. These findings confirm that combining lexical and contextual features can enhance classification accuracy and generalization. This approach provides a more robust solution for improving real-world information retrieval systems where both term specificity and contextual relevance are important.

Copyrights © 2025






Journal Info

Abbrev

JISA

Publisher

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering

Description

JISA (Jurnal Informatika dan Sains) is an electronic publication media which publishes research articles in the field of Informatics and Sciences, which encompasses software engineering, Multimedia, Networking, and soft computing. Journal published by Program Studi Teknik Informatika Universitas ...