JOURNAL OF APPLIED INFORMATICS AND COMPUTING
Vol. 9 No. 4 (2025): August 2025

Implementation of Text Mining for Evaluating the Relevance Between News Headlines and Content on a Web-Based Platform

Purnawati, Desak Gede Inten (Unknown)
Singgih Putri, Desy Purnami (Unknown)
Piarsa , I Nyoman (Unknown)



Article Info

Publish Date
05 Aug 2025

Abstract

Technological advancements in the era of the Industrial Revolution 4.0 have significantly transformed how society accesses and consumes information, particularly through online news portals. This study aims to analyze the relevance between news headlines and article content on Indonesian online news platforms by employing text mining techniques and similarity checking methods. To enhance the accuracy of relevance assessment, this research utilizes two deep learning-based modeling algorithms: Long Short-Term Memory (LSTM) and IndoBERT. The data was collected from three leading Indonesian news portals detik.com, kompas.com, and suara.com with a total of 52,242 articles from the entertainment and national news categories, gathered between July 1 and September 30, 2024. The dataset includes attributes such as headline, category, publication date, author, article URL, and news content. The research process consists of several stages, including data collection through web scraping, data pre-processing (which involves cleaning the category, author, and content columns), content summarization, text similarity calculation, and data labeling into three classes (relevan, berlebihan, and nonrelevan). Evaluation results show that the IndoBERT model outperforms LSTM, achieving the best performance with a training accuracy of 0.9048 and a training loss of 0.2514, as well as a validation accuracy of 0.8604 and a validation loss of 0.4039. These findings demonstrate that IndoBERT is effective in assessing the coherence between news headlines and content in today’s digital age.

Copyrights © 2025






Journal Info

Abbrev

JAIC

Publisher

Subject

Computer Science & IT

Description

Journal of Applied Informatics and Computing (JAIC) Volume 2, Nomor 1, Juli 2018. Berisi tulisan yang diangkat dari hasil penelitian di bidang Teknologi Informatika dan Komputer Terapan dengan e-ISSN: 2548-9828. Terdapat 3 artikel yang telah ditelaah secara substansial oleh tim editorial dan ...