Journal of Information Systems and Informatics
Vol 7 No 3 (2025): September

Enhancing News Similarity with Chunking Strategy and Hyperparameter Setting on Hybrid SBERT - Node2Vec Model

Permadi Supriyo, Reza Ananta (Unknown)
Setijohatmo, Urip Teguh (Unknown)
Maspupah, Asri (Unknown)



Article Info

Publish Date
25 Sep 2025

Abstract

The proliferation of online news necessitates accurate article similarity systems to combat information overload, yet models based solely on semantic content often ignore crucial structural context like news source and publication date. This research proposes and evaluates a hybrid embedding model that integrates semantic representations from Sentence-BERT (SBERT) with structural representations from Node2Vec. A series of quantitative experiments were conducted on the challenging, multilingual SPICED dataset to determine the optimal model configuration. Using Mean Squared Error (MSE) for evaluation, the results show that a per-paragraph chunking strategy yielded the best performance. This strategy's effectiveness was validated by the identical performance of an optimal fixed-size chunk (450 characters with a 64 overlap), a value that aligns closely with the dataset's average paragraph length. Furthermore, a community-focused (BFS-like) Node2Vec configuration (p=1.0, q=2.0, l=60) was identified as optimal for the structural component. Significantly, the final hybrid model (MSE = 0.1434) proved superior to both the purely semantic (MSE = 0.1449) and purely structural models (MSE = 0.2512). This study concludes that the fusion of content and context provides the most comprehensive and accurate representation for news similarity detection.

Copyrights © 2025






Journal Info

Abbrev

isi

Publisher

Subject

Computer Science & IT

Description

Journal-ISI is a scientific article journal that is the result of ideas, great and original thoughts about the latest research and technological developments covering the fields of information systems, information technology, informatics engineering, and computer science, and industrial engineering ...