Permadi Supriyo, Reza Ananta
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Enhancing News Similarity with Chunking Strategy and Hyperparameter Setting on Hybrid SBERT - Node2Vec Model Permadi Supriyo, Reza Ananta; Setijohatmo, Urip Teguh; Maspupah, Asri
Journal of Information System and Informatics Vol 7 No 3 (2025): September
Publisher : Universitas Bina Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51519/journalisi.v7i3.1180

Abstract

The proliferation of online news necessitates accurate article similarity systems to combat information overload, yet models based solely on semantic content often ignore crucial structural context like news source and publication date. This research proposes and evaluates a hybrid embedding model that integrates semantic representations from Sentence-BERT (SBERT) with structural representations from Node2Vec. A series of quantitative experiments were conducted on the challenging, multilingual SPICED dataset to determine the optimal model configuration. Using Mean Squared Error (MSE) for evaluation, the results show that a per-paragraph chunking strategy yielded the best performance. This strategy's effectiveness was validated by the identical performance of an optimal fixed-size chunk (450 characters with a 64 overlap), a value that aligns closely with the dataset's average paragraph length. Furthermore, a community-focused (BFS-like) Node2Vec configuration (p=1.0, q=2.0, l=60) was identified as optimal for the structural component. Significantly, the final hybrid model (MSE = 0.1434) proved superior to both the purely semantic (MSE = 0.1449) and purely structural models (MSE = 0.2512). This study concludes that the fusion of content and context provides the most comprehensive and accurate representation for news similarity detection.