Journal of Computing Theories and Applications
Vol. 2 No. 2 (2024): JCTA 2(2) 2024

Comprehensive Evaluation of LDA, NMF, and BERTopic's Performance on News Headline Topic Modeling

Babalola, Olusola (Unknown)
Ojokoh, Bolanle (Unknown)
Boyinbode, Olutayo (Unknown)



Article Info

Publish Date
23 Nov 2024

Abstract

Topic modeling is an integral text mining component, employing diverse algorithms to uncover hidden themes within texts. This study examines the comparative performance of prominent topic modeling techniques on news headlines, which is characterized by brevity and specific linguistic style. Given the corpus originates from a non-native English-speaking country, an additional layer of complexity is introduced to the task. Our research explores the feasibility of employing a committee approach for topic modeling, evaluating the efficacy and challenges of various methods in practical settings. We applied three techniques—Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), and BERTopic—to create models with a fixed number of topics (n=40). These models were then tested on approximately 150,000 news headlines. To assess topic coherence, we utilized Word2Vec, human evaluators, and two large language models. Statistical tests confirmed the significance and impact of our findings. BERTopic demonstrated superior coherence compared to NMF, though slightly, but consistently outperformed NMF and LDA according to human and LLM evaluations. The notable disparity in LDA's performance relative to BERTopic and NMF underscores the importance of carefully selecting a topic modeling technique, as the choice can significantly influence the outcome of the analysis.

Copyrights © 2024






Journal Info

Abbrev

jcta

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management

Description

Journal of Computing Theories and Applications (JCTA) is a refereed, international journal that covers all aspects of foundations, theories and the practical applications of computer science. FREE OF CHARGE for submission and publication. All accepted articles will be published online and accessed ...