Topic modelling is an important approach for extracting latent thematic structures from text corpora, including religious texts that are characterized by dense semantics and short documents. This study aims to compare the performance of several topic modelling methods Latent Dirichlet Allocation (LDA), Biterm Topic Model (BTM), Combined Topic Model (CombinedTM), and BERTopic in extracting topics from the Indonesian translation of the Qur’an. The dataset consists of 6,236 verses, with each verse treated as a single document. Topic quality is evaluated using two main metrics: coherence score (C_v) and topic diversity. The experimental results show that CombinedTM achieves the highest coherence score, with a maximum value of approximately 0.52 at K = 10 topics, followed by BTM, which demonstrates relatively high and stable coherence scores (around 0.50) across certain topic number variations. LDA yields the highest topic diversity, exceeding 0.90, but with lower coherence scores compared to the other models, indicating its limitations in preserving semantic coherence in short texts. Meanwhile, BERTopic exhibits consistently high topic diversity (0.85–0.88) across different numbers of topics, although its bag-of-words–based coherence scores do not always increase significantly. These findings highlight that the choice of topic modelling method should be aligned with the characteristics of the corpus and the objectives of thematic analysis, particularly in the context of short-form religious texts.
Copyrights © 2026