The Qur'an, comprising over 80,000 words, 6,236 verses, and 114 surahs, presents a multifaceted and deeply significant text that demands a nuanced understanding of historical context, classical Arabic, and exegesis. To analyze and classify its content, various methodologies have been employed, including K-Nearest Neighbor (KNN) and Latent Semantic Analysis (LSA). This research investigates the effectiveness of combining KNN with LSA for multi-label topic classification of Qur'anic verses. The study reveals that KNN alone achieved a micro average F1-score of 0.49, demonstrating reliable performance particularly for topics such as "aqidah" (creed) and "worldly matters." When LSA was applied with 100 components, there was a decrease in performance, reflected by a drop in the micro average F1-score to 0.43 and an increase in Hamming loss to 0.1657. However, as the number of LSA components increased to 200 and 300, performance improved, with micro average F1-scores rising to 0.45 and 0.47, and Hamming loss values decreasing to 0.1507 and 0.1466, respectively. This indicates that while LSA can enhance KNN performance, optimal results are achieved with a higher number of components
Copyrights © 2024