The ideological polarization that has emerged on social media platforms like Twitter, particularly regarding discussions on Islamic ideologies in Indonesia, has led to the rapid spread of da’wah. However, it has also created challenges in effectively classifying tweets into distinct Islamic ideologies, such as Liberal Islam and Moderate Islam (Wasathiyyah). The lack of effective methods for accuratelyclassifying such nuanced content presents a significant challenge. To address this problem, the research aimed to develop and evaluate a machine learning model that compares the effectiveness of traditional word vectorization methods (TF-IDF) with modern text embedding models (Nomic Embed v2). The study utilized the Knowledge Discovery in Databases (KDD) framework, scraped relevant data using the Twitter API, and annotated the dataset based on ideology. Preprocessing techniques such as case folding, stopword removal, and symbol removal were applied to the dataset. Classification was carried out using an SVM model, and cross-validation was employed to assess the model’s accuracy. The findings indicate that the embedding model improved the accuracy by providing nuanced semantic context for the tweets, suggesting that modern semantic models can outperform traditional methods inclassifying complex, context-dependent texts.
Copyrights © 2025