The emergence of social media encourages the government to use social media to diseminate information to its people. The information must be beneficial for the people to maintain government to citizen relationships. Classification on social media post is possible to categorize the types of posts. The study was conducted on the local government`s social media accounts, yet the text processing in theresearch needsto be explored. Term weighting and word embedding are implemented in this research. The purpose is to compare term weighting term frequency-inverse document frequency, Okapi BM25, and word embedding doc2vec in producing features for the problem of short text classification.This study representsfeature selection process, how to assessclassification model, and to find the best model to overcome short text classification problem. There are six classes to categorize 1,000 short texts from 91 accounts. The measurements, i.e.precision, recall, f-1, macro-averages, micro-averages, and AUC,were calculated on each model. The result shows that the SVM linear kernel with TF-IDF performs best and slightly better than the logistic regressionwith 0.572 and 0.766 on macro-averagerecall and micro-average recall,respectively.
Copyrights © 2020