Twitter is a microblogging social network where one can write up to 280 characters in one tweet. Indonesia emerged as the fifth largest country in terms of Twitter users. Seeing how many Twitter users in Indonesia can certainly be used by companies in creating new business strategies to serve their customers, but some social account users objection if they have to reveal their identities. These problems can be solved by developing a system for classifying based on tweets from users, this system is certainly useful because it saves time. The system is designed using the BM25 method for calculating similarities between documents and KNN for classifying data. the system used 1000 documents, then the document is tested with K-Fold Cross Validation using K = 10 so that 900 training documents and 100 testing documents are obtained on each K. The next test is about neighbor values, neighbor values used are 1, 3, 5, 7, 10, 20, 30, 40 and 50, the test results show that the optimal neighbor value is k = 3. At k = 3 the value of accuracy, precision, recall and F-Measure of the average Cross Validation 10 fold are 68.6%, 67.63%, 71.52% and 69.34%.
Copyrights © 2020