Twitter is a microblog that is currently favored by many people and has turned out to be a very fast spreader of information at this time. Information released and circulates through this media is very free and has many variations, like news, opinions, questions, criticisms, comments either positive or negative. Classification is a rule in text mining that collects content based on the similarity of the script. With this classification allows a tweets on Twitter to be grouped into one based on the category. For example, football, basketball and chess content are grouped into sports categories. Prosedure of classification begins using preprocessing, then term weighting is done, then categorization consists of cosine similarity calculations. Preprocessing itself consists of several phases, that is document cleaning, tokenizing, stopword removal, and stemming. The word weighting method used in this thesis is Term Frequency - Inverse Document Frequency (TF-IDF) & using K-Nearest Neighbor (K-NN) for its classification method. The KNN method is a classification of a set of data based on data learning that has been previously classified. Accuracy testing of the classification of tweets on Twitter with step of K-Nearest Neighbor (K-NN) theorem resulted in accuracy where the total data amounted to 140, with descriptions of 100 training data and 40 testing data and the values of k entered were 1, 3, 5, and 7, each the result is when k = 1, the accuration is 75.0%; k = 3, accuration is 72.5%; k = 5, accuration is 62.5%; k = 7, accuration is 55.0%.
Copyrights © 2019