The popularity of Youtube as the largest video sharing website in the wolrd give spammers opportunities to get benefit from Youtube in illegal ways by putting spam comments on Youtube's videos. Spam comments are very troubling to channel owners. The variants of spam comments are becoming more difficult to detect. One of them is spam comments using abbreviations, symbols, terms or misspelled word to make detection difficult. This research evaluate some classification techniques and employ text normalization method called TextExpansion to deal with this problem. This research uses Youtube Spam Collections dataset from UCI Machine Learning Library composed by five different datasets, which each one contains text comments extracted from YouTube videos (Psy, Katty Perry, LMFAO, Eminem and Shakira). The evaluation results shows TextExpansion is able to produce the highest accuracy value of 90.23%. To determine the impact of applying the TextExpansion method, this research conducted t-test for each dataset. The results of t-test for each dataset shows P(T<=t) two-tail < 0.05 which indicates a significant impact after applying text normalization using TextExpansion.
Copyrights © 2018