One of the cases that had tarnished the world of journalism was the plagiarism that had been carried out by a journalist related to the news articles he wrote. Plagiarism was not given strict observation, so that the reuse of all news articles could be carried out freely in the past. But as time goes by, news agencies are no longer able to ignore the case of plagiarism, so detection of plagiarism is very important to implement. The method used to detect plagiarism in this study is BM25. The process of calculating plagiarism using BM25 begins with text preprocessing, searches for term frequency value, inverse document frequency, weighting using BM25, then calculating the percentage of plagiarism. Testing is done by changing the threshold value by 75%, 50%, and 25%. Then the results of plagiarism calculations using BM25 will be compared with the results of cosine similarity. The average results from BM25 are closer to the threshold with a difference of 6.12%, 9.77%, and 10.01%. These results prove that BM25 works better than cosine similarity which has a difference of 14.25%, 26.43% and 32.36% of the threshold. The average value of precision from BM25 for each threshold is 0.87, 0.80, and 0.63.
Copyrights © 2019