Stemmer has been used in document processing like: information retrieval, question answering, spell checking, language translator, document clustering, document classification. Stemmer method based on word morphology has some lack such as: incorrect prefix removal on root words beginning with the letter “k”, “t”, “s” and “p”, Incorrect suffix removal especially for “-kan” and “-an” suffix. To handle these problems, this research proposes a stemmer that uses two level morphology to root word beginning with the letter “k”, “t”, “s”, “p” and use prefix and suffix combination rules to remove suffix on a word. Example: “di-” as the prefix should only be paired with “kan-” as the suffix and should not be paired with “-an” as the suffix. The experiments showed that the proposed stemmer accuracy was 95.5%, better than the earlier stemmer based on word morphology. The accuracy of earlier stemmer based on word morphology was 82.5%.
Copyrights © 2013