Claim Missing Document
Check
Articles

Found 2 Documents
Search

Improving Computational Efficiency and Accuracy of Damerau-Levenshtein Distance for Indonesian Spelling Correction using Cosine Similarity husni husni; Yoga Dwitya Pramudita; Mohammad Syarief; Army Justitia; Ika Oktavia Suzanti
Journal of Innovation Information Technology and Application (JINITA) Vol 7 No 2 (2025): JINITA, December 2025
Publisher : Politeknik Negeri Cilacap

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35970/jinita.v7i2.2893

Abstract

Spelling correction is an automatic correction feature useful in detecting spelling errors and providing word suggestions if necessary. Spelling correction is one of the crucial preprocessing phases in text mining. The Damerau-Levenshtein Distance method is one of the spelling correction methods that has high accuracy. This method has four types of operations: insertion, deletion, substitution, and transposition. The basic approach in detecting spelling errors in the Indonesian language is to use a dictionary search. Despite its accuracy, the Damerau-Levenshtein Distance method has a slow computation time. Furthermore, when the dictionary contains several suggested words that have the same distance from the target word, it will be difficult to prioritize the most appropriate suggestions. To overcome this problem, we introduce a caching mechanism to store previously calculated corrections, thereby speeding up the computation process. In addition, we use the cosine similarity method to rank words in Damerau-Levenshtein Distance results. The results of our approach have a significant improvement in accuracy, increasing from 72.13% to 83.60% by integrating caching and cosine similarity for ranking, which shows a significant improvement in both efficiency and effectiveness
Comparison of Word2Vec and GloVe performance in Bi-LSTM models for Indonesian news classification Muhammad Faris Wafda; Husni; Ika Oktavia Suzanti; Firdaus Solihin; Mula'ab; Army Justitia
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control Vol. 11, No. 3, August 2026 (Article in Progress)
Publisher : Universitas Muhammadiyah Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22219/kinetik.v11i3.2608

Abstract

The explosion in the volume of textual data from digital news presents challenges in classifying content automatically and efficiently. For the task of classifying Indonesian-language news, this study aims to compare the performance of several word embeddings specifically Word2Vec using CBOW and Skip-Gram architectures and GloVe when applied to a Bidirectional Long Short-Term Memory (Bi-LSTM) model. This study uses a dataset consisting of 6,715 news articles from the Indonesian news portal that have undergone pre-processing, divided into five categories. The model was trained using 80% of the training data with K-Fold Cross Validation (K=5), while the remaining 20% of the data was used for testing. The experimental findings indicate that the Bi-LSTM model, when combined with CBOW embedding, yielded the best performance, achieving 95.16% accuracy and a 95.15% F1-Score. The Skip-Gram model followed with solid performance, achieving an accuracy of 93.30% and the fastest computation time. Conversely, the model that used pre-trained GloVe embedding delivered the poorest performance, achieving 88.98% accuracy. This result suggests that training embeddings on a specific domain is more effective at capturing local context. The conclusion of this study confirms that selecting a word embedding method specifically trained on local datasets is also an important step in achieving optimal accuracy in Indonesian news text classification.