This study aims to develop an intelligent application for detecting the semantic similarity of undergraduate thesis titles using Natural Language Processing and machine learning techniques. The need for this system arises from the growing number of thesis title submissions in Indonesian universities, which increases the risk of duplication and challenges the effectiveness of manual novelty-verification processes. The development follows a Research and Development (R&D) approach consisting of needs analysis, NLP model development, implementation, and evaluation. A dataset of 114 thesis titles was collected from official academic archives, with 87 titles remaining after data cleaning for the model benchmarking. The Sentence-BERT (IndoSBERT) model is used as the core of the semantic similarity engine, achieving an accuracy of 93% and an F1-score of 0.90, outperforming traditional approaches such as TF-IDF and LSA. System evaluation was conducted based on ISO/IEC 25010, showing strong performance in functional suitability, time behavior (average response time 1.82 s), reliability (100% uptime/24 h), and usability evaluated by 25 respondents using the SUS instrument (score = 80, excellent). The results indicate that the proposed system can significantly assist study programs in identifying potential topic duplications and strengthening academic governance. However, the limited dataset size and single-domain scope (engineering and informatics education) restrict the model’s generalizability. Future development may include larger multi-domain datasets and broader novelty evaluation coverage, such as proposals and abstracts. This study contributes to practical automation support and technological innovation for academic quality assurance.
Copyrights © 2025