Nur Wahidah, Inaya
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Implementation of Semantic Search in an Academic Repository Using Sentence-BERT and FAISS Lubis, Ihsan; Lubis, Husni; Nur Wahidah, Inaya
Sinkron : jurnal dan penelitian teknik informatika Vol. 10 No. 2 (2026): Article Research April, 2026
Publisher : Politeknik Ganesha Medan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33395/sinkron.v10i2.15940

Abstract

Academic repositories serve as centralized platforms for storing and managing scientific documents, including research papers, reports, and administrative records. Yet, traditional keyword-based search systems often struggle to deliver relevant results. These systems typically fail to capture the contextual meaning of user queries, which leads to mismatches when the query terms differ from those found in the documents. To overcome this limitation, this study introduces a semantic search approach for academic repositories by combining Sentence-BERT as the text embedding model with FAISS as the vector-based similarity search engine. In the proposed system, documents stored in a MySQL database are first preprocessed to remove HTML tags, then converted into semantic vector representations using Sentence-BERT. These vectors are indexed with FAISS, enabling fast and efficient similarity searches compared to conventional keyword matching. The system architecture integrates FastAPI as the backend service for indexing, searching, and evaluation, while CodeIgniter 4 functions as the frontend framework for document management by administrators and end users. Evaluation was carried out using three test sets, each containing ten queries. Performance was measured using Recall@K, normalized Discounted Cumulative Gain (nDCG), Mean Reciprocal Rank (MRR), Mean Average Precision (MAP), and search latency. Experimental results show that the system achieved an average Recall@K of 0.61, a MAP of 0.39, and a No-Hit rate of 0.033, meaning all queries successfully retrieved results. Although the nDCG value declined in the third test set, the system consistently returned relevant documents.