Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : JOURNAL OF APPLIED INFORMATICS AND COMPUTING

Performance Comparison of Embeddings and Keyword Selection Methods in Enterprise Document Cristin, Putri; Natalia, Brenda; Limantara, Joseph Clio; Sarwosri
Journal of Applied Informatics and Computing Vol. 9 No. 4 (2025): August 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i4.9971

Abstract

Keyword extraction is widely used in domains such as social media and e-commerce, but its application for enterprise document retrieval remains limited. Most organizations still depend on structured systems or rule-based approaches for indexing, which often lack semantic understanding and scalability. While several techniques like TextRank and RAKE have been explored, few studies assess their effectiveness on operational document retrieval in institutional settings, revealing a research gap. This study investigates the use of KeyBERT to extract keywords from university documents, including SOPs, manuals, and guidelines. KeyBERT leverages transformer-based embeddings to generate semantically relevant keywords and is chosen for its ease of use, model flexibility, and no need for labeled data. Additionally, it supports diversification strategies such as Maximum Marginal Relevance (MMR) and MaxSum to reduce redundancy and enhance keyword variety. We evaluate six embedding models combined with three keyword selection methods: Cosine similarity, MMR, and MaxSum. The best F1 score of 0.78 is achieved using Cosine with the paraphrase-MiniLM-L3-v2 model, along with an average extraction time of 184.02 seconds. These findings highlight the effectiveness of combining lightweight embeddings with strategic keyword selection for enterprise-scale document indexing.