Arhan Windu Rizki Putra Budianto
Politeknik Negeri Malang

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

AI Based Digital Book Indexing System Using YAKE and WORD2VEC Methods Mohammad Alfarizi Abdullah; Ulla Delfana Rosiani; Vit Zuraida; Arhan Windu Rizki Putra Budianto; Rizki Putri Ramadhani
Journal of Informatics and Vocational Education Vol. 9 No. 1 (2026): Journal of Informatics and Vocational Education - March
Publisher : Informatics Education Department, Faculty of Teacher Training and Education, Universitas Sebelas Maret

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20961/joive.v9i1.3026

Abstract

Polinema Press, the publishing unit of the State Polytechnic of Malang (Polinema), requires an efficient solution for automatically generating book indexes. The current manual indexing process is time-consuming and inefficient. This research aims to develop an AI-based automatic indexing system utilizing the YAKE (Yet Another Keyword Extractor) and Word2Vec methods to improve the accuracy and efficiency of index generation. The system is designed to process digital books in PDF format through several stages: (1) text preprocessing (text extraction, stopword removal, tokenization), (2) keyword extraction using YAKE based on statistical features such as word frequency and position, (3) final keyword selection by measuring semantic similarity using Word2Vec, and (4) alphabetical index compilation along with page numbers where keywords appear. The indexing results are evaluated by comparing them with manual indexes using cosine similarity to measure the degree of similarity. This research has been tested on 37 digital books and resulted in the best configuration in the combination of YAKE and Word2Vec with phrases of 2-3 words, which obtained cosine similarity values of up to 0.91, precision of up to 0.38, and average processing time of less than 4 seconds per document. These results show that the system is able to produce relevant, fast, and contextual indexes when compared to manual indexes, and is expected to reduce the manual workload at Polinema Press and become a reference for the application of natural language processing (NLP) technology for Indonesian-language documents.
Automatic Indexing of Digital Books using RAKE and Word2Vec Arhan Windu Rizki Putra Budianto; Ulla Delfana Rosiani; Vit Zuraida; Rizki Putri Ramadhani; Mohammad Alfarizi Abdullah
Journal of Informatics and Vocational Education Vol. 9 No. 1 (2026): Journal of Informatics and Vocational Education - March
Publisher : Informatics Education Department, Faculty of Teacher Training and Education, Universitas Sebelas Maret

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20961/joive.v9i1.3050

Abstract

Manual indexing of digital books is time-consuming and prone to inconsistency. To address this, this study developed an automatic indexing system using RAKE (Rapid Automatic Keyword Extraction) method and Word2Vec. The system accepts PDF files as input, performs text preprocessing, and extracts key phrases using RAKE. These phrases are subsequently filtered based on semantic relevance to the specified topic using an Indonesian-language Word2Vec model. Users can manually add phrases and select relevant ones to be included in the final index. The resulting index includes phrases, page numbers, and relevance scores, which are inserted as an additional page at the end of the PDF document. Evaluation was conducted by comparing the system-generated index with the author’s manual index using precision, recall, and cosine similarity metrics. The results indicate that although precision and recall were very low, a cosine similarity score of 0.69 suggests a semantic similarity between the system output and the author’s index.