Scientific Journal of Informatics
Vol. 9 No. 1 (2024): Jurnal Ilmiah Informatika

Prediksi Popularitas Novel Berbasis Fitur-Fitur Teks Menggunakan Metode Random Forest

Nadya Elfareta Azarin (Universitas Halu Oleo)
Rizal Adi Saputra (Universitas Halu Oleo)
Subardin Subardin (Universitas Halu Oleo)



Article Info

Publish Date
17 May 2025

Abstract

In today's digital era, a novel's popularity is often measured by reader response and sales. This research aims to develop a novel popularity prediction model based on text features to provide insights to authors and publishers about the factors that influence reader acceptance. The method used in this research is Random Forest, a machine learning algorithm that can handle classification and regression well. The main goal of this research is to develop a predictive model that can identify key factors that contribute to the popularity of novels. The proposed method integrates text features, such as keyword extraction and sentiment analysis, in a Random Forest framework to predict popularity with high accuracy. The dataset used consists of various novel information, including title, genre, number of pages, and text features such as summary or description. Data is preprocessed to address issues such as missing values ​​and duplicates. Feature extraction is carried out by applying tokenization, stemming, and converting text into TF-IDF vectors. A Random Forest model was built incorporating these features, and the model parameters were optimized through a cross-validation process. The dataset used consists of various novel information, including title, genre, number of pages, and text features such as summary or description. Data is preprocessed to address issues such as missing values ​​and duplicates. Feature extraction is carried out by applying tokenization, stemming, and converting text into TF-IDF vectors. A Random Forest model was built incorporating these features, and the model parameters were optimized through a cross-validation process. The experimental results show that the Random Forest model is able to predict the popularity of novels with a satisfactory level of accuracy. Text features, such as keyword frequency and sentiment analysis, proved significant in their contribution to the predictive ability of the model. These findings provide valuable insight to authors and publishers in understanding reader preferences and the potential success of a novel.

Copyrights © 2024






Journal Info

Abbrev

JIMI

Publisher

Subject

Computer Science & IT

Description

Topics cover the following areas (but are not limited to): 1. Information Technology (IT) a. Software engineering b. Game c. Information Retrieval d. Computer network e. Telecommunication f. Internet g. Wireless technology h. Network security i. Multimedia technology j. Mobile Computing k. ...