cover
Contact Name
Mesran
Contact Email
mesran.skom.mkom@gmail.com
Phone
-
Journal Mail Official
jurnal.bits@gmail.com
Editorial Address
-
Location
Kota medan,
Sumatera utara
INDONESIA
Building of Informatics, Technology and Science
ISSN : 26848910     EISSN : 26853310     DOI : -
Core Subject : Science,
Building of Informatics, Technology and Science (BITS) is an open access media in publishing scientific articles that contain the results of research in information technology and computers. Paper that enters this journal will be checked for plagiarism and peer-rewiew first to maintain its quality. This journal is managed by Forum Kerjasama Pendidikan Tinggi (FKPT) published 2 times a year in Juni and Desember. The existence of this journal is expected to develop research and make a real contribution in improving research resources in the field of information technology and computers.
Arjuna Subject : -
Articles 974 Documents
Perbandingan Kinerja XGBoost dan Naive Bayes dalam Analisis Sentimen Komentar TikTok Terhadap Ibu Kota Nusantara (IKN) pada Data Tidak Seimbang Novi Purnamasari; Nirwana Hendrastuty
Building of Informatics, Technology and Science (BITS) Vol 7 No 4 (2026): March 2026
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i4.9488

Abstract

The growth of social media has generated diverse public responses regarding the development of Indonesia’s new capital city, Ibu Kota Nusantara (IKN), particularly on TikTok, a platform with high user interaction. This study aims to compare the performance of Naive Bayes and eXtreme Gradient Boosting (XGBoost) algorithms in sentiment analysis of TikTok comments related to IKN development under imbalanced data conditions. The dataset consists of 1,132 comments that underwent preprocessing, including case folding, text cleaning, tokenization, normalization, and stemming. Feature extraction was performed using the Term Frequency–Inverse Document Frequency (TF-IDF) method, generating 1,926 features to represent word importance. The classification process used an 80:20 split for training and testing data. The results show that Naive Bayes achieved an accuracy of 61.23%, while XGBoost obtained a slightly higher accuracy of 62.11%. XGBoost improved recall in the negative class (from 0.21 to 0.40) and neutral class (from 0.11 to 0.26), although the improvement remains limited. The difference in accuracy between the models is relatively small and does not indicate a significant overall performance improvement. This study is limited by the relatively small dataset size and imbalanced class distribution, which may affect data representativeness and model generalization. Therefore, the results are not yet optimal for broader real-world applications.
Predictive Modeling of National University Rankings Using Ensemble Machine Learning and Multi-Dimensional Institutional Performance Indicators: Evidence from Japan Bernadus Gunawan Sudarsono; Raditya Galih Whendasmoro
Building of Informatics, Technology and Science (BITS) Vol 7 No 4 (2026): March 2026
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i4.9525

Abstract

The global higher education landscape is becoming increasingly competitive in attracting outstanding students, qualified faculty, and international research collaborations. University ranking systems serve as strategic instruments for assessing institutional performance and as a basis for public policy. However, traditional ranking approaches employing linear aggregate scores often oversimplify the complex relationships among indicators such as research, internationalization, and graduate outcomes. This study develops a data-driven predictive model to map the non-linear relationships among university performance indicators. The research employs a quantitative predictive analytics approach using a dataset of 52 Japanese universities from the 2024–2026 period, encompassing the variables Research_Impact_Score, Employment_Rate, Intl_Student_Ratio, Institution_Age, Institution_Type, and Region, with National_Rank as the target variable. The research stages include data preprocessing (handling missing values, encoding, scaling), feature engineering (including Institutional Age), regression model development (Linear, Ridge, Lasso, SVR) as well as ensemble models (Random Forest and Gradient Boosting), evaluation using RMSE, MAE, and R², and explainable analysis based on feature importance. The results indicate that the Gradient Boosting model delivers the best performance with an RMSE of 1.175117, MAE of 1.087856, and R² of 0.994988, followed by Random Forest with an RMSE of 1.436536 and R² of 0.992510. Traditional linear regression models demonstrate significantly lower performance (R² 0.657519), confirming the superiority of non-linear approaches in modeling complex relationships among indicators. Stability testing using K-Fold Cross Validation yields an average RMSE of 1.1045 with a difference of 0.4493 between folds, indicating model consistency. Feature contribution analysis reveals that Research_Impact_Score is the dominant factor with a contribution of 97.94%, followed by Employment_Rate at 1.81%, while internationalization indicators and geographical factors contribute minimally. These findings confirm that research performance constitutes the primary determinant of university rankings, whereas employability and internationalization serve as supporting factors. This study demonstrates that ensemble-based machine learning models are effective in predicting national rankings accurately and interpretably. This approach offers a multidimensional evaluation framework that is more representative than linear aggregate scores, and provides policy implications for enhancing research quality, curriculum relevance, and internationalization strategies of higher education institutions.
Evaluasi KNN dan Logistic Regression untuk Klasifikasi Diabetes dengan Preprocessing Terstandarisasi: Trade-off Kinerja dan Interpretabilitas Alif Zayyin Kamandani; Egia Rosi Subhiyakto
Building of Informatics, Technology and Science (BITS) Vol 7 No 4 (2026): March 2026
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i4.9534

Abstract

Although K-Nearest Neighbors (KNN) and Logistic Regression have been widely used in diabetes classification, studies that systematically combine a standardized preprocessing pipeline—including median imputation, feature standardization, and stratified data splitting—and evaluate the trade-off between predictive performance and model interpretability remain limited. This study aims to compare the performance of both algorithms in classifying diabetes status using the Pima Indians Diabetes dataset, which consists of 768 samples with eight numerical attributes. The research stages include data exploration, handling missing values using median imputation, feature standardization using StandardScaler, and stratified data splitting with a ratio of 80:20. Model evaluation is conducted using accuracy, precision, recall, F1-score, confusion matrix, and ROC-AUC metrics. The experimental results show that KNN with an optimal parameter of K=21 achieves an accuracy of 75.97%, an F1-score of 61.86%, and a ROC-AUC of 0.8120, while Logistic Regression achieves an accuracy of 70.78%, an F1-score of 54.55%, and a ROC-AUC of 0.8130. Although KNN demonstrates higher predictive performance, Logistic Regression provides advantages in interpretability through model coefficients, where the variables Glucose (β=1.1825) and BMI (β=0.6887) are identified as the main predictors of diabetes risk. These findings indicate a clear trade-off between accuracy and interpretability, suggesting that KNN is more suitable for high-accuracy prediction tasks, while Logistic Regression is more appropriate in clinical contexts requiring transparency and model accountability.
Hybrid Feature Selection with Metaheuristics for Improving the Accuracy of Diabetes Disease Prediction Ida Maratul Khamidah; Suci Ramadhani; Aulia Khoirunnita
Building of Informatics, Technology and Science (BITS) Vol 7 No 4 (2026): March 2026
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i4.9541

Abstract

Early diagnosis of diabetes mellitus is crucial to prevent severe complications and reduce long-term healthcare costs, making accurate and efficient predictive models an important research focus in medical data analytics. However, one of the main challenges in diabetes prediction lies in the presence of irrelevant and redundant features within medical datasets, which can degrade classification accuracy, increase computational complexity, and reduce model generalizability. To address this issue, this study proposes a Hybrid Feature Selection (HFS) approach that integrates filter-based methods and meta-heuristic optimization to identify an optimal subset of features for diabetes prediction. In the proposed framework, statistical filter techniques combining Chi-square and Mutual Information are first employed to rank and reduce feature dimensionality by selecting the most relevant attributes. Subsequently, a Genetic Algorithm (GA) is applied to further optimize the feature subset by maximizing classification accuracy while minimizing the number of selected features. The effectiveness of the proposed HFS approach is evaluated using the Pima Indian Diabetes Dataset, consisting of 768 instances and 8 clinical features, and tested across multiple machine learning classifiers, including Random Forest, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and XGBoost. Experimental results demonstrate that the proposed HFS significantly improves predictive performance compared to baseline models without feature selection. Specifically, the Random Forest classifier achieved the highest accuracy of 79.22%, compared to 74.03% in the baseline model, representing an improvement of approximately 5.2%. Additionally, notable improvements were observed in F1-score and AUC, with AUC increasing from 0.8336 to 0.8403. Beyond accuracy gains, the proposed method reduced feature dimensionality from 8 to 5 features, resulting in lower computational cost and faster model training time. These findings indicate that the hybrid integration of filter-based selection and meta-heuristic optimization provides a robust and efficient solution for feature selection in medical prediction tasks. Overall, the proposed HFS framework offers a promising approach for developing accurate, efficient, and reliable decision-support systems for early diabetes diagnosis.