cover
Contact Name
Mesran
Contact Email
mesran.skom.mkom@gmail.com
Phone
-
Journal Mail Official
jurnal.bits@gmail.com
Editorial Address
-
Location
Kota medan,
Sumatera utara
INDONESIA
Building of Informatics, Technology and Science
ISSN : 26848910     EISSN : 26853310     DOI : -
Core Subject : Science,
Building of Informatics, Technology and Science (BITS) is an open access media in publishing scientific articles that contain the results of research in information technology and computers. Paper that enters this journal will be checked for plagiarism and peer-rewiew first to maintain its quality. This journal is managed by Forum Kerjasama Pendidikan Tinggi (FKPT) published 2 times a year in Juni and Desember. The existence of this journal is expected to develop research and make a real contribution in improving research resources in the field of information technology and computers.
Arjuna Subject : -
Articles 953 Documents
Collaboration between Convolutional Neural Network and Semantic Search for English Hadith Search Using Automatic Topic Classification, TF-IDF, and Sentence-BERT Razaka, Akmal Sidki; Lhaksmana, Kemas Muslim
Building of Informatics, Technology and Science (BITS) Vol 7 No 3 (2025): December 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i3.8861

Abstract

This research was conducted with the intention of developing an English-language hadith search system that is not only syntactically accurate, but also contextually appropriate. The system was developed using a combination of convolutional neural networks (CNN) and two text representation methods, namely Term Frequency–Inverse Document Frequency (TF-IDF) and Sentence-BERT (SBERT). CNN is used to classify hadiths into seven main categories based on chapter titles. In the semantic retrieval stage, TF-IDF and SBERT were utilized to represent the text of the hadith and user queries, then both were evaluated using cosine similarity. Testing was conducted using five queries commonly used in Islamic studies, then evaluated manually for semantic similarity. As a result, the tuned CNN achieved a classification accuracy of 94%. On the other hand, although the TF-IDF approach produced greater similarity results, SBERT proved to be superior in generating more relevant results in semantic searches. These results indicate that TF-IDF is superior in terms of speed, but SBERT is better at understanding sentence context in depth. This research contributes to the development of a meaning-based hadith search system and emphasizes the importance of a semantic approach in religious text search. Moving forward, system development can be directed toward multilingual support and evaluation on a larger scale.
Speech Emotion Classification Using MFCC Feature Extraction and Bagging-Based Ensemble Learning Haristyawan, Ivan; Arriyanti, Eka; Wahyuni, Wahyuni
Building of Informatics, Technology and Science (BITS) Vol 7 No 3 (2025): December 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i3.8878

Abstract

Speech emotion classification, also known as Speech Emotion Recognition (SER), has become increasingly important with the growing prevalence of human–machine interaction, particularly in the domains of healthcare, online education, and customer service. This study aims to develop a robust speech emotion classification system by employing Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction and a Decision Tree–based Bagging algorithm for classification. The proposed approach is designed to address the challenges of low classification accuracy, especially under speaker-independent conditions and limited availability of labeled emotional speech data. The research workflow includes speech signal preprocessing, MFCC feature extraction, dataset partitioning through bootstrapping, ensemble model training, and performance evaluation using accuracy, precision, recall, and F1-score metrics. Experimental results on a balanced dataset comprising five emotion classes (anger, disgust, fear, happy, and sad) demonstrate that the proposed model achieves an overall accuracy of 61.04%. While the fear and happy emotions are classified effectively with recall values of 0.75, the anger class exhibits the lowest performance with an F1-score of 0.49. Confusion matrix analysis further reveals substantial acoustic overlap among several emotion categories, particularly the frequent misclassification of sad as disgust or anger. In conclusion, the integration of MFCC features with the Bagging algorithm improves model stability and robustness; however, further optimization of acoustic features and hyperparameters is required to enhance overall classification accuracy.
Customer Sentiment Analysis of E-Commerce Products Using the Naïve Bayes Method and Word Embedding Harpad, Bartolomius; Azahari, Azahari; Salmon, Salmon
Building of Informatics, Technology and Science (BITS) Vol 7 No 3 (2025): December 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i3.8879

Abstract

This study discusses customer sentiment analysis toward e-commerce products using the Naïve Bayes method combined with Word Embedding techniques to enhance the semantic understanding of Indonesian-language customer reviews. The research background is based on the rapid growth of e-commerce, which has created a strong need to understand consumer opinions through online reviews. The main challenge in sentiment analysis lies in the complexity of natural language, such as the use of informal words, abbreviations, and diverse emotional expressions. This study utilizes 40,607 Tokopedia customer reviews across five product categories with three sentiment labels (positive, neutral, and negative). The research stages include data collection, text preprocessing (case folding, tokenization, stopword removal, stemming, and slang normalization), feature representation using Word2Vec and FastText, and classification using Multinomial Naïve Bayes. Experimental results show that the combination of Word2Vec and Naïve Bayes achieved an accuracy of 87.92%, while FastText and Naïve Bayes improved accuracy to 91.52%. The FastText-based model proved superior in handling morphological variations and non-standard spellings, making it more effective for Indonesian customer review texts. The WordCloud visualization reveals the dominance of positive words such as “sesuai” (appropriate), “barang” (item), and “cepat” (fast), indicating customer satisfaction regarding product conformity and service speed. The Confusion Matrix results indicate a bias toward the positive class due to data imbalance, where the model still struggles to recognize neutral and negative classes. Overall, this study demonstrates that integrating Word Embedding with Naïve Bayes enhances classification performance and provides richer semantic representations compared to traditional Bag of Words approaches. This approach has the potential to be applied in developing data-driven recommendation systems and marketing strategies within Indonesia’s e-commerce ecosystem.
Classification of Diabetes Diseases Based on Medical Features Using Optimized Support Vector Machine Arfyanti, Ita; Yusnita, Amelia; Adytia, Pitrasacha
Building of Informatics, Technology and Science (BITS) Vol 7 No 3 (2025): December 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i3.8880

Abstract

Diabetes mellitus is a chronic disease caused by impaired glucose metabolism and has become a global health threat with a steadily increasing prevalence each year. According to WHO and IDF, the number of people living with diabetes is projected to reach 783 million by 2045. This condition demands the development of an accurate and efficient early detection system to support medical decision-making. This study aims to develop an optimized Support Vector Machine (SVM)-based classification model to enhance the accuracy and interpretability of diabetes prediction. The dataset used is the Pima Indians Diabetes Dataset, which consists of eight medical features such as glucose level, blood pressure, and body mass index (BMI). The research stages include data preprocessing, class balancing using the Synthetic Minority Over-sampling Technique (SMOTE), parameter optimization with GridSearchCV, and interpretability analysis through SHapley Additive exPlanations (SHAP). The results show that the optimized SVM model with the Radial Basis Function (RBF) kernel achieved an accuracy of 82%, with a significant improvement in the diabetes class recall value from 0.564 to 0.83 after optimization. The Area Under Curve (AUC) value of 0.871 indicates the model’s effectiveness in distinguishing between positive and negative classes. The SHAP analysis reveals that Glucose, Age, BMI, and Diabetes Pedigree Function are the most influential features in prediction. These findings emphasize that the combination of normalization, balancing, hyperparameter optimization, and interpretability produces a reliable and transparent SVM model. This model has strong potential for implementation in Clinical Decision Support Systems (CDSS) for accurate and explainable early diabetes detection.
Perbandingan Model Naïve Bayes, Logistic Regression, SVM, XGBoost, dan SVM-XGBoost untuk Analisis Sentimen Tunaiku Melapa, Yabes Aryanto; Wibowo, Setyoningsih; Sari, Nur Latifah Dwi Mutiara
Building of Informatics, Technology and Science (BITS) Vol 7 No 3 (2025): December 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i3.8914

Abstract

Sentiment analysis is used to explore user perceptions of fintech services such as Tunaiku through the evaluation of customer reviews. This study specifically aims to compare the performance of several sentiment classification algorithms to determine the most optimal model for classifying Tunaiku app user reviews. The dataset used in this study is a collection of Tunaiku app user reviews obtained from the Google Play Store, with a total of 18,458 reviews. This study compares the performance of five classification algorithms, namely Naïve Bayes, Logistic Regression, Support Vector Machine (SVM), XGBoost, and a hybrid SVM-XGBoost model. The research stages include text preprocessing, feature extraction using TF-IDF, and the application of a validated classification model using the cross-validation method. Model performance evaluation is carried out based on accuracy, precision, recall, and F1-score metrics. The test results showed that Naïve Bayes (91.96%), Logistic Regression (92.81%), SVM (92.56%), and XGBoost (92.52%) provided good performance, while the hybrid SVM-XGBoost model produced the best performance with the highest accuracy of 93.05%. These findings indicate that the hybrid approach is more effective in analyzing user review sentiment and has the potential to be a basis for decision-making in improving Tunaiku's service quality according to user needs.
Decision Support System for Selecting the Best Head of Study Program Using the MOORA and MOOSRA Methods Karim, Abdul; Hidayatullah, Muhammad; Kurniawan Nasution, Muhammad Bobbi; Esabella, Shinta
Building of Informatics, Technology and Science (BITS) Vol 7 No 3 (2025): December 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i3.8928

Abstract

The Head of the Study Program is one of the most important parts of a university. The Head of the Study Program is also the highest leader within the study program structure. The role of the Head of the Study Program is as an organizational unit that is responsible for the administration of the study program they lead. The Head of the Study Program is tasked with coordinating all study program activities, as well as managing lecture schedules, practicum schedules, and lecture evaluation results. The selection of the Head of the Study Program requires precise accuracy to avoid errors in the selection process. The stability of a study program heavily depends on the role and reputation of its lecturers, especially the lecturer responsible for the core courses of that study program. Therefore, the participation of lecturers is highly necessary in the selection of the Head of the Study Program. Since the higher education management is also interested in the selection process, methodological assistance is needed to accommodate the aspirations of the lecturers and the interests of the university management. The reward system is a crucial element for motivation toward a better direction, aiming to further increase performance. This reward system is expected to encourage the performance of the Head of the Study Program to be more productive, so that the vision and mission for achieving the development of a university can be properly attained and implemented.
Prediksi Periode Fosil Trilobita Menggunakan XGBoost dengan Seleksi Fitur Geologi–Geospasial dan Hyperparameter Tuning Ramadhan, Naufal Rizky; Pramudya, Elkaf Rahmawan
Building of Informatics, Technology and Science (BITS) Vol 7 No 4 (2026): March 2026
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i4.8862

Abstract

This study investigates the application of the Extreme Gradient Boosting (XGBoost) algorithm to predict the age period of trilobite fossils based on geological and geospatial data. The challenges addressed in this research include the high complexity of paleontological data, the presence of missing values, and class imbalance in the target variable time_period, which can negatively affect predictive performance. The objective of this study is to develop an accurate and robust fossil age prediction model through systematic data preprocessing, feature selection, and model optimization. The dataset used in this research was obtained from Kaggle and consists of the attributes longitude, latitude, lithology, environment, and collection_type as the main features. The research workflow includes data cleaning, missing value imputation, categorical feature encoding, data splitting using stratified train–test split, and class imbalance handling through a class weight adjustment approach. The XGBoost model was trained on the training dataset and further optimized using RandomizedSearchCV to obtain the optimal hyperparameter configuration. Evaluation results on the testing dataset show that the tuned XGBoost model achieved an accuracy of 95%, precision of 90%, recall of 93%, and an F1-score of 91%, outperforming the model without hyperparameter tuning. These results demonstrate that the integration of geological–geospatial feature selection and hyperparameter tuning in XGBoost is effective in improving the performance of trilobite fossil age period prediction. The results of this study are expected to serve as a computational support approach in paleontology to assist fossil period determination in a more objective, efficient, and data-driven manner.
Reversible Data Hiding Citra MRI T1-Weighted Menggunakan Spatial Fuzzy C-Means dan Selective Histogram Shifting Suharyoto, Aufa Fadholi; Pramudya, Elkaf Rahmawan
Building of Informatics, Technology and Science (BITS) Vol 7 No 4 (2026): March 2026
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i4.8863

Abstract

The transmission of medical images over telemedicine networks increases the risk of data leakage and manipulation of sensitive information. This study develops a Reversible Data Hiding framework that integrates Spatial Fuzzy C-Means, Selective Histogram Shifting, and a measurable Distortion Control Mechanism for securing T1-weighted brain MRI images. The proposed method prioritizes the preservation of Region of Interest intensity characteristics and full reversibility over embedding capacity. SFCM is employed to generate Region of Interest and Non-Region of Interest mappings based on intensity distribution, with adaptive parameter adjustment for each slice. Data embedding is performed selectively on NROI using histogram shifting, while ROI areas remain unmodified. An Adaptive Feedback Control mechanism monitors image quality metrics SNR, CNR, GLCM with conservative thresholds (ΔSNR ≤ 2.0%, ΔCNR ≤ 1.0%) to ensure ROI stability. Experimental evaluation on the OASIS-1 dataset shows that the proposed method achieves an average PSNR of 54.13 dB, SSIM of 0.9996, and NCC of 0.9999, with an embedding capacity of 630 bits per slice (BPP 0.007-0.013 within NROI). Reversibility verification confirms perfect recovery (maximum difference = 0) for all samples. Batch testing on five slices demonstrates consistent performance across varying intensity characteristics, with ΔSNR and ΔCNR remaining at 0.0%. These results indicate that the method is capable of maintaining ROI technical integrity and pixel-perfect reversibility, although with a limited capacity suitable for lightweight metadata such as integrity hashes and patient identifiers. Limitations of the study include the technical-only evaluation without radiologist clinical validation and testing restricted to T1-weighted MRI modality.
Deteksi Malware Android Berbasis Ensemble Soft Voting LightGBM, Logistic Regression dan CatBoost Danendra, Ardian; Pramudya, Elkaf Rahmawan
Building of Informatics, Technology and Science (BITS) Vol 7 No 4 (2026): March 2026
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i4.8865

Abstract

The Android operating system faces serious challenges with increasingly complex and diverse malware evolution. This research proposes an Android malware detection system based on soft voting ensemble that integrates three algorithms (LightGBM, Logistic Regression, and CatBoost) to improve detection accuracy while maintaining computational efficiency. The dataset used is CCCS-CIC-AndMal-2020, which is highly imbalanced with over 400,000 Android application samples. The proposed model leverages hybrid features that combine static information (permissions, intents, API calls from the AndroidManifest) with dynamic behavior (memory activities, runtime API calls, logcat, and network traffic in an emulated environment), balancing low extraction cost with improved robustness against obfuscation. The methodology includes multi-stage preprocessing (IQR capping 40×, StandardScaler, RFE 150 features, SMOTE 30%) to improve data quality and reduce dimensionality by 56% without losing important information. The ensemble model is trained with F1-Macro-based weights (33.46% LightGBM, 30.99% Logistic Regression, 35.55% CatBoost) approximating 1:1:1 proportion. Evaluation results on the testing set demonstrate very high performance: Accuracy 95.58%, Balanced Accuracy 92.21%, F1-Macro 0.9208, True Positive Rate 100%, and False Alarm Rate 0.00%. The combination of these metrics indicates that the model can detect all malware samples without false positives on benign applications, making it suitable for production deployment. This research contributes by demonstrating the effectiveness of an efficient soft voting ensemble (only 3 models) for Android malware detection with multi-dimensional evaluation metrics representative of imbalanced data.
Perbandingan Kinerja Algoritma CatBoost, XGBoost, LightGBM dan Random Forest Dalam Memprediksi Risiko Infeksi Aids Dalam Dataset Kesehatan Yulianto, Pramudya Ridwan; Astuti, Yani Parti
Building of Informatics, Technology and Science (BITS) Vol 7 No 4 (2026): March 2026
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i4.8975

Abstract

This study investigates the prediction of AIDS infection risk using tree-based algorithms CatBoost, XGBoost, LightGBM, and Random Forest applied to a medical and demographic dataset consisting of 2,139 observations and 23 variables. The research process includes data exploration, cleaning, handling extreme values using the interquartile range (IQR) method, normalization with RobustScaler, and class balancing using the Synthetic Minority Over-sampling Technique (SMOTE). Due to the imbalanced nature of the dataset, model evaluation emphasizes not only accuracy but also Recall, F1-Score, and AUC-ROC to better assess infected class detection. Prior to SMOTE implementation, all models achieved high accuracy but relatively low recall for the positive class; after resampling, CatBoost demonstrated the most significant improvement, with recall increasing from 63% to 77% and F1-Score from 72% to 79%, achieving an overall accuracy of 90%. In comparison, XGBoost reached an accuracy of 88.63% with a more moderate recall improvement, while LightGBM and Random Forest showed consistent yet smaller gains, indicating that the combination of SMOTE and CatBoost is more effective in minimizing False Negatives in AIDS infection cases. The main contribution of this study lies in the integration of robust outlier handling, feature normalization, and class balancing within a structured experimental framework, with a specific emphasis on sensitivity optimization to enhance early detection reliability in clinical screening contexts.