cover
Contact Name
Mesran
Contact Email
mesran.skom.mkom@gmail.com
Phone
-
Journal Mail Official
jurnal.bits@gmail.com
Editorial Address
-
Location
Kota medan,
Sumatera utara
INDONESIA
Building of Informatics, Technology and Science
ISSN : 26848910     EISSN : 26853310     DOI : -
Core Subject : Science,
Building of Informatics, Technology and Science (BITS) is an open access media in publishing scientific articles that contain the results of research in information technology and computers. Paper that enters this journal will be checked for plagiarism and peer-rewiew first to maintain its quality. This journal is managed by Forum Kerjasama Pendidikan Tinggi (FKPT) published 2 times a year in Juni and Desember. The existence of this journal is expected to develop research and make a real contribution in improving research resources in the field of information technology and computers.
Arjuna Subject : -
Articles 926 Documents
Penerapan Algoritma Random Forest dalam Prediksi Curah Hujan untuk Mendukung Analisis Cuaca Torhino, Rizal; Andono, Pulung Nurtantio
Building of Informatics, Technology and Science (BITS) Vol 6 No 3 (2024): December 2024
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i3.6404

Abstract

Indonesia's climate diversity leads to different rainfall patterns in each region. This condition presents a major challenge in the effort to produce accurate rainfall predictions, which are important to support effective infrastructure planning and disaster mitigation. The purpose of this research is to analyze the rainfall potential in Purwodadi Sub-district using Random Forest algorithm. In this analysis, several weather parameters such as air pressure, temperature, humidity, and wind speed are used, while rainfall becomes the target variable in the prediction process. The dataset used in this study was obtained from NASA Prediction Of Worldwide Energy Resources (POWER) with a time period between 2000 and 2022. The data is then divided into 70% for training data and 30% for test data. In this study, the Random Forest algorithm was used to classify the likelihood of rain based on existing weather conditions. The implementation results showed that the Random Forest model achieved 100% accuracy on the training data and 92% on the test data, indicating excellent prediction performance. Results from the confusion matrix confirmed that the majority of the model predictions matched the actual data. This finding shows that the weather parameters used are effective in predicting rainfall in Purwodadi sub-district. This research contributes to improving the accuracy of rainfall prediction and opens up opportunities for the development of better weather prediction models, involving more parameters or using other algorithms for more in-depth performance evaluation.
Support Vector Machine and Naïve Bayes for Personality Classification Based on Social Media Posting Patterns Nugroho, Bayu Seno; Maharani, Warih
Building of Informatics, Technology and Science (BITS) Vol 6 No 3 (2024): December 2024
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i3.6411

Abstract

This research investigates the use of Support Vector Machine (SVM) and Naive Bayes models to classify the personality traits based on the social media posting patterns. This study integrates textual features obtained from the Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) methods, and along with the feature expansion using the Linguistic Inquiry and Word Count (LIWC) tool, to assess their influence on accuracy Classification Personality characteristics were mapped from social media posts using the Big Five Inventory (BFI-44). The research findings show that the SVM model in which uses the TF-IDF + LIWC feature set, provides the best performance, and achieve 76.60% of accuracy on the base model with a linear kernel. In comparison to the Naive Bayes model performed best with the same feature set, achieving 59.57% accuracy with a smoothing parameter of 1xE-2. Although the oversampling improved recall and precision, the undersampling was found to have a negative effect on model performance. These findings highlight the benefits of combining TF-IDF and LIWC features which improve model effectiveness, with SVM producing the best overall results in personality classification from social media data.
Peningkatan Akurasi Temu Kembali Citra Berbasis Konten dengan Modifikasi Kontras Histogram Equalization dan Fast Fourier Transform Hartono, Budi; Lusiana, Veronica; Eniyati, Sri
Building of Informatics, Technology and Science (BITS) Vol 6 No 3 (2024): December 2024
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i3.6418

Abstract

Image retrieval is a way to search for images in an image database based on the content or contents of the image or Content-Based Image Retrieval (CBIR). This study aims to develop a retrieval system using Fast Fourier Transform (FFT) for image texture feature extraction. The test image and image database consist of four Batik motif textures—contrast modification using Histogram Equalization. The level of similarity between the test image and the image database is calculated using Manhattan Distance. The study results show a difference in the accuracy of the retrieval results between images without and with contrast modification. In images with contrast modification, the accuracy of the search results increases by 71.4%. System performance is evaluated based on the level of accuracy calculated using the Precision, Recall, and F1-score values. Further research is still needed to test the accuracy of image retrieval results, especially in pre-processing image textures with other batik motifs.
Predicting Diabetes with Machine Learning: Evaluating Tree-Based and Ensemble Models with Custom Metrics and Statistical Validation Airlangga, Gregorius
Building of Informatics, Technology and Science (BITS) Vol 6 No 3 (2024): December 2024
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i3.6419

Abstract

This study investigates the predictive performance of machine learning models in diagnosing diabetes using the Pima Indians Diabetes Dataset. Seven models, including Logistic Regression, Random Forest, Gradient Boosting, XGBoost, LightGBM, Stacking Classifier, and Voting Classifier, were evaluated. A 10-fold cross-validation strategy was employed to ensure robust and reliable performance assessment. The evaluation incorporated standard metrics such as accuracy, precision, recall, F1 score, and ROC AUC, as well as a custom metric designed to prioritize recall while maintaining precision, addressing the clinical importance of minimizing false negatives. LightGBM and Random Forest emerged as the top-performing individual models, achieving competitive scores across metrics. Ensemble methods, particularly the Stacking Classifier, demonstrated robustness by leveraging the complementary strengths of base models. Statistical validation using the Friedman test confirmed significant differences in model rankings, with a test statistic of 22.77 and a p-value of 0.00088. However, pairwise comparisons using the Wilcoxon signed-rank test revealed that the differences between top models, such as LightGBM and Random Forest, were not statistically significant. These results emphasize the effectiveness of tree-based and ensemble models in addressing clinical diagnostic challenges. The study highlights the importance of using a custom metric to align model evaluation with clinical priorities. Future work should explore hybrid modeling approaches and larger datasets to further enhance predictive accuracy and generalizability in real-world healthcare applications.
Analisis Sentimen Opini Publik Program Makan Siang Gratis dengan Random Forest Pada Media Azhari, Muhamad; Parjito, Parjito
Building of Informatics, Technology and Science (BITS) Vol 6 No 3 (2024): December 2024
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i3.6423

Abstract

The "Free Lunch Program," introduced as part of the 2024 Indonesian election campaign, became a hot topic on social media, especially on the platform X. This program aims to improve children's health and nutrition while reducing stunting rates by providing free lunches and milk to children and pregnant women. A study was conducted to analyze public sentiment regarding the program using the Random Forest algorithm. The data consisted of 9,347 tweets collected through web crawling. The analysis revealed that the majority of sentiments were negative (8,021 entries), while positive sentiments accounted for only 430 entries. The preprocessing steps included data cleaning, case folding, tokenization, stopword removal, and stemming. The imbalance between positive and negative sentiment data was addressed using the Synthetic Minority Over-sampling Technique (SMOTE), resulting in a more balanced dataset. After applying SMOTE, the model achieved 100% accuracy, with significant improvements in precision, recall, and F1-Score. The analysis showed that positive sentiments focused on the program's health and educational benefits, while negative sentiments highlighted criticism of implementation and budget allocation. This study demonstrates the value of sentiment analysis in evaluating social programs and understanding public perceptions.
Performa Random Forest dan XGBoost pada Deteksi Penipuan E-Commerce Menggunakan Augmentasi Data CGAN Sarmini, Sarmini; Sunardi, Sunardi; Fadlil, Abdul
Building of Informatics, Technology and Science (BITS) Vol 6 No 3 (2024): December 2024
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i3.6430

Abstract

Fraud detection in e-commerce faces great challenges due to data imbalance, where legitimate transactions far outnumber fraudulent transactions. This research explores the use of Conditional Generative Adversarial Network (CGAN) to generate synthetic fraudulent transaction data to address the imbalance problem. By increasing the amount of data in the minority class, this research aims to improve the performance of two widely used machine learning algorithms, namely Random Forest and XGBoost. The dataset used of 23,634 transactions with 22,412 non-fraud transactions and 1,222 fraudulent transactions. Accuracy, precision, recall, and F1-score metrics were conducted to assess the performance of the model in detecting fraud on the imbalanced and augmented datasets. The results show that augmentation of data with CGAN significantly improves the performance of both models, especially in improving recall for fraudulent transactions. On the original unbalanced dataset, Random Forest and XGBoost showed low recall (12.81% and 13.08%), with accuracy of 95.35% and 95.32% respectively. However, after augmentation, recall improved to 95.15% for Random Forest and 95.22% for XGBoost, with F1-score of 97.47% and 97.42% respectively, and accuracy of 97.50% for Random Forest and 97.42% for XGBoost. XGBoost showed a slight advantage in precision and recall over Random Forest, especially on the augmented dataset. These findings confirm the effectiveness of CGAN as a data augmentation method in improving fraud detection performance and offer a robust solution to address data imbalance in the financial sector.
Analisis Sentimen Masyarakat terhadap Penggunaan Sepeda Listrik pada Anak-Anak di Media Sosial X Menggunakan Metode SVM Lestari, Rohmah Dewi; Isnain, Auliya Rahman
Building of Informatics, Technology and Science (BITS) Vol 6 No 3 (2024): December 2024
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i3.6431

Abstract

The use of electric bicycles among children is becoming increasingly popular in Indonesia. While offering practicality and mobility efficiency, their usage raises safety concerns, especially for children on the road. Public opinions on this issue are widely discussed on social media platform X (Twitter), with some supporting their use due to practicality and eco-friendliness, while others advocate stricter regulations to ensure children’s safety. This study analyzes public sentiment toward the use of electric bicycles for children using the Support Vector Machine (SVM) method. Data was collected through a crawling process on social media X using the Tweet Harvest tool, resulting in 3,565 entries. The data underwent preprocessing and translation into English for sentiment analysis using TextBlob. Sentiments were labeled, identifying 1,737 negative sentiments (64.24%) and 967 positive sentiments (35.76%). The dataset was divided into 80% for training and 20% for testing. An SVM model with a linear kernel was applied for classification. Performance evaluation using a confusion matrix showed 0.84 accuracy, precision scores of 0.84 (negative) and 0.85 (positive), recall scores of 0.92 (negative) and 0.71 (positive), and F1-scores of 0.88 (negative) and 0.78 (positive). The findings reveal that public sentiment predominantly reflects concerns about children’s safety risks.
A Comparative Analysis of Diabetes Prediction through Deep Learning Architectures Airlangga, Gregorius
Building of Informatics, Technology and Science (BITS) Vol 6 No 3 (2024): December 2024
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i3.6446

Abstract

Diabetes prediction plays a vital role in healthcare, enabling early diagnosis and timely interventions to mitigate the risks associated with the disease. This study investigates the application of advanced machine learning architectures to predict diabetes using the Pima Indians Diabetes Dataset, a widely used benchmark for medical diagnostics. Five models: Deep Neural Network (DNN), Convolutional Neural Network (CNN) with Attention, LSTM with Residual Connections, Bidirectional LSTM (BiLSTM) with Attention, and GRU with Dense Layers were developed and evaluated on multiple performance metrics, including accuracy, precision, recall, F1 score, and ROC AUC. A stratified five-fold cross-validation strategy was employed to ensure robustness, while SHAP analysis was conducted to enhance interpretability. Among the models, the GRU with Dense Layers achieved superior performance, recording the highest accuracy (76.17%), F1 score (69.85%), and ROC AUC (83.52%). SHAP analysis revealed Glucose as the most influential feature, with significant interactions identified between Glucose and Pregnancies, aligning with established medical insights. Statistical analysis confirmed the reliability of the results, with all metrics demonstrating statistically significant improvements over a baseline of random chance (p < 0.05). These findings underscore the efficacy of GRU-based models in capturing complex patterns in medical data while maintaining computational efficiency. Future work will explore hybrid architectures and larger datasets to enhance generalizability and real-world applicability, contributing to more effective decision-making in healthcare.
Perbandingan Model Machine Learning dalam Analisis Sentimen Pada Kasus Monkeypox di Media Sosial X Prasetyoningrum, Devi; Andono, Pulung Nurtantio
Building of Informatics, Technology and Science (BITS) Vol 6 No 3 (2024): December 2024
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i3.6447

Abstract

Monkeypox or MPOX, is a zoonotic disease caused by the monkeypox virus, a member of the genus Orthopoxvirus. Monkeypox became a global concern after cases of transmission were reported in various countries, sparking widespread discussion on social media X. This platform is often used by the public to disseminate information and express concerns related to the disease. This study aims to compare the performance of several models in sentiment analysis related to the Monkeypox case on social media X. The models tested include Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Naïve Bayes, and Random Forest (RF). The data used consisted of tweets containing opinions or information about Monkeypox, which were then processed through the stages of text normalization, remove stopwords, and stemming. Furthermore, feature weighting was carried out using the TF-IDF technique and feature selection using the Chi-Square method, resulting in an optimal number of features of 652. The results of the analysis show that SVM provides the highest accuracy of 83%, with a 3% increase from the previous number of features, which was 500. Although KNN and Naïve Bayes showed significant improvements, Random Forest did not experience any significant changes in their performance. The study concluded that SVM is the most effective model in analyzing Monkeypox-related sentiment on social media X. For future research, it is recommended to explore deep learning techniques and the use of larger datasets to improve the accuracy and depth of sentiment analysis.
Perbandingan Efficientnet, Visual Geometry Group 16, dan Residual Network 50 Untuk Klasifikasi Kendaraan Bermotor Andrianto, Andrianto; Tahyudin, Imam; Karyono, Giat
Building of Informatics, Technology and Science (BITS) Vol 6 No 3 (2024): December 2024
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i3.6450

Abstract

This study compares the performance of three Convolutional Neural Network (CNN) models—EfficientNet, VGG16, and ResNet50—in motor vehicle classification tasks using the "Car vs Bike" dataset. Transfer learning was applied using pretrained weights from ImageNet. The results indicate that VGG16 achieved the best performance with 95% accuracy, precision of 0.95, recall of 0.96, and an F1-score of 0.95, demonstrating high balance in recognizing both classes. ResNet50 attained 87% accuracy on the test dataset with a precision of 0.89, recall of 0.84, and an F1-score of 0.87, offering a trade-off between accuracy and computational efficiency. Conversely, EfficientNet exhibited the lowest performance with 50% accuracy, failing to recognize the "Car" class effectively, as evidenced by precision and recall values of 0.00. Factors such as architectural complexity, dataset bias, and computational efficiency influenced these outcomes. This study reinforces previous findings on the strengths and weaknesses of CNN models in motor vehicle classification applications. Furthermore, it highlights the importance of balanced data management and model selection tailored to specific application requirements. However, the dataset's limitation of only two classes and reliance on transfer learning remain areas for future improvement. These findings provide valuable insights for developing intelligent transportation systems requiring high accuracy and efficiency.