cover
Contact Name
Siti Maesaroh
Contact Email
siti.maesaroh@mercubuana.ac.id
Phone
+6282125242949
Journal Mail Official
collabits-fasilkom@mercubuana.ac.id
Editorial Address
Jl. Raya Meruya Selatan, Kembangan, Jakarta 11650
Location
Kota adm. jakarta barat,
Dki jakarta
INDONESIA
Journal Collabits
ISSN : 30628601     EISSN : 30466709     DOI : http://dx.doi.org/10.22441/collabits
Journal Collabits adalah jurnal yang membahas strategi keamanan cyber untuk meningkatkan kinerja dan keandalan dalam implementasi teknologi kecerdasan buatan (AI), kecerdasan bisnis (BI), dan sains data, yang di kelola oleh Fakultas Ilmu Komputer (FASILKOM) terdiri dari dua prodi yaitu Teknik Informatika (TI dan Prodi Sistem Informasi (SI). Dengan pertumbuhan pesat dalam penggunaan teknologi ini, keamanan cyber menjadi semakin penting dalam menjaga integritas, kerahasiaan, dan ketersediaan data. Tulisan ini mengeksplorasi berbagai pendekatan, alat, dan praktik terbaik dalam mengamankan sistem AI, BI, dan sains data, termasuk deteksi ancaman, enkripsi data, manajemen akses, dan pemulihan bencana. Jurnal ini juga menganalisis dampak kebijakan keamanan cyber pada inovasi teknologi dan memberikan rekomendasi untuk meningkatkan keamanan dalam ekosistem digital yang terus berkembang
Articles 9 Documents
Search results for , issue "Vol 3, No 1 (2026)" : 9 Documents clear
Comparison of Random Forest and Naive Bayes Algorithms in Classification of Song Popularity on the Spotify Platform Saputro, Janu Ilham; Fantomi, Rian; Simbolon, Saoloan; Arizka, Puput Nur; Ramadani, Berlina
Journal Collabits Vol 3, No 1 (2026)
Publisher : Journal Collabits

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22441/collabits.v3i1.37578

Abstract

The purpose of this study is to use machine learning to rank Spotify songs based on how popular they are. Because there is so much music data out there, musicians and artists need to know if a song will be popular or not. The dataset has 8,778 songs, each with different features like how popular the artist is, how many followers they have, and other song details. This research evaluates the efficacy of two classification algorithms: Random Forest and Naive Bayes. Artist popularity, artist followers, explicit album total tracks, and track number are the main things that are used to make models. The results of the experiment show that the Random Forest algorithm works better than the Naive Bayes algorithm. The Random Forest algorithm was right 76.54% of the time, but the Naive Bayes algorithm was only right 72.21% of the time. The f1-score for both popularity classes is also better for Random Forest. This finding shows that ensemble-based models, like Random Forest, work better with the features of music popularity data than basic probabilistic models do.
COMPARATIVE ANALYSIS OF LINEAR REGRESSION AND RANDOM FOREST FOR USED CAR PRICE PREDICTION Syamsudi, Muhammad Faris Adjil; Daffa, Bimo Arya; Jarodi, Wisnu
Journal Collabits Vol 3, No 1 (2026)
Publisher : Journal Collabits

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22441/collabits.v3i1.37646

Abstract

Manual estimation is often subjective and prone to human bias because the used car market has a complex pricing structure with non-linear depreciation. Objective: This study conducted a comparative analysis between Linear Regression and Random Forest algorithms to develop a more objective pricing model. Methods: The Kaggle dataset contains 5,000 entries indicating features such as manufacturer, model, engine size, and mileage for this study. The methodology included data cleaning, feature engineering, and outlier removal using the IQR method. For training and testing, the data was split 80:20. Results: "Year of Manufacture" was identified as the feature that most significantly influences price, and the evaluation results showed a significant difference in performance. Linear Regression achieved 82.33% accuracy, while Random Forest achieved 99.60% accuracy. Conclusion: Random Forest captures non-linear patterns and complex relationships in used car pricing better than Linear Regression, although it remains quite reliable for general trends.
Implementation of DBSCAN Clustering and Random Forest Algorithm for Mapping and Predicting Shooting Incidents in New York Rangkuti, Azka Niaji; Arifin, Samoedra Cakra; Putra, Muhammad Ramadansyah Kurnia; Natalia, Nila
Journal Collabits Vol 3, No 1 (2026)
Publisher : Journal Collabits

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22441/collabits.v3i1.37587

Abstract

Shooting incidents in crowded, heavily populated areas of cities cause serious threats to public safety and social security. New York State, which includes large metropolitan areas and suburban regions, experiences complex spatial and temporal crime patterns that are difficult to identify using traditional crime analysis methods that rely only on descriptive statistics and manual hot spot identification. This study proposes a data-driven quantitative approach to mapping and predicting shooting incidents by integrating spatial clustering and machine learning techniques. Density-based clustering methods are applied to the geographic coordinates of shooting incidents to identify areas with high incident concentrations while filtering out isolated events as noise. The resulting spatial clusters are then interpreted as hotspot locations and used as reference labels for a supervised classification model. A Random Forest algorithm is then used to predict hotspot and non-hotspot locations using spatial and temporal features, including geographic position and time of occurrence. The model is evaluated using standard classification performance measures, including accuracy, precision, recall, F1 score, and confusion matrix analysis.
Analisis Popularitas Lagu Spotify Berdasarkan Fitur Audio Menggunakan Random Forest rahma putri, Anggi beauty; Putri, Deswita Nindya; rahmadani, Nia putri
Journal Collabits Vol 3, No 1 (2026)
Publisher : Journal Collabits

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22441/collabits.v3i1.37647

Abstract

The rapid growth of digital music streaming platforms such as Spotify has significantly increased competition among songs, making popularity an important yet difficult aspect to predict. Understanding the factors that influence song popularity is essential for musicians, producers, and digital platforms in developing effective promotion strategies and recommendation systems. This study aims to analyze the relationship between Spotify audio features and song popularity using a data science approach. The dataset used in this study consists of songs described by various audio features, including danceability, energy, loudness, tempo, acousticness, instrumentalness, valence, and track duration, with popularity serving as the target variable. An exploratory data analysis (EDA) was conducted to examine the distribution of popular and non-popular songs, analyze correlations among audio features, and visualize the relationships between selected audio features and popularity. The results show that the dataset is highly imbalanced, with non-popular songs dominating the overall distribution. Correlation analysis indicates strong relationships between certain audio features, particularly between energy and loudness, while the linear correlation between individual audio features and popularity is relatively weak. Scatter plot visualizations suggest that popular songs tend to have higher levels of danceability, energy, and loudness compared to non-popular songs. However, no single feature is sufficient to explain popularity on its own, indicating that song popularity is influenced by a combination of multiple audio characteristics. This research provides an initial insight into the relationship between Spotify audio features and song popularity and serves as a foundation for future studies applying machine learning models, such as Random Forest, for popularity prediction.
Analysis and Prediction of Customer Churn in the Telecommunications Industry Using Logistic Regression and Random Forest Nabila, Celsi Alisa; Santoso, Ryno Julian; Nafisa, Sabila Alya; Roza, Yuni
Journal Collabits Vol 3, No 1 (2026)
Publisher : Journal Collabits

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22441/collabits.v3i1.37599

Abstract

Customer churn represents a major challenge for telecommunication companies because of its significant influence on revenue stability and customer retention efforts. Intense competition among service providers has increased the need for reliable predictive models capable of identifying customers with a high probability of terminating their subscriptions. This study focuses on the analysis and prediction of customer churn by applying machine learning techniques to the Telco Customer Churn dataset. The research workflow includes data preprocessing stages such as duplicate removal, treatment of missing values, and transformation of both categorical and numerical features. Exploratory data analysis supported by visualization techniques is employed to examine customer behavior and feature relationships. Subsequently, the dataset is partitioned into training and testing subsets using an 80:20 stratified split. A preprocessing pipeline is applied, incorporating feature scaling for numerical variables and one-hot encoding for categorical variables. Predictive models are developed using Logistic Regression and Random Forest algorithms, and their performance is assessed through accuracy measurements and classification reports. The results indicate that the Random Forest model delivers better predictive performance than Logistic Regression, demonstrating its effectiveness in modeling complex data patterns. Overall, the study confirms that machine learning-based approaches can serve as effective tools for churn prediction and offer meaningful insights to support strategic decision-making in customer retention within the telecommunication sector.
A Data Science Approach to Cancer Patient Classification Using Support Vector Machine and Random Forest Anggraini, Devi Dwi; Salsabila, Mutiara Rizky; Kamila, Keisya Rizkia
Journal Collabits Vol 3, No 1 (2026)
Publisher : Journal Collabits

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22441/collabits.v3i1.37642

Abstract

The increasing availability of healthcare data has encouraged the application of data science and machine learning techniques in medical research. Cancer patient datasets contain numerical demographic and clinical attributes that can be utilized for classification tasks; however, complex feature relationships and limited feature relevance remain key challenges. This study aims to analyze cancer patient data and compare the performance of Support Vector Machine and Random Forest algorithms for gender classification. The dataset used in this study consists of numerical features, including patient age, tumor size, number of examined lymph nodes, number of positive lymph nodes, body mass index, and survival duration measured in months. The research methodology includes data preprocessing, exploratory data analysis, model development, and performance evaluation. Feature normalization and data splitting are applied to ensure a fair comparison between models, while exploratory analysis is conducted to examine data distribution and relationships among variables. Both classification models are trained under identical experimental settings and evaluated using accuracy as the primary performance metric. The results indicate that both algorithms are capable of classifying cancer patient gender with satisfactory accuracy. Support Vector Machine demonstrates slightly better performance compared to Random Forest, suggesting its effectiveness in handling numerical data with complex decision boundaries. The findings highlight the importance of appropriate algorithm selection and feature utilization in healthcare data analysis.
Laptop Price Prediction Based on Specifications: A Comparison of Random Forest and Linear Regression Putra, Bagas Pratama; Mahfuzh, Ilham Miftahali; Kurniawan, Agus Fahrizal; Budiman, Ramdani
Journal Collabits Vol 3, No 1 (2026)
Publisher : Journal Collabits

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22441/collabits.v3i1.37603

Abstract

This study investigates the prediction of laptop prices based on hardware specifications by comparing the performance of Linear Regression and Random Forest algorithms. The dataset consists of both numerical and categorical features, including brand, processor type, RAM capacity, storage configuration, screen size, and other relevant attributes that influence pricing. Data preprocessing was conducted through data cleaning, handling missing values, and transforming categorical variables using one-hot encoding. The dataset was then divided into training and testing sets with a 70:30 ratio to evaluate model generalization. Exploratory data analysis was performed using visualizations such as average price per brand, correlation heatmaps of numerical features, and scatter plots comparing actual and predicted prices. Model performance was evaluated using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (R²) on both training and testing data. The results indicate that the Random Forest model achieves higher predictive accuracy compared to Linear Regression, as it is more effective in capturing non-linear relationships and complex feature interactions. In contrast, Linear Regression tends to underperform due to its linear assumptions when applied to heterogeneous laptop specification data. These findings suggest that ensemble-based models are more suitable for laptop price prediction tasks involving diverse and non-linear feature patterns.
Data-Oriented Classification of Red Wine Quality Using Machine Learning ammar, fajar; wira, raja; Charllo, Christian
Journal Collabits Vol 3, No 1 (2026)
Publisher : Journal Collabits

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22441/collabits.v3i1.37621

Abstract

This study examines the use of supervised machine learning to classify thequality level of red wine based on measurable physicochemical properties. The analysis isconducted using the winequality-red.csv dataset, which contains laboratory-basedmeasurements such as acidity components, alcohol percentage, and sulfur dioxide levels.The primary goal of this research is to explore the contribution of these attributes to winequality and to compare the classification results produced by different machine learningmodels. The research procedure involves initial data inspection, feature preparation,exploratory analysis, model training using Logistic Regression and Random Forest, andperformance assessment through accuracy, precision, recall, and F1-score indicators. Theresults show that the Random Forest classifier yields more consistent and reliableclassification outcomes than Logistic Regression. These findings suggest that machinelearning techniques can support objective quality evaluation processes in the food andbeverage industry.
COMPARATIVE ANALYSIS OF PUBLIC SENTIMENT TOWARDS SRI MULYANI AND PURBAYA AS FINANCE MINISTERS ON THE X PLATFORM USING THE INDOBERTWEET MODEL. Zamzami, Muhammad Aryaka; Maesaroh, Siti; Managas, Dendy Jonas
Journal Collabits Vol 3, No 1 (2026)
Publisher : Journal Collabits

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22441/collabits.v3i1.37962

Abstract

The development of social media has positioned platform X (Twitter) as a primary source for expressing public opinion toward government figures and policies. This study aims to analyze public sentiment toward two Indonesian public figures, Sri Mulyani Indrawati and Purbaya Yudhi Sadewa, by utilizing the transformer-based IndoBERTweet model. The data were collected from January 1, 2025, to November 1, 2025. A total of 11,000 tweets related to Sri Mulyani were collected; however, only 2,500 tweets were used for data processing and model training, with a maximum limit of 1,000 tweets per month. Meanwhile, 650 tweets were obtained for Purbaya Yudhi Sadewa. This research employs a supervised learning approach with labeled data consisting of positive, negative, and neutral sentiment classes. Minimal preprocessing was applied, considering that IndoBERTweet is specifically designed to handle the characteristics of social media text. The model was trained for five epochs and evaluated using accuracy, precision, recall, and F1-score metrics. The results indicate that the IndoBERTweet model can classify sentiment effectively, particularly on the Sri Mulyani dataset, which contains a larger volume of data and achieves an accuracy of over 82%. In contrast, the model’s performance on the Purbaya Yudhi Sadewa dataset shows a lower accuracy of 71%, influenced by the limited amount of data. This study confirms that the quantity and distribution of data significantly affect the performance of transformer-based sentiment analysis models. Based on the sentiment classification results, public sentiment toward Sri Mulyani Indrawati tends to be dominated by negative and neutral sentiments, while sentiment toward Purbaya Yudhi Sadewa shows a distribution dominated by neutral and positive sentiments.

Page 1 of 1 | Total Record : 9