Claim Missing Document
Check
Articles

Robust Fan Actuator Prediction in Smart Greenhouses Using Machine Learning: A Comparative Analysis of Ensemble and Linear Models Airlangga, Gregorius
Journal of Information System Research (JOSH) Vol 6 No 1 (2024): Oktober 2024
Publisher : Forum Kerjasama Pendidikan Tinggi (FKPT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/josh.v6i1.6095

Abstract

The increasing demand for sustainable agriculture has driven the development of smart greenhouses equipped with automated systems for climate control. A critical component of these systems is the fan actuator, which regulates airflow and stabilizes the internal climate. This study explores the use of machine learning models for predicting the activation status of fan actuators based on environmental data collected from a smart greenhouse. We evaluate several machine learning models, including Support Vector Machine (SVM), Random Forest, Gradient Boosting, XGBoost, and Logistic Regression, under real-world conditions simulated by adding noise and label corruption to the dataset. The dataset was augmented and balanced using the Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalances. Results indicate that ensemble methods, particularly XGBoost and Random Forest, outperform simpler models in terms of accuracy, precision, recall, and F1 score. XGBoost achieved the highest accuracy at 94.47%, while Random Forest followed closely with 94.29%. The study demonstrates that these models are robust to data imperfections and can be effectively employed for real-time fan actuator control. However, further validation is needed to generalize the findings to different greenhouse environments. The research highlights the potential of machine learning models to improve operational efficiency in smart farming, offering insights into how these technologies can support more sustainable agricultural practices.
A Comparative Analysis of Machine Learning Models for Predicting Student Performance: Evaluating the Impact of Stacking and Traditional Methods Airlangga, Gregorius
Brilliance: Research of Artificial Intelligence Vol. 4 No. 2 (2024): Brilliance: Research of Artificial Intelligence, Article Research November 2024
Publisher : Yayasan Cita Cendekiawan Al Khwarizmi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47709/brilliance.v4i2.4669

Abstract

This study investigates the application of machine learning models to predict student performance using socio-economic, demographic, and academic factors. Various models were developed and evaluated, including Linear Regression, Random Forest, Gradient Boosting, XGBoost, LightGBM, Support Vector Regressor, and a Stacking Regressor. The models were assessed using key evaluation metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), R-squared (????2), Mean Squared Log Error (MSLE), and Mean Absolute Percentage Error (MAPE). The Support Vector Regressor demonstrated the best overall performance, with an MAE of 4.3091, RMSE of 5.4110, and an ????2 of 0.8685, surpassing even the more complex ensemble models. Similarly, Linear Regression achieved strong results, with an MAE of 4.3154 and ????2 of 0.8685. In contrast, the Stacking Regressor, while effective, did not significantly outperform its base models, achieving an MAE of 4.5340 and ????2 of 0.8563, highlighting that greater model complexity does not necessarily lead to better predictive power. The analysis also revealed that MAPE was highly sensitive to outliers in the dataset, indicating the need for robust data preprocessing to handle extreme values. These results suggest that, in educational data mining, simpler models can often match or exceed the performance of more complex methods. Future research should investigate advanced ensembling strategies and feature engineering techniques to further enhance the accuracy and reliability of student performance predictions.
Spam Detection on YouTube Comments Using Advanced Machine Learning Models: A Comparative Study Airlangga, Gregorius
Brilliance: Research of Artificial Intelligence Vol. 4 No. 2 (2024): Brilliance: Research of Artificial Intelligence, Article Research November 2024
Publisher : Yayasan Cita Cendekiawan Al Khwarizmi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47709/brilliance.v4i2.4670

Abstract

The exponential growth of user-generated content on platforms like YouTube has led to an increase in spam comments, which negatively affect the user experience and content moderation efforts. This research presents a comprehensive comparative study of various machine learning models for detecting spam comments on YouTube. The study evaluates a range of traditional and ensemble models, including Linear Support Vector Classifier (LinearSVC), RandomForest, LightGBM, XGBoost, and a VotingClassifier, with the goal of identifying the most effective approach for automated spam detection. The dataset consists of labeled YouTube comments, and text preprocessing was performed using Term Frequency-Inverse Document Frequency (TF-IDF) vectorization. Each model was trained and evaluated using a stratified 10-fold cross-validation to ensure robustness and generalizability. LinearSVC outperformed all other models, achieving an accuracy of 95.33% and an F1-score of 95.32%. The model demonstrated superior precision (95.46%) and recall (95.33%), making it highly effective in distinguishing between spam and legitimate comments. The results highlight the potential of LinearSVC for real-time spam detection systems, offering a reliable balance between accuracy and computational efficiency. Furthermore, the study suggests that while ensemble models like RandomForest and VotingClassifier performed well, they did not surpass the simpler LinearSVC model in this context. Future work will explore the incorporation of deep learning techniques, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), to capture more complex patterns and further enhance spam detection accuracy on social media platforms like YouTube.
EVALUATING MACHINE LEARNING MODELS FOR PREDICTING SLEEP DISORDERS IN A LIFESTYLE AND HEALTH DATA CONTEXT Airlangga, Gregorius
JIKO (Jurnal Informatika dan Komputer) Vol 7, No 1 (2024)
Publisher : Universitas Khairun

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33387/jiko.v7i1.7870

Abstract

Sleep disorders significantly impact public health, but their detection is often complicated by the multifaceted nature of causative factors. This study investigates the efficacy of various machine learning (ML) models in identifying sleep disorders based on comprehensive lifestyle and health data. We employed a dataset comprising 400 individual records with features including demographic information, sleep metrics, lifestyle factors, and health parameters. The dataset distinguished between individuals with no sleep disorder, insomnia, and sleep apnea. We evaluated a broad spectrum of ML models including logistic regression, decision trees, ensemble methods like RandomForest and GradientBoosting, support vector machines, and neural networks. The models' performances were assessed using accuracy, precision, recall, and F1 score metrics. Results indicated that ensemble methods, particularly RandomForest and XGBClassifier, outperformed other models in terms of accuracy, precision, and F1 scores, achieving values as high as 0.93. These methods proved effective in managing the complexity and variability of the dataset, thereby suggesting their robustness in clinical predictive analytics. The study's findings advocate for the use of advanced ensemble techniques in developing diagnostic tools for sleep disorders, highlighting their potential to enhance predictive accuracy and reliability in real-world healthcare settings. Further research is recommended to optimize these models and explore their integration into clinical practice.
EVALUATING HYBRID NEURAL NETWORK ARCHITECTURES FOR PREDICTING SLEEP DISORDERS FROM STRUCTURED DATA Airlangga, Gregorius
JIKO (Jurnal Informatika dan Komputer) Vol 7, No 1 (2024)
Publisher : Universitas Khairun

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33387/jiko.v7i1.7873

Abstract

The accurate diagnosis of sleep disorders is crucial for effective treatment and management, yet current methods often rely on subjective assessments and are not always reliable. This research examines the efficacy of various neural network architectures, including dense networks, convolutional neural networks (CNNs), recurrent neural networks (RNNs), and innovative hybrid models, in predicting sleep disorders from structured health data. Our study focuses on comparing the performance of these models using metrics such as accuracy, precision, recall, and F1 score across a dataset comprising 400 individuals with detailed sleep and lifestyle data. Our findings demonstrate that while traditional models like dense networks and CNNs for structured data yield robust results, hybrid models, particularly the CNN-Transformer, significantly outperform others. This model effectively integrates convolutional layers with Transformer’s attention mechanisms, excelling in handling complex data interactions and providing superior predictive accuracy with an F1 score and accuracy reaching as high as 0.91. Conversely, RNN models, designed to capture temporal data dependencies, showed less efficacy, underscoring the importance of model selection aligned with data characteristics. This suggests that for datasets not exhibiting strong temporal features, models leveraging spatial relationships or advanced attention mechanisms are more suitable. This study not only advances our understanding of neural network applications in medical diagnostics but also highlights the potential of hybrid models in enhancing diagnostic accuracy. These insights could lead to significant improvements in the early detection and treatment of sleep disorders, thereby enhancing patient outcomes and contributing to the broader field of medical informatics.
Evaluating the Efficacy of Traditional Machine Learning Models in Speaker Recognition: A Comparative Study Using the LibriSpeech Dataset Airlangga, Gregorius
Brilliance: Research of Artificial Intelligence Vol. 3 No. 2 (2023): Brilliance: Research of Artificial Intelligence, Article Research November 2023
Publisher : Yayasan Cita Cendekiawan Al Khwarizmi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47709/brilliance.v3i2.3488

Abstract

The efficacy of machine learning models in speaker recognition tasks is critical for advancements in security systems, biometric authentication, and personalized user interfaces. This study provides a comparative analysis of three prominent machine learning models: Naive Bayes, Logistic Regression, and Gradient Boosting, using the LibriSpeech test-clean dataset—a corpus of read English speech from audiobooks designed for training and evaluating speech recognition systems. Mel-Frequency Cepstral Coefficients (MFCCs) were extracted as features from the audio samples to represent the power spectrum of the speakers’ voices. The models were evaluated based on precision, recall, F1-score, and accuracy to determine their performance in correctly identifying speakers. Results indicate that Logistic Regression outperformed the other models, achieving nearly perfect scores across all metrics, suggesting its superior capability for linear classification in high-dimensional spaces. Naive Bayes also demonstrated high efficiency and robustness, despite the inherent assumption of feature independence, while Gradient Boosting showed slightly lower performance, potentially due to model complexity and overfitting. The study underscores the potential of simpler machine learning models to achieve high accuracy in speaker recognition tasks, particularly where computational resources are limited. However, limitations such as the controlled nature of the dataset and the focus on a single feature type were noted, with recommendations for future research to include more diverse environmental conditions and feature sets.
Advanced Seismic Data Analysis: Comparative study of Machine Learning and Deep Learning for Data Prediction and Understanding Airlangga, Gregorius
Brilliance: Research of Artificial Intelligence Vol. 3 No. 2 (2023): Brilliance: Research of Artificial Intelligence, Article Research November 2023
Publisher : Yayasan Cita Cendekiawan Al Khwarizmi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47709/brilliance.v3i2.3501

Abstract

This study delves into the application of machine learning (ML) and deep learning (DL) techniques for the analysis of seismic data, aiming to identify and categorize patterns and anomalies within seismic events. Using a robust dataset, we applied three distinct clustering approaches: K-Means, DBSCAN, and an Autoencoder-based method, each offering unique perspectives on the data. K-Means clustering provided a fundamental partitioning of the data into five predefined clusters, facilitating the identification of broad seismic patterns. DBSCAN, a density-based clustering algorithm, offered insights into the spatial distribution and density of seismic events, adeptly pinpointing anomalies and outliers that signify unusual seismic activity. The Autoencoder, leveraging deep learning, excelled in capturing complex and non-linear relationships within the data, revealing subtle patterns not immediately apparent through traditional methods. The effectiveness of these clustering techniques was quantitatively evaluated using the Silhouette Score and the Davies-Bouldin Score, alongside visual assessments through PCA and t-SNE for dimensionality reduction. The results indicated that while K-Means provided clear partitioning, DBSCAN excelled in outlier detection, and the Autoencoder offered a balanced approach with its nuanced analysis capabilities. Our comprehensive analysis underscores the significance of employing a multi-methodological approach in seismic data analysis, as each method contributes uniquely to the understanding of seismic events. The insights gained from this study are valuable for enhancing predictive models and improving disaster risk management strategies in seismology. Future research directions include the integration of additional seismic features, validation against larger datasets, and the development of hybrid models to further refine the predictive accuracy of seismic event analysis.
Comparative Analysis of Machine Learning Models for Real-Time Disaster Tweet Classification: Enhancing Emergency Response with Social Media Analytics Airlangga, Gregorius
Brilliance: Research of Artificial Intelligence Vol. 4 No. 1 (2024): Brilliance: Research of Artificial Intelligence, Article Research May 2024
Publisher : Yayasan Cita Cendekiawan Al Khwarizmi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47709/brilliance.v4i1.3669

Abstract

In the realm of disaster management, the real-time analysis of social media data, particularly from Twitter, has become indispensable. This study investigates the efficacy of various machine learning models in classifying tweets pertaining to disaster scenarios, with the goal of bolstering emergency response systems. A dataset of tweets, categorized as related or unrelated to disasters, underwent a rigorous preprocessing regimen to facilitate the evaluation of five distinct machine learning models: Naïve Bayes, Random Forest, Logistic Regression, Support Vector Machines (SVM), and Long Short-Term Memory (LSTM) networks. The performance of these models was assessed based on accuracy, precision, recall, and F1 score. The results indicated that the SVM model excelled, achieving an accuracy of 89%, precision of 88%, recall of 89%, and an F1 score of 88%, making it the most robust for text classification tasks within the context of disaster-related data. The LSTM model also performed notably well, with an accuracy of 87%, precision of 86%, recall of 87%, and F1 score of 86%, underscoring the potential of deep learning models in processing sequential data. In comparison, Naïve Bayes, Random Forest, and Logistic Regression models demonstrated moderate performance, with accuracy and F1 scores in the range of 76-77% and 72-73%, respectively. These insights are crucial for the development of advanced social media monitoring tools that can significantly enhance the timeliness and precision of crisis response. The research not only highlights the necessity of selecting appropriate machine learning models for specific NLP tasks but also sets the stage for future investigations into the integration of hybrid analytical frameworks. This study establishes a foundation for leveraging machine learning to transform social media data into actionable intelligence, thereby contributing to more effective disaster management and community safety strategies.
Comparative Analysis of Machine Learning Algorithms for Multi-Class Tree Species Classification Using Airborne LiDAR Data Airlangga, Gregorius
Brilliance: Research of Artificial Intelligence Vol. 4 No. 1 (2024): Brilliance: Research of Artificial Intelligence, Article Research May 2024
Publisher : Yayasan Cita Cendekiawan Al Khwarizmi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47709/brilliance.v4i1.3673

Abstract

Forests hold vital ecological significance, and the ability to accurately classify tree species is integral to conservation and management practices. This research investigates the application of machine learning techniques to airborne Light Detection and Ranging (LiDAR) data for the multi-class classification of tree species, specifically Alder, Aspen, Birch, Fir, Pine, Spruce, and Tilia. High-density LiDAR data from varied forest landscapes were subjected to a rigorous preprocessing and noise reduction protocol, followed by feature extraction to discern structural characteristics indicative of species identity. We assessed the performance of six machine learning models: Logistic Regression, Decision Tree, Random Forest, Support Vector Classifier (SVC), k-Nearest Neighbors (KNN), and Gradient Boosting. The analysis was based on metrics of accuracy, precision, recall, and F1 score. Logistic Regression and Random Forest models outperformed others, achieving accuracies of 0.81, precision of 0.80, recall of 0.81, and an F1 score of 0.80. In contrast, the KNN algorithm had the lowest accuracy of 0.60, precision and recall of 0.60, and an F1 score of 0.59. These results demonstrate the robustness of Logistic Regression and Random Forest for classifying complex LiDAR datasets. The study underscores the potential of these models to support ecological monitoring, enhance forest management, and aid in biodiversity conservation. Future research directions include the fusion of LiDAR data with other environmental variables, application of deep learning for improved feature extraction, and validation of the models across broader species and geographical ranges. This research marks a significant step towards leveraging advanced machine learning to interpret and utilize LiDAR data for environmental and ecological applications.
Evaluating Machine Learning Models for Mental Health Diagnostics: A Comparative Analysis and Visual Insights Airlangga, Gregorius
KLIK: Kajian Ilmiah Informatika dan Komputer Vol. 4 No. 4 (2024): Februari 2024
Publisher : STMIK Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/klik.v4i4.1702

Abstract

This study addresses the critical challenge of enhancing mental health diagnostics amidst a surge in global mental disorder prevalence. With mental health conditions predicted to become the leading cause of disability by 2030, there is an urgent need for more effective diagnostic methods that transcend the limitations of traditional frameworks, such as subjectivity and clinician bias. Leveraging the capabilities of machine learning (ML) to analyze complex datasets, this research aims to fill the gap in the comparative effectiveness of various ML models, particularly within the context of imbalanced mental health datasets. We systematically evaluated the performance of diverse ML models—including Random Forest, Gradient Boosting, Support Vector Machines, and others—on a rich dataset embodying a wide spectrum of symptoms and diagnoses. Through advanced data preprocessing techniques, such as innovative handling of missing values and categorical encoding, coupled with RandomizedSearchCV for model optimization, we provided a comprehensive analysis of the models' effectiveness. The application of oversampling strategies addressed the challenge of dataset imbalance, ensuring realistic clinical scenario evaluations. The study's findings are presented through detailed model performance metrics and visual analytics, such as symptom distribution visualizations and correlation cluster maps, enhancing interpretability and clinical relevance. The discussion section explores the practical applicability of these findings in clinical settings, acknowledging limitations and outlining future research directions. In conclusion, the study presents a nuanced narrative of ML model selection and performance evaluation complexities. The superior performance of ensemble methods like Random Forest and Gradient Boosting classifiers for certain diagnoses demonstrates the potential of ML in mental health diagnostics. However, the varied performance across models underscores the importance of context-specific model selection, considering the trade-offs between accuracy, interpretability, and computational efficiency. This research contributes significantly to the field of mental health diagnostics by highlighting models with the greatest promise for clinical application and by providing a framework for future advancements integrating ML into mental health diagnostics.