Claim Missing Document
Check
Articles

Found 10 Documents
Search

Knowledge Dictionary for Information Extraction on the Arabic Text Data Saputra, Wahyu Syaifullah Jauharis; Arifin, Agus Zainal; Yuniarti, Anny
Makara Journal of Technology Vol. 16, No. 2
Publisher : UI Scholars Hub

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Information extraction is an early stage of a process of textual data analysis. Information extraction is required to get information from textual data that can be used for process analysis, such as classification and categorization. A textual data is strongly influenced by the language. Arabic is gaining a significant attention in many studies because Arabic language is very different from others, and in contrast to other languages, tools and research on the Arabic language is still lacking. The information extracted using the knowledge dictionary is a concept of expression. A knowledge dictionary is usually constructed manually by an expert and this would take a long time and is specific to a problem only. This paper proposed a method for automatically building a knowledge dictionary. Dictionary knowledge is formed by classifying sentences having the same concept, assuming that they will have a high similarity value. The concept that has been extracted can be used as features for subsequent computational process such as classification or categorization. Dataset used in this paper was the Arabic text dataset. Extraction result was tested by using a decision tree classification engine and the highest precision value obtained was 71.0% while the highest recall value was 75.0%.
FORECASTING SALES USING SARIMA MODELS AT THE SINAR PAGI BUILDING MATERIALS STORE Aminullah, Ahmad Adiib; Idhom, Mohammad; Saputra, Wahyu Syaifullah Jauharis
JIKO (Jurnal Informatika dan Komputer) Vol 7, No 2 (2024)
Publisher : Universitas Khairun

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33387/jiko.v7i2.8266

Abstract

Sinar Pagi Building Materials Store faces the challenge of maintaining optimal stock levels of goods to avoid excess and understock, which affects customer satisfaction and operational efficiency. This study applies the Seasonal Autoregressive Integrated Moving Average (SARIMA) method to forecast sales in the store. Leveraging its ability to model seasonal patterns on historical sales data, various SARIMA models were analyzed and compared using the Akaike Information Criterion (AIC) and Root Mean Square Error (RMSE). The dataset is divided by a 95:5 ratio into training and testing sets for robust evaluation. The results show that the SARIMA model with SARIMA notation (p,d,q)(P,D,Q  has the best model value of (1,0,0) . This model is the most suitable model based on the lowest AIC value of 1245 and the lowest RMSE of 7,95 compared to other SARIMA models after model identification using the model looping test. For other models such as model (1,0,1)  and (0,0,1) , the AIC and RMSE values are greater, namely model (1,0,1)  with AIC 1246 and RMSE of 8,05, while model (0,0,1)  gets an AIC of 1252 and an AIC of 8,15 .The lower the AIC value, the better the model and the lower the RMSE value, the better the model. This shows a superior balance between model complexity and prediction accuracy. The model manages to capture seasonal patterns in sales data, providing a pretty good prediction framework. This study shows that the SARIMA (1,0,0)  model is effective in the accuracy of the sales forecasting process so that Sinar Pagi Building Materials Store can make more reliable sales predictions, which can help in inventory planning and marketing strategies
PENERAPAN DATA MINING UNTUK PREDIKSI HASIL PANEN BUDIDAYA PERIKANAN DARI MITRA PANEN MENGGUNAKAN ALGORITMA SUPPORT VECTOR REGRESSION Suprapto, Claudia Millennia; Saputra, Wahyu Syaifullah Jauharis; Aditiawan, Firza Prima
J-Icon : Jurnal Komputer dan Informatika Vol 12 No 2 (2024): Oktober 2024
Publisher : Universitas Nusa Cendana

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35508/jicon.v12i2.13187

Abstract

PT. Adma Digital Solusi is a company that serves as a harvest partner for cultivators in the fields of agriculture, animal husbandry and fisheries which is used for planning and controlling supply chain results. Planning and controlling PT fishery supply chain results. Adma Digital Sousi in the digital era needs to utilize various technologies and information systems. This aims to ensure that planning and controlling fish resources fulfill aspects of effectiveness and efficiency in decision making. In this research, a machine learning method will be implemented using the Support Vector Regression (SVR) algorithm to predict the harvest results of PT's fishery cultivation partners. Adma Digital Solutions. The SVR algorithm is a theory used to solve a regression classification problem using a Support Vector Machine (SVM). The SVR forecasting process uses the SVR() model by filling in the parameters, namely the kernel using polynomials, C is filled with the value 100, gamma is filled with auto, degree is filled with the value three, epsilon is filled with the value 0.1, and finally coef0 is filled with the value one. Then, using the fit function to train the model using x train and y train data to produce a MAPE error rate value of 0.12865018182566176 and an R2 value of 0.9998831470091238 with very good and accurate prediction capabilities. By knowing the estimated harvest results of aquaculture, the benefits obtained by harvest partners are adjusting production and marketing strategies to maximize profits. And can help harvest partners in managing risks, because they can prepare themselves well for situations where harvest results do not match estimates.
Indonesian Sign Language (BISINDO) Classification Using Xception Transfer Learning Architecture Amelia, Meisya Vira; Saputra, Wahyu Syaifullah Jauharis; Hindrayani, Kartika Maulida; Riyantoko, Prismahardi Aji
International Journal of Advances in Data and Information Systems Vol. 6 No. 2 (2025): August 2025 - International Journal of Advances in Data and Information Systems
Publisher : Indonesian Scientific Journal

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.59395/ijadis.v6i2.1392

Abstract

Human communication generally relied on speech. However, this was not applicable to the deaf people, who depended on sign language for daily interactions. Unfortunately, not everyone had the ability to understand sign language. In higher education environments, the lack of individuals proficient in sign language often created inequality in the learning process for deaf students. This limitation could be addressed by fostering a more inclusive environment, one of which was through the implementation of a sign language translation system. Therefore, this study aimed to develop a machine learning model capable of detecting and translating Indonesian Sign Language (BISINDO) alphabet gestures. The model was built using the Xception transfer learning method from Convolutional Neural Networks (CNN). The dataset consisted of 26 BISINDO alphabet gestures with a total of 650 images. The model was evaluated using K-Fold cross-validation and achieved an F1-score of 94% during testing.
Classification of Road Damage in Sidoarjo Using CNN Based on Inception Resnet-V2 Architecture Zahrah, Fathima; Diyasa, I Gede Susrama Mas; Saputra, Wahyu Syaifullah Jauharis
Signal and Image Processing Letters Vol 7, No 1 (2025)
Publisher : Association for Scientific Computing Electrical and Engineering (ASCEE)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31763/simple.v7i1.123

Abstract

Road damage is a serious issue in Sidoarjo Regency, posing risks to road users' safety. This study aims to classify road surface conditions using a Convolutional Neural Network (CNN) model based on the Inception ResNet-V2 architecture. The research develops an image-based classification model by combining secondary data from Kaggle and primary data obtained through Google Street View API scraping, along with training strategies such as data augmentation, class balancing, early stopping, and model checkpointing. A total of 885 images were used, categorized into three classes: potholes, cracks, and undamaged roads. The model was trained over 20 epochs with early stopping triggered at epoch 15, when validation accuracy reached 95.95%. Evaluation on the test set showed a test accuracy of 83%. The undamaged road class achieved the highest performance with an F1-score of 0.89, while the pothole class recorded an F1-score of 0.79. The lowest performance was observed in the cracked road class, with an F1-score of 0.65, indicating the model's limited ability to detect fine crack features. This limitation is likely due to class imbalance and visual similarity between classes. Although the model demonstrated good generalization for the two majority classes, the performance gap between validation and test accuracy highlights the need to improve detection for minority classes. Future work is recommended to explore advanced augmentation techniques, increase the representation of minority class data, and consider alternative architectures or ensemble methods to enhance the model’s sensitivity to subtle road damage features.
Multimodal Detection of Covert Online Gambling Advertisements Using Faster R-CNN and Tr-OCR Maldini, Andry Syva; Saputra, Wahyu Syaifullah Jauharis; Prasetya, Dwi Arman
bit-Tech Vol. 8 No. 1 (2025): bit-Tech
Publisher : Komunitas Dosen Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32877/bt.v8i1.2769

Abstract

The increasing prevalence of online gambling advertisements on social media has led to the use of covert strategies, such as embedding visual watermarks and employing euphemistic language, to bypass traditional detection methods, rendering manual moderation ineffective. This study proposes an AI-based automated detection system designed to identify both explicit and obfuscated gambling content. The system operates in three stages: (1) Object detection: Faster R-CNN, using a ResNet-50 backbone and Feature Pyramid Network (FPN), detects gambling-related visual elements, such as watermarks and logos; (2) Text extraction: A Transformer-based Optical Character Recognition (TrOCR) model is employed to extract textual content from images and video frames, even in the presence of visual distortions; and (3) Text classification: A BERT-based Natural Language Processing (NLP) model is used to identify gambling-related language within the extracted text. The dataset, manually collected and annotated, was augmented with Roboflow to improve model robustness and generalization. Experimental results show that the Faster R-CNN model achieved an average precision of 98.1%, TrOCR demonstrated a Character Error Rate (CER) of 4.6% and a Word Error Rate (WER) of 29%, while the BERT classifier reached an impressive 99% accuracy with high precision and recall. The system was integrated into a Flask-based web application that allows real-time analysis of both image and video inputs. This system presents strong potential to support automated content moderation and curb the spread of online gambling advertisements on digital platforms, contributing to safer online spaces.
Prediksi Laju Inflasi di Jawa Timur Menggunakan Model N-BEATS dan Optimasi Optuna: Prediction of Inflation Rate in East Java Using the N-BEATS Model and Optuna Optimization Riswanda, Mohammad Nizar; Trimono, Trimono; Saputra, Wahyu Syaifullah Jauharis
MALCOM: Indonesian Journal of Machine Learning and Computer Science Vol. 5 No. 3 (2025): MALCOM July 2025
Publisher : Institut Riset dan Publikasi Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.57152/malcom.v5i3.2141

Abstract

Inflasi merupakan indikator penting yang memengaruhi kestabilan dan pertumbuhan ekonomi suatu wilayah. Prediksi inflasi yang akurat sangat dibutuhkan guna mendukung perumusan kebijakan ekonomi yang tepat. Penelitian ini mengusulkan penggunaan model N-BEATS (Neural Basis Expansion Analysis for Time Series) yang dioptimalkan dengan Optuna untuk memprediksi inflasi di Provinsi Jawa Timur. Data yang digunakan berupa deret waktu univariat, yaitu laju inflasi bulanan dari Januari 2005 hingga Desember 2024, yang diperoleh dari Badan Pusat Statistik (BPS). Evaluasi performa model dilakukan menggunakan metrik Mean Absolute Percentage Error (MAPE). Berbeda dengan model tradisional seperti ARIMA dan LSTM, N-BEATS mengandalkan jaringan saraf feedforward dengan arsitektur blok residual yang mampu melakukan rekonstruksi masa lalu (backcast) dan prediksi masa depan (forecast). Optimasi hyperparameter melalui Optuna berhasil meningkatkan akurasi model secara signifikan. Hasil Penelitian menunjukkan bahwa N-BEATS teroptimasi mencapai MAPE sebesar 8,97%, lebih baik dibandingkan N-BEATS dasar (11,05%), ARIMA (16,95%), dan LSTM (12,23%). Temuan ini mengindikasikan bahwa pendekatan N-BEATS dengan Optuna efektif dalam meningkatkan akurasi prediksi inflasi dan dapat menjadi alat bantu penting bagi perencanaan ekonomi di tingkat daerah.
Klasterisasi Produktivitas Daerah di Jawa Tengah Berdasarkan Ketenagakerjaan Menggunakan K-Means dan Average Linkage Nashrullah, Ahmad Firqi; Mahardhika, Rivaldi Dwi; Rusdiyanto, Nur Rahmat; May Wara, Shindi Shella; Saputra, Wahyu Syaifullah Jauharis
JURNAL DIFERENSIAL Vol 7 No 2 (2025): November 2025
Publisher : Program Studi Matematika, Universitas Nusa Cendana

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35508/jd.v7i2.22516

Abstract

This study employs K-Means and Agglomerative Clustering (Average Linkage) to group regions based on variables such as the number of residents, unemployment rate, and other supporting indicators. The data are normalized and evaluated using the Silhouette Score metric, yielding three optimal clusters. Average Linkage (0.3596) outperforms K-Means (0.2627). The Average Linkage results indicate that cluster 1 is characterized by stable productivity and low unemployment, cluster 2 consists solely of Semarang City with the highest Human Development Index and wages, and cluster 3 comprises underdeveloped areas with high unemployment and low wages. This clustering is highly beneficial for supporting more targeted data-driven regional development policies.
FORECASTING SALES USING SARIMA MODELS AT THE SINAR PAGI BUILDING MATERIALS STORE Aminullah, Ahmad Adiib; Idhom, Mohammad; Saputra, Wahyu Syaifullah Jauharis
JIKO (Jurnal Informatika dan Komputer) Vol 7, No 2 (2024)
Publisher : Universitas Khairun

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33387/jiko.v7i2.8266

Abstract

Sinar Pagi Building Materials Store faces the challenge of maintaining optimal stock levels of goods to avoid excess and understock, which affects customer satisfaction and operational efficiency. This study applies the Seasonal Autoregressive Integrated Moving Average (SARIMA) method to forecast sales in the store. Leveraging its ability to model seasonal patterns on historical sales data, various SARIMA models were analyzed and compared using the Akaike Information Criterion (AIC) and Root Mean Square Error (RMSE). The dataset is divided by a 95:5 ratio into training and testing sets for robust evaluation. The results show that the SARIMA model with SARIMA notation (p,d,q)(P,D,Q  has the best model value of (1,0,0) . This model is the most suitable model based on the lowest AIC value of 1245 and the lowest RMSE of 7,95 compared to other SARIMA models after model identification using the model looping test. For other models such as model (1,0,1)  and (0,0,1) , the AIC and RMSE values are greater, namely model (1,0,1)  with AIC 1246 and RMSE of 8,05, while model (0,0,1)  gets an AIC of 1252 and an AIC of 8,15 .The lower the AIC value, the better the model and the lower the RMSE value, the better the model. This shows a superior balance between model complexity and prediction accuracy. The model manages to capture seasonal patterns in sales data, providing a pretty good prediction framework. This study shows that the SARIMA (1,0,0)  model is effective in the accuracy of the sales forecasting process so that Sinar Pagi Building Materials Store can make more reliable sales predictions, which can help in inventory planning and marketing strategies
Segmentasi Faktor Perceraian berdasarkan Provinsi di Indonesia Tahun 2024 dengan K-Means dan DBSCAN Rizkiyah, Selly; Indira; Putri, Milla Akbarany Bakhtiar; Wara, Shindi Shella May; Saputra, Wahyu Syaifullah Jauharis
INDONESIAN JOURNAL ON DATA SCIENCE Vol. 3 No. 2 (2025): Indonesian Journal On Data Science
Publisher : Lembaga Penelitian dan Pengabdian Kepada Masyarakat Universitas Achmad Yani Yogyakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30989/ijds.v3i2.1654

Abstract

Divorce is a complex social phenomenon that continues to increase in Indonesia. Based on data from 34 provinces, divorce is influenced by various factors, both internal and external to the household. This research aims to describe the main factors causing divorce based on national data and review relevant literature using machine learning methods, especially unsupervised learning techniques in the form of clustering. The dominant factors found include constant disputes and arguments, economic problems, domestic violence, abandonment of one of the parties, and infidelity. This research uses K-Means and DBSCAN algorithms to compare the results. It is known that the best modeling with Silhoutte Score comparison is DBSCAN of 0.331. DBSCAN with optimal clusters was obtained from a combination of epsilon parameter 2.9 and minimum sample 2. The clustering results were then further analyzed to evaluate the data distribution and identify the dominant characteristics in each cluster. These findings indicate the need for a multidisciplinary approach in understanding and addressing divorce issues in Indonesia in order to reduce the divorce rate and improve the quality of family life.