Claim Missing Document
Check
Articles

Found 5 Documents
Search
Journal : International Journal of Data Science, Engineering, and Analytics (IJDASEA)

Model Selection For Forecasting Rainfall Dataset Amri Muhaimin; Hendri Prabowo; Suhartono
Internasional Journal of Data Science, Engineering, and Anaylitics Vol. 1 No. 1 (2021): International Journal of Data Science, Engineering, and Analytics Vol 1, No 1,
Publisher : International Journal of Data Science, Engineering, and Analytics

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (450.352 KB) | DOI: 10.33005/ijdasea.v1i1.2

Abstract

The objective of this research is to obtain the best method for forecasting rainfall in the Wonorejo reservoir in Surabaya. Time series and causal approaches using statistical methods and machine learning will be compared to forecast rainfall. Time series regression (TSR), autoregressive integrated moving average (ARIMA), linear regression (LR), and transfer function (TF) are used as a statistical method. Feedforward neural network (FFNN) and deep feed-forward neural network (DFFNN) is used as a machine learning method. Statistical methods are used to capture linear patterns, whereas the machine learning method is used to capture nonlinear patterns. Data about hourly rainfall in the Wonorejo reservoir is used as a case study. The data has a seasonal pattern, i.e. monthly seasonality. Based on the cross-validation and information criteria, the results showed that DFFNN using the time series approach has a more accurate forecast than other methods. In general, machine learning methods have better accuracy than statistical methods. Furthermore, additional information is obtained, through this research the parameter that best to make a neural network model is known. Moreover, these results are also not in line with the results of M3 and M4 competition, i.e. more complex methods do not necessarily produce better forecasts than simpler methods.
Negative Binomial Time Series Regression – Random Forest Ensemble in Intermittent Data Amri Muhaimin; Prismahardi Aji Riyantoko; Hendri Prabowo; Trimono Trimono
Internasional Journal of Data Science, Engineering, and Anaylitics Vol. 1 No. 2 (2021): International Journal of Data Science, Engineering, and Analytics Vol 1, No 2,
Publisher : International Journal of Data Science, Engineering, and Analytics

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (331.85 KB) | DOI: 10.33005/ijdasea.v1i2.10

Abstract

Intermittent dataset is a unique data that will be challenging to forecast. Because the data is containing a lot of zeros. The kind of intermittent data can be sales data and rainfall data. Because both sometimes no data recorded in a certain period. In this research, the model is created to overcome the problem. The approach that is used in this research is the ensemble method. Mostly the intermittent data comes from the Negative Binomial because the variance is over the mean. We use two datasets, which are rainfall and sales data. So, our approach is creating the base model from the time series regression with Negative Binomial based, and then we augmented the base model with a tree-based model which is random forest. Furthermore, we compare the result with the benchmark method which is The Croston method and Single Exponential Smoothing (SES). As the result, our approach can overcome the benchmark based on metric value by 1.79 and 7.18.
Water Availability Forecasting Using Univariate and Multivariate Prophet Time Series Model for ACEA (European Automobile Manufacturers Association) Prismahardi Aji Riyantoko; Tresna Maulana Fahrudin; Kartika Maulida Hindrayani; Amri Muhaimin; Trimono
Internasional Journal of Data Science, Engineering, and Anaylitics Vol. 1 No. 2 (2021): International Journal of Data Science, Engineering, and Analytics Vol 1, No 2,
Publisher : International Journal of Data Science, Engineering, and Analytics

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1292.381 KB) | DOI: 10.33005/ijdasea.v1i2.12

Abstract

Time series is one of method to forecasting the data. The ACEA company has competition with opened the data in the Water Availability and uses the data to forecast. The dataset namely, Aquifers-Petrignano in Italy in water resources field has five parameters e.g. rainfall, temperature, depth to groundwater, drainage volume, and river hydrometry. In our research will be forecast the depth to groundwater data using univariate and multivariate approach of time series using Prophet Method. Prophet method is one of library which develop by Facebook team. We also use the other approach to making the data clean, or the data ready to forecast. We use handle missing data, transforming, differencing, decomposition time series, determine lag, stationary approach, and Augmented Dickey-Fuller (ADF). The all approach will be uses to make sure that the data not appearing the problem while we tried to forecast. In the other describe, we already get the results using univariate and multivariate Prophet method. The multivariate approach has presented the value of MAE 0.82 and RMSE 0.99, it’s better than while we forecast using univariate Prophet.
Metric Comparison For Text Classification Amri Muhaimin; Tresna Maulana Fahrudin; Trimono; Prismahardi Aji Riyantoko; Kartika Maulida Hindrayani
Internasional Journal of Data Science, Engineering, and Anaylitics Vol. 2 No. 1 (2022): International Journal of Data Science, Engineering, and Analytics Vol 2, No 1,
Publisher : International Journal of Data Science, Engineering, and Analytics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/ijdasea.v2i1.34

Abstract

Text classifications have been popular in recent years. To classify the text, the first step that needs to be done is to convert the text into some value. Some values that can be used, such as Term Frequencies, Inverse Document Frequencies, Term Frequencies – Inverse Document Frequencies, and Frequency of the word itself. This study aims to get which metric value is best in text classification. The method used is Naïve Bayes, Logistic Regression, and Random Forest. The evaluation score that is used is accuracy and Area Under Curve value. It comes out that some metric values produce similar evaluation scores. Another finding is that Random Forest is the best method among others, also the best metric for text classification is Term Frequencies – Inverse Document Frequencies.
Urban Village Clustering in Surabaya City based on Live Birth Rate using K-Means with Principle Component Analysis Regita Putri Permata; Rifdatun Ni’mah; Amri Muhaimin
Internasional Journal of Data Science, Engineering, and Anaylitics Vol. 2 No. 2 (2022): International Journal of Data Science, Engineering, and Analytics Vol 2, No 2,
Publisher : International Journal of Data Science, Engineering, and Analytics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/ijdasea.v2i2.41

Abstract

Pregnancy and childbirth are important times in a mother's life. Mothers and children are vulnerable so their health efforts should be prioritized. The health level is a useful indicator to see the health efforts achievement or success of an area. The Surabaya City Government is very concerned about the health and safety of mothers and babies problem. Therefore, this study aims to map and classify urban villages in Surabaya based on the number of live births and pregnant women using the K-Means algorithm and feature reduction techniques using Principal Component Analysis. Two main components can be formed as the result of the variable reduction. The most optimal grouping of urban villages in the city of Surabaya is 3 groups/clusters. Based on the number of live births and pregnant women, those consisted of 3 clusters, in which cluster 0 consisted of 99 villages, cluster 1 consisted of 42 villages, and cluster 2 consisted of 12 villages