Claim Missing Document
Check
Articles

Found 17 Documents
Search

Model Selection For Forecasting Rainfall Dataset Amri Muhaimin; Hendri Prabowo; Suhartono
Internasional Journal of Data Science, Engineering, and Anaylitics Vol. 1 No. 1 (2021): International Journal of Data Science, Engineering, and Analytics Vol 1, No 1,
Publisher : International Journal of Data Science, Engineering, and Analytics

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (450.352 KB) | DOI: 10.33005/ijdasea.v1i1.2

Abstract

The objective of this research is to obtain the best method for forecasting rainfall in the Wonorejo reservoir in Surabaya. Time series and causal approaches using statistical methods and machine learning will be compared to forecast rainfall. Time series regression (TSR), autoregressive integrated moving average (ARIMA), linear regression (LR), and transfer function (TF) are used as a statistical method. Feedforward neural network (FFNN) and deep feed-forward neural network (DFFNN) is used as a machine learning method. Statistical methods are used to capture linear patterns, whereas the machine learning method is used to capture nonlinear patterns. Data about hourly rainfall in the Wonorejo reservoir is used as a case study. The data has a seasonal pattern, i.e. monthly seasonality. Based on the cross-validation and information criteria, the results showed that DFFNN using the time series approach has a more accurate forecast than other methods. In general, machine learning methods have better accuracy than statistical methods. Furthermore, additional information is obtained, through this research the parameter that best to make a neural network model is known. Moreover, these results are also not in line with the results of M3 and M4 competition, i.e. more complex methods do not necessarily produce better forecasts than simpler methods.
Negative Binomial Time Series Regression – Random Forest Ensemble in Intermittent Data Amri Muhaimin; Prismahardi Aji Riyantoko; Hendri Prabowo; Trimono Trimono
Internasional Journal of Data Science, Engineering, and Anaylitics Vol. 1 No. 2 (2021): International Journal of Data Science, Engineering, and Analytics Vol 1, No 2,
Publisher : International Journal of Data Science, Engineering, and Analytics

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (331.85 KB) | DOI: 10.33005/ijdasea.v1i2.10

Abstract

Intermittent dataset is a unique data that will be challenging to forecast. Because the data is containing a lot of zeros. The kind of intermittent data can be sales data and rainfall data. Because both sometimes no data recorded in a certain period. In this research, the model is created to overcome the problem. The approach that is used in this research is the ensemble method. Mostly the intermittent data comes from the Negative Binomial because the variance is over the mean. We use two datasets, which are rainfall and sales data. So, our approach is creating the base model from the time series regression with Negative Binomial based, and then we augmented the base model with a tree-based model which is random forest. Furthermore, we compare the result with the benchmark method which is The Croston method and Single Exponential Smoothing (SES). As the result, our approach can overcome the benchmark based on metric value by 1.79 and 7.18.
Water Availability Forecasting Using Univariate and Multivariate Prophet Time Series Model for ACEA (European Automobile Manufacturers Association) Prismahardi Aji Riyantoko; Tresna Maulana Fahrudin; Kartika Maulida Hindrayani; Amri Muhaimin; Trimono
Internasional Journal of Data Science, Engineering, and Anaylitics Vol. 1 No. 2 (2021): International Journal of Data Science, Engineering, and Analytics Vol 1, No 2,
Publisher : International Journal of Data Science, Engineering, and Analytics

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1292.381 KB) | DOI: 10.33005/ijdasea.v1i2.12

Abstract

Time series is one of method to forecasting the data. The ACEA company has competition with opened the data in the Water Availability and uses the data to forecast. The dataset namely, Aquifers-Petrignano in Italy in water resources field has five parameters e.g. rainfall, temperature, depth to groundwater, drainage volume, and river hydrometry. In our research will be forecast the depth to groundwater data using univariate and multivariate approach of time series using Prophet Method. Prophet method is one of library which develop by Facebook team. We also use the other approach to making the data clean, or the data ready to forecast. We use handle missing data, transforming, differencing, decomposition time series, determine lag, stationary approach, and Augmented Dickey-Fuller (ADF). The all approach will be uses to make sure that the data not appearing the problem while we tried to forecast. In the other describe, we already get the results using univariate and multivariate Prophet method. The multivariate approach has presented the value of MAE 0.82 and RMSE 0.99, it’s better than while we forecast using univariate Prophet.
Metric Comparison For Text Classification Amri Muhaimin; Tresna Maulana Fahrudin; Trimono; Prismahardi Aji Riyantoko; Kartika Maulida Hindrayani
Internasional Journal of Data Science, Engineering, and Anaylitics Vol. 2 No. 1 (2022): International Journal of Data Science, Engineering, and Analytics Vol 2, No 1,
Publisher : International Journal of Data Science, Engineering, and Analytics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/ijdasea.v2i1.34

Abstract

Text classifications have been popular in recent years. To classify the text, the first step that needs to be done is to convert the text into some value. Some values that can be used, such as Term Frequencies, Inverse Document Frequencies, Term Frequencies – Inverse Document Frequencies, and Frequency of the word itself. This study aims to get which metric value is best in text classification. The method used is Naïve Bayes, Logistic Regression, and Random Forest. The evaluation score that is used is accuracy and Area Under Curve value. It comes out that some metric values produce similar evaluation scores. Another finding is that Random Forest is the best method among others, also the best metric for text classification is Term Frequencies – Inverse Document Frequencies.
Urban Village Clustering in Surabaya City based on Live Birth Rate using K-Means with Principle Component Analysis Regita Putri Permata; Rifdatun Ni’mah; Amri Muhaimin
Internasional Journal of Data Science, Engineering, and Anaylitics Vol. 2 No. 2 (2022): International Journal of Data Science, Engineering, and Analytics Vol 2, No 2,
Publisher : International Journal of Data Science, Engineering, and Analytics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/ijdasea.v2i2.41

Abstract

Pregnancy and childbirth are important times in a mother's life. Mothers and children are vulnerable so their health efforts should be prioritized. The health level is a useful indicator to see the health efforts achievement or success of an area. The Surabaya City Government is very concerned about the health and safety of mothers and babies problem. Therefore, this study aims to map and classify urban villages in Surabaya based on the number of live births and pregnant women using the K-Means algorithm and feature reduction techniques using Principal Component Analysis. Two main components can be formed as the result of the variable reduction. The most optimal grouping of urban villages in the city of Surabaya is 3 groups/clusters. Based on the number of live births and pregnant women, those consisted of 3 clusters, in which cluster 0 consisted of 99 villages, cluster 1 consisted of 42 villages, and cluster 2 consisted of 12 villages
Stock Price Modeling with Geometric Brownian Motion and Value with Risk PT Ciputra Development TBK Amri Muhaimin; Trimono Trimono
Nusantara Science and Technology Proceedings 7st International Seminar of Research Month 2022
Publisher : Future Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11594/nstp.2023.3329

Abstract

Financial sector investment is an activity that attracts a lot of public interest. One of them is investing funds in purchasing the company’s shares. Profit received from stock investment activity can be seen from the value of stock returns. While, if the previous stock returns to Normal distribution, the future stock price can be predicted by Geometric Brownian Motion Method. Based on the stock price prediction, can also be measured an estimated value of the investment risk. The result of data processing shows that the stock price prediction of PT. Ciputra Development Tbk period December 1, 2016, until January 31, 2017, has very good accuracy, based on the value of MAPE 1.98191%. Further, the Value Risk Method of Monte Carlo Simulation with ? = 5% significance level was used to measure the share investment risk of PT.Ciputra Development Tbk. Thus, this method is only useful if it can be used to predict accurately. Therefore, backtesting is needed. Based on the processing obtained data, backtesting generates the value of violation ratio at 0, it means that at significance level ? = 5%, the Value at Risk Method of Monte Carlo Simulation can be used at all levels of probability violation.
A Simple Data Sentiment Analysis using Bjorka phenomenon on Twitter Prismahardi Aji Riyantoko; Amri Muhaimin
Nusantara Science and Technology Proceedings 7st International Seminar of Research Month 2022
Publisher : Future Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11594/nstp.2023.3353

Abstract

Social media is one of the means used by netizens to access, share and discuss the latest and hottest news issues. Twitter as one of the social media is a platform that in real-time is often chosen to communicate that matter. Through sentiment analysis with the text method mining on Twitter, we can understand how people describe and express their perceptions of obesity both positively and negatively nor neutral. This analysis is important to see the extent to which social media such as Twitter is used today. Those are one of the instruments for disseminating information data security in Indonesia. Research objectives for identifying sentiment analysis on related Twitter the Bjorka phenomenon in Indonesia using the text mining method. The type of research is cross-sectional. This research plan was chosen because of the data taken from Twitter in the last four-month time series (June 2022 - October 2022). The result of web scraping on Twitter is 998 Indonesian tweets. Taking data using the Twitter Scraping extension pack and analyzing using Python 3.7.2. Based on the results of sentiment analysis tweets got a neutral sentiment of 744 (75%) tweets, followed by negative sentiment of as much as 175 (18%) tweets and positive sentiment by the number 75 (8%) of a total of 994 tweets. The conclusion was presented the modelling in based on the topic, and we got three topic most relevant terms for topic 0, 1, or 2 with 35,3%, 33%, 31,7% of tokens, respectively.
Batas Atas Ukuran Risiko Agregat Pada Portofolio Saham INDF.JK dan ICBP.JK Trimono Trimono; Amri Muhaimin; Andreas Nugroho Sihananto
Statistika Vol. 21 No. 2 (2021): Statistika
Publisher : Department of Statistics, Faculty of Mathematics and Natural Sciences, Universitas Islam Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29313/statistika.v21i2.340

Abstract

Pada investasi agregat aset finansial, setiap aset tunggal dapat memunculkan potensi risiko kerugian yang harus ditanggung oleh investor. Pada kondisi ini, untuk memprediksi nilai risiko kerugian dapat digunakan konsep risiko agregat. Prediksi nilai risiko dapat diukur melalui suatu ukuran risiko, salah satunya adalah Value at Risk (VaR). Namun, VaR tidak selalu memenuhi sifat subaditif, sehingga VaR bukan merupakan ukuran risiko yang koheren. Ukuran risiko lain sebagai alternatif pengganti VaR adalah Expected Shortfall (ES). Kelebihan utama ES dibandingkan VaR adalah ES telah memenuhi sifat subaditif, sehingga ES adalah ukuran risiko yang koheren. Untuk memprediksi nilai risiko agregat menggunakan VaR maupun ES, dibutuhkan fungsi distribusi bersama dari risiko agregat tersebut. Akan tetap cukup sulit untuk menentukan fungsi distribusi bersama risiko agregat yang disusun oleh beberapa risiko tunggal yang tidak saling bebas. Alternatif yang dapat digunakan apabila fungsi distribusi bersama risiko agregat sulit diperoleh adalah dengan menghitung batas atas risiko agregat dengan memanfaatkan sifat komonotonik dan convex order. Penelitian ini bertujuan untuk mengukur nilai batas risiko agregat menggunakan ukuran risiko ES untuk investasi agregat pada saham PT. Indofood Sukses Makmur Tbk (INDF.JK) dan PT Indofood CBP Sukses Makmur Tbk (ICBP.JK). Berdasarkan hasil analisis menggunakan data return saham INDF.JK dan ICBP.JK periode 02/01/21 – 17/09/21, nilai batas atas ukuran risiko aregat VaR dan ES pada portofolio saham untuk tingkat kepercayaan 95% dan holding period 1 hari masing-masing adalah -0,05231 dan -0,07731.
PEMBENTUKAN KLASTER TERHADAP INDEKS PEMBANGUNAN MANUSIA DI WILAYAH JAWA TIMUR Nine Alvariqati Varqa Ansori; Amri Muhaimin; Aviolla Terza
Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistika Vol. 5 No. 1 (2024): Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistik
Publisher : LPPM Universitas Bina Bangsa

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.46306/lb.v5i1.589

Abstract

The Human Development Index (HDI) is one of the efforts to achieve the Sustainable Development Goals (TPB) with HDI components including health, education and a decent standard of living. Referring to the Central Statistics Agency, it always increases from 2010 to 2022, but there are still several regions in East Java that have low HDI values. Based on this, it is necessary to carry out analysis using clustering techniques so that we can find out more deeply about the characteristics of each district or city in East Java. The research method used is by collecting data which includes HDI, UHH, RLS, HLS per capita expenditure values, income inequality, HDI, city minimum wage, GRDP, number of poor people, open unemployment rate, population, and labor force participation rate. After obtaining the data, data exploration and cleaning was carried out, principal component analysis (PCA), factor analysis, modeling using K-Means and DBSCAN, and silhouette calculations were carried out. The results show that the K-Means model has 3 clusters. Cluster 1 has a purple color with superior characteristics in all aspects except the HDI value, GRDP, and the number of poor people. Cluster 2 has a green color with less superior characteristics in terms of UHH, RLS, HLS, real per capita expenditure, income inequality, and IPG. And cluster 3 in yellow has superior characteristics in terms of UHH, RLS, HLS, real per capita expenditure, income inequality, and IPG. East Java Province always experiences increasing HDI values, but there are still disparities between one district or city and another as shown by the K-Means silhouette value of 0.43
Identifikasi Penyakit Daun Jeruk Siam Menggunakan Convolutional Neural Network (CNN) dengan Arsitektur EfficientNet Burhan Syarif Acarya; Amri Muhaimin; Kartika Maulida Hindrayani
G-Tech: Jurnal Teknologi Terapan Vol 8 No 2 (2024): G-Tech, Vol. 8 No. 2 April 2024
Publisher : Universitas Islam Raden Rahmat, Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33379/gtech.v8i2.4120

Abstract

Jeruk siam menjadi salah satu komoditas hortikultura yang memegang peranan utama dalam sektor pertanian Indonesia dengan jumlah produksi yang mencapai 2 juta ton setiap tahunnya. Namun, produksi jeruk siam rentan terhadap serangan hama dan penyakit, terutama pada bagian daun. Penyakit yang umum terjadi termasuk Blackspot Leaf, Canker Leaf, Greening Leaf, Powdery Mildew, dan Citrus Leafminer. Pada umunya identifikasi penyakit pada tanaman jeruk dilakukan secara manual sehingga penentuan penyakit cenderung subyektif. Oleh karena itu, diperlukan solusi otomatis dalam mendeteksi penyakit pada daun jeruk. Tujuan penelitian yaitu untuk mengidentifikasi penyakit yang menyerang daun jeruk menggunakan metode deep learning yaitu CNN dengan arsitektur EfficientNetB3. Dataset yang digunakan adalah citra penyakit daun jeruk yang diambil langsung dari kebun jeruk yang dibagi menjadi 6 kelas seperti pada penyakit yang disebutkan di atas. Hasil penelitian menggunakan skenario epoch 10 dengan optimizer Adam memperoleh hasil akurasi terbaik yaitu 0,98 (98%).