PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND OFFICIAL STATISTICS
International Conference on Data Science and Official Statistics International Conference on Data Science and Official Statistics (ICDSOS) 2023 is organized by Politeknik Statistika STIS and Statistics Indonesia (BPS). This international conference in collaboration with Forum Pendidikan Tinggi Statistika (FORSTAT), Ikatan Statistisi Indonesia (ISI), United Nations Economic and Social Commission for Asia and the Pacific (UNESCAP), and United Nations Statistics Division (UNSD). The ICDSOS will bring together statisticians and data scientists from academia, official statistics, health sector and business, junior and senior professionals, in an inviting hybrid environment on November 24th - 25th, 2023. Dealing with the theme of this conference is Harnessing Innovation in Data Science and Official Statistics to Address Global Challenges towards the Sustainable Development Goals. DATA SCIENCE Machine Learning and Deep Learning Data Science and Artificial Intelligence (AI) Data Mining and Big Data Statistical Software Information System Development for Official Statistics Remote Sensing to Strengthen Official Statistics Other data science relevant topic APPLIED STATISTICS Applied Multivariate Analysis Applied Time Series Analysis Applied Spatial Statistics Applied Bayesian Statistics Microeconomics Modelling and Applications Macroeconomics Modelling and Applications Econometrics Modelling and Applications Quantitative Public Policy and Statistical Analysis Applied Statistics on Demography Applied Statistics on Population Studies Applied Statistics on Biostatistics and Public health Other applied statistics relevant topic OFFICIAL STATISTICS Official Statistics Survey Methodology Developments Data Collection Improvements Sustainable Development Goals (SDGs) Indicators Estimation Small Area Estimation (SAE) Non Response and Imputation Methods Sampling Error and Non Sampling Error Evaluation Benchmarking Regional Official Statistics Other official statistics relevant topic
Articles
151 Documents
Performance Comparison of Hot-Deck Imputation, K-Nearest Neighbor Imputation, and Predictive Mean Matching in Missing Value Handling, Case Study: March 2019 SUSENAS Kor Dataset
Tsasya Raudhatunnisa;
Nori Wilantika
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2021 No. 1 (2021): Proceedings of 2021 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.34123/icdsos.v2021i1.93
Missing value can cause bias and makes the dataset not represent the actual situation. The selection of methods for handling missing values is important because it will affect the estimated value generated. Therefore, this study aims to compare three imputation methods to handle missing values—Hot-Deck Imputation, K-Nearest Neighbor Imputation (KNNI), and Predictive Mean Matching (PMM). The difference in the way the three methods work causes the estimation results to be different. The criteria used to compare the three methods are the Root Mean Squared Error (RMSE), Unsupervised Classification Error (UCE), Supervised Classification Error (SCE), and the time used to run the algorithm. This study uses two pieces of analysis, comparison analysis, and scoring analysis. The comparative analysis applying a simulation that pays attention to the mechanism of missing value. The mechanism of the missing value used in the simulation is Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR). Then, scoring analysis aims to narrow down the results of comparative analysis by giving a score on the results of the imputation of the three methods. The result suggests Hot-Deck Imputation is the most excellent in dealing with a missing value based on the score.
Does Palapa Ring Project Infrastructure Bridging Connectivity and Economic Activity?
Realita Eschachasthi;
Taly Purwa;
Diyang Gita Cendekia
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2021 No. 1 (2021): Proceedings of 2021 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.34123/icdsos.v2021i1.99
This study examines the impact of existence of the Indonesian Palapa Ring Project (PRP) infrastructure on connectivity and economic activities in 46 districts in the West, Central, and East package of PRP in 2015-2020. Connectivity is an internet activity that measured by using percentage of internet use and economic activity is measured by using Gross Regional Domestic Product (GRDP). The fixed effect staggered difference-in-difference is utilized to analyze the panel data obtained from Badan Pusat Statistik (BPS)-Statistics Indonesia. An examination of parallel trend assumptions, robustness check, and heterogeneity analysis are also presented. The results show that PRP infrastructure has a positive and significant impact on connectivity; yet has no significant effect on economic activity. In response to the findings, the policy should be designed by intensifying coverage and quality of the internet; proliferating Information Communication Technology (ICT) facilities in rural areas; and expanding education and digital literacy programs.
Determinant of Labor Force Resilience Against The Employment Impact of The Covid-19 Pandemic in Bali Province, Indonesia: An Application of Survival Analysis
Ni Luh Putu Yayang Septia Ningsih;
Mohammad Dokhi
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2021 No. 1 (2021): Proceedings of 2021 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.34123/icdsos.v2021i1.101
The impact of coronavirus disease 2019 (Covid-19) pandemic is not only on health problems, but also has a negative impact on economic. The sector that economically worst affected by the pandemic is the tourism and its derivatives. As a result of depending heavily on the tourism sector, Bali is the province with the most labor force that has stopped working during the pandemic. In this study, data from the national labor force survey were analyzed using the Weibull-Gamma Shared Frailty Survival Model to explore the determinants of labor force resilience against the event of stop working due to the Covid-19 pandemic. The results show that gender, education level, experience in training, marital status, and age of labor force are variables that significantly affect on how quickly a labor force experiences an event of stop working. Moreover, variations among regions where they work (regencies/cities) also have a significant effect on stop working acceleration.
Individual and Province-level Determinants of Unemployed NEET as Young People’s Productivity Indicator in Indonesia During 2020: A Multilevel Analysis Approach
Ni Putu Gita Naraswati;
Yogo Aryo Jatmiko
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2021 No. 1 (2021): Proceedings of 2021 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.34123/icdsos.v2021i1.102
Nowadays, employment has become one of the focus of attention for developing countries, including Indonesia. This is one of the urgencies that must be addressed considering that the Indonesian population is entering the demographic divident period. Success in achieving the demographic divident is very dependent on the employment conditions of young people in realizing a low level of dependence. However, obstacles in terms of education and employment are still experienced by youth which can be seen from the percentage of NEET from Year-on-Year (YoY), especially in 2020 it is exacerbated by Covid-19 pandemic. Based on these problems, it is necessary to research NEET in Indonesia in 2020. This study uses 2020 National Labor Force Survey (Sakernas) data which is analyzed by using multilevel binary logistic regression analysis. The unemployed status of young NEETs is influenced by gender, age, marital status, highest education completed, disability status, classification of the area of residence, and recent migrant status. There is a multilevel effect in the NEET assessment of young people as evidenced by the influence of Gross Domestic Product (GDP) and Human Development Index (HDI). The research results are expected to be used as a reference in making policies to optimizing the mismatch program on the pre-employment card to bridge the young age of job seekers with available job opportunities and based on the province-level variable, the province government are expected to maximize the province-level variables that affect the tendency of NEETs to remain active in the labor market. that are targeted towards the NEET problem in Indonesia.
Estimation of Total Fertility Rate (TFR) Using Small Area Estimation (SAE) in Nusa Tenggara Timur (NTT) Province
Mellinda Mellinda;
Cucu Sumarni
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2021 No. 1 (2021): Proceedings of 2021 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.34123/icdsos.v2021i1.107
The large population in Indonesia has an impact on providing basic services for population which is not optimal so the condition and distribution of the population in a country must be addressed through fertility control methods. Total Fertility Rate (TFR) is one of fertility measures used in Indonesia. The estimation of TFR at the district level is very important, especially for the Nusa Tenggara Timur (NTT) Province as the province with the highest TFR in Indonesia. The availability of TFR data up to the district level is difficult to obtain every year due to data limitations. This study uses the National Socio-Economic Survey to address these problems. TFR estimation through survey data (direct estimation) generally results in a large Relative Standard Error (RSE) value, so it is necessary to estimate using an indirect estimate in the form of Small Area Estimation (SAE). By using SAERestricted Maximum Likelihood (REML) procedure, TFR with an RSE that is lower than the direct estimate is obtained. There are 5 district that have a medium-high TFR, namely: Sumba Barat Daya, Sumba Tengah, Sabu Raijua, Sumba Barat, and Manggarai Barat. The government is recommended to focus more on that 5 districts to suppress the high TFR in NTT.
Study of Exchange Rate Volatility and Its Effect on Indonesian Economic Indicators With Potential Exchange Rate Crisis
Adin Nugroho;
Nasrudin Nasrudin
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2021 No. 1 (2021): Proceedings of 2021 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.34123/icdsos.v2021i1.108
Exchange rate volatility occurred when exchange rate movement was wildly fluctuating which could depict uncertainty. Since Indonesia used an open economy, exchange rate fluctuation became important to be maintained due to crisis potential. This research was conducted to analyze the effect or impact of exchange rate volatility on the Indonesian economy in general and few related case using time series analysis. ARIMA (Autoregressive Integrated Moving Average) and EGARCH (Exponential Generalized Autoregressive Conditional Heteroscedasticity) were used for measuring the volatility in the period between 1997-2021. Then, regressions were applied to analyze the impact of exchange rate volatility on few macroeconomic indicators. The result shows that exchange rate volatility yielded a significant negative effect on GDP Growth rate, export, and import. Logistic regression was used to analyze the factors that were affecting the crisis potential. The result showed only a negative GDP growth rate and high volatility that gave more risk which could lead to crisis. Therefore, it is important to keep exchange rate volatility stable.
Determine Sample Size for Precision Results on Quick Count
Yusep Ridwan;
Rizqon Halal Syah Aji
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2021 No. 1 (2021): Proceedings of 2021 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.34123/icdsos.v2021i1.121
This research aims to answer the problem of the appropriate sample size in the case of the quick count of the election so that the results obtained are close to the actual results. Although there are practical procedures that are widely used to calculate the sample size in the quick count methodology, in reality, the results obtained often deviate from the actual results, so the issue of precision is always an interesting discussion. The formulation of the problem regarding the size of the sample and how the level of precision of the forecast results are important issue to be discussed. This research method is included in experimental research where the analysis used is the Kruskal-Wallis test. The data used is primary data from the real count results of the regency election Sumedang by consultants and teams. The results showed that there was a significant difference between the seven sample size groups in vote acquisition and the percentage of votes at the polling station (TPS), where the sample sizes n=408, n=500, n=875 and n=1674 were the most appropriate sample sizes in the implementation of the quick count.
Short-Term Forecasting of Air Travellers Outflows from Bali Using Web Search Data
Parma Dwi Widy Oktama
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2021 No. 1 (2021): Proceedings of 2021 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.34123/icdsos.v2021i1.122
Air travelers have become one of the strategic indicators in the transportation sector. The official data-released by Statistics Indonesia (BPS) for thirty days-lag, makes the condition of this indicator can’t be known in real-time. By the utilization of web search data that has been briskly evolving in recent years, this study aims to explore the possibility of using web search data in performing short-term forecasting to know the general outlook of the indicator earlier. Based on this study, web search data and official statistics figures show a strong correlation and having similar movement patterns over time. The application of web search data as a predictor in time series modeling, especially on time series regression and autoregressive model (SARIMA and SARIMAX), turn out a predicted value that well-approach the actual value of the response variable. In addition, it is proven that the use of web search data can increase model accuracy. The analysis results using SARIMAX model shows that the number of air traveller’s outflows from Bali in September and October 2021 will generally be higher than the number in August 2021. The increasing number of air travelers is thought due to a decrease in Covid-19 cases which has triggered the public's confidence in travelling about to rise again.
Determining the Stopping Point on GPS Data Using Density Based Spatial Clustering of Application with Noise and Gaussian Mixture Model Cluster
You Ari Faeni
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2021 No. 1 (2021): Proceedings of 2021 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.34123/icdsos.v2021i1.123
GPS data is an interesting thing to research. Various studies have been conducted to find information based on GPS data. In this paper, we propose a novel model for determining the stopping point on a GPS data for cases of human movement without using transportation modes. Further, this information can be used to determines human behavior such as fraud and favorite spot. The GPS data used in this research is the travel data of the SUSENAS survey officers at the time of updating the census block for 27 households. Density Based Spatial Clustering Of Application With Noise (DBSCAN) And Gaussian Mixture Model (GMM) Clustering model is used to create the model. The model made using a flowchart and applied to the GPS data that has been collected. The results of the developed model show that the stopping points generated using the DBSCAN cluster model are better than the stopping points generated using the GMM cluster model. Furthermore, the results of this study will be used to make model of surveyor fraud.
Revisiting Local Walking Based on Social Network Trust (LWSNT): Friends Recommendation Algorithm in Facebook Social Networks
Wahidya Nurkarim;
Arie Wahyu Wijayanto
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2021 No. 1 (2021): Proceedings of 2021 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.34123/icdsos.v2021i1.124
In the last decades, the internet penetration rate and online social network users have grown very fast. Online social network, such as Facebook, is a platform where one can find friends without having to meet face to face. A social network is represented by a large graph because it involves many participants. Hence, it is hard to find potential friends who have the same thoughts and interests. The Local Walking Based on Social Network Trust (LWSNT) algorithm is one of the popular algorithms for social friend recommendation. This study re-examines whether the correlation between attributes gives un-match ranks in different cases (cases with and without correlation). We assess the performance of LWSNT in Facebook networks under the supervised manner by comparing its F-score against similar methods. By using Kendall’s tau correlation, the results show that the correlation of attributes has no significant effect on the order of friend recommendations. In addition, the LWSNT performance is quite inferior against the Common Neighbors algorithm and Jaccard index.