Sharkawy, Abdel Nasser
Unknown Affiliation

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Play Store Data Scrapping and Preprocessing done as Sentiment Analysis Material Hasanah, Rakyatul; Sulistiani, Sulistiani; Nurhikmayani, Nurhikmayani; Hasanah, Zakiyah; Wijaya, Setiawan Ardi; Abdennasser, Dahmani; Sharkawy, Abdel Nasser
Indonesian Journal of Modern Science and Technology Vol. 1 No. 1 (2025): January
Publisher : CV. Abhinaya Indo Group

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.64021/ijmst.1.1.16-21.2025

Abstract

Sentiment analysis is a computational technique used to interpret user opinions about a product through textual reviews. This research aims to prepare useful data for further research, one of which is sentiment analysis. A total of 12000 recent reviews from July 2024 - January 2025 were collected through web scrapping. The research process includes data preprocessing steps such as case folding and data cleaning to transform the raw data into a usable format. The raw data up to the given changes have been uploaded to the mendeley data repository to be reprocessed into further research, one of which is the sentiment analysis approach.
EVALUATING CLUSTERING METHODS FOR SEMANTIC REPRESENTATION OF DISASTER NEWS USING BERT EMBEDDINGS AND HBDSCAN Ningrum, Ariska Fitriyana; Purwanto, Dannu; Sharkawy, Abdel Nasser
JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer) Vol. 11 No. 3 (2026): JITK Issue February 2026
Publisher : LPPM Nusa Mandiri

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33480/jitk.v11i3.7204

Abstract

Natural disasters that frequently occur in Indonesia demand a fast and accurate information monitoring and analysis system through online news sources. This study aims to identify topic patterns related to natural disasters in Indonesia using news articles from Detik.com through a semantic clustering approach. A total of 1,000 articles were collected, preprocessed, and represented using the Sentence-BERT (SBERT) model to capture contextual relationships between sentences. The vector representations were then clustered using three methods: K-Means, Agglomerative Hierarchical Clustering, and HDBSCAN. The performance of each method was evaluated using the Silhouette Score, Davies–Bouldin (DB) Index, and Calinski–Harabasz (CH) Index. The results show that HDBSCAN achieved the best performance with a Silhouette Score of 0.215, a DB Index of 1.557, and a CH Index of 18.102, outperforming Agglomerative (0.028, 3.945, 29.669) and K-Means (0.055, 3.678, 36.778). Moreover, the HDBSCAN model achieved the highest coherence score of 0.8669, indicating strong semantic consistency within clusters. Five coherent clusters emerged, representing major disaster themes: landslides, earthquakes, tornadoes, flash floods, and volcanic activity. The visualization of word clouds for each cluster reinforced the interpretation of these disaster topics. Overall, the combination of SBERT and HDBSCAN effectively groups news articles based on semantic similarity. These findings highlight the potential of Natural Language Processing (NLP) to enhance data-driven media monitoring, support early warning systems, and strengthen disaster communication and mitigation strategies in Indonesia