Claim Missing Document
Check
Articles

A Novel Fusion of Machine Learning Methods for Enhancing Named Entity Recognition in Indonesian Language Text Widyawan, Widyawan; Utomo, Bayu Prasetiyo; Rizala, Muhammad Nur
Jurnal Sistem Informasi Bisnis Vol 14, No 4 (2024): Volume 14 Nomor 4 Tahun 2024
Publisher : Diponegoro University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.21456/vol14iss4pp311-320

Abstract

One of the important implementations in machine learning is Named Entity Recognition (NER), which is used to process text and extract entities such as people, organizations, laws, religions, and locations. NER for the Indonesian language still faces significant challenges due to the lack of high-quality labelled datasets, which limits the development of more advanced models. To address this issue, we utilized several pre-trained BERT models (bert-base-uncased, indobenchmark/indobert-base-p1, indolem/indobert-base-uncased) and datasets (NERGRIT-IndoNLU, NERGRIT-Corpus, NERUGM, and NERUI). This study proposes a novel fusion approach by integrating deep learning architectures such as CNN, Bi-LSTM, Bi-GRU, and CRF to detect 19 entities. This approach enhances BERT’s sequence modelling and feature extraction capabilities, while CRF improves entity prediction by enforcing global word-sequence constraints. Experimental results demonstrate that the fusion approach outperforms previous methods. On the bert-base-uncased dataset, accuracy reached 94.75%, while indobenchmark/indobert-base-p1 achieved 95.75%, and indolem/indobert-base-uncased achieved 95.85%. This study emphasizes the effectiveness of combining deep learning architectures with pre-trained transformers to improve NER performance in the Indonesian language. The proposed methodology offers significant advancements in entity extraction for languages with limited datasets, such as Indonesian.
Toward a Modular, Low-Latency Architecture with BERT-based Big Media Data Analysis Widyawan, Widyawan; Murti, Handoko Wisnu; Putra, Guntur Dharma; Nurmanto, Eddy; Affandi, Achmad
Telematika Vol 18, No 2: August (2025)
Publisher : Universitas Amikom Purwokerto

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35671/telematika.v18i2.3151

Abstract

The significant growth of digital and social media platforms has introduced massive streams of unstructured media data. However, current big data approaches are not specifically tailored to the high volume and velocity of media data, which consists of unstructured and lengthy full-text messages. This study proposes a modular and stream-oriented big data architecture for media data. The proposed architecture consists of data crawlers, a message broker, machine learning modules, persistent storage, and analytical dashboards, with a publish-subscribe communication pattern to enable asynchronous, decoupled data processing. The system integrates IndoBERT, a transformer-based model fine-tuned for the Indonesian language, enabling real-time semantic tagging within the streaming pipeline. The proposed solution has been implemented as a prototype using open-source technologies in an on-premise cluster. As such, the primary novelty is the successful integration and operationalization of a large, transformer-based language model (IndoBERT) within a low-latency streaming pipeline. The experimental results underscore the feasibility of deploying scalable, vendor-neutral media analytics platforms for institutions with high sensitivity to privacy and cost. Architectural quality is quantitatively evaluated through Martin's Instability Metric and Coupling Between Objects (CBO), confirming high modularity across components. The system demonstrates an end-to-end latency of 3.121 seconds, deep learning latency of 2.333 seconds, and processes 32,102 messages per day, making an explicit trade-off where the 2.333-second deep learning inference provides advanced semantic depth. This study presents a reference architecture for scalable, intelligent real-time media analytics systems that support public sector and academic deployments, requiring data privacy and control over infrastructure.
Socio-user Context Aware-Based Recommender System: Context Suggestions for A Better Tourism Recommendation Kusuma Adi Achmad; Lukito Edi Nugroho; Achmad Djunaedi; Widyawan
International Journal on Information and Communication Technology (IJoICT) Vol. 9 No. 2 (2023): Vol.9 No. 2 Dec 2023
Publisher : School of Computing, Telkom University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.21108/ijoict.v9i2.858

Abstract

The existing tourism recommender system model is mostly predictive analytics for destination recommendations (item recommendation). Limited research has been conducted in the discussion of a recommender system model, particularly context suggestion. Thus, it is necessary to develop a recommender system model not only to predict tourism destinations but also to suggest contexts appropriate for tourist preferences (context suggestions). A deep learning method was used to create a model of the socio-user context aware-based recommender system for context suggestions. The attribute used as a label to suggest context was uHijos, uCuisine, uAmbience, and uTransport. The accuracy of the socio-user context aware-based recommender system in suggesting the context of uHijos, uAmbience, and uTransport was 100% with an error rate of 0%. It was found that only the level of recognition of the model in suggesting uCuisine was less accurate (below 30%) with a classification error for more than 70%. Performance evaluation of the socio-user model context-based recommender system was considered efficient, particularly for the evaluation of the level of accuracy, completeness (recall/sensitivity), precision, and a harmonic average of precision and recall (F-score), mainly for label/context of uHijos, uAmbience, and uTransport.