Lucia D. Krisnawati
Universitas Kristen Duta Wacana

Published : 3 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 3 Documents
Search

Indonesian-English Textual Similarity Detection Using Universal Sentence Encoder (USE) and Facebook AI Similarity Search (FAISS) Krisnawati, Lucia D.; Mahastama, Aditya W.; Haw, Su-Cheng; Ng, Kok-Why; Naveen, Palanichamy
CommIT (Communication and Information Technology) Journal Vol. 18 No. 2 (2024): CommIT Journal
Publisher : Bina Nusantara University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.21512/commit.v18i2.11274

Abstract

The tremendous development in Natural Language Processing (NLP) has enabled the detection of bilingual and multilingual textual similarity. One of the main challenges of the Textual Similarity Detection (TSD) system lies in learning effective text representation. The research focuses on identifying similar texts between Indonesian and English across a broad range of semantic similarity spectrums. The primary challenge is generating English and Indonesian dense vector representation, a.k.a. embeddings that share a single vector space. Through trial and error, the research proposes using the Universal Sentence Encoder (USE) model to construct bilingual embeddings and FAISS to index the bilingual dataset. The comparison between query vectors and index vectors is done using two approaches: the heuristic comparison with Euclidian distance and a clustering algorithm, Approximate Nearest Neighbors (ANN). The system is tested with four different semantic granularities, two text granularities, and evaluation metrics with a cutoff value of k={2,10}. Four semantic granularities used are highly similar or near duplicate, Semantic Entailment (SE), Topically Related (TR), and Out of Topic (OOT), while the text granularities take on the sentence and paragraph levels. The experimental results demonstrate that the proposed system successfully ranks similar texts in different languages within the top ten. It has been proven by the highest F1@2 score of 0.96 for the near duplicate category on the sentence level. Unlike the near-duplicate category, the highest F1 scores of 0.77 and 0.89 are shown by the SE and TR categories, respectively. The experiment results also show a high correlation between text and semantic granularity.
Hybrid-Based Recommender System Based on Electronic Product Reviews Muhammad Syafiq Chelvam, Nor Liyana Natasha; Haw, Su-Cheng; Krisnawati, Lucia D.; Mahastama, Aditya
JOIV : International Journal on Informatics Visualization Vol 9, No 4 (2025)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.9.4.3561

Abstract

The era of abundant information and the continuous introduction of new products and services has made it increasingly challenging for users to navigate numerous options. Recommender systems have emerged as essential tools to help users find personalized and relevant information quickly. This paper proposes a hybrid recommender system that effectively processes online customer reviews using word embedding and clustering techniques. The system generates product-feature words, detects sentiment words and their intensity, analyzes word correlations, and extracts variables from the reviews for the product. Word embedding models, such as Word2Vec, are employed to capture the semantic content of product reviews and descriptions. The attributes extracted from the text data and word embeddings are combined to create a hybrid representation of products. Based on this hybrid representation, the system calculates the similarity among products using cosine similarity and other measures. Finally, it returns a ranked list of recommended best products based on how similar they are to either an inputted product or user preferences. We have implemented the system and experimental evaluations have been carried out on the “Datafiniti Electronics Product Data" dataset. We aim to provide personalized recommendations to users based on online reviews, ultimately enhancing the user experience and addressing the challenge of information overload in the digital age. The developed prototype will provide personalized recommendations to users, ultimately enhancing the user experience and addressing the challenge of information overload in the digital age.
Penerapan Framework Rasa untuk Membangun Sistem FAQ Bot Sebagai Layanan Informasi BIRO 3 Talenggoran, Rivai; Krisnawati, Lucia D.; Virginia, Gloria
Jurnal Terapan Teknologi Informasi Vol 9 No 2 (2025): Jurnal Terapan Teknologi Informasi
Publisher : Fakultas Teknologi Informasi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.21460/jutei.2025.92.431

Abstract

Duta Wacana Christian University (UKDW), specifically Bureau 3 (BIRO 3), provides a variety of information services accessible through both online channels, such as Instagram, WhatsApp, and email, and offline methods. However, BIRO 3 has not yet implemented an FAQ Bot system, which means that frequently asked questions are still answered manually. This practice renders the question-and-answer process regarding campus information repetitive and time-consuming. This research aims to implement the RASA Open Source framework to develop an FAQ Bot system to automate the retrieval of information for frequently asked questions at BIRO 3, thereby enhancing the efficiency of its information services. The system was developed using RASA Open Source and implemented on the Telegram messaging platform. The evaluation was conducted through a two-fold approach. First, internal testing of the RASA model on the validation dataset yielded optimal results, achieving accuracy, precision, and F1-scores of 1.000. On the test data, the model demonstrated strong performance with an accuracy of 0.915, a precision of 0.928, and an F1-score of 0.912. Second, functional testing was performed by engaging users in predefined scenarios. This second phase of testing resulted in a functional accuracy of 95% based on 200 collected data points. The user testing results indicate that the FAQ Bot system was successfully developed and capable of achieving a functional accuracy rate of 95%. Despite its high performance, limitations were identified in the form of eleven false positive and false negative cases out of the 200 data points. This suggests that the model has not yet perfectly learned to comprehend all variations of user input. Therefore, recommendations such as expanding the training dataset and exploring modifications to the RASA Open Source framework are proposed to refine the system's capabilities, enabling it to handle all types of inquiries accurately.