Claim Missing Document
Check
Articles

Found 6 Documents
Search

An Approach for Automatically Generate Star Schema from Natural Language Rosni Lumbantoruan; Elisa Margareth Sibarani; Monica Verawati Sitorus; Ayunisa Mindari; Suhendrowan Putra Sinaga
TELKOMNIKA (Telecommunication Computing Electronics and Control) Vol 12, No 2: June 2014
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12928/telkomnika.v12i2.63

Abstract

The star schema is a form of data warehouse modelling, which acts primary storage for dimensional data that enables efficient retrieval of business information for decision making. Star schemas can be generated from business needs that we refer to as a user business key or from a relational schema of the operational system. There are many tools available to automatically generate star schema from relational schema, such as BIRST and SAMSTAR; however, there is no application that can automatically generate it from a user business key that is represented in the form of human language. In this paper, we offered an approach for automatically generating star schema from user business key(s). It begins by processing the user business key using a syntactical parsing process to identify noun words. Those identified words will be used to generate dimension table candidates and a fact table. The evaluation result indicates that the tool can generate star schema based on the inputted user business key(s) with some limitations in that the star schema will not be formed if the dimensional tables do not have a direct relationship.
TopC-CAMF: Sistem Perekomendasi Matrix Factorization Berbasis Top Context Rosni Lumbantoruan; Paulus Simanjuntak; Inggrid Aritonang; Erika Simaremare
Jurnal Nasional Teknik Elektro dan Teknologi Informasi Vol 11 No 4: November 2022
Publisher : Departemen Teknik Elektro dan Teknologi Informasi, Fakultas Teknik, Universitas Gadjah Mada

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22146/jnteti.v11i4.5399

Abstract

Online activities have been more and more vital as the digital business has expanded. Users can conduct most activities online such as online shops, hotel bookings, or online educations and courses. A large number of social users are drawn to the abundance of goods available on the Internet. The huge amount of information makes it impossible for social users to navigate it properly and efficiently. Many companies have offered a personalization to tackle this issue. It is proven that the personalized recommendation systems are able to suggest items to users based on their interests and needs that best suit them, which can be captured from user’s contextual information. However, most of the studies capture this contextual information from the predefined contexts such as location and time. In this study, the personalized user context from the user’s text review that they posted as they gave rating to an item was obtained. To this end, a new approach based on the matrix factorization recommendation model, TopC-CAMF, was proposed. TopC-CAMF investigates and finds the most important contexts or needs for each user by leveraging the deep learning model. First, all important contexts from user’s text reviews were extracted. The next step was representing user preferences with the variations of most important contexts, namely top 5, top 10, top 15, top 20, and top 25 contexts. Then, the best top context variation was evaluated and the optimal one was used as the input for the matrix factorization method in providing better recommendations. Extensive experiments using three real datasets were conducted to prove the effectiveness of the TopC-CAMF in terms of root mean square error (RMSE), mean absolute error (MAE), mean squared error (MSE), normalized discounted cumulative gain (NDCG), and Recall.
Penilaian Kesamaan Entity Relationship Diagram dengan Algoritme Tree Edit Distance Humasak Simanjuntak; Rosni Lumbantoruan; Wiwin Banjarnahor; Erisha Sitorus; Magdalena Panjaitan; Sintong Panjaitan
Jurnal Nasional Teknik Elektro dan Teknologi Informasi Vol 6 No 1: Februari 2017
Publisher : Departemen Teknik Elektro dan Teknologi Informasi, Fakultas Teknik, Universitas Gadjah Mada

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1265.561 KB)

Abstract

Main competency in database learning is ability to design Entity Relationship Diagram (ERD). Generally, lecturer gives task to students to design an ERD with some requirements. These ERDs are then assessed by comparing them with the answers. In practice, the process takes long time and it is possible that the lecturer grades the students inconsistently. Furthermore, plagiarism could be occured without being noticed by the lecturer. This research aims to design and build an application that assess similarity of ERD. The application apply tree edit distance algorithm in checking ERD similarity. ERD is exported into XMI document and then processed using the tree edit distance algorithm. The results show that ERD similarity value depends on number of insert, delete, and rename operation in tree edit distance Algorithm rather than number of difference component.
Two-step convolutional neural network classification of plant disease Lumbantoruan, Rosni; Rajagukguk, Nico; Lubis, Anju Ucok; Claudia, Marwani; Simanjuntak, Humasak
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 14, No 1: February 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v14.i1.pp584-591

Abstract

Indonesia is primarily an agricultural country, with farming being the primary source of income for most of its people. Unfortunately, crop production is vulnerable to plant diseases, which are usually caused by plant pests, resulting in a reduction in both the quantity and quality of the expected harvest. In addition to the large number of classes to predict, detecting and accurately classifying each disease on different plants can be difficult. We believe that limiting the number of classes to identify may improve classification accuracy. Thus, in this research, we propose a new approach, two-step convolutional neural network (CNN), which reduces the number of classes with a two-step classification approach. To begin, we identify the number of classes that can be reduced by categorizing them into different characteristics, namely, plant type classification and plant condition classification. Second, we deal with unbalanced datasets, which can result in poor performance, if overlooked. Finally, we compare the proposed two-step CNN to baseline CNN in terms of efficiency and effectiveness. Extensive experiments show that the two-step CNN outperforms the baselines, CNN and jellyfish-residual network (JF-ResNet), increasing accuracy by 4% and 2% to 99%, respectively. In addition, we also provide a simulation evaluation to ensure that this approach is applicable.
A Benchmark Study of Protein Embeddings in Sequence-Based Classification Simanjuntak, Humasak Tommy Argo; Siahaan, Lamsihar; Margaretha, Patricia Dian; Manurung, Ruth Christine; Purba, Susi; Lumbantoruan, Rosni; Barus, Arlinta; Gonzales, Helen Grace B.
Elinvo (Electronics, Informatics, and Vocational Education) Vol. 9 No. 2 (2024): November 2024
Publisher : Department of Electronic and Informatic Engineering Education, Faculty of Engineering, UNY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.21831/elinvo.v9i2.77389

Abstract

Proteins play a vital role in various tissue and organ activities and play a key role in cell structure and function. Humans can produce thousands of proteins, each consisting of tens or hundreds of interconnected amino acids. The sequence of amino acids determines the protein's 3D structure and conformational dynamics, which in turn affects its biological function. Understanding protein function is very important, especially for biological processes at the molecular level. However, extracting or studying features from protein sequences that can predict protein function is still challenging: it takes a long time, is an expensive process, and has yet to be maximized in accuracy, resulting in a large gap between protein sequence and function. Protein embedding is essential in function protein prediction using a deep learning model. Therefore, this study benchmarks three protein embedding models, ProtBert, T5, and ESM-2, as a part of function protein prediction using the LSTM Model. We delve into protein embedding performance and how to leverage it to find optimal embeddings for a given use case. We experimented with the CAFA-5 dataset to see the optimal embedding model in protein function prediction. Experiment results show that ESM-2 outperforms from ProtBert and T5. On training, the accuracy of ESM-2 is above 0.99, almost the same as T5, but still above ProtBert. Furthermore, testing on five samples of protein sequence shows that ESM2 has an average hit rate of 93.33% (100% for four samples and 66.67% for one sample).
Studi dan Analisis Hyperparameter Tuning IndoBERT Dalam Pendeteksian Berita Palsu Anugerah Simanjuntak; Rosni Lumbantoruan; Kartika Sianipar; Rut Gultom; Mario Simaremare; Samuel Situmeang; Erwin Panggabean
Jurnal Nasional Teknik Elektro dan Teknologi Informasi Vol 13 No 1: Februari 2024
Publisher : Departemen Teknik Elektro dan Teknologi Informasi, Fakultas Teknik, Universitas Gadjah Mada

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22146/jnteti.v13i1.8532

Abstract

The rapid advancement of communication technology has transformed how information is shared, but it has also brought concerns about the proliferation of false information. A recent report by the Ministry of Communication and Informatics in Indonesia revealed that around 800,000 websites were involved in spreading false information, underscoring the seriousness of the problem. To combat this issue, researchers have focused on developing techniques to detect and combat fake news. This research centers on using IndoBERT-base-p1 for fake news detection and aims to enhance its performance through three methods to tune the hyperparameter value of the model namely: Bayesian optimization, grid search, and random search. After comparing the outcomes of the three hyperparameter tuning methods, Bayesian Optimization emerged as the most effective approach. Achieving a precision of 88.79%, recall of 94.5%, and F1-score of 91.56% for the “fake” label, Bayesian Optimization outperformed the other hyperparameter tuning methods as well as the model using the fine-tuning hyperparameter value. These findings emphasize the importance of hyperparameter tuning in improving the accuracy of fake news detection models. Utilizing Bayesian Optimization and optimizing the specified hyperparameters, the model demonstrated superior performance in accurately identifying instances of fake news, providing a valuable tool in the ongoing battle against disinformation in the digital realm.