Articles
Prediksi Harapan Hidup Penderita Hepatitis Kronik Menggunakan Metode-Metode Klasifikasi
Khomsah, Siti
Seminar Nasional Informatika Medis (SNIMed) 2018
Publisher : Magister Teknik Informatika, Universitas Islam Indonesia
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
Data Riskesdas 2013 menunjukkan 28 juta penduduk Indonesia terinfeksi hepatitis B atau C. Potensi penderita hepatitis kronik sebesar empat belas juta dan satu koma empat juta diantaranya berpotensi menjadi penderita kanker hati. Perawatan bagi pasien hepatitis B kronik bertujuan memperpanjang harapan hidup pasien. Hepatitis C merupakan penyebab utama kanker hati dan sirosis. Vaksin yang tepat bagi penderita hepatitis kronik belum ditemukan sehingga pengobatannya hanya bertujuan memperpanjang harapan hidup pasien. Masa depan kesehatan pasien hepatitis kronik atau akut dapat diukur dari gejala-gejala hasil pemeriksaan baik fisik maupun laboratorium. Berdasarkan hasil pemeriksaan, dokter dapat memprediksi apakah pasien berisiko meninggal dunia karena penyakit tersebut sehingga dapat memberikan perlakuan yang tepat pada pasien. Data mining adalah salah satu teknik untuk menemukan pola informasi dari dataset pasien hepatitis. Pola informasi tersebut digunakan untuk membangun model yang dapat memprediksi resiko kematian pasien hepatitis. Klasifikasi adalah salah satu teknik dalam data mining untuk analisis prediksi. Penelitian bertujuan menerapkan metode data mining klasifikasi untuk memprediksi harapan hidup penderita hepatitis kronik. Fokus penelitian adalah membandingkan beberapa metode klasifikasi dan akurasinya dalam memprediksi harapan hidup pasien hepatitis. Metode yang diajukan adalah K-NN, Naive Bayes, D-Tree, dan Random forest. Model yang dirancang akan diuji menggunakan 155 data penderita hepatitis kronik atau akut. Performance model diukur berdasarkan nilai akurasi dan AUC. Model yang dirancang akan diuji menggunakan 155 data penderita hepatitis kronik atau akut. Kinerja model diukur berdasarkan nilai akurasi dan AUC. Metode validasi menggunakan k-fold cross validation dengan k = 10. Hasil pengujian model menunjukkan Random forest merupakan metode yang paling akurat yaitu mencapai 79.35%. Nilai AUC Naive Bayes, D-Tree, dan Random forest lebih dari 0.8, artinya ketiga model tersebut bagus sebagai classifier. Sedangkan nilai AUC K-NN adalah 0.7 artinya K-NN hanya pada level fair atau cukup.
Implementation Of Text Mining For Emotion Detection Using The Lexicon Method (Case Study: Tweets About Covid-19)
Agus Sasmito Aribowo;
Siti Khomsah
Telematika Vol 18, No 1 (2021): Edisi Februari 2021
Publisher : Jurusan Teknik Informatika
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.31315/telematika.v18i1.4341
Information and news about Covid-19 received various responses from social media users, including Twitter users. Changes in netizen opinion from time to time are interesting to analyze, especially about the patterns of public sentiment and emotions contained in these opinions. Sentiment and emotional conditions can illustrate the public's response to the Covid-19 pandemic in Indonesia. This research has two objectives, first to reveal the types of public emotions that emerged during the Covid-19 pandemic in Indonesia. Second, reveal the topics or words that appear most frequently in each emotion class. There are seven types of emotions to be detected, namely anger, fear, disgust, sadness, surprise, joy, and trust. The dataset used is Indonesian-language tweets, which were downloaded from April to August 2020. The method used for the extraction of emotional features is the lexicon-based method using the EmoLex dictionary. The result obtained is a monthly graph of public emotional conditions related to the Covid-19 pandemic in the dataset.
Cross-domain sentiment analysis model on Indonesian YouTube comment
Agus Sasmito Aribowo;
Halizah Basiron;
Noor Fazilla Abd Yusof;
Siti Khomsah
International Journal of Advances in Intelligent Informatics Vol 7, No 1 (2021): March 2021
Publisher : Universitas Ahmad Dahlan
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.26555/ijain.v7i1.554
A cross-domain sentiment analysis (CDSA) study in the Indonesian language and tree-based ensemble machine learning is quite interesting. CDSA is useful to support the labeling process of cross-domain sentiment and reduce any dependence on the experts; however, the mechanism in the opinion unstructured by stop word, language expressions, and Indonesian slang words is unidentified yet. This study aimed to obtain the best model of CDSA for the opinion in Indonesia language that commonly is full of stop words and slang words in the Indonesian dialect. This study was purposely to observe the benefits of the stop words cleaning and slang words conversion in CDSA in the Indonesian language form. It was also to find out which machine learning method is suitable for this model. This study started by crawling five datasets of the comments on YouTube from 5 different domains. The dataset was copied into two groups: the dataset group without any process of stop word cleaning and slang word conversion and the dataset group to stop word cleaning and slang word conversion. CDSA model was built for each dataset group and then tested using two types of tree-based ensemble machine learning, i.e., Random Forest (RF) and Extra Tree (ET) classifier, and tested using three types of non-ensemble machine learning, including Naïve Bayes (NB), SVM, and Decision Tree (DT) as the comparison. Then, It can be suggested that the accuracy of CDSA in Indonesia Language increased if it still removed the stop words and converted the slang words. The best classifier model was built using tree-based ensemble machine learning, particularly ET, as in this study, the ET model could achieve the highest accuracy by 91.19%. This model is expected to be the CDSA technique alternative in the Indonesian language.
PENALARAN BERBASIS KASUS UNTUK DETEKSI DINI PENYAKIT LEUKEMIA
Agus Sasmito Aribowo;
Siti Khomsah
Seminar Nasional Informatika (SEMNASIF) Vol 1, No 3 (2012): Intelligent System dan Application
Publisher : Jurusan Teknik Informatika
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
Case-Based Reasoning (CBR) merupakan sebuah pendekatan dimana seseorang yang melakukan penalaran dapat menyelesaikan masalah baru dengan memperhatikan kesamaannya dengan satu atau beberapa penyelesaian dari permasalahan sebelumnya.Penyakit leukemia atau kanker darah diketahui memiliki sedikitnya empat jenis utama leukemia. Setiap jenis penyakit leukemia memiliki gejala yang hampir mirip dan juga gejala yang spesifik. Proses diagnosa leukemia saat ini kebanyakan dilakukan dengan tes fisik, tes darah, tes imunofenotipe, cytogenetic analisis dan pengambilan sampel sumsum tulang. Proses diagnosa semacam ini membutuhkan banyak peralatan laboratorium dan tenaga ahli yang memadai sehingga hanya dapat dilakukan di rumah sakit besar. Database kasus leukemia cukup lengkap di rumah sakit-rumah sakit besar meliputi kondisi penderita, gejala yang terjadi hingga jenis pengobatannya. Bagaimana cara mendiagnosa jenis leukemia secara lebih dini dengan membandingkan gejala pasien yang ada terhadap gejala-gejala yang mirip yang ada pada database kasus leukemia yang sudah ada sehingga tenaga medis di lokasi yang jauh dari rumah sakit besar tetap dapat mengklasifikasikan jenis leukemia dan memberikan pertolongan pertamanya. Penelitian ini bertujuan mengembangkan sistem untuk diagnosa awal jenis leukemia dengan memanfaatkan data kasus sebelumnya menggunakan metode-metode penalaran berbasis kasus atau Case Based Reasoning (CBR). Pengembangan sistem menggunakan metodologi Waterfall. Sistem penyimpanan kasus dalam CBR menggunakan metode indexing sehingga mempermudah proses pencarian kemiripan. Sistem CBR menggunakan metode Nearest Neighbor untuk case retrieval. Hasil dari penelitian adalah sebuah prototype atau model sistem yang dapat membantu diagnosa awal jenis leukemia. Sistem juga dapat memberikan saran pengobatan, perawatan pasien dan cara pencegahannya.
SISTEM PAKAR DENGAN BEBERAPA KNOWLEDGE BASE MENGGUNAKAN PROBABILITAS BAYES DAN MESIN INFERENSI FORWARD CHAINING
Agus Sasmito Aribowo;
Siti Khomsah
Seminar Nasional Informatika (SEMNASIF) Vol 1, No 4 (2011): Intelligent System dan Application
Publisher : Jurusan Teknik Informatika
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
Keunggulan sistem pakar yang dapat memiliki sifat multi kepakaran perlu lebih diteliti karena dengan adanya banyak kepakaran dalam sebuah sistem pakar maka sistem pakar itu dapat lebih berguna untuk menyelesaikan lebih banyak permasalahan. Dicontohkan sistem pakar untuk diagnosa penyakit dapat diisi knowledge base untuk penyakit pada sapi, kambing, ayam , dan beberapa jenis penyakit hewan lainnya. Sistem pakar untuk diagnosa kerusakan alat elektronik dapat diberi knowledge base untuk kerusakan televisi, telepon seluler, dan radio. Tentunya sistem dengan banyak kemampuan kepakaran membutuhkan simpanan knowledge base yang memadai. Maka bagaimana menyusun sistem basis data yang tepat, teknik inferensi forward chaining, dan pengelolaan working memory yang tepat sehingga sistem pakar dengan beberapa knowledge dapat bekerja. Penelitian ini menggunakan tiga buah knowledge base yaitu untuk diagnosa kerusakan televisi, kerusakan handphone dan kerusakan komputer. Hasil penelitian adalah sebuah sistem pakar dengan beberapa knowledge base ini dapat mendiagnosis kerusakan beberapa alat elektronik. Sistem pakar dapat mengakomodasi jawaban atas pertanyaan diagnosa dalam tiga jenis jawaban, yaitu ”YA”, ”TIDAK” dan ”TIDAK TAHU”. Sistem pakar dilengkapi dengan manajemen ketidak pastian menggunakan Probabilitas Bayes sehingga sistem tetap dapat memberikan hasil kesimpulan walaupun fakta yang dimasukkan oleh pengguna tidak lengkap.
Pemanfaatan Algoritma WIT-Tree dan HITS untuk Klasifikasi Tingkat Keberhasilan Pemberdayaan Keluarga Miskin
Siti Khomsah;
Edi Winarko
IJCCS (Indonesian Journal of Computing and Cybernetics Systems) Vol 11, No 1 (2017): January
Publisher : IndoCEISS in colaboration with Universitas Gadjah Mada, Indonesia.
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.22146/ijccs.15927
The successful rate of the poor families empowerment can be classified by characteristic patterns extracted from the database that contains the data of the poor families empowerment. The purpose of this research is to build a classification model to predict the level of success from poor families, who will receive assistance empowerment of poverty. Classification models built with WARM, which is combining two methods, they are HITS and WIT-tree. HITS is used to obtained the weight of the attributes from the database. The weights are used as the attributes’s weight on methods WIT-tree. WIT-tree is used to generate the association rules that satisfy a minimum weight support and minimum weight confidence. The data used was 831 sample data poor families that divided into two classes, namely poor families in the standard of "developing" and poor families in the level of "underdeveloped". The performance of classification model shows, weighting attribute using HITS approaches the accuracy of 86.45% and weighted attributes defined by the user approaches the accuracy of 66.13%. This study shows that the weight of the attributes obtained from HITS is better than the weight of the attributes specified by the user.
Model Text-Preprocessing Komentar Youtube Dalam Bahasa Indonesia
Siti Khomsah;
Agus Sasmito Aribowo
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 4 No 4 (2020): Agustus 2020
Publisher : Ikatan Ahli Informatika Indonesia (IAII)
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
Full PDF (397.867 KB)
|
DOI: 10.29207/resti.v4i4.2035
YouTube is the most widely used in Indonesia, and it’s reaching 88% of internet users in Indonesia. YouTube’s comments in Indonesian languages produced by users has increased massively, and we can use those datasets to elaborate on the polarization of public opinion on government policies. The main challenge in opinion analysis is preprocessing, especially normalize noise like stop words and slang words. This research aims to contrive several preprocessing model for processing the YouTube commentary dataset, then seeing the effect for the accuracy of the sentiment analysis. The types of preprocessing used include Indonesian text processing standards, deleting stop words and subjects or objects, and changing slang according to the Indonesian Dictionary (KBBI). Four preprocessing scenarios are designed to see the impact of each type of preprocessing toward the accuracy of the model. The investigation uses two features, unigram and combination of unigram-bigram. Count-Vectorizer and TF-IDF-Vectorizer are used to extract valuable features. The experimentation shows the use of unigram better than a combination of unigram and bigram features. The transformation of the slang word to standart word raises the accuracy of the model. Removing the stop words also contributes to increasing accuracy. In conclusion, the combination of preprocessing, which consists of standard preprocessing, stop-words removal, converting of Indonesian slang to common word based on Indonesian Dictionary (KBBI), raises accuracy to almost 3.5% on unigram feature.
Analisis Sentimen Pelanggan Hotel di Purwokerto Menggunakan Metode Random Forest dan TF-IDF (Studi Kasus: Ulasan Pelanggan Pada Situs TRIPADVISOR)
Boma Bayu Baskoro;
Irwan Susanto;
Siti Khomsah
Journal of INISTA Vol 3 No 2 (2021): Mei 2021
Publisher : LPPM INSTITUT TEKNOLOGI TELKOM PURWOKERTO
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.20895/inista.v3i2.218
Aplikasi e-tourism di Indonesia sudah banyak diterapkan terutama untuk layanan akomodasi wisata seperti hotel atau penginapan. Salah satu aplikasi e-tourism yang terkenal adalah tripadvisor.co.id. Aplikasi tersebut memudahkan masyarakat memesan hotel secara online karena lebih cepat, praktis dan mudah. Salah satu faktor penting dalam memilih hotel terbaik dengan harga terjangkau ialah pendapat para pelanggan hotel dari ulasan pada kolom komentar dari para pelanggan hotel sebelumnya. Banyaknya data ulasan pelanggan membutuhkan waktu yang lama untuk mengetahui polaritas ulasan positif dan mana ulasan negatif secara manual. Oleh karena itu diperlukan model analisis sentimen yang akurat yang dapat mengklasifikasikan ulasan pelanggan menjadi ulasan positif dan negatif. Pada penelitian ini diusulkan model analisis sentimen pelanggan hotel menggunakan metode Random Forest Classifier dan Term Frequency–Inverse Document Frequency (TF–IDF). Dataset yang digunakan untuk membangun model sentimen analisis adalah data komentar-komentar pelanggan hotel di Purwokerto yang diunduh dari situs tripadvisor.co.id. Pada preprocessing melibatkan proses konversi slangword menjadi kata baku sesuai KBBI, stemming, dan menambahkan kata-kata stopword baru selain stopword dalam library sastrawi. Hasil penelitian menunjukkan akurasi model mencapai akurasi 87,23%. Akan tetapi jika tanpa proses stemming, akurasi model hanya 76,07%.
Sentiment Analysis On YouTube Comments Using Word2Vec and Random Forest
Siti Khomsah
Telematika Vol 18, No 1 (2021): Edisi Februari 2021
Publisher : Jurusan Teknik Informatika
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.31315/telematika.v18i1.4493
Purpose: This study aims to determine the accuracy of sentiment classification using the Random-Forest, and Word2Vec Skip-gram used for features extraction. Word2Vec is one of the effective methods that represent aspects of word meaning and, it helps to improve sentiment classification accuracy.Methodology: The research data consists of 31947 comments downloaded from the YouTube channel for the 2019 presidential election debate. The dataset consists of 23612 positive comments and 8335 negative comments. To avoid bias, we balance the amount of positive and negative data using oversampling. We use Skip-gram to extract features word. The Skip-gram will produce several features around the word the context (input word). Each of these features contains a weight. The feature weight of each comment is calculated by an average-based approach. Random Forest is used to building a sentiment classification model. Experiments were carried out several times with different epoch and window parameters. The performance of each model experiment was measured by cross-validation.Result: Experiments using epochs 1, 5, and 20 and window sizes of 3, 5, and 10, obtain the average accuracy of the model is 90.1% to 91%. However, the results of testing reach an accuracy between 88.77% and 89.05%. But accuracy of the model little bit lower than the accuracy model also was not significant. In the next experiment, it recommended using the number of epochs and the window size greater than twenty epochs and ten windows, so that accuracy increasing significantly.Value: The number of epoch and window sizes on the Skip-Gram affect accuracy. More and more epoch and window sizes affect increasing the accuracy.
The Accuracy Comparison Between Word2Vec and FastText On Sentiment Analysis of Hotel Reviews
Siti Khomsah;
Rima Dias Ramadhani;
Sena Wijaya
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 6 No 3 (2022): Juni 2022
Publisher : Ikatan Ahli Informatika Indonesia (IAII)
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
Full PDF (380.214 KB)
|
DOI: 10.29207/resti.v6i3.3711
Word embedding vectorization is more efficient than Bag-of-Word in word vector size. Word embedding also overcomes the loss of information related to sentence context, word order, and semantic relationships between words in sentences. Several kinds of Word Embedding are often considered for sentiment analysis, such as Word2Vec and FastText. Fast Text works on N-Gram, while Word2Vec is based on the word. This research aims to compare the accuracy of the sentiment analysis model using Word2Vec and FastText. Both models are tested in the sentiment analysis of Indonesian hotel reviews using the dataset from TripAdvisor.Word2Vec and FastText use the Skip-gram model. Both methods use the same parameters: number of features, minimum word count, number of parallel threads, and the context window size. Those vectorizers are combined by ensemble learning: Random Forest, Extra Tree, and AdaBoost. The Decision Tree is used as a baseline for measuring the performance of both models. The results showed that both FastText and Word2Vec well-to-do increase accuracy on Random Forest and Extra Tree. FastText reached higher accuracy than Word2Vec when using Extra Tree and Random Forest as classifiers. FastText leverage accuracy 8% (baseline: Decision Tree 85%), it is proofed by the accuracy of 93%, with 100 estimators.