Articles
Racing Bib Number Recognition Method using Deep Learning
Rayhan, Muhammad Aditya;
Lhaksmana, Kemas Muslim
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 4, No 3 (2020): Juli 2020
Publisher : STMIK Budi Darma
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.30865/mib.v4i3.2270
Mass running event has gained popularity ever since recreational running becomes more common as they often held annually by various organizers. As image documentation took a huge part to showcase the event, many thousands of images were generated during the event. Along with thousands of images that were generated, the participant is unlikely to found an image of themselves. To solve this problem, image annotation could be performed to address image with specific tags such as participant attribute like racing bib number (RBN). Manually annotate thousands of images would result in inefficiency of time and hard-labor. As a work to tackle this problem, this paper proposed an automatic image annotation system using the YOLOv3 algorithm based RBN recognition method. The experiment result shows 83.0% precision, 81.5% recall, and 82.2% F1 score as a result of our proposed method on running event dataset. Therefore, this implemented method will promote efficiency to solve the image annotation problem because it doesn't require manual annotation over thousand of running event images
Typo handling in searching of Quran verse based on phonetic similarities
Purwita, Naila Iffah;
Bijaksana, Moch Arif;
Lhaksmana, Kemas Muslim;
Naf’an, Muhammad Zidny
Register: Jurnal Ilmiah Teknologi Sistem Informasi Vol 6, No 2 (2020): July
Publisher : Information Systems - Universitas Pesantren Tinggi Darul Ulum
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.26594/register.v6i2.2065
The Quran search system is a search system that was built to make it easier for Indonesians to find a verse with text by Indonesian pronunciation, this is a solution for users who have difficulty writing or typing Arabic characters. Quran search system with phonetic similarity can make it easier for Indonesian Muslims to find a particular verse. Lafzi was one of the systems that developed the search, then Lafzi was further developed under the name Lafzi+. The Lafzi+ system can handle searches with typo queries but there are still fewer variations regarding typing error types. In this research Lafzi++, an improvement from previous development to handle typographical error types was carried out by applying typo correction using the autocomplete method to correct incorrect queries and Damerau Levenshtein distance to calculate the edit distance, so that the system can provide query suggestions when a user mistypes a search, either in the form of substitution, insertion, deletion, or transposition. Users can also search easily because they use Latin characters according to pronunciation in Indonesian. Based on the evaluation results it is known that the system can be better developed, this can be seen from the accuracy value in each query that is tested can surpass the accuracy of the previous system, by getting the highest recall of 96.20% and the highest Mean Average Precision (MAP) reaching 90.69%. The Lafzi++ system can improve the previous system.
Implementation of TF-IDF Method and Support Vector Machine Algorithm for Job Applicants Text Classification
Luthfi, Muhammad Faris;
Lhaksamana, Kemas Muslim
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 4, No 4 (2020): Oktober 2020
Publisher : STMIK Budi Darma
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.30865/mib.v4i4.2276
Tens of thousands of people are applying for job in PT. Telkom each year. The goal of the recruitment process is to get new employees which can fit PT. Telkom's working culture. Due to the high number of applicants, the recruitment process takes a lot of time and affecting higher cost to spend. We're proposing a popular combination of Term Frequency-Inverse Document Frequency (TF-IDF) as the extraction method and Support Vector Machine (SVM) as the classifier to filter the applicants' interview text. SVM generally produces better accuracy in text classification compared to Random Forest or K-Nearest Neighbors (KNN) algorithm. However, TF-IDF has several developments to improve its flaws, one of them is Term Frequency-Relevance Frequency (TF-RF). As a comparison, in this study we use three extraction methods: TF only (without IDF), TF-IDF, and TF-RF. We use interview texts from PT. Telkom as the data source. The results of combination SVM with TF-IDF can produce 86.31\% of accuracy, with TF only can produce 85.06\%, and with TF-RF can produce 83.61\% of accuracy. The results show extracting method TF-IDF can still outperform TF-RF in term of accuracy.
Pencarian Potongan Ayat Al-Qur'an dengan Perbedaan Bunyi pada Tanda Berhenti Berdasarkan Kemiripan Fonetis
Naufal Rasyad;
Moch. Arif Bijaksana;
Kemas Muslim Lhaksmana
Jurnal Linguistik Komputasional Vol 2 No 2 (2019): Vol. 2, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
Full PDF (1238.009 KB)
|
DOI: 10.26418/jlk.v2i2.25
Al-Quran merupakan kitab suci utama bagi umat Islam yang ditulis menggunakan bahasa Arab. Seiring dengan perkembangan teknologi, telah dikembangkan sistem pencarian ayat Al-Qur’an berdasarkan kemiripan fonetis salah satunya adalah Lafzi. Namun untuk menangani perbedaan bunyi pada tanda berhenti di pertengahan ayat, sistem Lafzi belum bisa menanganinya dengan baik. Maka dari itu, dibutuhkan sistem yang dapat membantu pengguna dalam melakukan pencarian ayat Al-Quran, terutama untuk perbedaan bunyi pada tanda berhenti sehingga pencarian bisa menemukan kata yang berbeda pengucapan pada tanda berhenti. Berdasarkan permasalahan tersebut, dari sistem Lafzi, dilakukan pengembangan supaya dapat melakukan pencarian yang bisa menangani perbedaan bunyi pada tanda berhenti. Digunakan pengindeksan trigram untuk memperkirakan kecocokan string antara kueri dengan transliterasi ayat Al-Qur’an serta dibuat aturan pada input dengan huruf akhir ’T’ menjadi ’H’. Sistem yang sudah ada mendapatkan nilai recall sebesar 81% dan nilai MAP sebesar 65%. Sedangkanhasildaripenelitianinidiperolehnilairecallsebesar 100% dan nilai MAP sebesar 84%.
Topic Classification of Islamic Question and Answer Using Naïve Bayes and TF-IDF Method
Aura Sukma Andini;
Danang Triantoro Murdiansyah;
Kemas Muslim Lhaksmana
Computer Engineering and Applications Journal Vol 10 No 3 (2021)
Publisher : Universitas Sriwijaya
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
Full PDF (362.608 KB)
|
DOI: 10.18495/comengapp.v10i3.385
Information spread through the internet is widely used by people to find anything. One of the most searched information on the internet is information related to Islamic religious knowledge. However, the large amount of information available from various sources makes it difficult for people to find the correct information. Previous researchers have researched this topic, but the dataset used only comes from one source. Therefore, in this study, a classification system for Islamic question and answer topics was built using the Naïve Bayes and TF-IDF methods. This study using 1000 question and answer article data taken from Islamic consultation websites, namely rumahfiqih.com and islamqa.info. The multi-class classification uses five categories which are manually labeled using the category classes on the website. From several test scenarios in this study, the Naïve Bayes classification method using TF-IDF (n-gram level) with a maximum feature of 1000 at a data separation ratio of 70:30 produces the highest accuracy of 81%. The 81% accuracy value was also generated by the SVM classification method, but the difference was in the SVM the highest accuracy value using TF-IDF (word level). It is expected that in the subsequent research will be used more website sources and the use of other classification and feature extraction methods with more optimal value than previous research.
Kategorisasi Berita Menggunakan Metode Pembobotan TF.ABS dan TF.CHI
Muhammad Arif Kurniawan;
Yuliant Sibaroni;
Kemas L Muslim
Indonesia Journal on Computing (Indo-JC) Vol. 3 No. 2 (2018): September, 2018
Publisher : School of Computing, Telkom University
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.21108/INDOJC.2018.3.2.236
Dengan kemajuan teknologi saat ini, berita dapat ditemukan dengan mudah dan berjumlah sangat banyak dalam bentuk digital yang menyebabkan diperlukannya suatu teknik untuk mengkategorikan berita-berita tersebut ke dalam topik tertentu agar mempermudah pembaca menemukan berita sesuai dengan topik yang diinginkan. Kategorisasi teks merupakan suatu teknik yang dapat mengkategorikan berita ke dalam topik yang telah ditentukan secara otomatis. Salah satu proses yang penting dalam kategorisasi adalah ekstraksi fitur yang mana unigram binary merupakan salah satu ekstraksi fitur yang dasar dibandingkan dengan term weighting yang dalam penelitian ini akan menggunakan metode pembobotan TF.ABS dan TF.CHI untuk memperoleh hasil kategorisasi berita yang optimal. Berdasarkan hasil pengujian, rata-rata akurasi yang didapatkan dari tiga sumber data pada ekstraksi fitur unigram binary sebesar 90.44%. Sedangkan pada metode pembobotan TF.ABS sebesar 95.74% dan TF.CHI sebesar 95.87%. Berdasarkan hasil akurasi tersebut, dapat disimpulkan bahwa term weighting lebih baik dibandingkan dengan unigram binary. Metode pembobotan TF.ABS dan TF.CHI sama-sama baik dalam kategorisasi karena tidak berbeda secara signifikan dalam performansinya. Pada hasil pengujian lainnya menunjukkan bahwa proses stemming tidak memberikan banyak pengaruh terhadap akurasi kategorisasi berita, namun proses ini dapat mengefisiensikan waktu hingga 45%.
Analysis of the Commutative Method Approach on English Thesaurus for Developing Synonym Sets
Arini Rohmawati;
Moch. Arif Bijaksana;
Kemas Muslim Lhaksmana
Indonesia Journal on Computing (Indo-JC) Vol. 4 No. 2 (2019): September, 2019
Publisher : School of Computing, Telkom University
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.34818/INDOJC.2019.4.2.332
WordNet is a lexical database for languages, the difference between WordNet and dictionaries in general is that WordNet focuses on the synonyms. The main unit of WordNet is synonym set (synset), synset is a set of one or more words that have the same meaning and certainly can be replaced in certain contexts. Synset is a very important element in implementing WordNet. In this paper, an analysis of the synonym extraction process is carried out by using commutative approach, the data test obtained from the Oxford Paperback Thesaurus by taking 51 word entries. Commutative method has similar characters with synonym set, synonym set can replace each other in certain contexts. The data test extraction process is carried out until the performance measurement evaluation process using F1Score. The system generates synonym sets that matched with the manual extraction, the result of F1Score between the program and Princeton synonym sets are worth 10%.
Pembangunan Synonym Set untuk WordNet Bahasa Indonesia dengan Menggunakan Metode Komutatif
dina juni restina;
Moch. Arif Bijaksana;
Kemas Muslim Lhaksamana
Indonesia Journal on Computing (Indo-JC) Vol. 4 No. 2 (2019): September, 2019
Publisher : School of Computing, Telkom University
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.34818/INDOJC.2019.4.2.334
Dalam NLP (Natural Language Processing) banyak keterkaitan semantik leksikal yang dapatdiaplikasikan, salah satunya adalah aplikasi WordNet. WordNet yang dibangun menggunakanmetode komutatif dalam proses pembangunan synonym set (synset)-nya. Synset yang akandibangun harus memiliki sifat komutatif baru dapat dikatakan synset yang bernilai valid, yangberarti jika sebuah kata w1 memiliki sinonim w2, maka w2 juga harus memiliki sinonimw1, seperti yang terjadi pada Princeton WordNet. WordNet pertama kali dibuat dalam bahasaInggris, sebelum para peneliti menerjemahkan kedalam berbagai bahasa seperti bahasa Jepang,bahasa Arab, bahasa Turki serta bahasa Indonesia dan bahasa lainnya. Untuk itu dibutuhkanpembangunan WordNet untuk turut serta membantu peneliti lain agar kedepanya WordNetBahasa Indonesia yang ada memiliki kosa kata yang lebih lengkap. WordNet yang akandibangun akan berfokus pada ekstraksi synset yaitu tahapan pertama pembangunan WordNetsebelum tahapan relasi antar synset dan gloss kata. Pembangunan synset ini menggunakankamus Tesaurus Bahasa Indonesia sebagai sumber kata. Nilai F-measure dari pembangunansynset dengan menggunakan metode komutatif adalah sebesar 66 persen.Kata Kunci: WordNet Bahasa Indonesia, Synset dan Metode komutatif
Sistem Pencarian Lintas Ayat Al-Qur'an Berdasarkan Kesamaan Fonetis
Eki Rifaldi;
Moch Arif Bijaksana;
Kemas Muslim Lhaksamana
Indonesia Journal on Computing (Indo-JC) Vol. 4 No. 2 (2019): September, 2019
Publisher : School of Computing, Telkom University
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.34818/INDOJC.2019.4.2.342
Mencari teks Arab dalam Al-Qur'an tidak mudah bagi pengguna yang tidak memiliki cukup pengetahuan tentang bahasa dan tulisan Arab. Banyaknya ayat dan perbedaan bahasa dalam Al-Qur'an menimbulkan kesulitan tersendiri untuk pencarian ayat oleh masyarakat muslim Indonesia. Dibutuhkan sistem pencarian ayat Al-Qur'an berbasis fonetis yang dapat memudahkan pengguna dalam mencari ayat menggunakan tulisan latin berhuruf alfabet yang merepresentasikan bunyi pengucapan pengguna. Sebagai contoh, jika dilakukan pencarian kata الْحَمْدُ لِلَّـهِ maka sistem akan menampilkan seluruh ayat yang memiliki kemiripan bunyi dengan kata kunci. Untuk saat ini, sudah ada sistem pencarian ayat Al-Qur'an dengan menggunakan phonetic string matching, namun terbatas hanya dapat menemukan ayat berdasarkan query yang tidak lintas ayat. Kemudian jika dilakukan pencarian kata lintas ayat يَوْمِ الدِّينِ (4) إِيَّاكَdengan pencocokan string dalam database, maka sistem tidak dapat memberikan hasil pencarian dua ayat sekaligus. Oleh karena itu, dibangun suatu sistem pencarian ayat Al-Qur'an berdasarkan kemiripan bunyi (fonetis) yang dapat melintasi ayat. Algoritma N-gram berupa trigram digunakan untuk menemukan ayat-ayat yang memiliki kemiripan bunyi (fonetis) karena memiliki MAP yang tinggi untuk kata kunci panjang. Untuk mencari lintas ayat, lima buah trigram ayat selanjutnya ditambahkan ke ujung trigram ayat sebelumnya. Kemudian diperoleh nilai MAP 0,9 dan Recall 0,93.
Topic Classification of Islamic Questionand Answer Using Naive Bayes Classifier
Naufal Furqan Hardifa;
Kemas Muslim Lhaksmana;
Jondri Jondri
Indonesia Journal on Computing (Indo-JC) Vol. 4 No. 2 (2019): September, 2019
Publisher : School of Computing, Telkom University
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.34818/INDOJC.2019.4.2.346
Topic classification is one of the most important components in an automatic Islamic question-answering system, which is capable of automatically providing the most relevant answers given a question about the Islamic issue. In our research, the Islamic question-answering system to be built collects existing Islamic questions and answers from trusted online Islamic consultation websites. To speed up the search for finding the appropriate answers, each Q & A entry should be classified into a topic. However, the question-answering system cannot directly adopt the topic classes provided by the online Islamic consultation websites, because different websites use different classifications. Since the number of Q & A entries could reach tenth thousands, an automatic topic classification method is required. In this paper, a naive Bayes classifier is implemented to classify Q & A entries. The classifier gives a satisfying result with 0.88 precision.