JuTISI (Jurnal Teknik Informatika dan Sistem Informasi)
Vol 4 No 3 (2018): JuTISI

Pembentukan Dataset Topik Kata Bahasa Indonesia pada Twitter Menggunakan TF-IDF & Cosine Similarity

Kristian Adi Nugraha (Fakultas Teknologi Informasi, Universitas Kristen Duta Wacana)
Danny Sebastian (Fakultas Teknologi Informasi, Universitas Kristen Duta Wacana)



Article Info

Publish Date
21 Dec 2018

Abstract

Social media is evidently the most popular platform compared to other web applications. Indonesians spend an average of 3 hours and 15 minutes every day to access social media, resulting in a substantial amount of information flow. Even though research on information retrieval with social media data is common, only an inconsiderable amount concentrate using social media content in the Indonesian language. Our research aims to form an Indonesian language topic dataset using social media data from Twitter. The methods used in this research include TF-IDF for data formation and cosine similarity to group the Twitter data. Based on the test we conducted, our system is able to produce a fairly accurate result with 64% as its most optimal percentage for the process of every 200 Tweets.

Copyrights © 2018






Journal Info

Abbrev

jutisi

Publisher

Subject

Computer Science & IT

Description

Paper topics that can be included in JuTISI are as follows, but are not limited to: • Artificial Intelligence • Business Intelligence • Cloud & Grid Computing • Computer Networking & Security • Data Analytics • Datawarehouse & Datamining • Decision Support System • E-Systems (E-Gov, ...