Penelitian ini berfokus pada pengembangan model pembelajaran mesin untuk mendeteksi informasi sensitif dalam dokumen teks di industri jasa keuangan. Masalah utama yang diidentifikasi adalah potensi penyalahgunaan informasi oleh karyawan yang mengundurkan diri, keterbatasan metode deteksi tradisional, dan kebutuhan akan model pembelajaran mesin yang efektif. Ruang lingkup penelitian mencakup pengembangan model Convolutional Neural Networks (CNN) dengan metode pembobotan Term Frequency-Inverse Document Frequency (TF-IDF) dan Term Frequency-Relevance Frequency (TF-RF). Penelitian menggunakan pendekatan kuantitatif dan eksperimental, dengan tahapan pengumpulan data, pra-pemrosesan, penerapan pembobotan, pelatihan dan evaluasi model, serta validasi hasil. Data terdiri dari dokumen teks perusahaan jasa keuangan seperti laporan keuangan dan data nasabah. Pra-pemrosesan dilakukan untuk menghilangkan noise dan informasi tidak relevan, diikuti oleh metode pembobotan untuk memberi bobot pada kata-kata penting. Model CNN dilatih untuk mendeteksi pola yang menunjukkan informasi sensitif. Hasil penelitian menunjukkan metode TF-IDF lebih baik daripada TF-RF dalam mendeteksi informasi sensitif, dengan akurasi tertinggi 93,26%. Model CNN mampu mengenali pola kompleks dan mendeteksi informasi sensitif dengan akurasi tinggi. Evaluasi dengan akurasi, presisi, recall, dan f1-score menunjukkan bahwa model ini dapat diandalkan dan diaplikasikan dalam situasi nyata. Penelitian ini berkontribusi pada keamanan informasi dan penerapan pembobotan dalam meningkatkan kinerja model pembelajaran mesin. Abstract This research focuses on the development of a machine learning model to detect sensitive information in text documents within the financial services industry. The main issues identified are the potential misuse of information by employees who resign, the limitations of traditional detection methods, and the need for an effective machine learning model. The scope of the research includes the development of a Convolutional Neural Networks (CNN) model with Term Frequency-Inverse Document Frequency (TF-IDF) and Term Frequency-Relevance Frequency (TF-RF) weighting methods. The study employs a quantitative and experimental approach, with stages including data collection, preprocessing, application of weighting methods, model training and evaluation, and result validation. The data consists of text documents from financial services companies such as financial reports and customer data. Preprocessing was carried out to remove noise and irrelevant information, followed by the application of weighting methods to assign importance to significant words. The CNN model was trained to detect patterns indicating sensitive information. The results show that the TF-IDF method performed better than TF-RF in detecting sensitive information, with the highest accuracy of 93.26%. The CNN model was able to recognize complex patterns and detect sensitive information with high accuracy. Evaluation using accuracy, precision, recall, and f1-score demonstrates that this model is reliable and applicable in real-world situations. This research contributes to information security and the use of weighting methods to improve the performance of machine learning models.
Copyrights © 2025