Computer Science and Information Technologies
Vol 5, No 3: November 2024

Vector space model, term frequency-inverse document frequency with linear search, and object-relational mapping Django on hadith data search

Taufik, Ichsan (Unknown)
Agra, Agra (Unknown)
Gerhana, Yana Aditia (Unknown)



Article Info

Publish Date
01 Nov 2024

Abstract

For Muslims, the Hadith ranks as the secondary legal authority following the Quran. This research leverages hadith data to streamline the search process within the nine imams’ compendium using the vector space model (VSM) approach. The primary objective of this research is to enhance the efficiency and effectiveness of the search process within Hadith collections by implementing pre-filtering techniques. This study aims to demonstrate the potential of linear search and Django object-relational mapping (ORM) filters in reducing search times and improving retrieval performance, thereby facilitating quicker and more accurate access to relevant Hadiths. Prior studies have indicated that VSM is efficient for large data sets because it assigns weights to every term across all documents, regardless of whether they include the search keywords. Consequently, the more documents there are, the more protracted the weighting phase becomes. To address this, the current research pre-filters documents prior to weighting, utilizing linear search and Django ORM as filters. Testing on 62,169 hadiths with 20 keywords revealed that the average VSM search duration was 51 seconds. However, with the implementation of linear and Django ORM filters, the times were reduced to 7.93 and 8.41 seconds, respectively. The recall@10 rates were 79% and 78.5%, with MAP scores of 0.819 and 0.814, accordingly.

Copyrights © 2024






Journal Info

Abbrev

csit

Publisher

Subject

Computer Science & IT Engineering

Description

Computer Science and Information Technologies ISSN 2722-323X, e-ISSN 2722-3221 is an open access, peer-reviewed international journal that publish original research article, review papers, short communications that will have an immediate impact on the ongoing research in all areas of Computer ...