Journal of Applied Data Sciences
Vol 6, No 2: MAY 2025

An Artificial Ant-Based Approach Using Polynomial Algorithms to Tackle the Text Aspect of Clustering Web Pages

Moufok, Souad (Unknown)
Belkadi, Khaled (Unknown)
Lebbah, Fatima Zohra (Unknown)



Article Info

Publish Date
05 Mar 2025

Abstract

Nowadays, the web clustering problem represents a scalable research area, which is based on deep study and efficient analysis of the user's browsing behavior. Managing huge amounts of unstructured data that are given through web pages is described as a hard and primary task. In this article, we analyze clusters by grouping users based on the similarity of the web pages they have visited. Our work focuses on cleaning, analyzing, and clustering web data to facilitate users’ access to relevant content. Thus, we propose a novel algorithm, called WCLARTANT, to cluster WEB pages, which consists of finding groups of sessions according to the corresponding Web access patterns. We propose a new approach based on the ANTTREE algorithm, inspired from the self-assembling behavior observed in real ants and the binary search tree concept. The combination that we present in our approach is applied for the first time in web usage mining clustering. More precisely, different topologies are built in terms of different similarity measures, such as SBS, Euclidean, Jaccard and Cosine. Afterward, the clusters are extracted from the binary tree, which is built by the prefix depth algorithm. In other words, the proposed algorithms in this manuscript provide the corresponding binary tree to the sessions' matrix, where each node models a WEB session and each branch represents a cluster. In addition, we use the Silhouette index to evaluate and to analyze the clustering performance of WCLARTANT relative to the DBScan algorithm. WClArtAnt combined with the similarity measure SBS provides the best results compared to DBScan. The performance of our algorithm varies between 0.62 and 0.39, which are considered good. The considered log files are coming from NASA and contain all HTTP requests for a month period, from 1st July, 1995, to 31st July, 1995, for a total of 65,194 entries.

Copyrights © 2025






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...