TELKOMNIKA (Telecommunication Computing Electronics and Control)
Vol 10, No 1: March 2012

CT-FC: more Comprehensive Traversal Focused Crawler

Siti Maimunah (Surabaya Adhitama Institute of Technology)
Husni S Sastramihardja (Bandung Institute of Technology)
Dwi H Widyantoro (Bandung Institute of Technology)
Kuspriyanto Kuspriyanto (Bandung Institute of Technology)



Article Info

Publish Date
01 Mar 2012

Abstract

 In today’s world, people depend more on the WWW information, including professionals who have to analyze the data according their domain to maintain and improve their business. A data analysis would require information that is comprehensive and relevant to their domain. Focused crawler as a topical based Web indexer agent is used to meet this application’s information need. In order to increase the precision, focused crawler face the problem of low recall. The study on WWW hyperlink structure characteristics indicates that many Web documents are not strong connected but through co-citation & co-reference. Conventional focused crawler that uses forward crawling strategy could not visit the documents in these characteristics. This study proposes a more comprehensive traversal framework. As a proof, CT-FC (a focused crawler with the new traversal framework) ran on DMOZ data that is representative to WWW characteristics. The results show that this strategy can increase the recall significantly.

Copyrights © 2012






Journal Info

Abbrev

TELKOMNIKA

Publisher

Subject

Computer Science & IT

Description

Submitted papers are evaluated by anonymous referees by single blind peer review for contribution, originality, relevance, and presentation. The Editor shall inform you of the results of the review as soon as possible, hopefully in 10 weeks. Please notice that because of the great number of ...