LDL: Librarian's Digital Library

Fast web page categorization without the web page

Show simple item record

dc.contributor.author Indra Devi, M.
dc.contributor.author Selvakuberan, K.
dc.date.accessioned 2007-05-04T06:31:26Z
dc.date.available 2007-05-04T06:31:26Z
dc.date.issued 2007-02
dc.identifier.citation International Conference on Semantic Web and Digital Libraries (ICSD-2007). ARD Prasad & Devika P. Madalli (Eds.): ICSD-2007 en
dc.identifier.uri http://drtc.isibang.ac.in/ldl/handle/1849/391
dc.description.abstract he World Wide Web has enormously increased day by day. Hence it is necessary for classifying the web pages. We demonstrate the usefulness of the uniform resource locator (URL) alone in performing web page classification. This approach is faster than typical web page classification, as the pages do not have to be fetched and analyzed Uniform Resource Locators (URLs) mark the address of the resource on the World Wide Web, are often humanreadable can indicate metadata about the resource[11]. Our approach segments the URL into meaningful tokens. We construct a binary tree for the entire set of tokens used in the hyperlinks and use J48 classifier. Our results show that in certain scenarios, URL-based methods approach show better performance. en
dc.format.extent 47413 bytes
dc.format.mimetype application/pdf
dc.language.iso en en
dc.publisher DRTC en
dc.subject Uniform Resource Locator, en
dc.subject metadata extraction en
dc.subject text categorization, en
dc.subject classification en
dc.subject Feature selection en
dc.title Fast web page categorization without the web page en
dc.type Article en


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account