Abstract:
he World Wide Web has enormously
increased day by day. Hence it is necessary for
classifying the web pages. We demonstrate the
usefulness of the uniform resource locator (URL) alone
in performing web page classification. This approach is
faster than typical web page classification, as the pages
do not have to be fetched and analyzed Uniform
Resource Locators (URLs) mark the address of the
resource on the World Wide Web, are often humanreadable
can indicate metadata about the resource[11].
Our approach segments the URL into meaningful tokens.
We construct a binary tree for the entire set of tokens
used in the hyperlinks and use J48 classifier. Our results
show that in certain scenarios, URL-based methods
approach show better performance.