Computer Science and Engineering, Department of


Date of this Version



University of Nebraska-Lincoln, Computer Science and Engineering
Technical Report # TR-UNL-CSE-2002-0005


By expressing web page content in a format that machines can understand, the semantic web provides huge possibilities for the Internet and for machine reasoning. Unfortunately, there is a considerable distance between the present-day World Wide Web and the semantic web of the future. The process of annotating the Web to make it semantic web-ready is quite long and not without resistance. In this paper one mechanism for semanticizing the Web is presented. This system is known as AutoSHOE, and it is capable of categorizing pages according to one of the present HTML semantic representations (Simple HTML Ontology Extensions) by Heflin et al. We are also extending this system to other semantic web representations, such as the Resource Description Framework (RDF). The AutoSHOE system includes mechanisms to train classifiers to identify web pages that belong in an ontology, as well as methods to classify pages within an ontology and to learn relations between pages with respect to an ontology. The modular design of AutoSHOE allows for the addition of new ontologies as well as algorithms for feature extraction, classifier learning, and rule learning. This system has the promise to help transparently bridge traditional web technology to the semantic web using contemporary machine learning techniques rather than tedious manual annotation.