Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.

Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Knowledge-based information retrieval and classification

Sanjiv K Bhatia, University of Nebraska - Lincoln

Abstract

A user of an information retrieval system formulates a query to express his/her information requirements. The query formulation is a difficult process because of the discrepancies between the vocabulary of the user and that of the system. For the system to perform effective retrieval, the query should be in terms of keywords in the system vocabulary. Past efforts for the solution to the problem of query expression have concentrated on relevance feedback, thesaurus construction, and classification using the matching of keywords extracted from the documents in the collection. In this dissertation, an alternative view is proposed to improve the query formulation and classification process. The proposed approach is based on the application of knowledge acquisition techniques to determine a user's vocabulary and his/her view of different documents in a training set. A representation is then developed for each phrase/concept given by the user in terms of keywords extracted by the system from those documents using machine learning techniques. The query given by the user in his/her own vocabulary can then be easily translated into the system vocabulary. Computation of relationships between the phrases given by the user also helps in developing a user profile and creating a classification of documents. The resulting system is capable of automatically identifying the phrases in a user query and correlating them to the keywords computed by the system through the conventional indexing process. In addition, keywords extracted from an incoming document are compared with the representation of various clusters to identify the most appropriate cluster for the document. The application of the developed techniques to message routing and message understanding is also investigated. The system is evaluated by using the standard performance measures of precision and recall by comparing its performance against the performance of the scSMART system for individual queries. The classification results are shown to satisfy the performance criterion for satisfactory classification as published in the literature.

Subject Area

Computer science

Recommended Citation

Bhatia, Sanjiv K, "Knowledge-based information retrieval and classification" (1991). ETD collection for University of Nebraska-Lincoln. AAI9200130.
https://digitalcommons.unl.edu/dissertations/AAI9200130

Share

COinS