Date of this Version
Electronic Journal of Academic and Special Librarianship (Summer 2005) 6(1-2). Also available at http://southernlibrarianship.icaap.org/content/v06n01/weiner_j01.htm.
Index terms are an important component in considering a scientific topic. In a real sense, the indexing terms represent the vocabulary and language of the topic. Study of these critical terms has employed human and machine techniques. Computerized indexing systems can accurately and completely recognize terms, but the different strategies for organizing and evaluating the concepts (i.e., informative terms) and related issues may not be effective in accomplishing the desired descriptive actions. This paper explored the results of two computer supported approaches in indexing scientific documents against a background of simple random generation of informative terms in varying sized text blocks. One method (RefViz) is based on the Latent Semantic Indexing with Multidimensional Scaling (LSI,MDS) approach. This system identifies potential indexing terms from the natural language text. The terms are stripped from the text, organized using statistical criteria and are used to classify documents based on assigned ‘meanings’ or themes. These themes are based on frequency and correlation and each document is assigned one. The second (Idea Analysis) also uses the natural language text. In this system, the terms are identified within the authors’ sentences and couplets extracted. A single document may be represented by 100 or more term couplets representing the authors’ thoughts. Both systems are superior to random sampling. The results suggested that indexing based on the authors’ thoughts may be better than indexing based on statistical criteria.