Paper on “Interactive Thesaurus Assessment for Automatic Document Annotation” accepted for K-CAP 2007

The paper “Interactive Thesaurus Assessment for Automatic Document Annotation” written by Heiner Stuckenschmidt, Magnus Pfeffer and me was accepted for the Fourth International Conference on Knowledge Capture (K-CAP 2007) in Whistler, Canada.

The use of thesaurus-based indexing is a common approach for increasing the performance of document retrieval. With the growing amount of documents available, manual indexing is not a feasible option. Statistical methods for automated document indexing are an attractive alternative. We argue that the quality of the thesaurus used as a basis for indexing in regard to its ability to adequately cover the contents to be indexed is of crucial importance in automatic indexing because there is no human in the loop that can spot and avoid indexing errors. We propose a method for thesaurus evaluation that is based on a combination of statistical measures and appropriate visualization techniques that supports the detection of potential problems in a thesaurus. We describe this method and show its application in the context of two automatic indexing tasks. The examples show that the methods indeed eases the detection and correction of errors leading to a better indexing result.

More information and the downloadable PDF can be found at http://www.kaiec.org/0706_paper.html.

Technorati , , , , , , , , ,

Leave a Reply