Unsupervised data driven taxonomy learning
The ability to effectively organize textual information is a big challenge in intelligent text processing. With the increase in the amount of textual data being generated, this task is becoming more and more essential. In this paper we present an unsupervised computer-aided tool for automatically building classification schemes and taxonomies for enhancing the process of automated text classification. The tool utilizes the Wikipedia knowledge base and its categorization system to achieve its goal. Validation of the tool was done using a subset of a large language dataset obtained from the Google moderator series (Egypt 2.0) idea bank. The output of the tool was evaluated by comparing the similarity between the results obtained automatically from the tool, and those manually annotated by three different human evaluators, verifying the effectiveness of the tool. The tool showed effectiveness with a precision of 88.6% and recall of 81.2%. © 2015 IEEE.