A computer-assisted method for discovering topics and categorizing contents in a document includes the steps of calculating an importance score for a term based on grammatical roles, parts of speech, and semantic attributes, selecting terms based on the importance score values of the respective terms, and outputting terms comprising the selected term to represent topics in the document, and building a category structure based on the selected terms.