Patent attributes
The disclosed embodiments provide a system for processing data. During operation, the system obtains validated training data containing a first set of content items and a first set of classification tags for the first set of content items. Next, the system uses the validated training data to produce a statistical model for classifying content using a set of dimensions represented by the first set of classification tags. The system then uses the statistical model to generate a second set of classification tags for a second set of content items. Finally, the system outputs one or more groupings of the second set of content items by the second set of classification tags to improve understanding of content related to the set of dimensions without requiring a user to manually analyze the second set of content items.