To identify emphasized text, bounding boxes are based on clusters resulting from horizontal compression and horizontal morphological dilation. The bounding boxes are processed to determine if any contain words or characters in bold. A bounding box is eliminated based on a comparison of its density and an average density across all bounding boxes. If its density is greater, text elements within the bounding box are evaluated to determine whether the text element is bold.