A respective set of features, including emotion-related features, are extracted from segments of a video for which a preview is to be generated. A subset of the segments is chosen using the features and filtering criteria including at least one emotion-based filtering criterion. Respective weighted preview-suitability scores are assigned to the segments of the subset using at least a metric of similarity between individual segments and a plot summary of the video. The scores are used to select and combine segments to form a preview for the video.