Embodiments of the present disclosure provide a method and apparatus for aligning a paragraph and a video. The method may include: acquiring a commentary and a candidate material resource set corresponding to the commentary, a candidate material resource being a video or an image; acquiring a matching degree between each paragraph in the commentary and each candidate material resource in the candidate material resource set; and determining a candidate material resource sequence corresponding to the each paragraph in the commentary based on the matching degrees between the paragraphs in the commentary and the candidate material resources, playing durations of the candidate material resources and text lengths of the paragraphs in the commentary, an image playing duration being a preset image playing duration.