Patent attributes
A media edit point selection process can include a media editing software application programmatically converting speech to text and storing a timestamp-to-text map. The map correlates text corresponding to speech extracted from an audio track for the media clip to timestamps for the media clip. The timestamps correspond to words and some gaps in the speech from the audio track. The probability of identified gaps corresponding to a grammatical pause by the speaker is determined using the timestamp-to-text map and a semantic model. Potential edit points corresponding to grammatical pauses in the speech are stored for display or for additional use by the media editing software application. Text can optionally be displayed to a user during media editing.