Patent attributes
Techniques are described that enable a user to edit and customize captions generated by a social networking system, such as transcriptions of an audio clip. In some cases, a social networking system receives, from a first user account, a video and an audio clip associated with the video, and determines that the audio clip contains speech. The social networking system may leverage a speech-to-text component to generate a first text caption based at least in part on the speech in the audio clip. The social networking system provides the first text caption to the first user account, and receives a user input to modify a word included in the first text caption. The social networking system generates a second text caption based at least in part on the user input, and provides the video, including the second text caption, to a second user account.