One example method includes capturing audio data at a client engine while outputting an output video, the output video being based upon an original video stored at the client engine, delivering the captured audio data to a prediction engine upon the captured audio data being captured for a pre-determined time, receiving from the prediction engine substitute frame data used by the client engine to stitch one or more frames into the original video stored at the client engine, and following stitching the one or more frames into the output video to generate an altered output video, outputting the captured audio data and the altered video from the client engine.