The present disclosure generally relates to performance recreation, and in particular, the recreation of observed human performance using reinforcement learning. In this regard, a first object is identified from a plurality of objects. The manipulation of the first object is tracked from a first position to a second position. A characterization of the manipulation is generated. A policy that controls a mechanical gripper to recreate the manipulation is generated based on an iteratively increasing cumulative award. The mechanical gripper iteratively recreates the manipulation to increase a cumulative award with each recreation.