Deep reinforcement learning may be used for vehicle repositioning on mobility-on-demand platforms. Information may be obtained. The information may include a current location of a vehicle on a ride-sharing platform. A set of paths originated from the current location of the vehicle may be obtained. Each of the set of paths may have a length less than a preset maximum path length. A set of expected cumulative rewards along the set of paths may be obtained based on a trained deep value-network. A best path from the set of paths may be selected based on a heuristic tree search of the set of expected cumulative rewards. A next step along the best path may be recommended as a reposition action for the vehicle.