Systems provided herein include a learning environment and an agent. The learning environment includes an avatar and an object. A state signal corresponding to a state of the learning environment includes a location and orientation of the avatar and the object. The agent is adapted to receive the state signal, to issue an action capable of generating at least one change in the state of the learning environment, to produce a set of observations relevant to a task, to hypothesize a set of action models configured to explain the observations, and to vet the set of action models to identify a learned model for the task.