A method for visual-based recognition of objects is described. Depth data for at least a pixel of an image of the object is received, the depth data comprising information relating to a distance from visual sensor to a portion of the object visible at the pixel. At least one plan-view image is generated based on the depth data. At least one plan-view template is extracted from the plan-view image. The plan-view template is processed by at least one classifier, wherein the classifiers are trained to make a decision according to pre-configured parameters.