A method for representing an input image includes the steps of applying a trained neural network on the input image, selecting a plurality of feature maps, determining a location of each of the plurality of feature maps in an image space of the input image, defining a plurality of interest points of the input image, and employing the plurality of interest points for representing the input image for performing a visual task. The plurality of feature maps are selected of an output of at least one selected layer of the trained neural network according to values attributed to the plurality of feature maps by the trained neural network. The plurality of interest points of the input image are defined based on the locations corresponding to the plurality of feature maps.