The present disclosure relates to an image processing device including: a memory configured to store one or more instructions; and a processor configured to execute the one or more instructions stored in the memory to: extract one or more input patches based on an input image; extract one or more pieces of feature information respectively corresponding to the one or more input patches, based on a dictionary including mapping information indicating mappings between a plurality of patches and pieces of feature information respectively corresponding to the plurality of patches; and obtain a final image by performing a convolution operation between the extracted one or more pieces of feature information and a filter kernel.