The method may include processing, by using a neural network, input feature maps of an image to obtain output feature maps of the image. The neural network may include a convolution part and/or a pooling part, and an aggregation part. The convolution part may include at least one parallel unit each of which contains two parallel paths, each path of the two parallel paths contains two cascaded convolution layers. The kernel sizes are 1 dimension and are different in different units. The pooling part includes at least one parallel unit each of which contains two parallel paths, each path of the two parallel paths contains two cascaded pooling layers. The size of filters of pooling is 1 dimension and is different in different units. The aggregation part is configured to concatenate results of the convolution part and/or the pooling part to obtain the output feature maps of the image.