A learning-based visual attention prediction method is disclosed. The method includes a correlation relationship between the fixation density and at least one feature information being learned by training, followed by a test video sequence of test frames being received. Afterward, at least one tested feature map is generated for each test frame based on the feature information. Finally, the tested feature map is mapped into a saliency map, which indicates the fixation strength of the corresponding test frame, according to the correlation relationship.