A method, system and product for automated quality assessment of a programming task. Programming activity of a developer is monitored to obtain measurements of a plurality of metrics in a plurality of time segments. Functional correctness of the program at a last time segment of the plurality of time segments is determined. Based on the measurements of each of the metrics in the plurality of time segments, a plurality of features are computed. The plurality of features are indicative of a behavior of the developer while programming. A prediction model is utilized to provide an automated assessment based on the values of the plurality of features.