A method of enchancing a video bit stream using temporal scalability, wherein the number of bits or a temporal position of a bidirectionally predicted picture in an enhancement layer is determined with reference to a corresponding characteristic of pictures in another layer of layers, such as a base layer, of the video bit stream and the peak signal to noise ration of the B picture is matched to that of the pictures in the layer below. By endeavouring to align the characteristics of the bidirectionally predicted picture or pictures with the existing picture or pictures in the lower layer or layers, and improved video sequence can be encoded and decoded for viewing by a user.