A system configured to perform scalable video encoding is provided. The system includes a memory; and a processing unit, wherein the processing unit is configured to: receive inter-layer data and a current picture, wherein the current picture has a base layer; upsample the inter-layer data to generate residual data and reconstruction data, wherein the inter-layer data includes a base mode flag; and encode the current picture to an enhanced layer using the upsampled inter-layer data based on a block type of the base layer and the base mode flag.