Image-based key points detection using a convolutional neural network (CNN) may be impacted if the key points are occluded in the image. Images obtained from additional imaging modalities such as depth and/or thermal images may be used in conjunction with RGB images to reduce or minimize the impact of the occlusion. The additional images may be used to determine adjustment values that are then applied to the weights of the CNN so that the convolution operations may be performed in a modality aware manner to increase the robustness, accuracy, and efficiency of key point detection.