A system may be configured for joint defect discovery and optical mode selection. Defects are detected during a defect discovery step. The discovered defects are accumulated into a mode selection dataset. The mode selection dataset is used to perform mode selection to determine a mode combination. The mode combination may then be used to train the defect detection model. Additional defects may then be detected by the defect detection model. The additional defects may then be provided to the mode selection dataset, for further performing mode selection and training the defect detection model. One or more run-time modes may then be determined. The system may be configured for mode selection and defect detection at an image pixel level.