The present disclosure provides systems and methods for test pattern generation to detect a hardware Trojan. One such method includes determining, by a computing device, a set of initial test patterns to activate the hardware Trojan within an integrated circuit design; evaluating nodes of the integrated circuit design and assigning a rareness attribute value and a testability attribute value associated with respective nodes of the integrated circuit design; and generating a set of additional test patterns to activate the hardware Trojan within the integrated circuit design using a reinforcement learning model. The set of initial test patterns is applied as an input along with rareness attribute values and testability attribute values associated with the nodes of the integrated circuit, and the reinforcement learning model is trained with a stochastic learning scheme to identify optimal test patterns for triggering nodes of the integrated circuit design.