The present disclosure provides an approach in which a domain corpus subset generator correlates documents from a document corpus to domain discernible attributes associated with domain corpus subsets. The domain corpus subset generator analyzes correlation results from the correlation and stores the documents into domain corpus subsets accordingly. In turn, a question-answer system utilizes documents included in a specific domain corpus subset to provide relevant and accurate answers to an input question.