Embodiments relate to a system and a method for identifying, from contractual documents, (i) standard exact clauses matching clause examples and (ii) non-standard clauses semantically related to but not matching the clause examples. A standard feature data set comprising standard exact clauses matching clause examples is obtained. In addition, a mirror feature data set comprising semantically related clauses of the clause examples is obtained using semantic language analysis, where the mirror feature data set encompasses the standard feature data set. Non-standard clauses are obtained by extracting a difference between the mirror feature data set and the standard exact feature data set.