Aspects include generating a matrix based on a first sample of source code. Each cell in the matrix can correspond to a unique element in the source code. Each unique element can be encoded to a predetermined value according to an encoding rule. A first waveform is generated by combining a left-side curve and a right-side curve. The left-side curve encodes a first position of non-zero cells in each row of the matrix and the right-side curve encodes a last position of non-zero cells in each row of the matrix. A second sample of source code is identified that matches the first sample of source code based on a comparison of the first waveform to a second waveform constructed from the second sample of source code.