Patent attributes
A method and system that proactively generate alerts for updating a scraping script to avoid scraping script errors. A predetermined number of webpages targeted by the scraping script are randomly sampled. The scraping script is appended to each webpage in the sample. A structured list of text fragments across the webpages with the appended script is generated. At predetermined time intervals, a fresh set of webpages is sampled, the scraping script is appended to the webpages, and a new structured list is generated. If the new structured list and the previous structured list do not match, the webpages may have been changed and the scraping script may have to be updated. An alert is generated indicating that such update is required and may include a location of the mismatch. Therefore, scraping script errors are proactively detected and can be rectified before an actual error occurs and propagates.