Patent attributes
A plurality of events is extracted from a set of sources, including an event representing a transfer of a first data set from one data storage stage of a data pipeline to another stage to form a second data set, and another event representing a completion of a computation performed on the second data set. Based on analysis of the plurality of events, a graph is stored; the nodes of the graph represent data sets at respective stages of the data pipeline, and edges represent the events. In response to a request for lineage information pertaining to a particular data set at a particular stage of the pipeline, an indication of a sequence of events represented in the graph is provided, including a particular event which led to the presence of the particular data set at the particular stage.