Patent attributes
The invention provides a method, apparatus and system for classification and clustering electronic data streams such as email, images and sound files for identification, sorting and efficient storage. The inventive systems disclose labeling a document as belonging to a predefined class though computer methods that comprise the steps of identifying an electronic data stream using one or more learning machines and comparing the outputs from the machines to determine the label to associate with the data. The method further utilizes learning machines in combination with hashing schemes to cluster and classify documents. In one embodiment hash apparatuses and methods taxonomize clusters. In yet another embodiment, clusters of documents utilize geometric hash to contain the documents in a data corpus without the overhead of search and storage.