Patent attributes
A system is disclosed for managing large datasets. The system comprises a physical network. The physical network comprises a plurality of computing devices with a plurality of processors. The system further comprises a logical peer-to-peer (P2P) network with a plurality of nodes. The system further comprises a distributed file system for distributing data and jobs received by the system randomly across the plurality of nodes in the P2P network. The system duplicates the data to neighboring nodes of the plurality of nodes. The nodes monitor each other to reduce loss of data. The system further comprises a task scheduler. The task scheduler balances load across the plurality of nodes as tasks, derived from jobs, are distributed to various nodes. The task scheduler redistributes and forwards tasks to ensure the nodes processing the tasks are best suited to process those tasks.