US Patent 8042112 Scheduler for search engine crawler

Is a

Patent

Patent attributes

Current Assignee

Google

Patent Jurisdiction

United States Patent and Trademark Office

Patent Number

8042112

Date of Patent

October 18, 2011

Patent Application Number

10882956

Date Filed

June 30, 2004

Patent Citations Received

‌

US Patent 11704376 Retrieval of content using link-based search

‌

US Patent 11763013 Transaction document management system and method

‌

US Patent 11709900 Automated web page accessing

Patent Primary Examiner

‌

Emerson Puente

Patent abstract

A search engine crawler includes a distributed set of schedulers that are associated with one or more segments of document identifiers (e.g., URLs) corresponding to documents on a network (e.g., WWW). Each scheduler handles the scheduling of document identifiers (for crawling) for a subset of the known document identifiers. Using a starting set of document identifiers, such as the document identifiers crawled (or scheduled for crawling) during the most recent completed crawl, the scheduler removes from the starting set those document identifiers that have been unreachable in each of the last X crawls. Other filtering mechanisms may also be used to filter out some of the document identifiers in the starting set. The resulting list of document identifiers is written to a scheduled output file for use in a next crawl cycle.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

US Patent 8042112 Scheduler for search engine crawler

Contents

Patent attributes

Timeline

Further Resources

References

Find more entities like US Patent 8042112 Scheduler for search engine crawler